ScholarSearch

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

ScholarSearch

⭐ OpenReward Hugging Face

Description

ScholarSearch is an environment for evaluating academic question answering. Based on the ScholarSearch benchmark from Peking University, agents are given academic research questions in Chinese across 12 disciplines and must provide concise, accurate answers. An LLM grader evaluates correctness.

Capabilities

  • Academic research question answering across multiple disciplines
  • Cross-disciplinary knowledge spanning 12 fields including Computer Science, Biology, Economics, Physics, and more
  • Chinese-language academic comprehension
  • Domain-specific knowledge evaluation

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0

Tasks

Split: test (223 tasks)

Tasks span 12 academic disciplines covering diverse fields of study. Questions are curated by undergraduate and graduate students and faculty at Peking University.

Reward Structure

ScholarSearch follows a single-turn evaluation paradigm:

  1. Agent receives an academic research question
  2. Agent submits an answer via the answer tool
  3. An LLM grader (gpt-4.1) evaluates the answer against the reference answer
  4. Binary reward: 1.0 if correct, 0.0 if incorrect

Data

File: ScholarSearch.json (223 entries)

Data sourced from HuggingFace PKU-DS-LAB/ScholarSearch. Task data is stored on the OpenReward platform.

Tools

Tool: answer

Submit a text answer for LLM-based grading.

Parameters:

  • text (string): The answer to the academic question

Returns:

  • reward (float): 1.0 if correct, 0.0 if incorrect
  • finished (bool): True (single-turn environment)

Time Horizon

Single-turn. Each task is evaluated in a single interaction.

Environment Difficulty

ModelAccuracy
gpt-4o-search-preview18.83%
gpt-4o-mini-search-preview10.31%
deepseek-r1-05288.52%
gpt-4.17.17%
gpt-4o-2024-11-203.59%
gpt-4o-mini2.24%

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."} when creating an environment session.

Safety

Agents in ScholarSearch answer academic questions in a standard environment. The environment does not present direct safety risks.

Citation

@misc{zhou2025scholarsearchbenchmarkingscholarsearching,
  title={ScholarSearch: Benchmarking Scholar Searching Ability of LLMs},
  author={Junting Zhou and Wang Li and Yiyan Liao and Nengyuan Zhang and Tingjia Miao and Zhihui Qi and Yuhan Wu and Tong Yang},
  year={2025},
  eprint={2506.13784},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2506.13784}
}
GeneralReasoning/ScholarSearch | OpenReward