GeneralReasoner
GeneralReasoner
Description
GeneralReasoner is an environment for evaluating general reasoning capabilities using the WebInstruct-verified dataset from the General-Reasoner project by TIGER-AI-Lab. It provides diverse reasoning tasks spanning multiple categories and difficulty levels, with LLM-based semantic grading for flexible answer evaluation.
Capabilities
- General reasoning evaluation across multiple domains
- Multi-category question answering
- Semantic answer verification
- Varied difficulty levels
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
Tasks
There are two splits in this environment:
- train: 228,736 tasks
- test: 1,000 tasks
Tasks span multiple categories with varying difficulty levels.
Reward Structure
This is a single-turn environment. The agent submits an answer via the answer tool. An LLM grader (gpt-5-mini) evaluates semantic correctness against the reference answer. Reward is binary: 1.0 if correct, 0.0 if incorrect.
Data
Data consists of Parquet files sourced from the WebInstruct-verified dataset. Each row contains a question, answer, answer type, category, and difficulty level. Data is stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
answer | Submit your answer for LLM grading. Ends the episode. |
Time Horizon
Single-turn. The agent reads the question and submits one answer.
Environment Difficulty
[Put environment difficulty statistics here]
Other Environment Requirements
OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.
Safety
Agents in GeneralReasoner answer reasoning questions in a standard environment. The environment does not present direct safety risks.
Citation
@inproceedings{ma2025generalreasoner,
title={General-Reasoner: Advancing {LLM} Reasoning Across All Domains},
author={Ma, Xueguang and Liu, Qian and Jiang, Dongfu and Zhang, Ge and Ma, Zejun and Chen, Wenhu},
booktitle={Proceedings of the Neural Information Processing Systems (NeurIPS)},
year={2025}
}