GeneralReasoner

Name: GeneralReasoning/GeneralReasoner
Author: General Reasoning

Description

GeneralReasoner is an environment for evaluating general reasoning capabilities using the WebInstruct-verified dataset from the General-Reasoner project by TIGER-AI-Lab. It provides diverse reasoning tasks spanning multiple categories and difficulty levels, with LLM-based semantic grading for flexible answer evaluation.

Capabilities

General reasoning evaluation across multiple domains
Multi-category question answering
Semantic answer verification
Varied difficulty levels

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There are two splits in this environment:

train: 228,736 tasks
test: 1,000 tasks

Tasks span multiple categories with varying difficulty levels.

Reward Structure

This is a single-turn environment. The agent submits an answer via the answer tool. An LLM grader (gpt-5-mini) evaluates semantic correctness against the reference answer. Reward is binary: 1.0 if correct, 0.0 if incorrect.

Data

Data consists of Parquet files sourced from the WebInstruct-verified dataset. Each row contains a question, answer, answer type, category, and difficulty level. Data is stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit your answer for LLM grading. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and submits one answer.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in GeneralReasoner answer reasoning questions in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{ma2025generalreasoner,
  title={General-Reasoner: Advancing {LLM} Reasoning Across All Domains},
  author={Ma, Xueguang and Liu, Qian and Jiang, Dongfu and Zhang, Ge and Ma, Zejun and Chen, Wenhu},
  booktitle={Proceedings of the Neural Information Processing Systems (NeurIPS)},
  year={2025}
}

Repository

Source repository

EnvCommons/GeneralReasonerDataset

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152