GSM8K

API Endpoint
Leaderboard
Loading leaderboard...
README

GSM8K

OpenReward Environment Hugging Face Dataset

Description

GSM8K is an environment for evaluating grade school math word problem solving. Based on OpenAI's GSM8K benchmark, agents are given math word problems requiring 2-8 steps of basic arithmetic and must provide the final numerical answer.

Capabilities

  • Multi-step arithmetic reasoning
  • Grade school math problem solving
  • Numerical answer extraction

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

MIT.

Tasks

There are two splits in this environment:

  • train: 7,473 grade school math word problems.
  • test: 1,319 grade school math word problems.

Total: 8,792 tasks. Each task presents a math word problem requiring 2-8 steps of basic arithmetic (addition, subtraction, multiplication, division) to solve.

Reward Structure

This is a single-turn, verifiable reward environment. The agent submits its answer via the answer tool. The answer is verified using the math_verify library for mathematical equivalence against the gold answer. The reward is binary: 1.0 if the answer is correct, 0.0 if incorrect.

We do not use LLM graders for this task.

Data

Data consists of Parquet files (train-00000-of-00001.parquet, test-00000-of-00001.parquet) sourced from the openai/gsm8k HuggingFace dataset. Each record contains a question (the math word problem) and an answer (the gold solution with a final numerical answer). Data files are stored on the OpenReward platform.

Tools

Agents have access to a single tool:

  • answer -- Submit a final numerical answer. The answer is checked for mathematical equivalence against the gold answer using the math_verify library. This tool finishes the episode.

Time Horizon

Single-turn. The agent reads the math problem and submits one answer.

Environment Difficulty

ModelAccuracy
GPT-4 (DUP)97.1%
Llama 3 405B96.8%
Claude 3.5 Sonnet96.4%
GPT-4o96.1%
Llama 3 70B95.1%

Other Environment Requirements

There are no further environment requirements; GSM8K works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in GSM8K solve grade school math problems in a standard environment. The environment does not present direct safety risks.

Citation

@article{cobbe2021gsm8k,
  title={Training Verifiers to Solve Math Word Problems},
  author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John},
  journal={arXiv preprint arXiv:2110.14168},
  year={2021}
}
GeneralReasoning/GSM8K | OpenReward