SMT2025

API Endpoint
Leaderboard
Loading leaderboard...
README

SMT2025

OpenReward Environment Hugging Face Dataset

Description

SMT2025 is an environment for evaluating mathematical reasoning on problems from the Stanford Math Tournament (SMT) 2025. Agents solve competition-level mathematics problems and submit answers in LaTeX boxed format. The environment uses a specialized grader for answer parsing and verification.

Capabilities

  • Competition-level mathematical problem solving
  • LaTeX answer parsing and verification
  • Stanford Math Tournament problem evaluation
  • Multi-step mathematical reasoning

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

  • test: SMT 2025 competition problems

Problems cover various mathematical topics from the Stanford Math Tournament.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:

  • 1.0: Answer matches the gold answer after parsing
  • 0.0: Answer is incorrect

The grader parses both model and gold answers to handle LaTeX formatting variations.

Data

Data is sourced from the MathArena/smt_2025 HuggingFace dataset.

Tools

ToolDescription
answerSubmit final answer (use \boxed{} format)

Time Horizon

Single-turn. The agent receives a problem and submits one answer.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

No other secrets required other than OpenReward API key.

Safety

Agents in SMT2025 solve mathematical problems in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{balunovic2025matharena,
  title={MathArena: Evaluating LLMs on Uncontaminated Math Competitions},
  author={Balunovi{\'c}, Mislav and Dekoninck, Jasper and Petrov, Ivo and Jovanovi{\'c}, Nikola and Vechev, Martin},
  booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
  year={2025}
}
GeneralReasoning/SMT2025 | OpenReward