MathVista

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

MathVista

OpenReward Environment Hugging Face Dataset

Description

MathVista is an environment for evaluating vision-language mathematical reasoning. It contains 6,141 examples combining 3 newly created datasets with 28 existing datasets, testing mathematical reasoning within visual contexts including geometry diagrams, function plots, charts, tables, and scientific figures.

Capabilities

  • Vision-language mathematical reasoning
  • Multiple choice and free-form question answering
  • Mathematical problem solving with visual context
  • Multi-domain reasoning (arithmetic, algebra, geometry, statistics, scientific reasoning)

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY-SA 4.0.

Tasks

There are two splits in this environment:

  • testmini: 1,000 tasks
  • test: 5,141 tasks

Questions span multiple mathematical skills (arithmetic, algebraic, geometric, statistical reasoning) and visual contexts (diagrams, charts, plots, tables, scientific figures).

Reward Structure

This is a single-turn environment. The agent submits an answer via the submit_answer tool. An LLM grader (gpt-5-mini) evaluates answer correctness with flexible matching:

  • Multiple choice: Letter matching, full text matching, or semantic equivalence
  • Integer: Exact integer value extraction and comparison
  • Float: Numerical comparison with precision-based tolerance
  • List: Element-by-element comparison
  • Text: Semantic equivalence

Reward is binary: 1.0 if correct, 0.0 if incorrect.

Data

Data consists of Parquet files (testmini.parquet, test.parquet) sourced from HuggingFace AI4Math/MathVista. Each row contains a question, image, choices (for multiple choice), answer, and metadata. Data is stored on the OpenReward platform.

Tools

ToolDescription
submit_answerSubmit your answer (letter choice for multiple choice, value for numerical/text). Ends the episode.

Time Horizon

Single-turn. The agent views the question and image, then submits one answer.

Environment Difficulty

MathVista evaluates mathematical reasoning in visual contexts across 31 source datasets and 7 mathematical skill categories.

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in MathVista solve visual mathematical problems in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{lu2024mathvista,
  author = {Lu, Pan and Bansal, Hritik and Xia, Tony and Liu, Jiacheng and
            Li, Chunyuan and Hajishirzi, Hannaneh and Cheng, Hao and
            Chang, Kai-Wei and Galley, Michel and Gao, Jianfeng},
  title = {MathVista: Evaluating Mathematical Reasoning of Foundation Models
           in Visual Contexts},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}
GeneralReasoning/MathVista | OpenReward