MathVista
MathVista
Description
MathVista is an environment for evaluating vision-language mathematical reasoning. It contains 6,141 examples combining 3 newly created datasets with 28 existing datasets, testing mathematical reasoning within visual contexts including geometry diagrams, function plots, charts, tables, and scientific figures.
Capabilities
- Vision-language mathematical reasoning
- Multiple choice and free-form question answering
- Mathematical problem solving with visual context
- Multi-domain reasoning (arithmetic, algebra, geometry, statistics, scientific reasoning)
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
There are two splits in this environment:
- testmini: 1,000 tasks
- test: 5,141 tasks
Questions span multiple mathematical skills (arithmetic, algebraic, geometric, statistical reasoning) and visual contexts (diagrams, charts, plots, tables, scientific figures).
Reward Structure
This is a single-turn environment. The agent submits an answer via the submit_answer tool. An LLM grader (gpt-5-mini) evaluates answer correctness with flexible matching:
- Multiple choice: Letter matching, full text matching, or semantic equivalence
- Integer: Exact integer value extraction and comparison
- Float: Numerical comparison with precision-based tolerance
- List: Element-by-element comparison
- Text: Semantic equivalence
Reward is binary: 1.0 if correct, 0.0 if incorrect.
Data
Data consists of Parquet files (testmini.parquet, test.parquet) sourced from HuggingFace AI4Math/MathVista. Each row contains a question, image, choices (for multiple choice), answer, and metadata. Data is stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
submit_answer | Submit your answer (letter choice for multiple choice, value for numerical/text). Ends the episode. |
Time Horizon
Single-turn. The agent views the question and image, then submits one answer.
Environment Difficulty
MathVista evaluates mathematical reasoning in visual contexts across 31 source datasets and 7 mathematical skill categories.
Other Environment Requirements
OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.
Safety
Agents in MathVista solve visual mathematical problems in a standard environment. The environment does not present direct safety risks.
Citation
@inproceedings{lu2024mathvista,
author = {Lu, Pan and Bansal, Hritik and Xia, Tony and Liu, Jiacheng and
Li, Chunyuan and Hajishirzi, Hannaneh and Cheng, Hao and
Chang, Kai-Wei and Galley, Michel and Gao, Jianfeng},
title = {MathVista: Evaluating Mathematical Reasoning of Foundation Models
in Visual Contexts},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024}
}