ScrapeBench
ScrapeBench
Description
ScrapeBench is an environment for evaluating an agent's ability to scrape structured data from websites. Given a natural language instruction describing what data to extract from a specific website, the agent must write code to scrape, parse, and structure the data into JSON or JSONL format. Tasks cover diverse domains including sports statistics and financial data.
Capabilities
- Web scraping using Python libraries (requests, BeautifulSoup, Selenium)
- Parsing and structuring web data into JSON/JSONL format
- Working with diverse web sources
- Developing robust data extraction pipelines
Compute Requirements
Agents are given a sandbox with 1 CPU and 2GB RAM, with network access enabled for web scraping. The sandbox includes Python with web scraping libraries (requests, BeautifulSoup, Selenium, lxml).
Tasks
There are two splits in this environment:
- Train: 29 web scraping tasks
- Test: 29 web scraping tasks
Tasks span sports statistics, financial data, and other structured web content.
Reward Structure
This is a multi-turn, sandbox-based environment. The agent writes scraping code, executes it, and submits results as submission.json or submission.jsonl. Reward is continuous (0.0 to 1.0) based on deterministic algorithmic matching:
- JSONL format: Bipartite matching with 60% similarity threshold, reward = matched entries / total ground truth entries
- JSON format: Field-by-field comparison with string similarity (difflib.SequenceMatcher) and numeric tolerance (10% relative error)
No LLM graders are used.
Data
Ground truth data consists of reference JSON/JSONL files for each task stored on the OpenReward platform. Tasks reference live websites that agents must scrape.
Tools
Agents get CLI tools (bash, read, write, edit, multi_edit, grep, glob, ls, todo_write) plus 1 environment-specific tool:
| Tool | Description |
|---|---|
submit_solution | Submit scraped data (submission.json or submission.jsonl) for evaluation against ground truth. |
Time Horizon
ScrapeBench is a multi-turn, sandbox-based environment. Agents read the scraping instructions, write code, test it, and submit structured results.
Environment Difficulty
[Put environment difficulty statistics here]
Other Environment Requirements
There are no further environment requirements; ScrapeBench works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in ScrapeBench scrape publicly available websites in a sandboxed environment. The environment does not present direct safety risks.
Citation
@dataset{GRScrapeBench,
author = {General Reasoning Inc. Team},
title = {ScrapeBench},
year = {2026},
publisher = {OpenReward},
url = {https://openreward.ai/GeneralReasoning/ScrapeBench}
}