ScrapeBench

Description

ScrapeBench is an environment for evaluating an agent's ability to scrape structured data from websites. Given a natural language instruction describing what data to extract from a specific website, the agent must write code to scrape, parse, and structure the data into JSON or JSONL format. Tasks cover diverse domains including sports statistics and financial data.

Capabilities

Web scraping using Python libraries (requests, BeautifulSoup, Selenium)
Parsing and structuring web data into JSON/JSONL format
Working with diverse web sources
Developing robust data extraction pipelines

Compute Requirements

Agents are given a sandbox with 1 CPU and 2GB RAM, with network access enabled for web scraping. The sandbox includes Python with web scraping libraries (requests, BeautifulSoup, Selenium, lxml).

Tasks

There are two splits in this environment:

Train: 29 web scraping tasks
Test: 29 web scraping tasks

Tasks span sports statistics, financial data, and other structured web content.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent writes scraping code, executes it, and submits results as submission.json or submission.jsonl. Reward is continuous (0.0 to 1.0) based on deterministic algorithmic matching:

JSONL format: Bipartite matching with 60% similarity threshold, reward = matched entries / total ground truth entries
JSON format: Field-by-field comparison with string similarity (difflib.SequenceMatcher) and numeric tolerance (10% relative error)

No LLM graders are used.

Data

Ground truth data consists of reference JSON/JSONL files for each task stored on the OpenReward platform. Tasks reference live websites that agents must scrape.

Tools

Agents get CLI tools (bash, read, write, edit, multi_edit, grep, glob, ls, todo_write) plus 1 environment-specific tool:

Tool	Description
`submit_solution`	Submit scraped data (submission.json or submission.jsonl) for evaluation against ground truth.

Time Horizon

ScrapeBench is a multi-turn, sandbox-based environment. Agents read the scraping instructions, write code, test it, and submit structured results.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

There are no further environment requirements; ScrapeBench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in ScrapeBench scrape publicly available websites in a sandboxed environment. The environment does not present direct safety risks.

Citation

@dataset{GRScrapeBench,
  author    = {General Reasoning Inc. Team},
  title     = {ScrapeBench},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/ScrapeBench}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	1 vCPU / 2 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000230
Total	$0.0000550

Examples

5-minute session$0.0165

1-hour session$0.1980