ScrapeBench

API Endpoint
Leaderboard
Loading leaderboard...
README

ScrapeBench

OpenReward Environment

Description

ScrapeBench is an environment for evaluating an agent's ability to scrape structured data from websites. Given a natural language instruction describing what data to extract from a specific website, the agent must write code to scrape, parse, and structure the data into JSON or JSONL format. Tasks cover diverse domains including sports statistics and financial data.

Capabilities

  • Web scraping using Python libraries (requests, BeautifulSoup, Selenium)
  • Parsing and structuring web data into JSON/JSONL format
  • Working with diverse web sources
  • Developing robust data extraction pipelines

Compute Requirements

Agents are given a sandbox with 1 CPU and 2GB RAM, with network access enabled for web scraping. The sandbox includes Python with web scraping libraries (requests, BeautifulSoup, Selenium, lxml).

Tasks

There are two splits in this environment:

  • Train: 29 web scraping tasks
  • Test: 29 web scraping tasks

Tasks span sports statistics, financial data, and other structured web content.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent writes scraping code, executes it, and submits results as submission.json or submission.jsonl. Reward is continuous (0.0 to 1.0) based on deterministic algorithmic matching:

  • JSONL format: Bipartite matching with 60% similarity threshold, reward = matched entries / total ground truth entries
  • JSON format: Field-by-field comparison with string similarity (difflib.SequenceMatcher) and numeric tolerance (10% relative error)

No LLM graders are used.

Data

Ground truth data consists of reference JSON/JSONL files for each task stored on the OpenReward platform. Tasks reference live websites that agents must scrape.

Tools

Agents get CLI tools (bash, read, write, edit, multi_edit, grep, glob, ls, todo_write) plus 1 environment-specific tool:

ToolDescription
submit_solutionSubmit scraped data (submission.json or submission.jsonl) for evaluation against ground truth.

Time Horizon

ScrapeBench is a multi-turn, sandbox-based environment. Agents read the scraping instructions, write code, test it, and submit structured results.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

There are no further environment requirements; ScrapeBench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in ScrapeBench scrape publicly available websites in a sandboxed environment. The environment does not present direct safety risks.

Citation

@dataset{GRScrapeBench,
  author    = {General Reasoning Inc. Team},
  title     = {ScrapeBench},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/ScrapeBench}
}
GeneralReasoning/ScrapeBench | OpenReward