NFLBench

API Endpoint
Leaderboard
Loading leaderboard...
README

NFLBench

⭐ OpenReward Environment

Description

NFLBench is an environment for building machine learning models of NFL football and trading those models on historical betting markets. Agents develop ML strategies using historical match data, place bets on game outcomes, and manage bankroll across an entire NFL season.

Capabilities

  • Developing machine learning models for NFL game prediction
  • Backtesting models against historical betting odds
  • Bankroll management and bet execution
  • Iterating on model development over time

Compute Requirements

Agents in NFLBench are given a sandbox with 1 CPU and 2GB RAM, with file system access and scientific Python libraries (pandas, numpy).

Tasks

There is one split in this environment:

  • Train: 4 scenarios
ScenarioStart DateStarting BankrollTraining Data
early-nflSeptember 2015$1002010-2014
mid-nflSeptember 2018$1502010-2017
covid-nflSeptember 2021$2002010-2020
recent-nflSeptember 2024$2502010-2023

Each task lasts for an entire NFL season (~18+ weeks).

Reward Structure

This is a dense, verifiable reward environment. Rewards occur after each matchday. The reward is calculated as the difference in log wealth before and after betting, i.e:

logWt+1logWt\log{W_{t+1}} - \log{W_{t}}

Agents must place at least one bet per matchday to prevent agents from learning to not exert any effort in the task.

No LLM graders are used for this task. Reward is deterministic based on match outcomes.

Data

Historical NFL match data including team names, scores, odds, season, week, and match type (regular/playoff). Training data is mounted at /tmp/gr-datasets for agents to build models.

Tools

Agents are given access to CLI tools for creating, viewing, and searching a filesystem (bash, read, write, edit, grep, glob, ls, todo_write, multi_edit). They are also given environment-specific tools:

ToolDescription
view_matchesView current week's NFL games with betting odds.
place_betPlace a bet on a game outcome (team1 or team2) with a specified amount.
view_bankrollView current bankroll and active bets.
next_matchdaySettle bets, receive reward, and advance to the next matchday.

Time Horizon

NFLBench is an open-ended, long-horizon environment where agents simulate an entire NFL season of model development and betting.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

There are no further environment requirements; NFLBench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in NFLBench are told to maximize their long-run bankroll growth. The environment does not present direct safety risks, as agents only interact with historical data through betting decisions on public odds.

There may be indirect risks, however, in that an agent that is taught to maximize long-run wealth may blindly follow this objective when tested in other environments, leading it to pursue unethical objectives. Our advice is that multi-environment training runs involving NFLBench should include other environments that teach agents to respect ethical norms so that the agent understands a broader category of objectives than just maximizing wealth.

Citation

@dataset{GRNFLBench,
  author    = {General Reasoning Inc. Team},
  title     = {NFLBench},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://www.openreward.ai/GeneralReasoning/NFLBench}
}
GeneralReasoning/NFLBench | OpenReward