API Endpoint

Leaderboard

Loading leaderboard...

README

BioReason

Description

BioReason is an environment for evaluating biological reasoning derived from the BioReason benchmark. Based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, agents answer questions about molecular pathways and genetic mechanisms that elucidate mechanistic connections between genetic variants and disease phenotypes. An LLM grader evaluates answer correctness.

Capabilities

Multi-step causal reasoning across molecular networks
Integration of pathway and variant data
Precise molecular mechanism identification
Clinical database integration (ClinVar, dbSNP, OMIM, COSM)
Standardized molecular network representation

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0

Tasks

Tasks span multiple splits covering different aspects of biological reasoning. Each entry contains a question about genetic variants and their mechanistic connections to disease phenotypes.

Reward Structure

This is a sparse reward environment with LLM-based grading:

Agent receives a biological reasoning question
Agent submits an answer via the answer tool
An LLM grader (gpt-4.1) evaluates the answer against the reference
Binary reward: 1.0 if correct, 0.0 if incorrect

Data

Data is derived from the KEGG pathway database with integration from clinical databases (ClinVar, dbSNP, OMIM, COSM). Task data is stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit answer for LLM-based grading

Time Horizon

Single-turn. Each task is evaluated in a single interaction.

Environment Difficulty

Model performance on BioReason KEGG benchmark (290 test samples):

Model	Accuracy	F1-Score
Evo2 + Qwen3-4B (+GRPO)	98.28%	93.05%
Evo2 + Qwen3-4B	97.24%	86.30%
Qwen3-4B	93.48%	85.44%

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."}.

Safety

Agents in BioReason answer biological reasoning questions in a standard environment. The environment does not present direct safety risks.

Citation

@misc{fallahpour2025bioreasonincentivizingmultimodalbiological,
  title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model},
  author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang},
  year={2025},
  eprint={2505.23579},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.23579}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

BioReason

GeneralReasoning/BioReason

BioReason

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Tools

Compute Configuration

Estimated Cost

Examples