llm-srbench

Description

LLM-SRBench is a comprehensive benchmark for evaluating LLM-based scientific equation discovery, comprising 239 challenging problems across four scientific domains designed to prevent trivial memorization. It includes two categories—LSR-Transform, which converts common physical models into less common mathematical representations to test reasoning beyond memorized forms, and LSR-Synth, which creates synthetic data-driven discovery problems—and shows current methods reach at best 31.5% symbolic accuracy.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
parshinshparshinsh/llmsr-bench-full
3
2 weeks ago
arXiv/llm-srbench | OpenReward