BrowseComp

Description

BrowseComp is a benchmark for measuring agents' web-browsing ability, comprising 1,266 questions that require persistent navigation to locate hard-to-find, entangled information. It yields short, easily verifiable answers and tests agents' persistence and creativity in finding information, serving as an analogue to programming competitions for coding agents.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/BrowseComp
0
1 months ago
OpenAI/BrowseComp | OpenReward