BrowseComp

Name: OpenAI/BrowseComp
Author: OpenAI

OpenAI/BrowseComp

Description

BrowseComp is a benchmark for measuring agents' web-browsing ability, comprising 1,266 questions that require persistent navigation to locate hard-to-find, entangled information. It yields short, easily verifiable answers and tests agents' persistence and creativity in finding information, serving as an analogue to programming competitions for coding agents.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/BrowseComp	0	3 months ago