From Single-Pass RAG to Self-Editing Search Agents: Designing and Training Agentic Search
The Setup You have a corpus, a search API, and users asking questions that no single search query can answer in one shot. Your job is to design a system that learns to search well, not just retrieve, but plan what to search, evaluate what came back, decide whether to search again, and manage what stays in context. The term “multi-hop search” covers two different skills: Type 1: questions whose constraints are bundled into the question text, sometimes explicitly tagged, sometimes encoded obliquely. “Find the EMNLP paper between 2018 and 2023 where the first author did their undergrad at Dartmouth and the fourth at UPenn.” Constraints are explicit and tagged with field names; the agent has to parse them, issue searches for each, and combine results. Or in a harder form: “A sacred structure in a western European capital was designed in a style combining two ancient architectural traditions, selected through a competitive process initiated in the late 1860s. The community for whom this building was constructed gained official state recognition during the early 1830s. On what date was this building formally inaugurated?” Same skill (constraints are all in the question), but the constraints are encoded obliquely, the agent has to decode “competitive process initiated in the late 1860s” into something searchable. ...