In-Context Learning for Pure Exploration in Continuous Spaces

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $ε$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

💡 Research Summary

**
This paper tackles the problem of pure exploration—also known as active sequential hypothesis testing—in settings where both the hypothesis space and the query (action) space are continuous. Classical pure‑exploration literature has focused on finite hypothesis sets (e.g., best‑arm identification in multi‑armed bandits), but many modern scientific and engineering tasks involve continuous decisions: finding an optimal action in a continuous‑armed bandit, localizing an ε‑ball inside a target region, or minimizing an unknown function based on noisy evaluations. The authors formalize the fixed‑confidence regime, requiring that the algorithm return an ε‑optimal hypothesis with probability at least 1 − δ while minimizing the expected number of queries.

A central theoretical contribution is the introduction of the posterior success probability qₜ(h, x) = P(Lθ(x) ≤ ε | Hₜ = h), which replaces the point‑mass posterior used in finite settings. The maximal posterior success rₜ(h) = maxₓ qₜ(h, x) serves as a natural confidence metric. By applying a Lagrangian dual to the (ε, δ) constraint, the authors obtain a dual objective Vλ(π, I, τ) = −E

In-Context Learning for Pure Exploration in Continuous Spaces

💡 Research Summary

Comments & Academic Discussion

Leave a Comment