51�� Information Theory Society

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the best-arm identification problem in multi-armed bandits with stochastic rewards when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a successive elimination algorithm for strictly optimal best-arm identification, show that it is $\delta $ -PAC and characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors.

Active Learning for Classification With Abstention

Submitted by admin on Mon, 10/28/2024 - 01:24

We construct and analyze active learning algorithms for the problem of binary classification with abstention, in which the learner has an additional option to withhold its decision on certain points in the input space. We consider this problem in the fixed-cost setting, where the learner incurs a cost $\lambda \in (0, 1/2)$ every time the abstain option is invoked. Our proposed algorithm can work with the three most commonly used active learning query models, namely, membership-query, pool-based, and stream-based models.

Bayesian Algorithms for Decentralized Stochastic Bandits

Submitted by admin on Mon, 10/28/2024 - 01:24

We study a decentralized cooperative multi-agent multi-armed bandit (MAB) problem with K arms and N agents connected over a network. In this model, each arm's reward distribution is the same for every agent, and rewards are drawn independently across agents and over time steps. At each iteration, agents independently choose an arm to play and exchange at most poly(K) real-valued messages with their neighbors.

Nonparametric Iterated-Logarithm Extensions of the Sequential Generalized Likelihood Ratio Test

Submitted by admin on Mon, 10/28/2024 - 01:24

We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simulations due to infinite horizon of the tests and the composite nonparametric nulls under consideration.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the estimation of risk-sensitive policies in reinforcement learning (RL) problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent, and hence modifications of Bellman's equations are required, which often do not permit efficient fixed point schemes.

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Submitted by admin on Mon, 10/28/2024 - 01:24

In this paper, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of algorithm for decentralized stochastic optimization. Considering communication and synchronization costs are the major bottlenecks for decentralized optimization, we attempt to reduce these costs from an algorithmic design aspect, in particular, we are able to reduce the number of agents involved in one round of update via randomization.

Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

Submitted by admin on Mon, 10/28/2024 - 01:24

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1.

Asynchronous Delayed Optimization With Time-Varying Minibatches

Submitted by admin on Mon, 10/28/2024 - 01:24

Large-scale learning and optimization problems are often solved in parallel. In a master-worker distributed setup, worker nodes are most often assigned fixed-sized minibatches of data points to process. However, workers may take different amounts of time to complete their per-batch calculations. To deal with such variability in processing times, an alternative approach has recently been proposed wherein each worker is assigned a fixed duration to complete the calculations associated with each batch.

Robust Change Detection via Information Projection

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the robust transient and quickest change detection problems with unknown post-change distributions under finite alphabets. When the distribution after the change point is unknown, the change in the distribution of observations may occur in multiple ways without much structure on the observations, whereas, before the change point, a false alarm is highly structured, following a particular sample path with respect to the distribution of observations and the detection scheme.

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Submitted by admin on Mon, 10/28/2024 - 01:24

The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits.

Subscribe to Our Mailing List