51�� Information Theory Society

Generalization Bounds via Information Density and Conditional Information Density

Submitted by admin on Mon, 10/28/2024 - 01:24

We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis.

Fast Variational Inference for Joint Mixed Sparse Graphical Models

Submitted by admin on Mon, 10/28/2024 - 01:24

Mixed graphical models are widely implemented to capture interactions among different types of variables. To simultaneously learn the topology of multiple mixed graphical models and encourage common structure, people have developed a variational maximum likelihood inference approach, which takes advantage of the log-determinant relaxation. In this article, we further improve the computational efficiency of this method by exploiting the block diagonal structure of the solution.

Generalized Autoregressive Linear Models for Discrete High-Dimensional Data

Submitted by admin on Mon, 10/28/2024 - 01:24

Fitting multivariate autoregressive (AR) models is fundamental for time-series data analysis in a wide range of applications. This article considers the problem of learning a $p$ -lag multivariate AR model where each time step involves a linear combination of the past $p$ states followed by a probabilistic, possibly nonlinear, mapping to the next state. The problem is to learn the linear connectivity tensor from observations of the states. We focus on the sparse setting, which arises in applications with a limited number of direct connections between variables.

rTop-k: A Statistical Estimation Approach to Distributed SGD

Submitted by admin on Mon, 10/28/2024 - 01:24

The large communication cost for exchanging gradients between different nodes significantly limits the scalability of distributed training for large-scale learning models. Motivated by this observation, there has been significant recent interest in techniques that reduce the communication cost of distributed Stochastic Gradient Descent (SGD), with gradient sparsification techniques such as top-k and random-k shown to be particularly effective.

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta ^{*}$ . Since we do not place any restrictions on these functions, the problem setting subsumes several previously studied frameworks that assume linear or invertible reward functions. We propose a novel approach to gradually estimate the hidden $\theta ^{*}$ and use the estimate together with the mean reward functions to substantially reduce exploration of sub-optimal arms.

Minimax Estimation of Divergences Between Discrete Distributions

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the minimax estimation of α-divergences between discrete distributions for integer α ≥ 1, which include the Kullback-Leibler divergence and the χ2-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials.

Convex Parameter Recovery for Interacting Marked Processes

Submitted by admin on Mon, 10/28/2024 - 01:24

We introduce a new general modeling approach for multivariate discrete event data with categorical interacting marks, which we refer to as marked Bernoulli processes. In the proposed model, the probability of an event of a specific category to occur in a location may be influenced by past events at this and other locations. We do not restrict interactions to be positive or decaying over time as it is commonly adopted, allowing us to capture an arbitrary shape of influence from historical events, locations, and events of different categories.

Information-Theoretic Limits for the Matrix Tensor Product

Submitted by admin on Mon, 10/28/2024 - 01:24

This article studies a high-dimensional inference problem involving the matrix tensor product of random matrices. This problem generalizes a number of contemporary data science problems including the spiked matrix models used in sparse principal component analysis and covariance estimation and the stochastic block model used in network analysis.

Tensor Estimation With Structured Priors

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider rank-one symmetric tensor estimation when the tensor is corrupted by gaussian noise and the spike forming the tensor is a structured signal coming from a generalized linear model. The latter is a mathematically tractable model of a non-trivial hidden lower-dimensional latent structure in a signal. We work in a large dimensional regime with fixed ratio of signal-to-latent space dimensions.

Fast Robust Subspace Tracking via PCA in Sparse Data-Dependent Noise

Submitted by admin on Mon, 10/28/2024 - 01:24

This work studies the robust subspace tracking (ST) problem. Robust ST can be simply understood as a (slow) time-varying subspace extension of robust PCA. It assumes that the true data lies in a low-dimensional subspace that is either fixed or changes slowly with time. The goal is to track the changing subspaces over time in the presence of additive sparse outliers and to do this quickly (with a short delay). We introduce a “fast” mini-batch robust ST solution that is provably correct under mild assumptions.

Subscribe to Our Mailing List