We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics

New submissions

[ total of 60 entries: 1-60 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 20 May 24

[1]  arXiv:2405.10329 [pdf, ps, other]
Title: Causal inference approach to appraise long-term effects of maintenance policy on functional performance of asphalt pavements
Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI)

Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order to maintain good functional pavement performance and extend the service life of asphalt pavements, the long-term performance of pavements under maintenance policies needs to be evaluated and favorable options selected based on the condition of the pavement. A major challenge in evaluating maintenance policies is to produce valid treatments for the outcome assessment under the control of uncertainty of vehicle loads and the disturbance of freeze-thaw cycles in the climatic environment. In this study, a novel causal inference approach combining a classical causal structural model and a potential outcome model framework is proposed to appraise the long-term effects of four preventive maintenance treatments for longitudinal cracking over a 5-year period of upkeep. Three fundamental issues were brought to our attention: 1) detection of causal relationships prior to variables under environmental loading (identification of causal structure); 2) obtaining direct causal effects of treatment on outcomes excluding covariates (identification of causal effects); and 3) sensitivity analysis of causal relationships. The results show that the method can accurately evaluate the effect of preventive maintenance treatments and assess the maintenance time to cater well for the functional performance of different preventive maintenance approaches. This framework could help policymakers to develop appropriate maintenance strategies for pavements.

[2]  arXiv:2405.10371 [pdf, other]
Title: Causal Discovery in Multivariate Extremes with a Hydrological Analysis of Swiss River Discharges
Subjects: Methodology (stat.ME); Applications (stat.AP)

Causal asymmetry is based on the principle that an event is a cause only if its absence would not have been a cause. From there, uncovering causal effects becomes a matter of comparing a well-defined score in both directions. Motivated by studying causal effects at extreme levels of a multivariate random vector, we propose to construct a model-agnostic causal score relying solely on the assumption of the existence of a max-domain of attraction. Based on a representation of a Generalized Pareto random vector, we construct the causal score as the Wasserstein distance between the margins and a well-specified random variable. The proposed methodology is illustrated on a hydrologically simulated dataset of different characteristics of catchments in Switzerland: discharge, precipitation, and snowmelt.

[3]  arXiv:2405.10399 [pdf, ps, other]
Title: A note on continuous-time online learning
Authors: Lexing Ying
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.

[4]  arXiv:2405.10412 [pdf, other]
Title: Property testing in graphical models: testing small separation numbers
Subjects: Statistics Theory (math.ST)

In many statistical applications, the dimension is too large to handle for standard high-dimensional machine learning procedures. This is particularly true for graphical models, where the interpretation of a large graph is difficult and learning its structure is often computationally impossible either because the underlying graph is not sufficiently sparse or the number of vertices is too large. To address this issue, we develop a procedure to test a property of a graph underlying a graphical model that requires only a subquadratic number of correlation queries (i.e., we require that the algorithm only can access a tiny fraction of the covariance matrix). This provides a conceptually simple test to determine whether the underlying graph is a tree or, more generally, if it has a small separation number, a quantity closely related to the treewidth of the graph. The proposed method is a divide-and-conquer algorithm that can be applied to quite general graphical models.

[5]  arXiv:2405.10453 [pdf, other]
Title: Expected Points Above Average: A Novel NBA Player Metric Based on Bayesian Hierarchical Modeling
Subjects: Other Statistics (stat.OT)

Team and player evaluation in professional sport is extremely important given the financial implications of success/failure. It is especially critical to identify and retain elite shooters in the National Basketball Association (NBA), one of the premier basketball leagues worldwide because the ultimate goal of the game is to score more points than one's opponent. To this end we propose two novel basketball metrics: "expected points" for team-based comparisons and "expected points above average (EPAA)" as a player-evaluation tool. Both metrics leverage posterior samples from Bayesian hierarchical modeling framework to cluster teams and players based on their shooting propensities and abilities. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players.

[6]  arXiv:2405.10458 [pdf, other]
Title: Decision theory via model-free generalized fiducial inference
Subjects: Statistics Theory (math.ST)

Building on the recent development of the model-free generalized fiducial (MFGF) paradigm (Williams, 2023) for predictive inference with finite-sample frequentist validity guarantees, in this paper, we develop an MFGF-based approach to decision theory. Beyond the utility of the new tools we contribute to the field of decision theory, our work establishes a formal connection between decision theories from the perspectives of fiducial inference, conformal prediction, and imprecise probability theory. In our paper, we establish pointwise and uniform consistency of an {\em MFGF upper risk function} as an approximation to the true risk function via the derivation of nonasymptotic concentration bounds, and our work serves as the foundation for future investigations of the properties of the MFGF upper risk from the perspective of new decision-theoretic, finite-sample validity criterion, as in Martin (2021).

[7]  arXiv:2405.10461 [pdf, other]
Title: Prediction in Measurement Error Models
Authors: Fei Jiang, Yanyuan Ma
Subjects: Methodology (stat.ME)

We study the well known difficult problem of prediction in measurement error models. By targeting directly at the prediction interval instead of the point prediction, we construct a prediction interval by providing estimators of both the center and the length of the interval which achieves a pre-determined prediction level. The constructing procedure requires a working model for the distribution of the variable prone to error. If the working model is correct, the prediction interval estimator obtains the smallest variability in terms of assessing the true center and length. If the working model is incorrect, the prediction interval estimation is still consistent. We further study how the length of the prediction interval depends on the choice of the true prediction interval center and provide guidance on obtaining minimal prediction interval length. Numerical experiments are conducted to illustrate the performance and we apply our method to predict concentration of Abeta1-12 in cerebrospinal fluid in an Alzheimer's disease data.

[8]  arXiv:2405.10490 [pdf, ps, other]
Title: Neural Optimization with Adaptive Heuristics for Intelligent Marketing System
Comments: KDD 2024
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Optimization and Control (math.OC)

Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.

[9]  arXiv:2405.10527 [pdf, other]
Title: Hawkes Models And Their Applications
Subjects: Methodology (stat.ME); Probability (math.PR); Applications (stat.AP)

The Hawkes process is a model for counting the number of arrivals to a system which exhibits the self-exciting property - that one arrival creates a heightened chance of further arrivals in the near future. The model, and its generalizations, have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed which: track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This paper creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction, simulation algorithms, and giving key references to the appropriate literature for a detailed treatment.

[10]  arXiv:2405.10552 [pdf, other]
Title: Data Science Principles for Interpretable and Explainable AI
Authors: Kris Sankaran
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field.
We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable algorithms. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive algorithmic systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at https://go.wisc.edu/3k1ewe.

[11]  arXiv:2405.10582 [pdf, ps, other]
Title: General oracle inequalities for a penalized log-likelihood criterion based on non-stationary data
Authors: Julien Aubert (UniCA, LJAD, CNRS), Luc Lehéricy (LJAD, UniCA, CNRS), Patricia Reynaud-Bouret (LJAD, UniCA, CNRS)
Subjects: Statistics Theory (math.ST)

We prove oracle inequalities for a penalized log-likelihood criterion that hold even if the data are not independent and not stationary, based on a martingale approach. The assumptions are checked for various contexts: density estimation with independent and identically distributed (i.i.d) data, hidden Markov models, spiking neural networks, adversarial bandits. In each case, we compare our results to the literature, showing that, although we lose some logarithmic factors in the most classical case (i.i.d.), these results are comparable or more general than the existing results in the more dependent cases.

[12]  arXiv:2405.10588 [pdf, other]
Title: Decompounding with unknown noise through several independents channels
Authors: Guillaume Garnier (LJLL, MERGE)
Subjects: Statistics Theory (math.ST)

In this article, we consider two different statistical models. First, we focus on the estimation of the jump intensity of a compound Poisson process in the presence of unknown noise. This problem combines both the deconvolution problem and the decompounding problem. More specifically, we observe several independent compound Poisson processes but we assume that all these observations are noisy due to measurement noise. We construct an Fourier estimator of the jump density and we study its mean integrated squared error. Then, we propose an adaptive method to correctly select the cutoff of the estimator and we illustrate the efficiency of the method with numerical results. Secondly, we introduce in this paper the multiplicative decompounding problem. We study this problem with Mellin density estimators. We develop an adaptive procedure to select the optimal cutoff parameter.

[13]  arXiv:2405.10712 [pdf, other]
Title: Comparative evaluation of earthquake forecasting models: An application to Italy
Subjects: Applications (stat.AP)

Testing earthquake forecasts is essential to obtain scientific information on forecasting models and sufficient credibility for societal usage. We aim at enhancing the testing phase proposed by the Collaboratory for the Study of Earthquake Predictability (CSEP, Schorlemmer et al., 2018) with new statistical methods supported by mathematical theory. To demonstrate their applicability, we evaluate three short-term forecasting models that were submitted to the CSEP Italy experiment, and two ensemble models thereof. The models produce weekly overlapping forecasts for the expected number of M4+ earthquakes in a collection of grid cells. We compare the models' forecasts using consistent scoring functions for means or expectations, which are widely used and theoretically principled tools for forecast evaluation. We further discuss and demonstrate their connection to CSEP-style earthquake likelihood model testing. Then, using tools from isotonic regression, we investigate forecast reliability and apply score decompositions in terms of calibration and discrimination. Our results show where and how models outperform their competitors and reveal a substantial lack of calibration for various models. The proposed methods also apply to full-distribution (e.g., catalog-based) forecasts, without requiring Poisson distributions or making any other type of parametric assumption.

[14]  arXiv:2405.10719 [pdf, other]
Title: $\ell_1$-Regularized Generalized Least Squares
Comments: 13 pages, 6 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)

In this paper we propose an $\ell_1$-regularized GLS estimator for high-dimensional regressions with potentially autocorrelated errors. We establish non-asymptotic oracle inequalities for estimation accuracy in a framework that allows for highly persistent autoregressive errors. In practice, the Whitening matrix required to implement the GLS is unkown, we present a feasible estimator for this matrix, derive consistency results and ultimately show how our proposed feasible GLS can recover closely the optimal performance (as if the errors were a white noise) of the LASSO. A simulation study verifies the performance of the proposed method, demonstrating that the penalized (feasible) GLS-LASSO estimator performs on par with the LASSO in the case of white noise errors, whilst outperforming it in terms of sign-recovery and estimation error when the errors exhibit significant correlation.

[15]  arXiv:2405.10742 [pdf, other]
Title: Efficient Sampling in Disease Surveillance through Subpopulations: Sampling Canaries in the Coal Mine
Authors: Ivo V. Stoepker
Comments: 15 pages, 1 figure
Subjects: Methodology (stat.ME); Applications (stat.AP)

We consider disease outbreak detection settings where the population under study consists of various subpopulations available for stratified surveillance. These subpopulations can for example be based on age cohorts, but may also correspond to other subgroups of the population under study such as international travellers. Rather than sampling uniformly over the entire population, one may elevate the effectiveness of the detection methodology by optimally choosing a subpopulation for sampling. We show (under some assumptions) the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks. This leads to a considerable potential increase in sampling efficiency when sampling from the subpopulation with higher baseline disease risk, if the two subpopulation baseline risks differ strongly. Our mathematical results require a careful treatment of the power curves of exact binomial tests as a function of their sample size, which are erratic and non-monotonic due to the discreteness of the underlying distribution. Subpopulations with comparatively high baseline disease risk are typically in greater contact with health professionals, and thus when sampled for surveillance purposes this is typically motivated merely through a convenience argument. With this study, we aim to elevate the status of such "convenience surveillance" to optimal subpopulation surveillance.

[16]  arXiv:2405.10769 [pdf, ps, other]
Title: Efficient estimation of target population treatment effect from multiple source trials under effect-measure transportability
Subjects: Methodology (stat.ME)

When the marginal causal effect comparing the same treatment pair is available from multiple trials, we wish to transport all results to make inference on the target population effect. To account for the differences between populations, statistical analysis is often performed controlling for relevant variables. However, when transportability assumptions are placed on conditional causal effects, rather than the distribution of potential outcomes, we need to carefully choose these effect measures. In particular, we present identifiability results in two cases: target population average treatment effect for a continuous outcome and causal mean ratio for a positive outcome. We characterize the semiparametric efficiency bounds of the causal effects under the respective transportability assumptions and propose estimators that are doubly robust against model misspecifications. We highlight an important discussion on the tension between the non-collapsibility of conditional effects and the variational independence induced by transportability in the case of multiple source trials.

[17]  arXiv:2405.10773 [pdf, ps, other]
Title: Proximal indirect comparison
Subjects: Methodology (stat.ME)

We consider the problem of indirect comparison, where a treatment arm of interest is absent by design in the target randomized control trial (RCT) but available in a source RCT. The identifiability of the target population average treatment effect often relies on conditional transportability assumptions. However, it is a common concern whether all relevant effect modifiers are measured and controlled for. We highlight a new proximal identification result in the presence of shifted, unobserved effect modifiers based on proxies: an adjustment proxy in both RCTs and an additional reweighting proxy in the source RCT. We propose an estimator which is doubly-robust against misspecifications of the so-called bridge functions and asymptotically normal under mild consistency of the nuisance models. An alternative estimator is presented to accommodate missing outcomes in the source RCT, which we then apply to conduct a proximal indirect comparison analysis using two weight management trials.

[18]  arXiv:2405.10795 [pdf, other]
Title: Non trivial optimal sampling rate for estimating a Lipschitz-continuous function in presence of mean-reverting Ornstein-Uhlenbeck noise
Comments: 14 pages, 5 figures
Subjects: Statistics Theory (math.ST); Probability (math.PR); Methodology (stat.ME)

We examine a mean-reverting Ornstein-Uhlenbeck process that perturbs an unknown Lipschitz-continuous drift and aim to estimate the drift's value at a predetermined time horizon by sampling the path of the process. Due to the time varying nature of the drift we propose an estimation procedure that involves an online, time-varying optimization scheme implemented using a stochastic gradient ascent algorithm to maximize the log-likelihood of our observations. The objective of the paper is to investigate the optimal sample size/rate for achieving the minimum mean square distance between our estimator and the true value of the drift. In this setting we uncover a trade-off between the correlation of the observations, which increases with the sample size, and the dynamic nature of the unknown drift, which is weakened by increasing the frequency of observation. The mean square error is shown to be non monotonic in the sample size, attaining a global minimum whose precise description depends on the parameters that govern the model. In the static case, i.e. when the unknown drift is constant, our method outperforms the arithmetic mean of the observations in highly correlated regimes, despite the latter being a natural candidate estimator. We then compare our online estimator with the global maximum likelihood estimator.

[19]  arXiv:2405.10817 [pdf, ps, other]
Title: Restless Linear Bandits
Authors: Azadeh Khaleghi
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters $(\theta_t,~t \in \mathbb{N})$ which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi$-dependence between consecutive $\theta_t$. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $\theta_t$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)$ with respect to an oracle that always plays a multiple of $\mathbb{E}\theta_t$. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb{E}\theta_t$.

[20]  arXiv:2405.10925 [pdf, ps, other]
Title: High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.

[21]  arXiv:2405.10930 [pdf, other]
Title: Submodular Information Selection for Hypothesis Testing with Misclassification Penalties
Comments: 23 pages, 4 figures
Subjects: Machine Learning (stat.ML); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)

We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis remains bounded and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under mild assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.

Cross-lists for Mon, 20 May 24

[22]  arXiv:2405.09843 (cross-list from econ.TH) [pdf, other]
Title: Organizational Selection of Innovation
Comments: 40 pages, 13 figures, 2 tables
Subjects: Theoretical Economics (econ.TH); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph); Applications (stat.AP)

Budgetary constraints force organizations to pursue only a subset of possible innovation projects. Identifying which subset is most promising is an error-prone exercise, and involving multiple decision makers may be prudent. This raises the question of how to most effectively aggregate their collective nous. Our model of organizational portfolio selection provides some first answers. We show that portfolio performance can vary widely. Delegating evaluation makes sense when organizations employ the relevant experts and can assign projects to them. In most other settings, aggregating the impressions of multiple agents leads to better performance than delegation. In particular, letting agents rank projects often outperforms alternative aggregation rules -- including averaging agents' project scores as well as counting their approval votes -- especially when organizations have tight budgets and can select only a few project alternatives out of many.

[23]  arXiv:2405.10410 (cross-list from math.NA) [pdf, ps, other]
Title: The fast committor machine: Interpretable prediction with kernels
Comments: 10 pages, 7 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)

In the study of stochastic dynamics, the committor function describes the probability that a process starting from an initial configuration $x$ will reach set $A$ before set $B$. This paper introduces a fast and interpretable method for approximating the committor, called the "fast committor machine" (FCM). The FCM is based on simulated trajectory data, and it uses this data to train a kernel model. The FCM identifies low-dimensional subspaces that optimally describe the $A$ to $B$ transitions, and the subspaces are emphasized in the kernel model. The FCM uses randomized numerical linear algebra to train the model with runtime that scales linearly in the number of data points. This paper applies the FCM to example systems including the alanine dipeptide miniprotein: in these experiments, the FCM is generally more accurate and trains more quickly than a neural network with a similar number of parameters.

[24]  arXiv:2405.10462 (cross-list from astro-ph.IM) [pdf, other]
Title: Rotation of the Globular Cluster Population of the Dark Matter Deficient Galaxy NGC 1052-DF4: Implication for the Total Mass
Comments: 9 pages 6 figures. Accepted for publication in PASA
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Applications (stat.AP)

We explore the globular cluster population of NGC 1052-DF4, a dark matter deficient galaxy, using Bayesian inference to search for the presence of rotation. The existence of such a rotating component is relevant to the estimation of the mass of the galaxy, and therefore the question of whether NGC 1052-DF4 is truly deficient of dark matter, similar to NGC 1052-DF2 another galaxy in the same group. The rotational characteristics of seven globular clusters in NGC 1052-DF4 were investigated, finding that a non-rotating kinematic model has a higher Bayesian evidence than a rotating model, by a factor of approximately 2.5. In addition, we find that under the assumption of rotation, its amplitude must be small. This distinct lack of rotation strengthens the case that, based on its intrinsic velocity dispersion, NGC 1052-DF4 is a truly dark matter deficient galaxy.

[25]  arXiv:2405.10469 (cross-list from cs.AI) [pdf, other]
Title: Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shopping behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems.

[26]  arXiv:2405.10618 (cross-list from cs.LG) [pdf, other]
Title: Distributed Event-Based Learning via ADMM
Comments: 29 pages, 12 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We can therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm and derive accelerated convergence rates in a convex setting. We also characterize the effect of communication drops and demonstrate that our algorithm is robust to communication failures. The article concludes by presenting numerical results from a distributed LASSO problem, and distributed learning tasks on MNIST and CIFAR-10 datasets. The experiments underline communication savings of 50% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, and FedADMM.

[27]  arXiv:2405.10763 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs
Comments: 37 pages, 15 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Discrete Mathematics (cs.DM); Optimization and Control (math.OC); Computation (stat.CO)

Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for commuters in a city represented by a graph, aiming to minimize congestion while adhering to integer flow constraints on paths. This restriction makes ITAP an NP-hard problem. While conventional TAP prioritizes repulsive interactions to minimize congestion, this work also explores the case of attractive interactions, related to minimizing the number of occupied edges. We present and evaluate multiple algorithms to address ITAP, including a message passing algorithm, a greedy approach, simulated annealing, and relaxation of ITAP to TAP. Inspired by studies of random ensembles in the large-size limit in statistical physics, comparisons between these algorithms are conducted on large sparse random regular graphs with a random set of origin-destination pairs. Our results indicate that while the simplest greedy algorithm performs competitively in the repulsive scenario, in the attractive case the message-passing-based algorithm and simulated annealing demonstrate superiority. We then investigate the relationship between TAP and ITAP in the repulsive case. We find that, as the number of paths increases, the solution of TAP converges toward that of ITAP, and we investigate the speed of this convergence. Depending on the number of paths, our analysis leads us to identify two scaling regimes: in one the average flow per edge is of order one, and in another the number of paths scales quadratically with the size of the graph, in which case the continuous relaxation solves the integer problem closely.

[28]  arXiv:2405.10815 (cross-list from math.OC) [pdf, other]
Title: A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a {\L}ojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.

[29]  arXiv:2405.10839 (cross-list from nucl-th) [pdf, other]
Title: Model orthogonalization and Bayesian forecast mixing via Principal Component Analysis
Comments: 12 pages, 4 figures
Subjects: Nuclear Theory (nucl-th); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

One can improve predictability in the unknown domain by combining forecasts of imperfect complex computational models using a Bayesian statistical machine learning framework. In many cases, however, the models used in the mixing process are similar. In addition to contaminating the model space, the existence of such similar, or even redundant, models during the multimodeling process can result in misinterpretation of results and deterioration of predictive performance. In this work we describe a method based on the Principal Component Analysis that eliminates model redundancy. We show that by adding model orthogonalization to the proposed Bayesian Model Combination framework, one can arrive at better prediction accuracy and reach excellent uncertainty quantification performance.

[30]  arXiv:2405.10875 (cross-list from eess.SY) [pdf, other]
Title: Recursively Feasible Shrinking-Horizon MPC in Dynamic Environments with Conformal Prediction Guarantees
Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)

In this paper, we focus on the problem of shrinking-horizon Model Predictive Control (MPC) in uncertain dynamic environments. We consider controlling a deterministic autonomous system that interacts with uncontrollable stochastic agents during its mission. Employing tools from conformal prediction, existing works derive high-confidence prediction regions for the unknown agent trajectories, and integrate these regions in the design of suitable safety constraints for MPC. Despite guaranteeing probabilistic safety of the closed-loop trajectories, these constraints do not ensure feasibility of the respective MPC schemes for the entire duration of the mission. We propose a shrinking-horizon MPC that guarantees recursive feasibility via a gradual relaxation of the safety constraints as new prediction regions become available online. This relaxation enforces the safety constraints to hold over the least restrictive prediction region from the set of all available prediction regions. In a comparative case study with the state of the art, we empirically show that our approach results in tighter prediction regions and verify recursive feasibility of our MPC scheme.

[31]  arXiv:2405.10938 (cross-list from cs.LG) [pdf, other]
Title: Observational Scaling Laws and the Predictability of Language Model Performance
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.

Replacements for Mon, 20 May 24

[32]  arXiv:1810.02071 (replaced) [pdf, other]
Title: Leave-one-out least squares Monte Carlo algorithm for pricing Bermudan options
Journal-ref: Journal of Futures Markets (2024)
Subjects: Computational Finance (q-fin.CP); Mathematical Finance (q-fin.MF); Machine Learning (stat.ML)
[33]  arXiv:2003.05492 (replaced) [pdf, other]
Title: An asymptotic Peskun ordering and its application to lifted samplers
Journal-ref: Bernoulli 30(3), 2301-2325, (August 2024)
Subjects: Computation (stat.CO); Methodology (stat.ME)
[34]  arXiv:2110.07051 (replaced) [pdf, other]
Title: Fast and Scalable Inference for Spatial Extreme Value Models
Subjects: Methodology (stat.ME); Computation (stat.CO)
[35]  arXiv:2210.05983 (replaced) [pdf, other]
Title: Model-based clustering in simple hypergraphs through a stochastic blockmodel
Authors: Luca Brusa (UNIMIB), Catherine Matias (LPSM (UMR\_8001))
Subjects: Methodology (stat.ME)
[36]  arXiv:2210.09560 (replaced) [pdf, other]
Title: A Bayesian Convolutional Neural Network-based Generalized Linear Model
Comments: 25 pages, 7 figures
Subjects: Methodology (stat.ME)
[37]  arXiv:2303.07158 (replaced) [pdf, other]
Title: Uniform Pessimistic Risk and its Optimal Portfolio
Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[38]  arXiv:2305.12100 (replaced) [pdf, other]
Title: How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features
Comments: Revision after ICML2024 acceptance. Motivation of the paper changed from Privacy to Spurious Features. arXiv admin note: text overlap with arXiv:2302.01629
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[39]  arXiv:2305.15671 (replaced) [pdf, other]
Title: Matrix Autoregressive Model with Vector Time Series Covariates for Spatio-Temporal Data
Subjects: Methodology (stat.ME)
[40]  arXiv:2306.08485 (replaced) [pdf, other]
Title: Graph-Aligned Random Partition Model (GARP)
Comments: Journal of the American Statistical Association 2024
Subjects: Methodology (stat.ME)
[41]  arXiv:2306.09555 (replaced) [pdf, other]
Title: Geometric-Based Pruning Rules For Change Point Detection in Multiple Independent Time Series
Comments: 34 pages, 11 figures, 1 table
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
[42]  arXiv:2306.15075 (replaced) [pdf, other]
Title: Differences in academic preparedness do not fully explain Black-White enrollment disparities in advanced high school coursework
Subjects: Methodology (stat.ME); Applications (stat.AP)
[43]  arXiv:2307.09864 (replaced) [pdf, ps, other]
Title: Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models
Authors: Matteo Barigozzi
Comments: arXiv admin note: text overlap with arXiv:2211.01921 which is written by the same author. The two papers do not overlap as they contain different results although they have the same assumptions
Subjects: Econometrics (econ.EM); Methodology (stat.ME)
[44]  arXiv:2310.09319 (replaced) [pdf, other]
Title: Topological Data Analysis in smart manufacturing
Comments: Preprint still under review
Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Applications (stat.AP)
[45]  arXiv:2310.12788 (replaced) [pdf, other]
Title: Continuous Time Locally Stationary Wavelet Processes
Comments: 38 pages, 12 figures
Subjects: Statistics Theory (math.ST)
[46]  arXiv:2310.14691 (replaced) [pdf, other]
Title: Identifiability of total effects from abstractions of time series causal graphs
Comments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI) 2024, Barcelona, Spain
Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI)
[47]  arXiv:2310.17582 (replaced) [pdf, other]
Title: Convergence of flow-based generative models via proximal gradient descent in Wasserstein space
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
[48]  arXiv:2311.00905 (replaced) [pdf, other]
Title: Data-driven fixed-point tuning for truncated realized variations
Subjects: Statistics Theory (math.ST); Econometrics (econ.EM)
[49]  arXiv:2311.03644 (replaced) [pdf, other]
Title: BOB: Bayesian Optimized Bootstrap for Uncertainty Quantification in Gaussian Mixture Models
Comments: 35 pages, 8 figures
Subjects: Methodology (stat.ME); Computation (stat.CO)
[50]  arXiv:2311.05794 (replaced) [pdf, other]
Title: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[51]  arXiv:2311.17778 (replaced) [pdf, other]
Title: Unified Binary and Multiclass Margin-Based Classification
Comments: Accepted for publication in Journal of Machine Learning Research
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[52]  arXiv:2312.07792 (replaced) [pdf, other]
Title: Differentially private projection-depth-based medians
Comments: 44 pages, 1 figure
Subjects: Statistics Theory (math.ST); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Methodology (stat.ME)
[53]  arXiv:2312.10563 (replaced) [pdf, other]
Title: Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[54]  arXiv:2402.02969 (replaced) [pdf, other]
Title: Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Comments: Revision after ICML2024 reviews
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
[55]  arXiv:2402.14775 (replaced) [pdf, other]
Title: Localised Natural Causal Learning Algorithms for Weak Consistency Conditions
Comments: UAI2024
Subjects: Methodology (stat.ME)
[56]  arXiv:2403.02058 (replaced) [pdf, other]
Title: Utility-based optimization of Fujikawa's basket trial design -- Pre-specified protocol of a comparison study
Comments: 26 pages, 1 figure; updated content in reaction to anonymous review: new section "Methodology of utility functions in basket trial designs", discussion of literature and four new scenario sets in section "Outcome scenarios", two new algorithms and detailed explanation in section "Optimization algorithms", new section "Discussion", further minor changes
Subjects: Methodology (stat.ME); Applications (stat.AP)
[57]  arXiv:2404.05484 (replaced) [pdf, other]
Title: On Computational Modeling of Sleep-Wake Cycle
Authors: Xin Li
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:2405.07836 (replaced) [pdf, other]
Title: Forecasting with Hyper-Trees
Comments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoost
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[59]  arXiv:2405.08253 (replaced) [pdf, ps, other]
Title: Thompson Sampling for Infinite-Horizon Discounted Decision Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[60]  arXiv:2405.10067 (replaced) [pdf, other]
Title: Sparse and Orthogonal Low-rank Collective Matrix Factorization (solrCMF): Efficient data integration in flexible layouts
Subjects: Methodology (stat.ME)
[ total of 60 entries: 1-60 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2405, contact, help  (Access key information)