We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 67 entries: 1-67 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 25 Nov 21

[1]  arXiv:2111.12149 [pdf, other]
Title: Binned multinomial logistic regression for integrative cell type annotation
Subjects: Applications (stat.AP)

Categorizing individual cells into one of many known cell type categories, also known as cell type annotation, is a critical step in the analysis of single-cell genomics data. The current process of annotation is time-intensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learning approaches have provided automated solutions to annotation, there remains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article, we propose a new multinomial logistic regression estimator which can be used to model cell type probabilities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simulation studies that our approach estimates cell type probabilities more accurately than competitors in a wide variety of scenarios. We apply our method to ten single-cell RNA-seq datasets and demonstrate its utility in predicting fine resolution cell type labels on unlabeled data as well as refining cell type labels on data with existing coarse resolution annotations. An R package implementing the method is available at https://github.com/keshav-motwani/IBMR and the collection of datasets we analyze is available at https://github.com/keshav-motwani/AnnotatedPBMC.

[2]  arXiv:2111.12157 [pdf, other]
Title: Bayesian Sample Size Prediction for Online Activity
Comments: 10 pages, 7 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will subsequently participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.

[3]  arXiv:2111.12161 [pdf, other]
Title: Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach
Subjects: Methodology (stat.ME)

We propose a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $\Gamma$-value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations where the training data is confounded. Under the marginal sensitivity model of Tan (2006), we characterize the shift between the distribution of the observations and that of the counterfactuals. We first develop a general method for predictive inference of test samples from a shifted distribution; we then leverage this to construct covariate-dependent prediction sets for counterfactuals. No matter the value of the shift, these prediction sets (resp. approximately) achieve marginal coverage if the propensity score is known exactly (resp. estimated). We describe a distinct procedure also attaining coverage, however, conditional on the training data. In the latter case, we prove a sharpness result showing that for certain classes of prediction problems, the prediction intervals cannot possibly be tightened. We verify the validity and performance of the new methods via simulation studies and apply them to analyze real datasets.

[4]  arXiv:2111.12163 [pdf, other]
Title: spOccupancy: An R package for single species, multispecies, and integrated spatial occupancy models
Comments: 31 pages, 4 figures
Subjects: Applications (stat.AP)

Occupancy modeling is a common approach to assess spatial and temporal species distribution patterns, while explicitly accounting for measurement errors common in detection-nondetection data. Numerous extensions of the basic single species occupancy model exist to address dynamics, multiple species or states, interactions, false positive errors, autocorrelation, and to integrate multiple data sources. However, development of specialized and computationally efficient software to fit spatial models to large data sets is scarce or absent. We introduce the spOccupancy R package designed to fit single species, multispecies, and integrated spatially-explicit occupancy models. Using a Bayesian framework, we leverage P\'olya-Gamma data augmentation and Nearest Neighbor Gaussian Processes to ensure models are computationally efficient for potentially massive data sets. spOccupancy provides user-friendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and k-fold cross-validation), and out-of-sample prediction. We illustrate the package's functionality via a vignette, simulated data analysis, and two bird case studies, in which we estimate occurrence of the Black-throated Green Warbler (Setophaga virens) across the eastern USA and species richness of a foliage-gleaning bird community in the Hubbard Brook Experimental Forest in New Hampshire, USA. The spOccupancy package provides a user-friendly approach to fit a variety of single and multispecies occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large data sets.

[5]  arXiv:2111.12201 [pdf, other]
Title: Parameter estimation and uncertainty quantification using information geometry
Comments: 50 pages (exc. references), 12 figures. Review
Subjects: Methodology (stat.ME); Applications (stat.AP)

In this work we (1) review likelihood-based inference for parameter estimation and the construction of confidence regions, and (2) explore the use of techniques from information geometry, including geodesic curves and Riemann scalar curvature, to supplement typical techniques for uncertainty quantification such as Bayesian methods, profile likelihood, asymptotic analysis and bootstrapping. These techniques from information geometry provide data-independent insights into uncertainty and identifiability, and can be used to inform data collection decisions. All code used in this work to implement the inference and information geometry techniques is available on GitHub.

[6]  arXiv:2111.12224 [pdf, other]
Title: Asymptotics for Markov chain mixture detection
Comments: To be published in Econometrics and Statistics
Subjects: Statistics Theory (math.ST)

Sufficient conditions are provided under which the log-likelihood ratio test statistic fails to have a limiting chi-squared distribution under the null hypothesis when testing between one and two components under a general two-component mixture model, but rather tends to infinity in probability. These conditions are verified when the component densities describe continuous-time, discrete-statespace Markov chains and the results are illustrated via a parametric bootstrap simulation on an analysis of the migrations over time of a set of corporate bonds ratings. The precise limiting distribution is derived in a simple case with two states, one of which is absorbing which leads to a right-censored exponential scale mixture model. In that case, when centred by a function growing logarithmically in the sample size, the statistic has a limiting distribution of Gumbel extreme-value type rather than chi-squared.

[7]  arXiv:2111.12244 [pdf, other]
Title: A Unified Decision Framework for Phase I Dose-Finding Designs
Subjects: Methodology (stat.ME)

The purpose of a phase I dose-finding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerated dose. Over the past three decades, various dose-finding designs have been proposed and discussed, including conventional model-based designs, new model-based designs using toxicity probability intervals, and rule-based designs. We present a simple decision framework that can generate several popular designs as special cases. We show that these designs share common elements under the framework, such as the same likelihood function, the use of loss functions, and the nature of the optimal decisions as Bayes rules. They differ mostly in the choice of the prior distributions. We present theoretical results on the decision framework and its link to specific and popular designs like mTPI, BOIN, and CRM. These results provide useful insights into the designs and their underlying assumptions, and convey information to help practitioners select an appropriate design.

[8]  arXiv:2111.12267 [pdf, other]
Title: The Practical Scope of the Central Limit Theorem
Comments: 47 pages, 17 figures
Subjects: Other Statistics (stat.OT); Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)

The \textit{Central Limit Theorem (CLT)} is at the heart of a great deal of applied problem-solving in statistics and data science, but the theorem is silent on an important implementation issue: \textit{how much data do you need for the CLT to give accurate answers to practical questions?} Here we examine several approaches to addressing this issue -- along the way reviewing the history of this problem over the last 290 years -- and we illustrate the calculations with case-studies from finite-population sampling and gambling. A variety of surprises emerge.

[9]  arXiv:2111.12272 [pdf, other]
Title: Causal Analysis and Prediction of Human Mobility in the U.S. during the COVID-19 Pandemic
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Since the increasing outspread of COVID-19 in the U.S., with the highest number of confirmed cases and deaths in the world as of September 2020, most states in the country have enforced travel restrictions resulting in sharp reductions in mobility. However, the overall impact and long-term implications of this crisis to travel and mobility remain uncertain. To this end, this study develops an analytical framework that determines and analyzes the most dominant factors impacting human mobility and travel in the U.S. during this pandemic. In particular, the study uses Granger causality to determine the important predictors influencing daily vehicle miles traveled and utilize linear regularization algorithms, including Ridge and LASSO techniques, to model and predict mobility. State-level time-series data were obtained from various open-access sources for the period starting from March 1, 2020 through June 13, 2020 and the entire data set was divided into two parts for training and testing purposes. The variables selected by Granger causality were used to train the three different reduced order models by ordinary least square regression, Ridge regression, and LASSO regression algorithms. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that the factors including the number of new COVID cases, social distancing index, population staying at home, percent of out of county trips, trips to different destinations, socioeconomic status, percent of people working from home, and statewide closure, among others, were the most important factors influencing daily VMT. Also, among all the modeling techniques, Ridge regression provides the most superior performance with the least error, while LASSO regression also performed better than the ordinary least square model.

[10]  arXiv:2111.12283 [pdf, other]
Title: Coexchangeable process modelling for uncertainty quantification in joint climate reconstruction
Comments: Submitted to the Journal of the American Statistical Association
Subjects: Applications (stat.AP)

Any experiment with climate models relies on a potentially large set of spatio-temporal boundary conditions. These can represent both the initial state of the system and/or forcings driving the model output throughout the experiment. Whilst these boundary conditions are typically fixed using available reconstructions in climate modelling studies, they are highly uncertain, that uncertainty is unquantified, and the effect on the output of the experiment can be considerable. We develop efficient quantification of these uncertainties that combines relevant data from multiple models and observations. Starting from the coexchangeability model, we develop a coexchangable process model to capture multiple correlated spatio-temporal fields of variables. We demonstrate that further exchangeability judgements over the parameters within this representation lead to a Bayes linear analogy of a hierarchical model. We use the framework to provide a joint reconstruction of sea-surface temperature and sea-ice concentration boundary conditions at the last glacial maximum (19-23 ka) and use it to force an ensemble of ice-sheet simulations using the FAMOUS-Ice coupled atmosphere and ice-sheet model. We demonstrate that existing boundary conditions typically used in these experiments are implausible given our uncertainties and demonstrate the impact of using more plausible boundary conditions on ice-sheet simulation.

[11]  arXiv:2111.12348 [pdf]
Title: Comparative Evaluation of Statistical Orbit Determination Algorithms for Short-Term Prediction of Geostationary and Geosynchronous Satellite Orbits in NavIC Constellation
Subjects: Applications (stat.AP)

NavIC is a newly established Indian regional Navigation Constellation with 3 satellites in geostationary Earth orbit (GEO) and 4 satellites in geosynchronous orbit (GSO). Satellite positions are essential in navigation for various positioning applications. In this paper, we propose a Bootstrap Particle Filter (BPF) approach to determine the satellite positions in NavIC constellation for short duration of 1 hr. The Bootstrap Particle filter-based approach was found to be efficient with meter level prediction accuracy as compared to other methods such as Least Squares (LS), Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF) and Ensemble Kalman Filter (EnKF). The residual analysis revealed that the BPF approach addressed the problem of non-linearity in the dynamics model as well as non-Gaussian nature of the state of the NavIC satellites.

[12]  arXiv:2111.12482 [pdf, other]
Title: One More Step Towards Reality: Cooperative Bandits with Imperfect Communication
Journal-ref: Conference on Neural Information Processing Systems, 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.

[13]  arXiv:2111.12526 [pdf]
Title: Mining Meta-indicators of University Ranking: A Machine Learning Approach Based on SHAP
Authors: Shudong Yang (1), Miaomiao Liu (1) ((1) Dalian University of Technology)
Comments: 4 pages, 1 figure
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Machine Learning (stat.ML)

University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the meta-indicators of the index system by simplifying the complexity? This research discovered three meta-indicators based on interpretable machine learning. The first one is time, to be friends with time, and believe in the power of time, and accumulate historical deposits; the second one is space, to be friends with city, and grow together by co-develop; the third one is relationships, to be friends with alumni, and strive for more alumni donations without ceiling.

[14]  arXiv:2111.12603 [pdf, ps, other]
Title: Strong Invariance Principles for Ergodic Markov Processes
Subjects: Statistics Theory (math.ST); Probability (math.PR); Computation (stat.CO)

Strong invariance principles describe the error term of a Brownian approximation of the partial sums of a stochastic process. While these strong approximation results have many applications, the results for continuous-time settings have been limited. In this paper, we obtain strong invariance principles for a broad class of ergodic Markov processes. The main results rely on ergodicity requirements and an application of Nummelin splitting for continuous-time processes. Strong invariance principles provide a unified framework for analysing commonly used estimators of the asymptotic variance in settings with a dependence structure. We demonstrate how this can be used to analyse the batch means method for simulation output of Piecewise Deterministic Monte Carlo samplers. We also derive a fluctuation result for additive functionals of ergodic diffusions using our strong approximation results.

[15]  arXiv:2111.12604 [pdf, other]
Title: State-space deep Gaussian processes with applications
Authors: Zheng Zhao
Comments: See reproducible codes in this https URL Permanent link this http URL
Journal-ref: Doctoral dissertation, Aalto University, 2021
Subjects: Methodology (stat.ME); Signal Processing (eess.SP); Machine Learning (stat.ML)

This thesis is mainly concerned with state-space approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using state-space filtering and smoothing methods. The resulting state-space DGP (SS-DGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SS-DGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuous-discrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SS-DGPs and solving their regression problems. Lastly, this thesis features a number of applications of state-space (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectro-temporal features of signals.

[16]  arXiv:2111.12612 [pdf, other]
Title: Multiplier bootstrap for Bures-Wasserstein barycenters
Comments: 36 pages, 2 figures
Subjects: Statistics Theory (math.ST); Applications (stat.AP)

Bures-Wasserstein barycenter is a popular and promising tool in analysis of complex data like graphs, images etc. In many applications the input data are random with an unknown distribution, and uncertainty quantification becomes a crucial issue. This paper offers an approach based on multiplier bootstrap to quantify the error of approximating the true Bures--Wasserstein barycenter $Q_*$ by its empirical counterpart $Q_n$. The main results state the bootstrap validity under general assumptions on the data generating distribution $P$ and specifies the approximation rates for the case of sub-exponential $P$. The performance of the method is illustrated on synthetic data generated from the weighted stochastic block model.

[17]  arXiv:2111.12676 [pdf, other]
Title: Super-polynomial accuracy of one dimensional randomized nets using the median-of-means
Subjects: Computation (stat.CO); Numerical Analysis (math.NA); Statistics Theory (math.ST)

Let $f$ be analytic on $[0,1]$ with $|f^{(k)}(1/2)|\leq A\alpha^kk!$ for some constant $A$ and $\alpha<2$. We show that the median estimate of $\mu=\int_0^1f(x)\,\mathrm{d}x$ under random linear scrambling with $n=2^m$ points converges at the rate $O(n^{-c\log(n)})$ for any $c< 3\log(2)/\pi^2\approx 0.21$. We also get a super-polynomial convergence rate for the sample median of $2k-1$ random linearly scrambled estimates, when $k=\Omega(m)$. When $f$ has a $p$'th derivative that satisfies a $\lambda$-H\"older condition then the median-of-means has error $O( n^{-(p+\lambda)+\epsilon})$ for any $\epsilon>0$, if $k\to\infty$ as $m\to\infty$.

Cross-lists for Thu, 25 Nov 21

[18]  arXiv:2111.12139 (cross-list from cs.LG) [pdf, other]
Title: ChebLieNet: Invariant Spectral Graph NNs Turned Equivariant by Riemannian Geometry on Lie Groups
Comments: submitted to NeurIPS'21, this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce ChebLieNet, a group-equivariant method on (anisotropic) manifolds. Surfing on the success of graph- and group-based neural networks, we take advantage of the recent developments in the geometric deep learning field to derive a new approach to exploit any anisotropies in data. Via discrete approximations of Lie groups, we develop a graph neural network made of anisotropic convolutional layers (Chebyshev convolutions), spatial pooling and unpooling layers, and global pooling layers. Group equivariance is achieved via equivariant and invariant operators on graphs with anisotropic left-invariant Riemannian distance-based affinities encoded on the edges. Thanks to its simple form, the Riemannian metric can model any anisotropies, both in the spatial and orientation domains. This control on anisotropies of the Riemannian metrics allows to balance equivariance (anisotropic metric) against invariance (isotropic metric) of the graph convolution layers. Hence we open the doors to a better understanding of anisotropic properties. Furthermore, we empirically prove the existence of (data-dependent) sweet spots for anisotropic parameters on CIFAR10. This crucial result is evidence of the benefice we could get by exploiting anisotropic properties in data. We also evaluate the scalability of this approach on STL10 (image data) and ClimateNet (spherical data), showing its remarkable adaptability to diverse tasks.

[19]  arXiv:2111.12140 (cross-list from cs.LG) [pdf, ps, other]
Title: Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark
Comments: Source code of the analysis is available on request
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

The amount of data for machine learning (ML) applications is constantly growing. Not only the number of observations, especially the number of measured variables (features) increases with ongoing digitization. Selecting the most appropriate features for predictive modeling is an important lever for the success of ML applications in business and research. Feature selection methods (FSM) that are independent of a certain ML algorithm - so-called filter methods - have been numerously suggested, but little guidance for researchers and quantitative modelers exists to choose appropriate approaches for typical ML problems. This review synthesizes the substantial literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment. For concrete guidance, we consider four typical dataset scenarios that are challenging for ML models (noisy, redundant, imbalanced data and cases with more features than observations). Drawing on the experience of earlier benchmarks, which have considered much fewer FSMs, we compare the performance of the methods according to four criteria (predictive performance, number of relevant features selected, stability of the feature sets and runtime). We found methods relying on the random forest approach, the double input symmetrical relevance filter (DISR) and the joint impurity filter (JIM) were well-performing candidate methods for the given dataset scenarios.

[20]  arXiv:2111.12143 (cross-list from cs.LG) [pdf, other]
Title: Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Comments: 28 pages, 8 figures
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); High Energy Physics - Theory (hep-th); Machine Learning (stat.ML)

Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0<l$. These quantities are particularly useful when the network architecture involves many different layers. We discuss various properties of the partial Jacobians such as their scaling with depth and relation to the neural tangent kernel (NTK). We derive the recurrence relations for the partial Jacobians and utilize them to analyze criticality of deep MLP networks with (and without) LayerNorm. We find that the normalization layer changes the optimal values of hyperparameters and critical exponents. We argue that LayerNorm is more stable when applied to preactivations, rather than activations due to larger correlation depth.

[21]  arXiv:2111.12148 (cross-list from eess.SP) [pdf, other]
Title: Machine Learning Based Forward Solver: An Automatic Framework in gprMax
Comments: 6 pages, 6 figures
Subjects: Signal Processing (eess.SP); Geophysics (physics.geo-ph); Machine Learning (stat.ML)

General full-wave electromagnetic solvers, such as those utilizing the finite-difference time-domain (FDTD) method, are computationally demanding for simulating practical GPR problems. We explore the performance of a near-real-time, forward modeling approach for GPR that is based on a machine learning (ML) architecture. To ease the process, we have developed a framework that is capable of generating these ML-based forward solvers automatically. The framework uses an innovative training method that combines a predictive dimensionality reduction technique and a large data set of modeled GPR responses from our FDTD simulation software, gprMax. The forward solver is parameterized for a specific GPR application, but the framework can be extended in a straightforward manner to different electromagnetic problems.

[22]  arXiv:2111.12151 (cross-list from cs.LG) [pdf, other]
Title: Best Arm Identification with Safety Constraints
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.

[23]  arXiv:2111.12166 (cross-list from cs.IT) [pdf, other]
Title: Towards Empirical Sandwich Bounds on the Rate-Distortion Function
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Rate-distortion (R-D) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed subject to a fidelity criterion, by any compression algorithm. As researchers push for ever-improving compression performance, establishing the R-D function of a given data source is not only of scientific interest, but also sheds light on the possible room for improving compression algorithms. Previous work on this problem relied on distributional assumptions on the data source (Gibson, 2017) or only applied to discrete data. By contrast, this paper makes the first attempt at an algorithm for sandwiching the R-D function of a general (not necessarily discrete) source requiring only i.i.d. data samples. We estimate R-D sandwich bounds on Gaussian and high-dimension banana-shaped sources, as well as GAN-generated images. Our R-D upper bound on natural images indicates room for improving the performance of state-of-the-art image compression methods by 1 dB in PSNR at various bitrates.

[24]  arXiv:2111.12187 (cross-list from cs.LG) [pdf, other]
Title: Input Convex Gradient Networks
Comments: Accepted to NeurIPS 2021 Optimal Transport and Machine Learning Workshop this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The gradients of convex functions are expressive models of non-trivial vector fields. For example, Brenier's theorem yields that the optimal transport map between any two measures on Euclidean space under the squared distance is realized as a convex gradient, which is a key insight used in recent generative flow models. In this paper, we study how to model convex gradients by integrating a Jacobian-vector product parameterized by a neural network, which we call the Input Convex Gradient Network (ICGN). We theoretically study ICGNs and compare them to taking the gradient of an Input-Convex Neural Network (ICNN), empirically demonstrating that a single layer ICGN can fit a toy example better than a single layer ICNN. Lastly, we explore extensions to deeper networks and connections to constructions from Riemannian geometry.

[25]  arXiv:2111.12193 (cross-list from cs.LG) [pdf, other]
Title: Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equivariance and improve it with approximate implicit differentiation, allowing for better optimization while being faster and saving memory. In a range of toy experiments, we show that the perspective of multiset-equivariance is beneficial and that our changes to DSPN achieve better results in most cases. On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation.

[26]  arXiv:2111.12258 (cross-list from econ.EM) [pdf, other]
Title: On Recoding Ordered Treatments as Binary Indicators
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Researchers using instrumental variables to investigate the effects of ordered treatments (e.g., years of education, months of healthcare coverage) often recode treatment into a binary indicator for any exposure (e.g., any college, any healthcare coverage). The resulting estimand is difficult to interpret unless the instruments only shift compliers from no treatment to some positive quantity and not from some treatment to more -- i.e., there are extensive margin compliers only (EMCO). When EMCO holds, recoded endogenous variables capture a weighted average of treatment effects across complier groups that can be partially unbundled into each group's treated and untreated means. Invoking EMCO along with the standard Local Average Treatment Effect assumptions is equivalent to assuming choices are determined by a simple two-factor selection model in which agents first decide whether to participate in treatment at all and then decide how much. The instruments must only impact relative utility in the first step. Although EMCO constrains unobserved counterfactual choices, it places testable restrictions on the joint distribution of outcomes, treatments, and instruments.

[27]  arXiv:2111.12292 (cross-list from cs.CV) [pdf, other]
Title: Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training iterations is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis which is popular in learning theory. Our result reveals that the final prediction precision may have a weak dependency on the pre-trained model especially in the case of large training iterations. The observation inspires us to leverage pre-training data for fine-tuning, since this data is also available for fine-tuning. The generalization result of using pre-training data shows that the final performance on a target task can be improved when the appropriate pre-training data is included in fine-tuning. With the insight of the theoretical finding, we propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline.

[28]  arXiv:2111.12295 (cross-list from cs.LG) [pdf, other]
Title: Animal Behavior Classification via Deep Learning on Embedded Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Machine Learning (stat.ML)

We develop an end-to-end deep-neural-network-based algorithm for classifying animal behavior using accelerometry data on the embedded system of an artificial intelligence of things (AIoT) device installed in a wearable collar tag. The proposed algorithm jointly performs feature extraction and classification utilizing a set of infinite-impulse-response (IIR) and finite-impulse-response (FIR) filters together with a multilayer perceptron. The utilized IIR and FIR filters can be viewed as specific types of recurrent and convolutional neural network layers, respectively. We evaluate the performance of the proposed algorithm via two real-world datasets collected from grazing cattle. The results show that the proposed algorithm offers good intra- and inter-dataset classification accuracy and outperforms its closest contenders including two state-of-the-art convolutional-neural-network-based time-series classification algorithms, which are significantly more complex. We implement the proposed algorithm on the embedded system of the collar tag's AIoT device to perform in-situ classification of animal behavior. We achieve real-time in-situ behavior inference from accelerometry data without imposing any strain on the available computational, memory, or energy resources of the embedded system.

[29]  arXiv:2111.12399 (cross-list from cs.LG) [pdf, other]
Title: Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem
Authors: Jeremy E. Cohen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained low-rank approximations are nowadays important research topics. This work deals with columns of factor matrices of a low-rank approximation being sparse in a known and possibly overcomplete basis, a model coined as Dictionary-based Low-Rank Approximation (DLRA). While earlier contributions focused on finding factor columns inside a dictionary of candidate columns, i.e. one-sparse approximations, this work is the first to tackle DLRA with sparsity larger than one. I propose to focus on the sparse-coding subproblem coined Mixed Sparse-Coding (MSC) that emerges when solving DLRA with an alternating optimization strategy. Several algorithms based on sparse-coding heuristics (greedy methods, convex relaxations) are provided to solve MSC. The performance of these heuristics is evaluated on simulated data. Then, I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics. These experiments suggest that DLRA extends the modeling capabilities of low-rank approximations, helps reducing estimation variance and enhances the identifiability and interpretability of estimated factors.

[30]  arXiv:2111.12429 (cross-list from cs.LG) [pdf, other]
Title: tsflex: flexible time series processing & feature extraction
Comments: The first two authors contributed equally. Submitted to SoftwareX
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their real-world applicability, as they cannot cope with irregularly-sampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domain-independent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularly-sampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for strided-window feature extraction, and (2) the sequence-index is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, in-depth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memory-efficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.

[31]  arXiv:2111.12460 (cross-list from cs.CV) [pdf, other]
Title: ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

This work presents a self-supervised method to learn dense semantically rich visual concept embeddings for images inspired by methods for learning word embeddings in NLP. Our method improves on prior work by generating more expressive embeddings and by being applicable for high-resolution images. Viewing the generation of natural images as a stochastic process where a set of latent visual concepts give rise to observable pixel appearances, our method is formulated to learn the inverse mapping from pixels to concepts. Our method greatly improves the effectiveness of self-supervised learning for dense embedding maps by introducing superpixelization as a natural hierarchical step up from pixels to a small set of visually coherent regions. Additional contributions are regional contextual masking with nonuniform shapes matching visually coherent patches and complexity-based view sampling inspired by masked language models. The enhanced expressiveness of our dense embeddings is demonstrated by significantly improving the state-of-the-art representation quality benchmarks on COCO (+12.94 mIoU, +87.6\%) and Cityscapes (+16.52 mIoU, +134.2\%). Results show favorable scaling and domain generalization properties not demonstrated by prior work.

[32]  arXiv:2111.12486 (cross-list from physics.ao-ph) [pdf, other]
Title: Enhanced monitoring of atmospheric methane from space with hierarchical Bayesian inference
Comments: 20 pages, 6 figures. Under consideration at Nature Communications
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Geophysics (physics.geo-ph); Applications (stat.AP)

Methane is a strong greenhouse gas, with a higher radiative forcing per unit mass and shorter atmospheric lifetime than carbon dioxide. The remote sensing of methane in regions of industrial activity is a key step toward the accurate monitoring of emissions that drive climate change. Whilst the TROPOspheric Monitoring Instrument (TROPOMI) on board the Sentinal-5P satellite is capable of providing daily global measurement of methane columns, data are often compromised by cloud cover. Here, we develop a statistical model which uses nitrogen dioxide concentration data from TROPOMI to accurately predict values of methane columns, expanding the average daily spatial coverage of observations of the Permian Basin from 16% to 88% in the year 2019. The addition of predicted methane abundances at locations where direct observations are not available will support inversion methods for estimating methane emission rates at shorter timescales than is currently possible.

[33]  arXiv:2111.12545 (cross-list from cs.LG) [pdf, other]
Title: Learning to Refit for Convex Learning Problems
Subjects: Machine Learning (cs.LG); Computation (stat.CO)

Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the state-of-the-art.

[34]  arXiv:2111.12550 (cross-list from cs.HC) [pdf, other]
Title: A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits
Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Crowdsourcing system has emerged as an effective platform to label data with relatively low cost by using non-expert workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$-type worker-task specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given recovery accuracy, and propose an inference algorithm achieving the order-wise optimal bound. We conduct experiments both on synthetic and real-world datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions.

[35]  arXiv:2111.12577 (cross-list from cs.CV) [pdf, other]
Title: A Method for Evaluating the Capacity of Generative Adversarial Networks to Reproduce High-order Spatial Context
Comments: Submitted to IEEE-TPAMI. Early version with partial results has been accepted for poster presentation at SPIE-MI 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Generative adversarial networks are a kind of deep generative model with the potential to revolutionize biomedical imaging. This is because GANs have a learned capacity to draw whole-image variates from a lower-dimensional representation of an unknown, high-dimensional distribution that fully describes the input training images. The overarching problem with GANs in clinical applications is that there is not adequate or automatic means of assessing the diagnostic quality of images generated by GANs. In this work, we demonstrate several tests of the statistical accuracy of images output by two popular GAN architectures. We designed several stochastic object models (SOMs) of distinct features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect the known arrangement rules. We then tested the rates at which the different GANs correctly reproduced the rules under a variety of training scenarios and degrees of feature-class similarity. We found that ensembles of generated images can appear accurate visually, and correspond to low Frechet Inception Distance scores (FID), while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that while low-order ensemble statistics are largely correct, there are numerous quantifiable errors per image that plausibly can affect subsequent use of the GAN-generated images.

[36]  arXiv:2111.12594 (cross-list from cs.CV) [pdf, other]
Title: Conditional Object-Centric Learning from Video
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for any supervision. However, such fully-unsupervised methods still fail to scale to diverse realistic data, despite the use of increasingly complex inductive biases such as priors for the size of objects or the 3D geometry of the scene. In this paper, we instead take a weakly-supervised approach and focus on how 1) using the temporal dynamics of video data in the form of optical flow and 2) conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data. We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation. These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences. We also find that such initial-state-conditioning can be used during inference as a flexible interface to query the model for specific objects or parts of objects, which could pave the way for a range of weakly-supervised approaches and allow more effective interaction with trained models.

[37]  arXiv:2111.12664 (cross-list from cs.CV) [pdf, other]
Title: MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Self-supervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the state-of-the-art self-supervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This formulation not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the displacement of the feature vectors in the feature space. This helps us to get a mathematical insight into the working principle of contrastive learning. An additive $L_2$ regularizer is also used to prevent diverging of the feature vectors and to improve performance. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100. After only 250 epochs of pre-training, the proposed model achieves the best accuracy of 85.44\%, 60.75\%, 56.81\% on CIFAR-10, STL-10, CIFAR-100 datasets, respectively.

[38]  arXiv:2111.12683 (cross-list from physics.ao-ph) [pdf, other]
Title: Data-Based Models for Hurricane Evolution Prediction: A Deep Learning Approach
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

Fast and accurate prediction of hurricane evolution from genesis onwards is needed to reduce loss of life and enhance community resilience. In this work, a novel model development methodology for predicting storm trajectory is proposed based on two classes of Recurrent Neural Networks (RNNs). The RNN models are trained on input features available in or derived from the HURDAT2 North Atlantic hurricane database maintained by the National Hurricane Center (NHC). The models use probabilities of storms passing through any location, computed from historical data. A detailed analysis of model forecasting error shows that Many-To-One prediction models are less accurate than Many-To-Many models owing to compounded error accumulation, with the exception of $6-hr$ predictions, for which the two types of model perform comparably. Application to 75 or more test storms in the North Atlantic basin showed that, for short-term forecasting up to 12 hours, the Many-to-Many RNN storm trajectory prediction models presented herein are significantly faster than ensemble models used by the NHC, while leading to errors of comparable magnitude.

Replacements for Thu, 25 Nov 21

[39]  arXiv:1904.12218 (replaced) [pdf, other]
Title: Graph Kernels: A Survey
Journal-ref: Journal of Artificial Intelligence Research (2021), Volume 72, Pages 943-1027
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[40]  arXiv:2006.10679 (replaced) [pdf, other]
Title: REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust Predictions
Comments: WACV,2022. Project Page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[41]  arXiv:2007.06226 (replaced) [pdf, other]
Title: AMITE: A Novel Polynomial Expansion for Analyzing Neural Network Nonlinearities
Comments: 13 pages, 2 tables, 9 figures, LaTeX; minor grammar updates, equation numbering, and exposition clarification updates
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[42]  arXiv:2007.09738 (replaced) [pdf, other]
Title: Hypothesis tests for structured rank correlation matrices
Subjects: Methodology (stat.ME)
[43]  arXiv:2009.09525 (replaced) [pdf, other]
Title: Deep Autoencoders: From Understanding to Generalization Guarantees
Journal-ref: R. Cosentino, R. Balestriero, R. Baraniuk, B. Aazhang, 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021)
Subjects: Machine Learning (cs.LG); Group Theory (math.GR); Machine Learning (stat.ML)
[44]  arXiv:2010.01184 (replaced) [pdf, other]
Title: Effective Sample Size, Dimensionality, and Generalization in Covariate Shift Adaptation
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
[45]  arXiv:2010.15764 (replaced) [pdf, other]
Title: Domain adaptation under structural causal models
Comments: 80 pages, 22 figures, accepted in JMLR
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[46]  arXiv:2011.09468 (replaced) [pdf, other]
Title: Gradient Starvation: A Learning Proclivity in Neural Networks
Comments: Proceeding of NeurIPS 2021
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[47]  arXiv:2011.12873 (replaced) [pdf, other]
Title: Hybrid Confidence Intervals for Informative Uniform Asymptotic Inference After Model Selection
Authors: Adam McCloskey
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[48]  arXiv:2101.01299 (replaced) [pdf, other]
Title: Bayesian Uncertainty Quantification for Low-rank Matrix Completion
Subjects: Methodology (stat.ME)
[49]  arXiv:2102.03906 (replaced) [pdf, ps, other]
Title: Causal versions of Maximum Entropy and Principle of Insufficient Reason
Authors: Dominik Janzing
Comments: 16 pages
Journal-ref: Journal of Causal Inference (2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[50]  arXiv:2102.09159 (replaced) [pdf, other]
Title: Robust and Differentially Private Mean Estimation
Comments: 58 pages, 2 figures, both exponential time and efficient algorithms no longer require a known bound on the true mean
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (stat.ML)
[51]  arXiv:2103.02457 (replaced) [pdf, other]
Title: Continuous scaled phase-type distributions
Subjects: Probability (math.PR); Statistics Theory (math.ST)
[52]  arXiv:2103.07088 (replaced) [pdf, other]
Title: Orthogonalized Kernel Debiased Machine Learning for Multimodal Data Analysis
Authors: Xiaowu Dai, Lexin Li
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[53]  arXiv:2104.09401 (replaced) [pdf, ps, other]
Title: Efficient multivariate inference in general factorial diagnostic studies
Subjects: Statistics Theory (math.ST)
[54]  arXiv:2105.00416 (replaced) [pdf, other]
Title: Selective Inference in Propensity Score Analysis
Comments: 32 pages, 2 figures, 5 tables
Subjects: Methodology (stat.ME)
[55]  arXiv:2105.09429 (replaced) [pdf, other]
Title: Point process simulation of generalised inverse Gaussian processes and estimation of the Jaeger integral
Subjects: Methodology (stat.ME); Signal Processing (eess.SP); Probability (math.PR)
[56]  arXiv:2106.00058 (replaced) [pdf, other]
Title: PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[57]  arXiv:2106.03969 (replaced) [pdf, other]
Title: Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models
Comments: 49 pages, 3 figures, to appear in FOCS'21
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
[58]  arXiv:2108.10573 (replaced) [pdf, other]
Title: The staircase property: How hierarchical structure can guide deep learning
Comments: 60 pages, accepted to NeurIPS '21
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[59]  arXiv:2109.02624 (replaced) [pdf, other]
Title: Functional additive models on manifolds of planar shapes and forms
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
[60]  arXiv:2109.08229 (replaced) [pdf, ps, other]
Title: Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling
Comments: Submitted to Econometrica
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)
[61]  arXiv:2109.11939 (replaced) [pdf, other]
Title: Discovering PDEs from Multiple Experiments
Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[62]  arXiv:2109.14501 (replaced) [pdf, other]
Title: Towards a theory of out-of-distribution learning
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[63]  arXiv:2110.05428 (replaced) [pdf, other]
Title: Learning Temporally Causal Latent Processes from General Temporal Data
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[64]  arXiv:2110.13081 (replaced) [pdf, ps, other]
Title: A Note on Consistency of the Bayes Estimator of the Density
Authors: A.G. Nogales
Comments: arXiv admin note: text overlap with arXiv:2008.00683
Subjects: Statistics Theory (math.ST)
[65]  arXiv:2111.04805 (replaced) [pdf, other]
Title: Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression
Comments: 8 pages, 14 figures, IEEE conference format
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[66]  arXiv:2111.05070 (replaced) [pdf, other]
Title: Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions
Comments: Added a new upper bound
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Methodology (stat.ME); Machine Learning (stat.ML)
[67]  arXiv:2111.11655 (replaced) [pdf, other]
Title: Multi-task manifold learning for small sample size datasets
Comments: 22 pages, 15 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ total of 67 entries: 1-67 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2111, contact, help  (Access key information)