Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 18 May 22
 [1] arXiv:2205.07880 [pdf, ps, other]

Title: A Note on the Chernoff Bound for Random Variables in the Unit IntervalSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The Chernoff bound is a wellknown tool for obtaining a high probability bound on the expectation of a Bernoulli random variable in terms of its sample average. This bound is commonly used in statistical learning theory to upper bound the generalisation risk of a hypothesis in terms of its empirical risk on heldout data, for the case of a binaryvalued loss function. However, the extension of this bound to the case of random variables taking values in the unit interval is less well known in the community. In this note we provide a proof of this extension for convenience and future reference.
 [2] arXiv:2205.07918 [pdf, other]

Title: FatTailed Variational Inference with Anisotropic Tail Adaptive FlowsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
While fattailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussianbased variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to nonLipschitz polynomial flows. Then, we develop an alternative theory for multivariate tail parameters which is sensitive to tailanisotropy. In doing so, we unveil a fundamental problem which plagues many existing flowbased methods: they can only model tailisotropic distributions (i.e., distributions having the same tail parameter in every direction). To mitigate this and enable modeling of tailanisotropic targets, we propose anisotropic tailadaptive flows (ATAF). Experimental results on both synthetic and realworld targets confirm that ATAF is competitive with prior work while also exhibiting appropriate tailanisotropy.
 [3] arXiv:2205.07937 [pdf, ps, other]

Title: MeanField Nonparametric Estimation of Interacting Particle SystemsSubjects: Statistics Theory (math.ST)
This paper concerns the nonparametric estimation problem of the distributionstate dependent drift vector field in an interacting $N$particle system. Observing singletrajectory data for each particle, we derive the meanfield rate of convergence for the maximum likelihood estimator (MLE), which depends on both Gaussian complexity and Rademacher complexity of the function class. In particular, when the function class contains $\alpha$smooth H{\"o}lder functions, our rate of convergence is minimax optimal on the order of $N^{\frac{\alpha}{d+2\alpha}}$. Combining with a Fourier analytical deconvolution argument, we derive the consistency of MLE for the external force and interaction kernel in the McKeanVlasov equation.
 [4] arXiv:2205.07946 [pdf, other]

Title: binspp: An R Package for Bayesian Inference for NeymanScott Point Processes with Complex Inhomogeneity StructureSubjects: Methodology (stat.ME)
The NeymanScott point process is a widely used point process model which is easily interpretable and easily extendable to include various types of inhomogeneity. The inference for such complex models is then complicated and fast methods, such as minimum contrast method or composite likelihood approach do not provide accurate estimates or fail completely. Therefore, we introduce Bayesian MCMC approach for the inference of NeymannScott point process models with inhomogeneity in any or all of the following model components: process of cluster centers, mean number of points in a cluster, spread of the clusters. We also extend the NeymanScott point process to the case of overdispersed or underdispersed cluster sizes and provide a Bayesian MCMC algorithm for its inference. The R package binspp provides these estimation methods in an easy to handle implementation, with detailed graphical output including traceplots for all model parameters and further diagnostic plots. All inhomogeneities are modelled by spatial covariates and the Bayesian inference for the corresponding regression parameters is provided.
 [5] arXiv:2205.07999 [pdf, other]

Title: An Exponentially Increasing Stepsize for Parameter Estimation in Statistical ModelsComments: 26 pages. The authors are listed in alphabetical orderSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
Using gradient descent (GD) with fixed or decaying stepsize is standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a stepsize schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the stepsize of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under nonregular statistical models whose the loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in nonregular statistical models. To the best of our knowledge, it resolves a longstanding gap between statistical and algorithmic computational complexities of parameter estimation in nonregular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.
 [6] arXiv:2205.08000 [pdf, ps, other]

Title: Causal influence, causal effects, and path analysis in the presence of intermediate confoundingAuthors: Iván DíazSubjects: Methodology (stat.ME)
Recent approaches to causal inference have focused on the identification and estimation of \textit{causal effects}, defined as (properties of) the distribution of counterfactual outcomes under hypothetical actions that alter the nodes of a graphical model. In this article we explore an alternative approach using the concept of \textit{causal influence}, defined through operations that alter the information propagated through the edges of a directed acyclic graph. Causal influence may be more useful than causal effects in settings in which interventions on the causal agents are infeasible or of no substantive interest, for example when considering gender, race, or genetics as a causal agent. Furthermore, the "information transfer" interventions proposed allow us to solve a longstanding problem in causal mediation analysis, namely the nonparametric identification of pathspecific effects in the presence of treatmentinduced mediatoroutcome confounding. We propose efficient nonparametric estimators for a covariance version of the proposed causal influence measures, using dataadaptive regression coupled with semiparametric efficiency theory to address model misspecification bias while retaining $\sqrt{n}$consistency and asymptotic normality. We illustrate the use of our methods in two examples using publicly available data.
 [7] arXiv:2205.08010 [pdf, other]

Title: The evalue and the Full Bayesian Significance Test: Logical Properties and Philosophical ConsequencesAuthors: Julio Michael Stern, Carlos Alberto de Braganca Pereira, Marcelo de Souza Lauretto, Luis Gustavo Esteves, Rafael Izbicki, Rafael Bassi Stern, Marcio Alves DinizSubjects: Statistics Theory (math.ST)
This article gives a conceptual review of the evalue, ev(HX)  the epistemic value of hypothesis H given observations X. This statistical significance measure was developed in order to allow logically coherent and consistent tests of hypotheses, including sharp or precise hypotheses, via the Full Bayesian Significance Test (FBST). Arguments of analysis allow a full characterization of this statistical test by its logical or compositional properties, showing a mutual complementarity between results of mathematical statistics and the logical desiderata lying at the foundations of this theory.
 [8] arXiv:2205.08030 [pdf, other]

Title: Interpretable sensitivity analysis for the BaronKenny approach to mediation with unmeasured confoundingSubjects: Methodology (stat.ME)
Mediation analysis assesses the extent to which the treatment affects the outcome indirectly through a mediator and the extent to which it operates directly through other pathways. As the most popular method in empirical mediation analysis, the BaronKenny approach estimates the indirect and direct effects of the treatment on the outcome based on linear structural equation models. However, when the treatment and the mediator are not randomized, the estimates may be biased due to unmeasured confounding among the treatment, mediator, and outcome. Building on Cinelli and Hazlett (2020), we propose a sharp and interpretable sensitivity analysis method for the BaronKenny approach to mediation in the presence of unmeasured confounding. We first generalize their sensitivity analysis method for linear regression to allow for heteroskedasticity and model misspecification. We then apply the general result to develop a sensitivity analysis method for the BaronKenny approach. To facilitate the interpretation, we express the sensitivity parameters in terms of the partial $R^2$'s that correspond to the natural factorization of the joint distribution of the direct acyclic graph for mediation analysis. They measure the proportions of variability explained by unmeasured confounding given the observed variables. Moreover, we extend the method to deal with multiple mediators, based on a novel matrix version of the partial $R^2$ and a general form of the omittedvariable bias formula. Importantly, we prove that all our sensitivity bounds are attainable and thus sharp.
 [9] arXiv:2205.08036 [pdf, ps, other]

Title: On Semiparametric Efficiency of an Emerging Class of Regression Models for Betweensubject AttributesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
The semiparametric regression models have attracted increasing attention owing to their robustness compared to their parametric counterparts. This paper discusses the efficiency bound for functional response models (FRM), an emerging class of semiparametric regression that serves as a timely solution for research questions involving pairwise observations. This new paradigm is especially appealing to reduce astronomical data dimensions for those arising from wearable devices and highthroughput technology, such as microbiome Betadiversity, viral genetic linkage, singlecell RNA sequencing, etc. Despite the growing applications, the efficiency of their estimators has not been investigated carefully due to the extreme difficulty to address the inherent correlations among pairs. Leveraging the Hilbertspacebased semiparametric efficiency theory for classical withinsubject attributes, this manuscript extends such asymptotic efficiency into the broader regression involving betweensubject attributes and pinpoints the most efficient estimator, which leads to a sensitive signaldetection in practice. With pairwise outcomes burgeoning immensely as effective dimensionreduction summaries, the established theory will not only fill the critical gap in identifying the most efficient semiparametric estimator but also propel wideranging implementations of this new paradigm for betweensubject attributes.
 [10] arXiv:2205.08047 [pdf, other]

Title: Perfect Spectral Clustering with Discrete CovariatesComments: 23 pages, 1 figureSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Statistics Theory (math.ST)
Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors.
 [11] arXiv:2205.08052 [pdf, other]

Title: An Inverse Probability Weighted Regression Method that Accounts for Rightcensoring for Causal Inference with Multiple Treatments and a Binary OutcomeSubjects: Methodology (stat.ME)
Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the posttreatment outcome of interest is whether the event happens within a prespecified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity scorebased methods. An additional source of bias is rightcensoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regressionbased estimator that can simultaneously handle both confounding and rightcensoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.
 [12] arXiv:2205.08132 [pdf, other]

Title: Latent Variable Method Demonstrator  Software for Understanding Multivariate Data Analytics AlgorithmsComments: 18 pages, 14 figures, code available: this https URL, preprint submitted to Computers & Chemical EngineeringSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)
The everincreasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software  the Latent Variable Demonstrator (LAVADE)  for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as opensource software so that others can apply and advance the tool for use in teaching or research.
 [13] arXiv:2205.08144 [pdf, other]

Title: BayesMix: Bayesian Mixture Models in C++Subjects: Computation (stat.CO); Other Statistics (stat.OT)
We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a selfcontained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models. In addition to the several models and MCMC algorithms for posterior inference included in the library, new users with little familiarity on mixture models and the related MCMC algorithms can extend our library with minimal coding effort. Our library is computationally very efficient when compared to competitor software. Examples show that the typical code runtimes are from two to 25 times faster than competitors for data dimension from one to ten. Our library is publicly available on Github at https://github.com/bayesmixdev/bayesmix/.
 [14] arXiv:2205.08187 [pdf, other]

Title: Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibilityComments: 89 pages, 11 figures, 7 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
This article studies the infinitewidth limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these pernode random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinitewidth limit. Under this model, we show that each layer of the infinitewidth neural network can be characterised by two simple quantities: a nonnegative scalar parameter and a L\'evy measure on the positive reals. If the scalar parameters are strictly positive and the L\'evy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy measure of at least one layer is nontrivial, we obtain a mixture of Gaussian processes (MoGP) in the largewidth limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with nonGaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and feature learning is possible. Many sparsitypromoting neural network models can be recast as special cases of our approach, and we discuss their infinitewidth limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.
 [15] arXiv:2205.08245 [pdf, other]

Title: Bayesian Inference for NonParametric Extreme Value TheoryAuthors: Tobias KallehaugeSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Statistical inference for extreme values of random events is difficult in practice due to low sample sizes and inaccurate models for the studied rare events. If prior knowledge for extreme values is available, Bayesian statistics can be applied to reduce the sample complexity, but this requires a known probability distribution. By working with the quantiles for extremely low probabilities (in the order of $10^{2}$ or lower) and relying on their asymptotic normality, inference can be carried out without assuming any distributions. Despite relying on asymptotic results, it is shown that a Bayesian framework that incorporates prior information can reduce the number of observations required to estimate a particular quantile to some level of accuracy.
 [16] arXiv:2205.08275 [pdf]

Title: Calculating LRs for presence of body fluids from mRNA assay data in mixturesComments: 28 pages. This is a prepublication versionJournalref: Forensic Science International: Genetics, Volume 52, 2021, 102455. https://www.sciencedirect.com/science/article/pii/S1872497320302271Subjects: Applications (stat.AP)
Messenger RNA (mRNA) profiling can identify body fluids present in a stain, yielding information on what activities could have taken place at a crime scene. To account for uncertainty in such identifications, recent work has focused on devising statistical models to allow for probabilistic statements on the presence of body fluids. A major hurdle for practical adoption is that evidentiary stains are likely to contain more than one body fluid and current models are illsuited to analyse such mixtures. Here, we construct a likelihood ratio (LR) system that can handle mixtures, considering the hypotheses H1: the sample contains at least one of the body fluids of interest (and possibly other body fluids); H2: the sample contains none of the body fluids of interest (but possibly other body fluids). Thus, the LRsystem outputs an LRvalue for any combination of mRNA profile and set of body fluids of interest that are given as input. The calculation is based on an augmented dataset obtained by in silico mixing of real single body fluid mRNA profiles. These digital mixtures are used to construct a probabilistic classification method (a 'multilabel classifier'). The probabilities produced are subsequently used to calculate an LR, via calibration. We test a range of different classification methods from the field of machine learning, ways to preprocess the data and multilabel strategies for their performance on in silico mixed test data. Furthermore, we study their robustness to different assumptions on background levels of the body fluids. We find logistic regression works as well as more flexible classifiers, but shows higher robustness and better explainability. We test the system's performance on labgenerated mixture samples, and discuss practical usage in case work.
 [17] arXiv:2205.08295 [pdf, other]

Title: SemiParametric Contextual Bandits with GraphLaplacian RegularizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Nonstationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semiparametric contextual bandits and warned that ignoring nonstationarity could harm performances. Another prevalent human behavior is social interaction which has become available in a form of a social network or graph structure. As a result, graphbased contextual bandits have received much attention. In this paper, we propose "SemiGraphTS," a novel contextual Thompsonsampling algorithm for a graphbased semiparametric reward model. Our algorithm is the first to be proposed in this setting. We derive an upper bound of the cumulative regret that can be expressed as a multiple of a factor depending on the graph structure and the order for the semiparametric model without a graph. We evaluate the proposed and existing algorithms via simulation and real data example.
 [18] arXiv:2205.08340 [pdf, other]

Title: A unified framework for dataset shift diagnosticsAuthors: Felipe Maia Polo, Rafael Izbicki, Evanildo Gomes Lacerda Jr, Juan Pablo IbietaJimenez, Renato VicenteSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Most machine learning (ML) methods assume that the data used in the training phase comes from the distribution of the target population. However, in practice one often faces dataset shift, which, if not properly taken into account, may decrease the predictive performance of the ML models. In general, if the practitioner knows which type of shift is taking place  e.g., covariate shift or label shift  they may apply transfer learning methods to obtain better predictions. Unfortunately, current methods for detecting shift are only designed to detect specific types of shift or cannot formally test their presence. We introduce a general framework that gives insights on how to improve prediction methods by detecting the presence of different types of shift and quantifying how strong they are. Our approach can be used for any data type (tabular/image/text) and both for classification and regression tasks. Moreover, it uses formal hypotheses tests that controls false alarms. We illustrate how our framework is useful in practice using both artificial and real datasets. Our package for dataset shift detection can be found in https://github.com/felipemaiapolo/detectshift.
 [19] arXiv:2205.08349 [pdf, other]

Title: Topological Signal Processing using the Weighted Ordinal Partition NetworkSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea of utilizing tools from TDA for signal processing tasks, known as topological signal processing (TSP), has gained much attention in recent years, largely through a standard pipeline that computes the persistent homology of the point cloud generated by the Takens' embedding. However, this procedure is limited by computation time since the simplicial complex generated in this case is large, but also has a great deal of redundant data. For this reason, we turn to a more recent method for encoding the structure of the attractor, which constructs an ordinal partition network (OPN) representing information about when the dynamical system has passed between certain regions of state space. The result is a weighted graph whose structure encodes information about the underlying attractor. Our previous work began to find ways to package the information of the OPN in a manner that is amenable to TDA; however, that work only used the network structure and did nothing to encode the additional weighting information. In this paper, we take the next step: building a pipeline to analyze the weighted OPN with TDA and showing that this framework provides more resilience to noise or perturbations in the system and improves the accuracy of the dynamic state detection.
 [20] arXiv:2205.08439 [pdf, other]

Title: A case study of glucose levels during sleep using fast function on scalar regression inferenceAuthors: Renat Sergazinov, Andrew Leroux, Erjia Cui, Ciprian Crainiceanu, R. Nisha Aurora, Naresh M. Punjabi, Irina GaynanovaSubjects: Applications (stat.AP); Computation (stat.CO)
Continuous glucose monitors (CGMs) are increasingly used to measure blood glucose levels and provide information about the treatment and management of diabetes. Our motivating study contains CGM data during sleep for 174 study participants with type II diabetes mellitus measured at a 5minute frequency for an average of 10 nights. We aim to quantify the effects of diabetes medications and sleep apnea severity on glucose levels. Statistically, this is an inference question about the association between scalar covariates and functional responses. However, many characteristics of the data make analyses difficult, including (1) nonstationary withinday patterns; (2) substantial betweenday heterogeneity, nonGaussianity, and outliers; 3) large dimensionality due to the number of study participants, sleep periods, and time points. We evaluate and compare two methods: fast univariate inference (FUI) and functional additive mixed models (FAMM). We introduce a new approach for calculating pvalues for testing a global null effect of covariates using FUI, and provide practical guidelines for speeding up FAMM computations, making it feasible for our data. While FUI and FAMM are philosophically different, they lead to similar point estimators in our study. In contrast to FAMM, FUI is fast, accounts for withinday correlations, and enables the construction of joint confidence intervals. Our analyses reveal that: (1) biguanide medication and sleep apnea severity significantly affect glucose trajectories during sleep, and (2) the estimated effects are timeinvariant.
 [21] arXiv:2205.08494 [pdf, ps, other]

Title: Covariance Estimation: Optimal Dimensionfree Guarantees for Adversarial Corruption and Heavy TailsComments: 31 pagesSubjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Probability (math.PR)
We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $\eta$fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$marginal moment with some $p \ge 4$ is equivalent to the corresponding $L_2$marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. As a part of our analysis, we prove a dimensionfree BaiYin type theorem in the regime $p > 4$.
 [22] arXiv:2205.08528 [pdf, other]

Title: Highdimensional additive Gaussian processes under monotonicity constraintsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We introduce an additive Gaussian process framework accounting for monotonicity constraints and scalable to high dimensions. Our contributions are threefold. First, we show that our framework enables to satisfy the constraints everywhere in the input space. We also show that more general componentwise linear inequality constraints can be handled similarly, such as componentwise convexity. Second, we propose the additive MaxMod algorithm for sequential dimension reduction. By sequentially maximizing a squarednorm criterion, MaxMod identifies the active input dimensions and refines the most important ones. This criterion can be computed explicitly at a linear cost. Finally, we provide opensource codes for our full framework. We demonstrate the performance and scalability of the methodology in several synthetic examples with hundreds of dimensions under monotonicity constraints as well as on a realworld flood application.
 [23] arXiv:2205.08530 [pdf, other]

Title: Highresolution landscapescale biomass mapping using a spatiotemporal patchwork of LiDAR coveragesAuthors: Lucas K. Johnson (1), Michael J. Mahoney (1), Eddie Bevilacqua (1), Stephen V. Stehman (1), Grant Domke (2), Colin M. Beier (1) ((1) State University of New York College of Environmental Science and Forestry, (2) USDA Forest Service)Comments: Manuscript: 19 pages, 7 figures; Supplements: 14 pages, 5 figures; Submitted to: Environmental Research Letters, Carbon Monitoring Systems Research and Applications focus collectionSubjects: Applications (stat.AP)
Estimating forest aboveground biomass at fine spatial scales has become increasingly important for greenhouse gas estimation, monitoring, and verification efforts to mitigate climate change. Airborne LiDAR continues to be a valuable source of remote sensing data for estimating aboveground biomass. However airborne LiDAR collections may take place at local or regional scales covering irregular, noncontiguous footprints, resulting in a 'patchwork' of different landscape segments at different points in time. Here we addressed common obstacles including selection of training data, the investigation of regional or coverage specific patterns in bias and error, and map agreement, and modelbased precision assessments at multiple scales.
Three machine learning algorithms and an ensemble model were trained using field inventory data (FIA), airborne LiDAR, and topographic, climatic and cadastral geodata. Using strict selection criteria, 801 FIA plots were selected with colocated point clouds drawn from a patchwork of 17 leafoff LiDAR coverages 20142019). Our ensemble model created 30m AGB prediction surfaces within a predictordefined area of applicability (98% of LiDAR coverage) and resulting AGB predictions were compared with FIA plotlevel and areal estimates at multiple scales of aggregation. Our model was overall accurate (% RMSE 1333%), had very low bias (MBE $\leq$ $\pm$5 Mg ha$^{1}$), explained most fieldobserved variation (R$^2$ 0.740.93), produced estimates that were both largely consistent with FIA's aggregate summaries (86% of estimates within 95% CI), as well as precise when aggregated to arbitrary smallareas (mean bootstrap standard error 0.37 Mg ha$^{1}$). We share practical solutions to challenges faced when using spatiotemporal patchworks of LiDAR to meet growing needs for biomass prediction and mapping, and applications in carbon accounting and ecosystem stewardship.
Crosslists for Wed, 18 May 22
 [24] arXiv:2205.07932 (crosslist from cs.LG) [pdf, other]

Title: Distributed Feature Selection for Highdimensional Additive ModelsComments: 40 pages, 2 figuresSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Distributed statistical learning is a common strategy for handling massive data where we divide the learning task into multiple local machines and aggregate the results afterward. However, most existing work considers the case where the samples are divided. In this work, we propose a new algorithm, DDACSpAM, that divides features under the highdimensional sparse additive model. The new algorithm contains three steps: divide, decorrelate, and conquer. We show that after the decorrelation operation, every local estimator can recover the sparsity pattern for each additive component consistently without imposing strict constraints to the correlation structure among variables. Theoretical analysis of the aggregated estimator and empirical results on synthetic and real data illustrate that the DDACSpAM algorithm is effective and competitive in fitting sparse additive models.
 [25] arXiv:2205.08017 (crosslist from cs.LG) [pdf, other]

Title: $\mathscr{H}$Consistency Estimation Error of Surrogate Loss MinimizersComments: ICML 2022 (long presentation)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a detailed study of estimation errors in terms of surrogate loss estimation errors. We refer to such guarantees as $\mathscr{H}$consistency estimation error bounds, since they account for the hypothesis set $\mathscr{H}$ adopted. These guarantees are significantly stronger than $\mathscr{H}$calibration or $\mathscr{H}$consistency. They are also more informative than similar excess error bounds derived in the literature, when $\mathscr{H}$ is the family of all measurable functions. We prove general theorems providing such guarantees, for both the distributiondependent and distributionindependent settings. We show that our bounds are tight, modulo a convexity assumption. We also show that previous excess error bounds can be recovered as special cases of our general results.
We then present a series of explicit bounds in the case of the zeroone loss, with multiple choices of the surrogate loss and for both the family of linear functions and neural networks with one hiddenlayer. We further prove more favorable distributiondependent guarantees in that case. We also present a series of explicit bounds in the case of the adversarial loss, with surrogate losses based on the supremum of the $\rho$margin, hinge or sigmoid loss and for the same two general hypothesis sets. Here too, we prove several enhancements of these guarantees under natural distributional assumptions. Finally, we report the results of simulations illustrating our bounds and their tightness.  [26] arXiv:2205.08033 (crosslist from cs.SI) [pdf]

Title: Using Embeddings for Causal Estimation of Peer Influence in Social NetworksComments: 17 pages, 1 figure, 4 tablesSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)
We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbors. A main challenge to such estimation is that homophily  the tendency of connected units to share similar latent traits  acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment nonparametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key contributions are to nonparametrically formalize the causal effect in a way that accounts for homophily, and to show how embedding methods can be used to identify and estimate this effect. Code is available at https://github.com/IrinaCristali/PeerContagiononNetworks.
 [27] arXiv:2205.08098 (crosslist from cs.LG) [pdf, other]

Title: Can We Do Better Than Random Start? The Power of Data OutsourcingComments: 22 pages, 5 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Many organizations have access to abundant data but lack the computational power to process the data. While they can outsource the computational task to other facilities, there are various constraints on the amount of data that can be shared. It is natural to ask what can data outsourcing accomplish under such constraints. We address this question from a machine learning perspective. When training a model with optimization algorithms, the quality of the results often relies heavily on the points where the algorithms are initialized. Random start is one of the most popular methods to tackle this issue, but it can be computationally expensive and not feasible for organizations lacking computing resources. Based on three different scenarios, we propose simulationbased algorithms that can utilize a small amount of outsourced data to find good initial points accordingly. Under suitable regularity conditions, we provide theoretical guarantees showing the algorithms can find good initial points with high probability. We also conduct numerical experiments to demonstrate that our algorithms perform significantly better than the random start approach.
 [28] arXiv:2205.08099 (crosslist from cs.LG) [pdf, other]

Title: Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a SurveyComments: Survey for pruning and freezing methods applied before training startsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Stateoftheart deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.
 [29] arXiv:2205.08178 (crosslist from cs.LG) [pdf, other]

Title: Active learning of causal probability treesAuthors: Tue HerlauSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
The past two decades have seen a growing interest in combining causal information, commonly represented using causal graphs, with machine learning models. Probability trees provide a simple yet powerful alternative representation of causal information. They enable both computation of intervention and counterfactuals, and are strictly more general, since they allow contextdependent causal dependencies. Here we present a Bayesian method for learning probability trees from a combination of interventional and observational data. The method quantifies the expected information gain from an intervention, and selects the interventions with the largest gain. We demonstrate the efficiency of the method on simulated and real data. An effective method for learning probability trees on a limited interventional budget will greatly expand their applicability.
 [30] arXiv:2205.08199 (crosslist from cs.IT) [pdf, ps, other]

Title: Sharp asymptotics on the compression of twolayer neural networksSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we study the compression of a target twolayer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. subGaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from highdimensional probability, we show that this nonconvex problem can be simplified when the target network is sufficiently overparameterized, and provide the error rate of this approximation as a function of the input dimension and N . For a ReLU activation function, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.
 [31] arXiv:2205.08234 (crosslist from cs.LG) [pdf, other]

Title: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit FeedbacksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with stateoftheart approaches.
 [32] arXiv:2205.08364 (crosslist from cs.LG) [pdf, other]

Title: Network Gradient Descent Algorithm for Decentralized Federated LearningSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communicationbased network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is well balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.
 [33] arXiv:2205.08370 (crosslist from cs.LG) [pdf, other]

Title: Individualized Risk Assessment of Preoperative Opioid Use by Interpretable Neural Network RegressionComments: 14 pages, 6 tables and 2 figures in main textSubjects: Machine Learning (cs.LG); Applications (stat.AP)
Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patientcentered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straightforward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.
 [34] arXiv:2205.08418 (crosslist from eess.SP) [pdf]

Title: Fault Detection for NonCondensing Boilers using Simulated Building Automation System Sensor DataAuthors: Rony Shohet, Mohamed Kandil (1), J.J. McArthur (1), ((1) Department Architectural Science, Ryerson University, Toronto, Canada)Comments: 41 pages, 55106 wordsSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)
Building performance has been shown to degrade significantly after commissioning, resulting in increased energy consumption and associated greenhouse gas emissions. Continuous Commissioning using existing sensor networks and IoT devices has the potential to minimize this waste by continually identifying system degradation and retuning control strategies to adapt to real building performance. Due to its significant contribution to greenhouse gas emissions, the performance of gas boiler systems for building heating is critical. A review of boiler performance studies has been used to develop a set of common faults and degraded performance conditions, which have been integrated into a MATLAB/Simulink emulator. This resulted in a labeled dataset with approximately 10,000 simulations of steadystate performance for each of 14 noncondensing boilers. The collected data is used for training and testing fault classification using Knearest neighbour, Decision tree, Random Forest, and Support Vector Machines. The results show that the Support Vector Machines method gave the best prediction accuracy, consistently exceeding 90%, and generalization across multiple boilers is not possible due to low classification accuracy.
 [35] arXiv:2205.08532 (crosslist from cs.DS) [pdf, ps, other]

Title: New Lower Bounds for Private Estimation and a Generalized Fingerprinting LemmaComments: Modified title and abstract for arxiv and made some improvements to the writingSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, \delta)$differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $\Omega(d^2)$ samples, and in spectral norm requires $\Omega(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $\Omega(d/(\alpha^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $\alpha$error in $\ell_2$distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon,0)$differential privacy.
Replacements for Wed, 18 May 22
 [36] arXiv:1508.02905 (replaced) [pdf, other]

Title: Bayesian DropoutComments: 21 pages, 3 figures. Manuscript prepared 2014 and awaiting submissionJournalref: Procedia Computer Science 201 (2022) 771776Subjects: Machine Learning (stat.ML)
 [37] arXiv:2004.09455 (replaced) [pdf, other]

Title: Enforcing stationarity through the prior in vector autoregressionsAuthors: Sarah E. HeapsComments: Accepted for publication in the Journal of Computational and Graphical StatisticsSubjects: Methodology (stat.ME)
 [38] arXiv:2007.02938 (replaced) [pdf, other]

Title: Causal Feature Selection via Orthogonal SearchSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [39] arXiv:2008.10957 (replaced) [pdf, ps, other]

Title: Are You All Normal? It Depends!Comments: arXiv admin note: text overlap with arXiv:2004.07332 by other authorsSubjects: Methodology (stat.ME)
 [40] arXiv:2009.07427 (replaced) [pdf, other]

Title: Intrinsic Riemannian Functional Data Analysis for Sparse Longitudinal ObservationsComments: 56 pagesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [41] arXiv:2009.07439 (replaced) [pdf, other]

Title: On the Landscape of Onehiddenlayer Sparse Networks and BeyondSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [42] arXiv:2011.01831 (replaced) [pdf, other]

Title: Nonparametric Estimation of Functional Dynamic Factor ModelComments: 28 pages, 6 figuresSubjects: Methodology (stat.ME)
 [43] arXiv:2105.10210 (replaced) [pdf, other]

Title: Bayesian Uncertainty Quantification of Local Volatility ModelSubjects: Applications (stat.AP); Numerical Analysis (math.NA); Other Statistics (stat.OT)
 [44] arXiv:2105.12120 (replaced) [pdf, other]

Title: Sampling random graphs with specified degree sequencesComments: 18 pages, 14 figures, added references and applications, methods substantially improved, results expanded. Code available at this http URLSubjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.dataan); Methodology (stat.ME)
 [45] arXiv:2106.01814 (replaced) [pdf, other]

Title: Explaining Recruitment to Extremism: A Bayesian CaseControl ApproachSubjects: Methodology (stat.ME); Applications (stat.AP)
 [46] arXiv:2107.14323 (replaced) [pdf, other]

Title: Reconstruction of Random Geometric Graphs: Breaking the Omega(r) distortion barrierComments: v1 on arxiv was titled "Improved Reconstruction of Random Geometric Graphs." An extended abstract with the above title appeared in ICALP 2022. The current version includes the proofs that were omitted from the ICALP version and adds the section "Missing Edges."Subjects: Computational Geometry (cs.CG); Social and Information Networks (cs.SI); Probability (math.PR); Physics and Society (physics.socph); Machine Learning (stat.ML)
 [47] arXiv:2108.02151 (replaced) [pdf, other]

Title: Semiparametric Functional Factor Models with Bayesian Rank SelectionSubjects: Methodology (stat.ME); Econometrics (econ.EM); Computation (stat.CO)
 [48] arXiv:2108.06138 (replaced) [pdf, ps, other]

Title: Stochastic orders and measures of skewness and dispersion based on expectilesSubjects: Statistics Theory (math.ST)
 [49] arXiv:2109.03457 (replaced) [pdf, other]

Title: Uncertainty Quantification and Experimental Design for LargeScale Linear Inverse Problems under Gaussian Process PriorsComments: under reviewSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)
 [50] arXiv:2109.05386 (replaced) [pdf, other]

Title: Microbiome subcommunity learning with logistictree normal latent Dirichlet allocationSubjects: Applications (stat.AP); Machine Learning (stat.ML)
 [51] arXiv:2109.13374 (replaced) [pdf, other]

Title: Variance partitioning in spatiotemporal disease mapping modelsSubjects: Methodology (stat.ME)
 [52] arXiv:2110.01899 (replaced) [pdf, ps, other]

Title: Random matrices in service of ML footprint: ternary random features with no performance lossComments: Published as a conference at ICLR2022Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [53] arXiv:2110.05308 (replaced) [pdf, ps, other]

Title: Clustering of Diverse Multiplex NetworksComments: 40 pages, 19 figuresSubjects: Methodology (stat.ME)
 [54] arXiv:2110.10422 (replaced) [pdf, other]

Title: PriorVAE: Encoding spatial priors with VAEs for smallarea estimationAuthors: Elizaveta Semenova, Yidan Xu, Adam Howes, Theo Rashid, Samir Bhatt, Swapnil Mishra, Seth FlaxmanSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [55] arXiv:2110.12406 (replaced) [pdf]

Title: Robust Variable Selection under Cellwise ContaminationComments: 17 pages, 4 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
 [56] arXiv:2111.01050 (replaced) [pdf, ps, other]

Title: Extended probabilities and their application to statistical inferenceSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [57] arXiv:2112.04626 (replaced) [pdf, other]

Title: Bayesian Semiparametric Longitudinal InverseProbit Mixed Models for Category LearningComments: arXiv admin note: text overlap with arXiv:1912.02774Subjects: Methodology (stat.ME)
 [58] arXiv:2201.09098 (replaced) [pdf, other]

Title: Estimation of the covariance structure from SNP allele frequenciesComments: In this new version we added the proof that the operator norm of D/2 is exactly square root of mSubjects: Methodology (stat.ME)
 [59] arXiv:2202.03051 (replaced) [pdf, ps, other]

Title: Using Partial Monotonicity in Submodular MaximizationComments: 45 pages; 7 figuresSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [60] arXiv:2202.04648 (replaced) [pdf, other]

Title: A survey of unsupervised learning methods for highdimensional uncertainty quantification in blackboxtype problemsAuthors: Katiana Kontolati, Dimitrios Loukrezis, Dimitris G. Giovanis, Lohit Vandanapu, Michael D. ShieldsComments: 45 pages, 14 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:2203.06126 (replaced) [pdf, other]

Title: Distributionfree Prediction Sets Adaptive to Unknown Covariate ShiftSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [62] arXiv:2203.10906 (replaced) [pdf, other]

Title: Bayesian inference in Epidemics: linear noise analysisComments: This version final after internal revisionSubjects: Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
 [63] arXiv:2204.01686 (replaced) [pdf, other]

Title: Bayesian Semiparametric Covariate Informed Multivariate Density DeconvolutionAuthors: Abhra SarkarComments: arXiv admin note: text overlap with arXiv:1912.05084Subjects: Methodology (stat.ME)
 [64] arXiv:2204.08182 (replaced) [pdf, other]

Title: ModalityBalanced Embedding for Video RetrievalAuthors: Xun Wang, Bingqing Ke, Xuanping Li, Fangyu Liu, Mingyu Zhang, Xiao Liang, Qiushi Xiao, Cheng Luo, Yue YuComments: Accepted by SIGIR2022, short paperJournalref: SIGIR, 2022Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [65] arXiv:2204.12993 (replaced) [pdf, other]

Title: Counterfactual harmComments: Changes to definition 3. Typos corrected and document shortened. Updated Appendices A  CSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [66] arXiv:2205.02143 (replaced) [pdf]

Title: Estimating Complier Average Causal Effects for Clustered RCTs When the Treatment Affects the Service PopulationAuthors: Peter Z. SchochetSubjects: Methodology (stat.ME)
 [67] arXiv:2205.03310 (replaced) [pdf, ps, other]

Title: Discussion of 'Event History and Topological Data Analysis'Authors: Peter BubenikComments: 4 pages, added citation to Garside et alJournalref: Biometrika, Volume 108, Issue 4, December 2021, Pages 785788Subjects: Statistics Theory (math.ST); Algebraic Topology (math.AT)
 [68] arXiv:2205.05777 (replaced) [pdf, other]

Title: Efficient estimation of modified treatment policy effects based on the generalized propensity scoreSubjects: Methodology (stat.ME)
 [69] arXiv:2205.07331 (replaced) [pdf, other]

Title: Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient DescentSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Statistics Theory (math.ST); Computational Physics (physics.compph); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2205, contact, help (Access key information)