# Statistics

## New submissions

[ total of 69 entries: 1-69 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Wed, 18 May 22

[1]
Title: A Note on the Chernoff Bound for Random Variables in the Unit Interval
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The Chernoff bound is a well-known tool for obtaining a high probability bound on the expectation of a Bernoulli random variable in terms of its sample average. This bound is commonly used in statistical learning theory to upper bound the generalisation risk of a hypothesis in terms of its empirical risk on held-out data, for the case of a binary-valued loss function. However, the extension of this bound to the case of random variables taking values in the unit interval is less well known in the community. In this note we provide a proof of this extension for convenience and future reference.

[2]
Title: Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial flows. Then, we develop an alternative theory for multivariate tail parameters which is sensitive to tail-anisotropy. In doing so, we unveil a fundamental problem which plagues many existing flow-based methods: they can only model tail-isotropic distributions (i.e., distributions having the same tail parameter in every direction). To mitigate this and enable modeling of tail-anisotropic targets, we propose anisotropic tail-adaptive flows (ATAF). Experimental results on both synthetic and real-world targets confirm that ATAF is competitive with prior work while also exhibiting appropriate tail-anisotropy.

[3]
Title: Mean-Field Nonparametric Estimation of Interacting Particle Systems
Subjects: Statistics Theory (math.ST)

This paper concerns the nonparametric estimation problem of the distribution-state dependent drift vector field in an interacting $N$-particle system. Observing single-trajectory data for each particle, we derive the mean-field rate of convergence for the maximum likelihood estimator (MLE), which depends on both Gaussian complexity and Rademacher complexity of the function class. In particular, when the function class contains $\alpha$-smooth H{\"o}lder functions, our rate of convergence is minimax optimal on the order of $N^{-\frac{\alpha}{d+2\alpha}}$. Combining with a Fourier analytical deconvolution argument, we derive the consistency of MLE for the external force and interaction kernel in the McKean-Vlasov equation.

[4]
Title: binspp: An R Package for Bayesian Inference for Neyman-Scott Point Processes with Complex Inhomogeneity Structure
Subjects: Methodology (stat.ME)

The Neyman-Scott point process is a widely used point process model which is easily interpretable and easily extendable to include various types of inhomogeneity. The inference for such complex models is then complicated and fast methods, such as minimum contrast method or composite likelihood approach do not provide accurate estimates or fail completely. Therefore, we introduce Bayesian MCMC approach for the inference of Neymann-Scott point process models with inhomogeneity in any or all of the following model components: process of cluster centers, mean number of points in a cluster, spread of the clusters. We also extend the Neyman-Scott point process to the case of overdispersed or underdispersed cluster sizes and provide a Bayesian MCMC algorithm for its inference. The R package binspp provides these estimation methods in an easy to handle implementation, with detailed graphical output including traceplots for all model parameters and further diagnostic plots. All inhomogeneities are modelled by spatial covariates and the Bayesian inference for the corresponding regression parameters is provided.

[5]
Title: An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models
Comments: 26 pages. The authors are listed in alphabetical order
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)

Using gradient descent (GD) with fixed or decaying step-size is standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models whose the loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.

[6]
Title: Causal influence, causal effects, and path analysis in the presence of intermediate confounding
Authors: Iván Díaz
Subjects: Methodology (stat.ME)

Recent approaches to causal inference have focused on the identification and estimation of \textit{causal effects}, defined as (properties of) the distribution of counterfactual outcomes under hypothetical actions that alter the nodes of a graphical model. In this article we explore an alternative approach using the concept of \textit{causal influence}, defined through operations that alter the information propagated through the edges of a directed acyclic graph. Causal influence may be more useful than causal effects in settings in which interventions on the causal agents are infeasible or of no substantive interest, for example when considering gender, race, or genetics as a causal agent. Furthermore, the "information transfer" interventions proposed allow us to solve a long-standing problem in causal mediation analysis, namely the non-parametric identification of path-specific effects in the presence of treatment-induced mediator-outcome confounding. We propose efficient non-parametric estimators for a covariance version of the proposed causal influence measures, using data-adaptive regression coupled with semi-parametric efficiency theory to address model misspecification bias while retaining $\sqrt{n}$-consistency and asymptotic normality. We illustrate the use of our methods in two examples using publicly available data.

[7]
Title: The e-value and the Full Bayesian Significance Test: Logical Properties and Philosophical Consequences
Subjects: Statistics Theory (math.ST)

This article gives a conceptual review of the e-value, ev(H|X) -- the epistemic value of hypothesis H given observations X. This statistical significance measure was developed in order to allow logically coherent and consistent tests of hypotheses, including sharp or precise hypotheses, via the Full Bayesian Significance Test (FBST). Arguments of analysis allow a full characterization of this statistical test by its logical or compositional properties, showing a mutual complementarity between results of mathematical statistics and the logical desiderata lying at the foundations of this theory.

[8]
Title: Interpretable sensitivity analysis for the Baron-Kenny approach to mediation with unmeasured confounding
Subjects: Methodology (stat.ME)

Mediation analysis assesses the extent to which the treatment affects the outcome indirectly through a mediator and the extent to which it operates directly through other pathways. As the most popular method in empirical mediation analysis, the Baron-Kenny approach estimates the indirect and direct effects of the treatment on the outcome based on linear structural equation models. However, when the treatment and the mediator are not randomized, the estimates may be biased due to unmeasured confounding among the treatment, mediator, and outcome. Building on Cinelli and Hazlett (2020), we propose a sharp and interpretable sensitivity analysis method for the Baron-Kenny approach to mediation in the presence of unmeasured confounding. We first generalize their sensitivity analysis method for linear regression to allow for heteroskedasticity and model misspecification. We then apply the general result to develop a sensitivity analysis method for the Baron-Kenny approach. To facilitate the interpretation, we express the sensitivity parameters in terms of the partial $R^2$'s that correspond to the natural factorization of the joint distribution of the direct acyclic graph for mediation analysis. They measure the proportions of variability explained by unmeasured confounding given the observed variables. Moreover, we extend the method to deal with multiple mediators, based on a novel matrix version of the partial $R^2$ and a general form of the omitted-variable bias formula. Importantly, we prove that all our sensitivity bounds are attainable and thus sharp.

[9]
Title: On Semiparametric Efficiency of an Emerging Class of Regression Models for Between-subject Attributes
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

The semiparametric regression models have attracted increasing attention owing to their robustness compared to their parametric counterparts. This paper discusses the efficiency bound for functional response models (FRM), an emerging class of semiparametric regression that serves as a timely solution for research questions involving pairwise observations. This new paradigm is especially appealing to reduce astronomical data dimensions for those arising from wearable devices and high-throughput technology, such as microbiome Beta-diversity, viral genetic linkage, single-cell RNA sequencing, etc. Despite the growing applications, the efficiency of their estimators has not been investigated carefully due to the extreme difficulty to address the inherent correlations among pairs. Leveraging the Hilbert-space-based semiparametric efficiency theory for classical within-subject attributes, this manuscript extends such asymptotic efficiency into the broader regression involving between-subject attributes and pinpoints the most efficient estimator, which leads to a sensitive signal-detection in practice. With pairwise outcomes burgeoning immensely as effective dimension-reduction summaries, the established theory will not only fill the critical gap in identifying the most efficient semiparametric estimator but also propel wide-ranging implementations of this new paradigm for between-subject attributes.

[10]
Title: Perfect Spectral Clustering with Discrete Covariates
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Statistics Theory (math.ST)

Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors.

[11]
Title: An Inverse Probability Weighted Regression Method that Accounts for Right-censoring for Causal Inference with Multiple Treatments and a Binary Outcome
Subjects: Methodology (stat.ME)

Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post-treatment outcome of interest is whether the event happens within a pre-specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score-based methods. An additional source of bias is right-censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression-based estimator that can simultaneously handle both confounding and right-censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.

[12]
Title: Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms
Comments: 18 pages, 14 figures, code available: this https URL, preprint submitted to Computers & Chemical Engineering
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)

The ever-increasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software -- the Latent Variable Demonstrator (LAVADE) -- for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as open-source software so that others can apply and advance the tool for use in teaching or research.

[13]
Title: BayesMix: Bayesian Mixture Models in C++
Subjects: Computation (stat.CO); Other Statistics (stat.OT)

We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a self-contained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models. In addition to the several models and MCMC algorithms for posterior inference included in the library, new users with little familiarity on mixture models and the related MCMC algorithms can extend our library with minimal coding effort. Our library is computationally very efficient when compared to competitor software. Examples show that the typical code runtimes are from two to 25 times faster than competitors for data dimension from one to ten. Our library is publicly available on Github at https://github.com/bayesmix-dev/bayesmix/.

[14]
Title: Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
Comments: 89 pages, 11 figures, 7 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a L\'evy measure on the positive reals. If the scalar parameters are strictly positive and the L\'evy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and feature learning is possible. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.

[15]
Title: Bayesian Inference for Non-Parametric Extreme Value Theory
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Statistical inference for extreme values of random events is difficult in practice due to low sample sizes and inaccurate models for the studied rare events. If prior knowledge for extreme values is available, Bayesian statistics can be applied to reduce the sample complexity, but this requires a known probability distribution. By working with the quantiles for extremely low probabilities (in the order of $10^{-2}$ or lower) and relying on their asymptotic normality, inference can be carried out without assuming any distributions. Despite relying on asymptotic results, it is shown that a Bayesian framework that incorporates prior information can reduce the number of observations required to estimate a particular quantile to some level of accuracy.

[16]
Title: Calculating LRs for presence of body fluids from mRNA assay data in mixtures
Comments: 28 pages. This is a pre-publication version
Journal-ref: Forensic Science International: Genetics, Volume 52, 2021, 102455. https://www.sciencedirect.com/science/article/pii/S1872497320302271
Subjects: Applications (stat.AP)

Messenger RNA (mRNA) profiling can identify body fluids present in a stain, yielding information on what activities could have taken place at a crime scene. To account for uncertainty in such identifications, recent work has focused on devising statistical models to allow for probabilistic statements on the presence of body fluids. A major hurdle for practical adoption is that evidentiary stains are likely to contain more than one body fluid and current models are ill-suited to analyse such mixtures. Here, we construct a likelihood ratio (LR) system that can handle mixtures, considering the hypotheses H1: the sample contains at least one of the body fluids of interest (and possibly other body fluids); H2: the sample contains none of the body fluids of interest (but possibly other body fluids). Thus, the LR-system outputs an LR-value for any combination of mRNA profile and set of body fluids of interest that are given as input. The calculation is based on an augmented dataset obtained by in silico mixing of real single body fluid mRNA profiles. These digital mixtures are used to construct a probabilistic classification method (a 'multi-label classifier'). The probabilities produced are subsequently used to calculate an LR, via calibration. We test a range of different classification methods from the field of machine learning, ways to preprocess the data and multi-label strategies for their performance on in silico mixed test data. Furthermore, we study their robustness to different assumptions on background levels of the body fluids. We find logistic regression works as well as more flexible classifiers, but shows higher robustness and better explainability. We test the system's performance on lab-generated mixture samples, and discuss practical usage in case work.

[17]
Title: Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Non-stationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semi-parametric contextual bandits and warned that ignoring non-stationarity could harm performances. Another prevalent human behavior is social interaction which has become available in a form of a social network or graph structure. As a result, graph-based contextual bandits have received much attention. In this paper, we propose "SemiGraphTS," a novel contextual Thompson-sampling algorithm for a graph-based semi-parametric reward model. Our algorithm is the first to be proposed in this setting. We derive an upper bound of the cumulative regret that can be expressed as a multiple of a factor depending on the graph structure and the order for the semi-parametric model without a graph. We evaluate the proposed and existing algorithms via simulation and real data example.

[18]
Title: A unified framework for dataset shift diagnostics
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

Most machine learning (ML) methods assume that the data used in the training phase comes from the distribution of the target population. However, in practice one often faces dataset shift, which, if not properly taken into account, may decrease the predictive performance of the ML models. In general, if the practitioner knows which type of shift is taking place - e.g., covariate shift or label shift - they may apply transfer learning methods to obtain better predictions. Unfortunately, current methods for detecting shift are only designed to detect specific types of shift or cannot formally test their presence. We introduce a general framework that gives insights on how to improve prediction methods by detecting the presence of different types of shift and quantifying how strong they are. Our approach can be used for any data type (tabular/image/text) and both for classification and regression tasks. Moreover, it uses formal hypotheses tests that controls false alarms. We illustrate how our framework is useful in practice using both artificial and real datasets. Our package for dataset shift detection can be found in https://github.com/felipemaiapolo/detectshift.

[19]
Title: Topological Signal Processing using the Weighted Ordinal Partition Network
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)

One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea of utilizing tools from TDA for signal processing tasks, known as topological signal processing (TSP), has gained much attention in recent years, largely through a standard pipeline that computes the persistent homology of the point cloud generated by the Takens' embedding. However, this procedure is limited by computation time since the simplicial complex generated in this case is large, but also has a great deal of redundant data. For this reason, we turn to a more recent method for encoding the structure of the attractor, which constructs an ordinal partition network (OPN) representing information about when the dynamical system has passed between certain regions of state space. The result is a weighted graph whose structure encodes information about the underlying attractor. Our previous work began to find ways to package the information of the OPN in a manner that is amenable to TDA; however, that work only used the network structure and did nothing to encode the additional weighting information. In this paper, we take the next step: building a pipeline to analyze the weighted OPN with TDA and showing that this framework provides more resilience to noise or perturbations in the system and improves the accuracy of the dynamic state detection.

[20]
Title: A case study of glucose levels during sleep using fast function on scalar regression inference
Subjects: Applications (stat.AP); Computation (stat.CO)

Continuous glucose monitors (CGMs) are increasingly used to measure blood glucose levels and provide information about the treatment and management of diabetes. Our motivating study contains CGM data during sleep for 174 study participants with type II diabetes mellitus measured at a 5-minute frequency for an average of 10 nights. We aim to quantify the effects of diabetes medications and sleep apnea severity on glucose levels. Statistically, this is an inference question about the association between scalar covariates and functional responses. However, many characteristics of the data make analyses difficult, including (1) non-stationary within-day patterns; (2) substantial between-day heterogeneity, non-Gaussianity, and outliers; 3) large dimensionality due to the number of study participants, sleep periods, and time points. We evaluate and compare two methods: fast univariate inference (FUI) and functional additive mixed models (FAMM). We introduce a new approach for calculating p-values for testing a global null effect of covariates using FUI, and provide practical guidelines for speeding up FAMM computations, making it feasible for our data. While FUI and FAMM are philosophically different, they lead to similar point estimators in our study. In contrast to FAMM, FUI is fast, accounts for within-day correlations, and enables the construction of joint confidence intervals. Our analyses reveal that: (1) biguanide medication and sleep apnea severity significantly affect glucose trajectories during sleep, and (2) the estimated effects are time-invariant.

[21]
Title: Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails
Subjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Probability (math.PR)

We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $\eta$-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$-marginal moment with some $p \ge 4$ is equivalent to the corresponding $L_2$-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. As a part of our analysis, we prove a dimension-free Bai-Yin type theorem in the regime $p > 4$.

[22]
Title: High-dimensional additive Gaussian processes under monotonicity constraints
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We introduce an additive Gaussian process framework accounting for monotonicity constraints and scalable to high dimensions. Our contributions are threefold. First, we show that our framework enables to satisfy the constraints everywhere in the input space. We also show that more general componentwise linear inequality constraints can be handled similarly, such as componentwise convexity. Second, we propose the additive MaxMod algorithm for sequential dimension reduction. By sequentially maximizing a squared-norm criterion, MaxMod identifies the active input dimensions and refines the most important ones. This criterion can be computed explicitly at a linear cost. Finally, we provide open-source codes for our full framework. We demonstrate the performance and scalability of the methodology in several synthetic examples with hundreds of dimensions under monotonicity constraints as well as on a real-world flood application.

[23]
Title: High-resolution landscape-scale biomass mapping using a spatiotemporal patchwork of LiDAR coverages
Authors: Lucas K. Johnson (1), Michael J. Mahoney (1), Eddie Bevilacqua (1), Stephen V. Stehman (1), Grant Domke (2), Colin M. Beier (1) ((1) State University of New York College of Environmental Science and Forestry, (2) USDA Forest Service)
Comments: Manuscript: 19 pages, 7 figures; Supplements: 14 pages, 5 figures; Submitted to: Environmental Research Letters, Carbon Monitoring Systems Research and Applications focus collection
Subjects: Applications (stat.AP)

Estimating forest aboveground biomass at fine spatial scales has become increasingly important for greenhouse gas estimation, monitoring, and verification efforts to mitigate climate change. Airborne LiDAR continues to be a valuable source of remote sensing data for estimating aboveground biomass. However airborne LiDAR collections may take place at local or regional scales covering irregular, non-contiguous footprints, resulting in a 'patchwork' of different landscape segments at different points in time. Here we addressed common obstacles including selection of training data, the investigation of regional or coverage specific patterns in bias and error, and map agreement, and model-based precision assessments at multiple scales.
Three machine learning algorithms and an ensemble model were trained using field inventory data (FIA), airborne LiDAR, and topographic, climatic and cadastral geodata. Using strict selection criteria, 801 FIA plots were selected with co-located point clouds drawn from a patchwork of 17 leaf-off LiDAR coverages 2014-2019). Our ensemble model created 30m AGB prediction surfaces within a predictor-defined area of applicability (98% of LiDAR coverage) and resulting AGB predictions were compared with FIA plot-level and areal estimates at multiple scales of aggregation. Our model was overall accurate (% RMSE 13-33%), had very low bias (MBE $\leq$ $\pm$5 Mg ha$^{-1}$), explained most field-observed variation (R$^2$ 0.74-0.93), produced estimates that were both largely consistent with FIA's aggregate summaries (86% of estimates within 95% CI), as well as precise when aggregated to arbitrary small-areas (mean bootstrap standard error 0.37 Mg ha$^{-1}$). We share practical solutions to challenges faced when using spatiotemporal patchworks of LiDAR to meet growing needs for biomass prediction and mapping, and applications in carbon accounting and ecosystem stewardship.

### Cross-lists for Wed, 18 May 22

[24]  arXiv:2205.07932 (cross-list from cs.LG) [pdf, other]
Title: Distributed Feature Selection for High-dimensional Additive Models
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)

Distributed statistical learning is a common strategy for handling massive data where we divide the learning task into multiple local machines and aggregate the results afterward. However, most existing work considers the case where the samples are divided. In this work, we propose a new algorithm, DDAC-SpAM, that divides features under the high-dimensional sparse additive model. The new algorithm contains three steps: divide, decorrelate, and conquer. We show that after the decorrelation operation, every local estimator can recover the sparsity pattern for each additive component consistently without imposing strict constraints to the correlation structure among variables. Theoretical analysis of the aggregated estimator and empirical results on synthetic and real data illustrate that the DDAC-SpAM algorithm is effective and competitive in fitting sparse additive models.

[25]  arXiv:2205.08017 (cross-list from cs.LG) [pdf, other]
Title: $\mathscr{H}$-Consistency Estimation Error of Surrogate Loss Minimizers
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a detailed study of estimation errors in terms of surrogate loss estimation errors. We refer to such guarantees as $\mathscr{H}$-consistency estimation error bounds, since they account for the hypothesis set $\mathscr{H}$ adopted. These guarantees are significantly stronger than $\mathscr{H}$-calibration or $\mathscr{H}$-consistency. They are also more informative than similar excess error bounds derived in the literature, when $\mathscr{H}$ is the family of all measurable functions. We prove general theorems providing such guarantees, for both the distribution-dependent and distribution-independent settings. We show that our bounds are tight, modulo a convexity assumption. We also show that previous excess error bounds can be recovered as special cases of our general results.
We then present a series of explicit bounds in the case of the zero-one loss, with multiple choices of the surrogate loss and for both the family of linear functions and neural networks with one hidden-layer. We further prove more favorable distribution-dependent guarantees in that case. We also present a series of explicit bounds in the case of the adversarial loss, with surrogate losses based on the supremum of the $\rho$-margin, hinge or sigmoid loss and for the same two general hypothesis sets. Here too, we prove several enhancements of these guarantees under natural distributional assumptions. Finally, we report the results of simulations illustrating our bounds and their tightness.

[26]  arXiv:2205.08033 (cross-list from cs.SI) [pdf]
Title: Using Embeddings for Causal Estimation of Peer Influence in Social Networks
Comments: 17 pages, 1 figure, 4 tables
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)

We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbors. A main challenge to such estimation is that homophily - the tendency of connected units to share similar latent traits - acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment nonparametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key contributions are to nonparametrically formalize the causal effect in a way that accounts for homophily, and to show how embedding methods can be used to identify and estimate this effect. Code is available at https://github.com/IrinaCristali/Peer-Contagion-on-Networks.

[27]  arXiv:2205.08098 (cross-list from cs.LG) [pdf, other]
Title: Can We Do Better Than Random Start? The Power of Data Outsourcing
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Many organizations have access to abundant data but lack the computational power to process the data. While they can outsource the computational task to other facilities, there are various constraints on the amount of data that can be shared. It is natural to ask what can data outsourcing accomplish under such constraints. We address this question from a machine learning perspective. When training a model with optimization algorithms, the quality of the results often relies heavily on the points where the algorithms are initialized. Random start is one of the most popular methods to tackle this issue, but it can be computationally expensive and not feasible for organizations lacking computing resources. Based on three different scenarios, we propose simulation-based algorithms that can utilize a small amount of outsourced data to find good initial points accordingly. Under suitable regularity conditions, we provide theoretical guarantees showing the algorithms can find good initial points with high probability. We also conduct numerical experiments to demonstrate that our algorithms perform significantly better than the random start approach.

[28]  arXiv:2205.08099 (cross-list from cs.LG) [pdf, other]
Title: Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Comments: Survey for pruning and freezing methods applied before training starts
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.

[29]  arXiv:2205.08178 (cross-list from cs.LG) [pdf, other]
Title: Active learning of causal probability trees
Authors: Tue Herlau
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

The past two decades have seen a growing interest in combining causal information, commonly represented using causal graphs, with machine learning models. Probability trees provide a simple yet powerful alternative representation of causal information. They enable both computation of intervention and counterfactuals, and are strictly more general, since they allow context-dependent causal dependencies. Here we present a Bayesian method for learning probability trees from a combination of interventional and observational data. The method quantifies the expected information gain from an intervention, and selects the interventions with the largest gain. We demonstrate the efficiency of the method on simulated and real data. An effective method for learning probability trees on a limited interventional budget will greatly expand their applicability.

[30]  arXiv:2205.08199 (cross-list from cs.IT) [pdf, ps, other]
Title: Sharp asymptotics on the compression of two-layer neural networks
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N . For a ReLU activation function, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.

[31]  arXiv:2205.08234 (cross-list from cs.LG) [pdf, other]
Title: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.

[32]  arXiv:2205.08364 (cross-list from cs.LG) [pdf, other]
Title: Network Gradient Descent Algorithm for Decentralized Federated Learning
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is well balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.

[33]  arXiv:2205.08370 (cross-list from cs.LG) [pdf, other]
Title: Individualized Risk Assessment of Preoperative Opioid Use by Interpretable Neural Network Regression
Comments: 14 pages, 6 tables and 2 figures in main text
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straightforward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.

[34]  arXiv:2205.08418 (cross-list from eess.SP) [pdf]
Title: Fault Detection for Non-Condensing Boilers using Simulated Building Automation System Sensor Data
Authors: Rony Shohet, Mohamed Kandil (1), J.J. McArthur (1), ((1) Department Architectural Science, Ryerson University, Toronto, Canada)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)

Building performance has been shown to degrade significantly after commissioning, resulting in increased energy consumption and associated greenhouse gas emissions. Continuous Commissioning using existing sensor networks and IoT devices has the potential to minimize this waste by continually identifying system degradation and re-tuning control strategies to adapt to real building performance. Due to its significant contribution to greenhouse gas emissions, the performance of gas boiler systems for building heating is critical. A review of boiler performance studies has been used to develop a set of common faults and degraded performance conditions, which have been integrated into a MATLAB/Simulink emulator. This resulted in a labeled dataset with approximately 10,000 simulations of steady-state performance for each of 14 non-condensing boilers. The collected data is used for training and testing fault classification using K-nearest neighbour, Decision tree, Random Forest, and Support Vector Machines. The results show that the Support Vector Machines method gave the best prediction accuracy, consistently exceeding 90%, and generalization across multiple boilers is not possible due to low classification accuracy.

[35]  arXiv:2205.08532 (cross-list from cs.DS) [pdf, ps, other]
Title: New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma
Comments: Modified title and abstract for arxiv and made some improvements to the writing
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, \delta)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $\Omega(d^2)$ samples, and in spectral norm requires $\Omega(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $\Omega(d/(\alpha^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $\alpha$-error in $\ell_2$-distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon,0)$-differential privacy.

### Replacements for Wed, 18 May 22

[36]  arXiv:1508.02905 (replaced) [pdf, other]
Title: Bayesian Dropout
Comments: 21 pages, 3 figures. Manuscript prepared 2014 and awaiting submission
Journal-ref: Procedia Computer Science 201 (2022) 771-776
Subjects: Machine Learning (stat.ML)
[37]  arXiv:2004.09455 (replaced) [pdf, other]
Title: Enforcing stationarity through the prior in vector autoregressions
Authors: Sarah E. Heaps
Comments: Accepted for publication in the Journal of Computational and Graphical Statistics
Subjects: Methodology (stat.ME)
[38]  arXiv:2007.02938 (replaced) [pdf, other]
Title: Causal Feature Selection via Orthogonal Search
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[39]  arXiv:2008.10957 (replaced) [pdf, ps, other]
Title: Are You All Normal? It Depends!
Subjects: Methodology (stat.ME)
[40]  arXiv:2009.07427 (replaced) [pdf, other]
Title: Intrinsic Riemannian Functional Data Analysis for Sparse Longitudinal Observations
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[41]  arXiv:2009.07439 (replaced) [pdf, other]
Title: On the Landscape of One-hidden-layer Sparse Networks and Beyond
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[42]  arXiv:2011.01831 (replaced) [pdf, other]
Title: Nonparametric Estimation of Functional Dynamic Factor Model
Subjects: Methodology (stat.ME)
[43]  arXiv:2105.10210 (replaced) [pdf, other]
Title: Bayesian Uncertainty Quantification of Local Volatility Model
Subjects: Applications (stat.AP); Numerical Analysis (math.NA); Other Statistics (stat.OT)
[44]  arXiv:2105.12120 (replaced) [pdf, other]
Title: Sampling random graphs with specified degree sequences
Comments: 18 pages, 14 figures, added references and applications, methods substantially improved, results expanded. Code available at this http URL
Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)
[45]  arXiv:2106.01814 (replaced) [pdf, other]
Title: Explaining Recruitment to Extremism: A Bayesian Case-Control Approach
Subjects: Methodology (stat.ME); Applications (stat.AP)
[46]  arXiv:2107.14323 (replaced) [pdf, other]
Title: Reconstruction of Random Geometric Graphs: Breaking the Omega(r) distortion barrier
Comments: v1 on arxiv was titled "Improved Reconstruction of Random Geometric Graphs." An extended abstract with the above title appeared in ICALP 2022. The current version includes the proofs that were omitted from the ICALP version and adds the section "Missing Edges."
Subjects: Computational Geometry (cs.CG); Social and Information Networks (cs.SI); Probability (math.PR); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
[47]  arXiv:2108.02151 (replaced) [pdf, other]
Title: Semiparametric Functional Factor Models with Bayesian Rank Selection
Subjects: Methodology (stat.ME); Econometrics (econ.EM); Computation (stat.CO)
[48]  arXiv:2108.06138 (replaced) [pdf, ps, other]
Title: Stochastic orders and measures of skewness and dispersion based on expectiles
Subjects: Statistics Theory (math.ST)
[49]  arXiv:2109.03457 (replaced) [pdf, other]
Title: Uncertainty Quantification and Experimental Design for Large-Scale Linear Inverse Problems under Gaussian Process Priors
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)
[50]  arXiv:2109.05386 (replaced) [pdf, other]
Title: Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation
Subjects: Applications (stat.AP); Machine Learning (stat.ML)
[51]  arXiv:2109.13374 (replaced) [pdf, other]
Title: Variance partitioning in spatio-temporal disease mapping models
Subjects: Methodology (stat.ME)
[52]  arXiv:2110.01899 (replaced) [pdf, ps, other]
Title: Random matrices in service of ML footprint: ternary random features with no performance loss
Comments: Published as a conference at ICLR2022
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[53]  arXiv:2110.05308 (replaced) [pdf, ps, other]
Title: Clustering of Diverse Multiplex Networks
Subjects: Methodology (stat.ME)
[54]  arXiv:2110.10422 (replaced) [pdf, other]
Title: PriorVAE: Encoding spatial priors with VAEs for small-area estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[55]  arXiv:2110.12406 (replaced) [pdf]
Title: Robust Variable Selection under Cellwise Contamination
Subjects: Methodology (stat.ME); Computation (stat.CO)
[56]  arXiv:2111.01050 (replaced) [pdf, ps, other]
Title: Extended probabilities and their application to statistical inference
Subjects: Statistics Theory (math.ST); Probability (math.PR)
[57]  arXiv:2112.04626 (replaced) [pdf, other]
Title: Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning
Subjects: Methodology (stat.ME)
[58]  arXiv:2201.09098 (replaced) [pdf, other]
Title: Estimation of the covariance structure from SNP allele frequencies
Comments: In this new version we added the proof that the operator norm of -D/2 is exactly square root of m
Subjects: Methodology (stat.ME)
[59]  arXiv:2202.03051 (replaced) [pdf, ps, other]
Title: Using Partial Monotonicity in Submodular Maximization
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
[60]  arXiv:2202.04648 (replaced) [pdf, other]
Title: A survey of unsupervised learning methods for high-dimensional uncertainty quantification in black-box-type problems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[61]  arXiv:2203.06126 (replaced) [pdf, other]
Title: Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[62]  arXiv:2203.10906 (replaced) [pdf, other]
Title: Bayesian inference in Epidemics: linear noise analysis
Comments: This version final after internal revision
Subjects: Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
[63]  arXiv:2204.01686 (replaced) [pdf, other]
Title: Bayesian Semiparametric Covariate Informed Multivariate Density Deconvolution
Authors: Abhra Sarkar
Subjects: Methodology (stat.ME)
[64]  arXiv:2204.08182 (replaced) [pdf, other]
Title: Modality-Balanced Embedding for Video Retrieval
Comments: Accepted by SIGIR-2022, short paper
Journal-ref: SIGIR, 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[65]  arXiv:2204.12993 (replaced) [pdf, other]
Title: Counterfactual harm
Comments: Changes to definition 3. Typos corrected and document shortened. Updated Appendices A - C
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[66]  arXiv:2205.02143 (replaced) [pdf]
Title: Estimating Complier Average Causal Effects for Clustered RCTs When the Treatment Affects the Service Population
Subjects: Methodology (stat.ME)
[67]  arXiv:2205.03310 (replaced) [pdf, ps, other]
Title: Discussion of 'Event History and Topological Data Analysis'
Authors: Peter Bubenik
Journal-ref: Biometrika, Volume 108, Issue 4, December 2021, Pages 785-788
Subjects: Statistics Theory (math.ST); Algebraic Topology (math.AT)
[68]  arXiv:2205.05777 (replaced) [pdf, other]
Title: Efficient estimation of modified treatment policy effects based on the generalized propensity score
Subjects: Methodology (stat.ME)
[69]  arXiv:2205.07331 (replaced) [pdf, other]
Title: Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Statistics Theory (math.ST); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
[ total of 69 entries: 1-69 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2205, contact, help  (Access key information)