Title: Dealing with missing data under stratified sampling designs where strata are study domains
Subjects: Applications (stat.AP)

A quick count seeks to estimate the voting trends of an election and communicate them to the population on the evening of the same day of the election. In quick counts, the sampling is based on a stratified design of polling stations. Voting information is gathered gradually, often with no guarantee of obtaining the complete sample or even information in all the strata. However, accurate interval estimates with partial information must be obtained. Furthermore, this becomes more challenging if the strata are additionally study domains. To produce partial estimates, two strategies are proposed: 1) A Bayesian model using a dynamic post-stratification strategy and a single imputation process defined after a thorough analysis of historic voting information. Additionally, a credibility level correction is included to solve the underestimation of the variance; 2) a frequentist alternative that combines standard multiple imputation ideas with classic sampling techniques to obtain estimates under a missing information framework. Both solutions are illustrated and compared using information from the 2021 quick count. The aim was to estimate the composition of the Chamber of Deputies in Mexico.

Title: Inferring the heritability of bacterial traits in the era of machine learning
Comments: arXiv admin note: text overlap with arXiv:1910.11743
Subjects: Applications (stat.AP); Genomics (q-bio.GN)

Quantification of heritability is a fundamental aim in genetics, providing answer to the question of how much genetic variation influences variation in a particular trait of interest. The traditional computational approaches for assessing the heritability of a trait have been developed in the field of quantitative genetics. However, modern sequencing methods have provided us with whole genome sequences from large populations, often together with rich phenotypic data, and this increase in data scale has led to the development of several new machine learning based approaches to inferring heritability. In this review, we systematically summarize recent advances in machine learning which can be used to perform heritability inference. We focus on bacterial genomes where heritability plays a key role in understanding phenotypes such as drug resistance and virulence, which are particularly important due to the rising frequency of antimicrobial resistance. Specifically, we present applications of these newer machine learning methods to estimate the heritability of antibiotic resistance phenotypes in several pathogens. This study presents lessons and insights for future research when using machine learning methods in heritability inference.

Title: Bullwhip Effect of Supply Networks: Joint Impact of Network Structure and Market Demand
Subjects: Systems and Control (eess.SY); Applications (stat.AP)

The progressive amplification of fluctuations in demand as the demand travels upstream the supply chains is known as the bullwhip effect. We first analytically characterize the bullwhip effect in general supply chain networks in two cases: (i) all suppliers have a unique layer position, where our method is founded on the control-theoretic approach, and (ii) not all suppliers have a unique layer position due to the presence of intra-layer links or inter-layer links between suppliers that are not positioned in consecutive layers, where we use both the absorbing Markov chain and the control-theoretic approach. We then investigate how network structures impact the BWE of supply chain networks. In particular, we analytically show that (i) if the market demand is generated from the same stationary process, the structure of supply networks does not affect the layer-wise bullwhip effect of supply networks, and (ii) if the market demand is generated from different stationary or non-stationary market processes, wider supply networks lead to a lower level of layer-wise bullwhip effect. Finally, numerical simulations are used to validate our propositions.

Title: Parameter Estimation in Ill-conditioned Low-inertia Power Systems
Comments: Submitted to 2022 IEEE North American Power Symposium (NAPS)
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC); Applications (stat.AP)

This paper examines model parameter estimation in dynamic power systems whose governing electro-mechanical equations are ill-conditioned or singular. This ill-conditioning is because of converter-interfaced power systems generators' zero or small inertia contribution. Consequently, the overall system inertia decreases, resulting in low-inertia power systems. We show that the standard state-space model based on least squares or subspace estimators fails to exist for these models. We overcome this challenge by considering a least-squares estimator directly on the coupled swing-equation model but not on its transformed first-order state-space form. We specifically focus on estimating inertia (mechanical and virtual) and damping constants, although our method is general enough for estimating other parameters. Our theoretical analysis highlights the role of network topology on the parameter estimates of an individual generator. For generators with greater connectivity, estimation of the associated parameters is more susceptible to variations in other generator states. Furthermore, we numerically show that estimating the parameters by ignoring their ill-conditioning aspects yields highly unreliable results.

Title: Restricted mean survival time estimate using covariate adjusted pseudovalue regression to improve precision
Subjects: Methodology (stat.ME); Applications (stat.AP)

Covariate adjustment is desired by both practitioners and regulators of randomized clinical trials because it improves precision for estimating treatment effects. However, covariate adjustment presents a particular challenge in time-to-event analysis. We propose to apply covariate adjusted pseudovalue regression to estimate between-treatment difference in restricted mean survival times (RMST). Our proposed method incorporates a prognostic covariate to increase precision of treatment effect estimate, maintaining strict type I error control without introducing bias. In addition, the amount of increase in precision can be quantified and taken into account in sample size calculation at the study design stage. Consequently, our proposed method provides the ability to design smaller randomized studies at no expense to statistical power.

Title: Copulaboost: additive modeling with copula-based model components
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)

We propose a type of generalised additive models with of model components based on pair-copula constructions, with prediction as a main aim. The model components are designed such that our model may capture potentially complex interaction effects in the relationship between the response covariates. In addition, our model does not require discretisation of continuous covariates, and is therefore suitable for problems with many such covariates. Further, we have designed a fitting algorithm inspired by gradient boosting, as well as efficient procedures for model selection and evaluation of the model components, through constraints on the model space and approximations, that speed up time-costly computations. In addition to being absolutely necessary for our model to be a realistic alternative in higher dimensions, these techniques may also be useful as a basis for designing efficient models selection algorithms for other types of copula regression models. We have explored the characteristics of our method in a simulation study, in particular comparing it to natural alternatives, such as logic regression, classic boosting models and penalised logistic regression. We have also illustrated our approach on the Wisconsin breast cancer dataset and on the Boston housing dataset. The results show that our method has a prediction performance that is either better than or comparable to the other methods, even when the proportion of discrete covariates is high.

Title: Applying data technologies to combat AMR: current status, challenges, and opportunities on the way forward
Comments: 65 pages, 3 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)

Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Numerous interventions have been proposed to control the development of AMR and mitigate the risks posed by its spread. This paper reviews key aspects of bacterial AMR management and control which make essential use of data technologies such as artificial intelligence, machine learning, and mathematical and statistical modelling, fields that have seen rapid developments in this century. Although data technologies have become an integral part of biomedical research, their impact on AMR management has remained modest. We outline the use of data technologies to combat AMR, detailing recent advancements in four complementary categories: surveillance, prevention, diagnosis, and treatment. We provide an overview on current AMR control approaches using data technologies within biomedical research, clinical practice, and in the "One Health" context. We discuss the potential impact and challenges wider implementation of data technologies is facing in high-income as well as in low- and middle-income countries, and recommend concrete actions needed to allow these technologies to be more readily integrated within the healthcare and public health sectors.

Title: Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets
Comments: 51 pages, 7 figures
Subjects: Applications (stat.AP)
Title: Nested Sampling for Non-Gaussian Inference in SLAM Factor Graphs
Journal-ref: IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9232-9239, Oct. 2022
Subjects: Robotics (cs.RO); Applications (stat.AP)
