We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 397 entries: 1-397 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 25 Nov 21

[1]  arXiv:2111.12111 [pdf, other]
Title: Context-based navigation for ground mobile robot in a semi-structured indoor environment
Subjects: Robotics (cs.RO); Software Engineering (cs.SE)

There is a growing demand for mobile robots to operate in more variable environments, where guaranteeing safe robot navigation is a priority, in addition to time performance. To achieve this, current solutions for local planning use a specific configuration tuned to the characteristics of the application environment. In this paper, we present an approach for developing quality models that can be used by a self-adaptation framework to adapt the local planner configuration at run-time based on the perceived environment. We contribute a definition of a safety model that predicts the safety of a navigation configuration given the perceived environment. Experiments have been performed in a realistic navigation scenario for a retail application to validate the obtained models and demonstrate their integration in a self-adaptation framework.

[2]  arXiv:2111.12115 [pdf, other]
Title: Algorithmic Fairness in Face Morphing Attack Detection
Comments: Accepted to WACVW2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face morphing attacks can compromise Face Recognition System (FRS) by exploiting their vulnerability. Face Morphing Attack Detection (MAD) techniques have been developed in recent past to deter such attacks and mitigate risks from morphing attacks. MAD algorithms, as any other algorithms should treat the images of subjects from different ethnic origins in an equal manner and provide non-discriminatory results. While the promising MAD algorithms are tested for robustness, there is no study comprehensively bench-marking their behaviour against various ethnicities. In this paper, we study and present a comprehensive analysis of algorithmic fairness of the existing Single image-based Morph Attack Detection (S-MAD) algorithms. We attempt to better understand the influence of ethnic bias on MAD algorithms and to this extent, we study the performance of MAD algorithms on a newly created dataset consisting of four different ethnic groups. With Extensive experiments using six different S-MAD techniques, we first present benchmark of detection performance and then measure the quantitative value of the algorithmic fairness for each of them using Fairness Discrepancy Rate (FDR). The results indicate the lack of fairness on all six different S-MAD methods when trained and tested on different ethnic groups suggesting the need for reliable MAD approaches to mitigate the algorithmic bias.

[3]  arXiv:2111.12116 [pdf, other]
Title: Caviar: An E-graph Based TRS for Automatic Code Optimization
Subjects: Programming Languages (cs.PL)

Term Rewriting Systems (TRS) are used in compilers to simplify and prove expressions. State-of-the-art TRSs in compilers use a greedy algorithm that applies a set of rewriting rules in a predefined order (where some of the rules are not axiomatic). This leads to a loss in the ability to simplify certain expressions. E-graphs and equality saturation sidestep this issue by representing the different equivalent expressions in a compact manner from which the optimal expression can be extracted. While an e-graph-based TRS can be more powerful than a TRS that uses a greedy algorithm, it is slower because expressions may have a large or sometimes infinite number of equivalent expressions. Accelerating e-graph construction is crucial for making the use of e-graphs practical in compilers. In this paper, we present Caviar, an e-graph-based TRS for proving expressions within compilers. Caviar is a fast (20x faster than base e-graph TRS) and flexible (completely parameterized) TRS that that relies on three novel techniques: 1) a technique that stops e-graphs from growing when the goal is reached, called Iteration Level Check; 2) a mechanism that balances exploration and exploitation in the equality saturation algorithm, called Pulsing Caviar; 3) a technique to stop e-graph construction before reaching saturation when a non-provable pattern is detected, called Non-Provable Patterns Detection (NPPD). We evaluate caviar on Halide, an optimizing compiler that relies on a greedy-algorithm-based TRS to simplify and prove its expressions. The proposed techniques allow Caviar to accelerate e-graph expansion by 20x for the task of proving expressions. They also allow Caviar to prove 51% of the expressions that Halide's TRS cannot prove while being only 0.68x slower.

[4]  arXiv:2111.12122 [pdf]
Title: Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset
Comments: 38 pages, 10 figures, submitted to journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Databases (cs.DB)

Vehicle classification is a hot computer vision topic, with studies ranging from ground-view up to top-view imagery. In remote sensing, the usage of top-view images allows for understanding city patterns, vehicle concentration, traffic management, and others. However, there are some difficulties when aiming for pixel-wise classification: (a) most vehicle classification studies use object detection methods, and most publicly available datasets are designed for this task, (b) creating instance segmentation datasets is laborious, and (c) traditional instance segmentation methods underperform on this task since the objects are small. Thus, the present research objectives are: (1) propose a novel semi-supervised iterative learning approach using GIS software, (2) propose a box-free instance segmentation approach, and (3) provide a city-scale vehicle dataset. The iterative learning procedure considered: (1) label a small number of vehicles, (2) train on those samples, (3) use the model to classify the entire image, (4) convert the image prediction into a polygon shapefile, (5) correct some areas with errors and include them in the training data, and (6) repeat until results are satisfactory. To separate instances, we considered vehicle interior and vehicle borders, and the DL model was the U-net with the Efficient-net-B7 backbone. When removing the borders, the vehicle interior becomes isolated, allowing for unique object identification. To recover the deleted 1-pixel borders, we proposed a simple method to expand each prediction. The results show better pixel-wise metrics when compared to the Mask-RCNN (82% against 67% in IoU). On per-object analysis, the overall accuracy, precision, and recall were greater than 90%. This pipeline applies to any remote sensing target, being very efficient for segmentation and generating datasets.

[5]  arXiv:2111.12123 [pdf, other]
Title: MICS : Multi-steps, Inverse Consistency and Symmetric deep learning registration network
Comments: In submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deformable registration consists of finding the best dense correspondence between two different images. Many algorithms have been published, but the clinical application was made difficult by the high calculation time needed to solve the optimisation problem. Deep learning overtook this limitation by taking advantage of GPU calculation and the learning process. However, many deep learning methods do not take into account desirable properties respected by classical algorithms.
In this paper, we present MICS, a novel deep learning algorithm for medical imaging registration. As registration is an ill-posed problem, we focused our algorithm on the respect of different properties: inverse consistency, symmetry and orientation conservation. We also combined our algorithm with a multi-step strategy to refine and improve the deformation grid. While many approaches applied registration to brain MRI, we explored a more challenging body localisation: abdominal CT. Finally, we evaluated our method on a dataset used during the Learn2Reg challenge, allowing a fair comparison with published methods.

[6]  arXiv:2111.12124 [pdf, ps, other]
Title: Towards Learning Universal Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learning systems on that benchmark. We discover that previous sound event classification or speech models do not generalize outside of their domains. We observe that more robust audio representations can be learned with the SimCLR objective; however, the model's transferability depends heavily on the model architecture. We find the Slowfast architecture is good at learning rich representations required by different domains, but its performance is affected by the normalization scheme. Based on these findings, we propose a novel normalizer-free Slowfast NFNet and achieve state-of-the-art performance across all domains.

[7]  arXiv:2111.12126 [pdf]
Title: Panoptic Segmentation Meets Remote Sensing
Comments: 43 pages, 10 figures, submitted to journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Databases (cs.DB)

Panoptic segmentation combines instance and semantic predictions, allowing the detection of "things" and "stuff" simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most algorithms are designed for traditional images, (b) image labelling must encompass "things" and "stuff" classes, and (c) the annotation format is complex. Thus, aiming to solve and increase the operability of panoptic segmentation in remote sensing, this study has five objectives: (1) create a novel data preparation pipeline for panoptic segmentation, (2) propose an annotation conversion software to generate panoptic annotations; (3) propose a novel dataset on urban areas, (4) modify the Detectron2 for the task, and (5) evaluate difficulties of this task in the urban setting. We used an aerial image with a 0,24-meter spatial resolution considering 14 classes. Our pipeline considers three image inputs, and the proposed software uses point shapefiles for creating samples in the COCO format. Our study generated 3,400 samples with 512x512 pixel dimensions. We used the Panoptic-FPN with two backbones (ResNet-50 and ResNet-101), and the model evaluation considered semantic instance and panoptic metrics. We obtained 93.9, 47.7, and 64.9 for the mean IoU, box AP, and PQ. Our study presents the first effective pipeline for panoptic segmentation and an extensive database for other researchers to use and deal with other data or related problems requiring a thorough scene understanding.

[8]  arXiv:2111.12128 [pdf, other]
Title: On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features
Subjects: Machine Learning (cs.LG)

While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general approach for handling missing features in graph machine learning applications that is based on minimization of the Dirichlet energy and leads to a diffusion-type differential equation on the graph. The discretization of this equation produces a simple, fast and scalable algorithm which we call Feature Propagation. We experimentally show that the proposed approach outperforms previous methods on seven common node-classification benchmarks and can withstand surprisingly high rates of missing features: on average we observe only around 4% relative accuracy drop when 99% of the features are missing. Moreover, it takes only 10 seconds to run on a graph with $\sim$2.5M nodes and $\sim$123M edges on a single GPU.

[9]  arXiv:2111.12132 [pdf, other]
Title: Robust Principal Component Analysis: A Construction Error Minimization Perspective
Authors: Kai Liu, Yarui Cao
Comments: 13 pages, 2 figures
Subjects: Machine Learning (cs.LG)

In this paper we propose a novel optimization framework to systematically solve robust PCA problem with rigorous theoretical guarantee, based on which we investigate very computationally economic updating algorithms.

[10]  arXiv:2111.12137 [pdf, other]
Title: Learning Interactive Driving Policies via Data-driven Simulation
Comments: The first two authors contributed equally to this this work. Code is available here: this http URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Data-driven simulators promise high data-efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a simulation method that uses in-painted ado vehicles for learning robust driving policies. Thus, our approach can be used to learn policies that involve multi-agent interactions and allows for training via state-of-the-art policy learning methods. We evaluate the approach for learning standard interaction scenarios in driving. In extensive experiments, our work demonstrates that the resulting policies can be directly transferred to a full-scale autonomous vehicle without making use of any traditional sim-to-real transfer techniques such as domain randomization.

[11]  arXiv:2111.12139 [pdf, other]
Title: ChebLieNet: Invariant Spectral Graph NNs Turned Equivariant by Riemannian Geometry on Lie Groups
Comments: submitted to NeurIPS'21, this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce ChebLieNet, a group-equivariant method on (anisotropic) manifolds. Surfing on the success of graph- and group-based neural networks, we take advantage of the recent developments in the geometric deep learning field to derive a new approach to exploit any anisotropies in data. Via discrete approximations of Lie groups, we develop a graph neural network made of anisotropic convolutional layers (Chebyshev convolutions), spatial pooling and unpooling layers, and global pooling layers. Group equivariance is achieved via equivariant and invariant operators on graphs with anisotropic left-invariant Riemannian distance-based affinities encoded on the edges. Thanks to its simple form, the Riemannian metric can model any anisotropies, both in the spatial and orientation domains. This control on anisotropies of the Riemannian metrics allows to balance equivariance (anisotropic metric) against invariance (isotropic metric) of the graph convolution layers. Hence we open the doors to a better understanding of anisotropic properties. Furthermore, we empirically prove the existence of (data-dependent) sweet spots for anisotropic parameters on CIFAR10. This crucial result is evidence of the benefice we could get by exploiting anisotropic properties in data. We also evaluate the scalability of this approach on STL10 (image data) and ClimateNet (spherical data), showing its remarkable adaptability to diverse tasks.

[12]  arXiv:2111.12140 [pdf, ps, other]
Title: Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark
Comments: Source code of the analysis is available on request
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

The amount of data for machine learning (ML) applications is constantly growing. Not only the number of observations, especially the number of measured variables (features) increases with ongoing digitization. Selecting the most appropriate features for predictive modeling is an important lever for the success of ML applications in business and research. Feature selection methods (FSM) that are independent of a certain ML algorithm - so-called filter methods - have been numerously suggested, but little guidance for researchers and quantitative modelers exists to choose appropriate approaches for typical ML problems. This review synthesizes the substantial literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment. For concrete guidance, we consider four typical dataset scenarios that are challenging for ML models (noisy, redundant, imbalanced data and cases with more features than observations). Drawing on the experience of earlier benchmarks, which have considered much fewer FSMs, we compare the performance of the methods according to four criteria (predictive performance, number of relevant features selected, stability of the feature sets and runtime). We found methods relying on the random forest approach, the double input symmetrical relevance filter (DISR) and the joint impurity filter (JIM) were well-performing candidate methods for the given dataset scenarios.

[13]  arXiv:2111.12143 [pdf, other]
Title: Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm
Comments: 28 pages, 8 figures
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); High Energy Physics - Theory (hep-th); Machine Learning (stat.ML)

Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0<l$. These quantities are particularly useful when the network architecture involves many different layers. We discuss various properties of the partial Jacobians such as their scaling with depth and relation to the neural tangent kernel (NTK). We derive the recurrence relations for the partial Jacobians and utilize them to analyze criticality of deep MLP networks with (and without) LayerNorm. We find that the normalization layer changes the optimal values of hyperparameters and critical exponents. We argue that LayerNorm is more stable when applied to preactivations, rather than activations due to larger correlation depth.

[14]  arXiv:2111.12144 [pdf, other]
Title: Mimicking Playstyle by Adapting Parameterized Behavior Trees in RTS Games
Subjects: Artificial Intelligence (cs.AI)

The discovery of Behavior Trees (BTs) impacted the field of Artificial Intelligence (AI) in games, by providing flexible and natural representation of non-player characters (NPCs) logic, manageable by game-designers. Nevertheless, increased pressure on ever better NPCs AI-agents forced complexity of handcrafted BTs to became barely-tractable and error-prone. On the other hand, while many just-launched on-line games suffer from player-shortage, the existence of AI with a broad-range of capabilities could increase players retention. Therefore, to handle above challenges, recent trends in the field focused on automatic creation of AI-agents: from deep- and reinforcementlearning techniques to combinatorial (constrained) optimization and evolution of BTs. In this paper, we present a novel approach to semi-automatic construction of AI-agents, that mimic and generalize given human gameplays by adapting and tuning of expert-created BT under a developed similarity metric between source and BT gameplays. To this end, we formulated mixed discrete-continuous optimization problem, in which topological and functional changes of the BT are reflected in numerical variables, and constructed a dedicated hybrid-metaheuristic. The performance of presented approach was verified experimentally in a prototype real-time strategy game. Carried out experiments confirmed efficiency and perspectives of presented approach, which is going to be applied in a commercial game.

[15]  arXiv:2111.12146 [pdf, other]
Title: Sharing to learn and learning to share - Fitting together Meta-Learning, Multi-Task Learning, and Transfer Learning : A meta review
Comments: 16 pages, 8 figures
Subjects: Machine Learning (cs.LG)

Integrating knowledge across different domains is an essential feature of human learning. Learning paradigms like transfer learning, meta learning, and multi-task learning reflect the human learning process by exploiting the prior knowledge for new tasks, encouraging faster learning and good generalization for new tasks. This article gives a detailed view of these learning paradigms along with a comparative analysis. The weakness of a learning algorithm turns out to be the strength of another, and thereby merging them is a prevalent trait in the literature. This work delivers a literature review of the articles, which fuses two algorithms to accomplish multiple tasks. A global generic learning network, an ensemble of meta learning, transfer learning, and multi-task learning, is also introduced here, along with some open research questions and directions for future research.

[16]  arXiv:2111.12147 [pdf, other]
Title: kmclib: Automated Inference and Verification of Session Types}
Comments: kmclib is available at this https URL
Subjects: Programming Languages (cs.PL)

Theories and tools based on multiparty session types offer correctness guarantees for concurrent programs that communicate using message-passing. These guarantees usually come at the cost of an intrinsically top-down approach, which requires the communication behaviour of the entire program to be specified as a global type. This paper introduces kmclib: an OCaml library that supports the development of correct message-passing programs without having to write any types. The library utilises the meta-programming facilities of OCaml to automatically infer the session types of concurrent programs and verify their compatibility (k-MC). Well-typed programs, written with kmclib, do not lead to communication errors and cannot get stuck.

[17]  arXiv:2111.12150 [pdf, other]
Title: Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift
Comments: 9 pages, 1 figure. Camera-ready NeurIPS 2021 DistShift workshop version
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, each with a cache of user-generated training examples that remain resident on the local device. These on-device training examples are gathered in situ during the course of users' interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet a distribution shift may still exist; the on-device training examples may lack for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate this shift: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models while still meeting the private training data access constraints imposed by FL.

[18]  arXiv:2111.12151 [pdf, other]
Title: Best Arm Identification with Safety Constraints
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.

[19]  arXiv:2111.12153 [pdf]
Title: Methodology and feasibility of neurofeedback to improve visual attention to letters in mild Alzheimer's disease
Comments: 50 pages including 6 figures and 4 tables
Subjects: Human-Computer Interaction (cs.HC)

Brain computer interfaces systems are controlled by users through neurophysiological input for a variety of applications including communication, environmental control, motor rehabilitation, and cognitive training. Although individuals with severe speech and physical impairment are the primary users of this technology, BCIs have emerged as a potential tool for broader populations, especially with regards to delivering cognitive training or interventions with neurofeedback. The goal of this study was to investigate the feasibility of using a BCI system with neurofeedback as an intervention for people with mild Alzheimer's disease. The study focused on visual attention and language since ad is often associated with functional impairments in language and reading. The study enrolled five adults with mild ad in a nine to thirteen week BCI EEG based neurofeedback intervention to improve attention and reading skills. Two participants completed intervention entirely. The remaining three participants could not complete the intervention phase because of restrictions related to covid. Pre and post assessment measures were used to assess reliability of outcome measures and generalization of treatment to functional reading, processing speed, attention, and working memory skills. Participants demonstrated steady improvement in most cognitive measures across experimental phases, although there was not a significant effect of NFB on most measures of attention. One subject demonstrated significantly significant improvement in letter cancellation during NFB. All participants with mild AD learned to operate a BCI system with training. Results have broad implications for the design and use of bci systems for participants with cognitive impairment. Preliminary evidence justifies implementing NFB-based cognitive measures in AD.

[20]  arXiv:2111.12154 [pdf, other]
Title: A Review on Analysis and Visualization Methods for Biclustering
Authors: Melih Sozdinler
Comments: 13 pages, 1 figure and 1 table
Subjects: Human-Computer Interaction (cs.HC); Data Structures and Algorithms (cs.DS)

Recently, biclustering is one of the hot topics in bioinformatics and takes the attention of authors from several different disciplines. Hence, many different methodologies from a variety of disciplines are proposed as a solution to the biclustering problem. As a consequence of this issue, a variety of solutions makes it harder to evaluate the proposed methods. With this review paper, we are aimed to discuss both analysis and visualization of biclustering as a guide for the comparisons between brand new and existing biclustering algorithms. Additionally, we concentrate on the tools that provide visualizations with accompanied analysis techniques. Through the paper, we give several references that are also a short review of the state of the art for the ones who will pursue research on biclustering. The Paper outline is as follows; we first give the visualization and analysis methods, then we evaluate each proposed tool with the visualization contribution and analysis options, finally, we discuss future directions for biclustering and we propose standards for future work.

[21]  arXiv:2111.12155 [pdf]
Title: In-field early disease recognition of potato late blight based on deep learning and proximal hyperspectral imaging
Authors: Chao Qi (1 and 2), Murilo Sandroni (3), Jesper Cairo Westergaard (4), Ea Høegh Riis Sundmark (5), Merethe Bagge (5), Erik Alexandersson (3), Junfeng Gao (1 and 6) ((1) Lincoln Agri-Robotics, Lincoln Institute for Agri-Food Technology, University of Lincoln, Lincoln, UK, (2) College of Engineering, Nanjing Agricultural University, Nanjing 210031, China, (3) Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Alnarp, Sweden, (4) Department of Plant and Environmental Sciences, University of Copenhagen, Taastrup, Denmark, (5) Danespo Breeding Company, Give, Denmark, (6) Lincoln Centre for Autonomous System, University of Lincoln, Lincoln, UK)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Effective early detection of potato late blight (PLB) is an essential aspect of potato cultivation. However, it is a challenge to detect late blight at an early stage in fields with conventional imaging approaches because of the lack of visual cues displayed at the canopy level. Hyperspectral imaging can, capture spectral signals from a wide range of wavelengths also outside the visual wavelengths. In this context, we propose a deep learning classification architecture for hyperspectral images by combining 2D convolutional neural network (2D-CNN) and 3D-CNN with deep cooperative attention networks (PLB-2D-3D-A). First, 2D-CNN and 3D-CNN are used to extract rich spectral space features, and then the attention mechanism AttentionBlock and SE-ResNet are used to emphasize the salient features in the feature maps and increase the generalization ability of the model. The dataset is built with 15,360 images (64x64x204), cropped from 240 raw images captured in an experimental field with over 20 potato genotypes. The accuracy in the test dataset of 2000 images reached 0.739 in the full band and 0.790 in the specific bands (492nm, 519nm, 560nm, 592nm, 717nm and 765nm). This study shows an encouraging result for early detection of PLB with deep learning and proximal hyperspectral imaging.

[22]  arXiv:2111.12158 [pdf, other]
Title: Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes
Journal-ref: Electronics, MDPI, 2021, 10 (20), pp.2498
Subjects: Machine Learning (cs.LG)

Long Short Term Memory LSTM-based structures have demonstrated their efficiency for daily living recognition activities in smart homes by capturing the order of sensor activations and their temporal dependencies. Nevertheless, they still fail in dealing with the semantics and the context of the sensors. More than isolated id and their ordered activation values, sensors also carry meaning. Indeed, their nature and type of activation can translate various activities. Their logs are correlated with each other, creating a global context. We propose to use and compare two Natural Language Processing embedding methods to enhance LSTM-based structures in activity-sequences classification tasks: Word2Vec, a static semantic embedding, and ELMo, a contextualized embedding. Results, on real smart homes datasets, indicate that this approach provides useful information, such as a sensor organization map, and makes less confusion between daily activity classes. It helps to better perform on datasets with competing activities of other residents or pets. Our tests show also that the embeddings can be pretrained on different datasets than the target one, enabling transfer learning. We thus demonstrate that taking into account the context of the sensors and their semantics increases the classification performances and enables transfer learning.

[23]  arXiv:2111.12159 [pdf, other]
Title: Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure
Subjects: Graphics (cs.GR); Machine Learning (cs.LG)

Synthesizing human motion with a global structure, such as a choreography, is a challenging task. Existing methods tend to concentrate on local smooth pose transitions and neglect the global context or the theme of the motion. In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre. In addition, our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat. Our music-driven dance synthesis framework is a hierarchical system that consists of three levels: pose, motif, and choreography. The pose level consists of an LSTM component that generates temporally coherent sequences of poses. The motif level guides sets of consecutive poses to form a movement that belongs to a specific distribution using a novel motion perceptual-loss. And the choreography level selects the order of the performed movements and drives the system to follow the global structure of a dance genre. Our results demonstrate the effectiveness of our music-driven framework to generate natural and consistent movements on various dance types, having control over the content of the synthesized motions, and respecting the overall structure of the dance.

[24]  arXiv:2111.12166 [pdf, other]
Title: Towards Empirical Sandwich Bounds on the Rate-Distortion Function
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Rate-distortion (R-D) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed subject to a fidelity criterion, by any compression algorithm. As researchers push for ever-improving compression performance, establishing the R-D function of a given data source is not only of scientific interest, but also sheds light on the possible room for improving compression algorithms. Previous work on this problem relied on distributional assumptions on the data source (Gibson, 2017) or only applied to discrete data. By contrast, this paper makes the first attempt at an algorithm for sandwiching the R-D function of a general (not necessarily discrete) source requiring only i.i.d. data samples. We estimate R-D sandwich bounds on Gaussian and high-dimension banana-shaped sources, as well as GAN-generated images. Our R-D upper bound on natural images indicates room for improving the performance of state-of-the-art image compression methods by 1 dB in PSNR at various bitrates.

[25]  arXiv:2111.12167 [pdf, other]
Title: PT-VTON: an Image-Based Virtual Try-On Network with Progressive Pose Attention Transfer
Comments: Short Version with 4 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The virtual try-on system has gained great attention due to its potential to give customers a realistic, personalized product presentation in virtualized settings. In this paper, we present PT-VTON, a novel pose-transfer-based framework for cloth transfer that enables virtual try-on with arbitrary poses. PT-VTON can be applied to the fashion industry within minimal modification of existing systems while satisfying the overall visual fashionability and detailed fabric appearance requirements. It enables efficient clothes transferring between model and user images with arbitrary pose and body shape. We implement a prototype of PT-VTON and demonstrate that our system can match or surpass many other approaches when facing a drastic variation of poses by preserving detailed human and fabric characteristic appearances. PT-VTON is shown to outperform alternative approaches both on machine-based quantitative metrics and qualitative results.

[26]  arXiv:2111.12170 [pdf, other]
Title: Domain-Agnostic Clustering with Self-Distillation
Comments: NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in self-supervised learning have reduced the gap between supervised and unsupervised representation learning. However, most self-supervised and deep clustering techniques rely heavily on data augmentation, rendering them ineffective for many learning tasks where insufficient domain knowledge exists for performing augmentation. We propose a new self-distillation based algorithm for domain-agnostic clustering. Our method builds upon the existing deep clustering frameworks and requires no separate student model. The proposed method outperforms existing domain agnostic (augmentation-free) algorithms on CIFAR-10. We empirically demonstrate that knowledge distillation can improve unsupervised representation learning by extracting richer `dark knowledge' from the model than using predicted labels alone. Preliminary experiments also suggest that self-distillation improves the convergence of DeepCluster-v2.

[27]  arXiv:2111.12172 [pdf, other]
Title: Multi-label Iterated Learning for Image Classification with Label Ambiguity
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe

[28]  arXiv:2111.12174 [pdf, other]
Title: Using Distributional Principles for the Semantic Study of Contextual Language Models
Authors: Olivier Ferret
Comments: PACLIC 35
Subjects: Computation and Language (cs.CL)

Many studies were recently done for investigating the properties of contextual language models but surprisingly, only a few of them consider the properties of these models in terms of semantic similarity. In this article, we first focus on these properties for English by exploiting the distributional principle of substitution as a probing mechanism in the controlled context of SemCor and WordNet paradigmatic relations. Then, we propose to adapt the same method to a more open setting for characterizing the differences between static and contextual language models.

[29]  arXiv:2111.12175 [pdf, other]
Title: Three-Way Deep Neural Network for Radio Frequency Map Generation and Source Localization
Comments: 5 pages, 5 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain to construct a smooth radio frequency map (RFMap) and then perform localization using a deep neural network. Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in facilitating dynamic spectrum access (DSA) in beyond-5G and 6G communication technologies. Localization, wireless signal detection, and spectrum policy-making are several of the applications where distributed spectrum sensing will play a significant role. Detection and positioning of wireless emitters is a very challenging task in a large spectral and spatial area. In order to construct a smooth RFMap database, a large number of measurements are required which can be very expensive and time consuming. One approach to help realize these systems is to collect finite localized measurements across a given area and then interpolate the measurement values to construct the database. Current methods in the literature employ channel modeling to construct the radio frequency map, which lacks the granularity for accurate localization whereas our proposed approach reconstructs a new generalized RFMap. Localization results are presented and compared with conventional channel models.

[30]  arXiv:2111.12181 [pdf, other]
Title: Channel Characterization of Diffusion-based Molecular Communication with Multiple Fully-absorbing Receivers
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper an analytical model is introduced to describe the impulse response of the diffusive channel between a pointwise transmitter and a given fully-absorbing (FA) receiver in a molecular communication (MC) system. The presence of neighbouring FA nanomachines in the environment is taken into account by describing them as sources of negative molecules. The channel impulse responses of all the receivers are linked in a system of integral equations. The solution of the system with two receivers is obtained analytically. For a higher number of receivers the system of integral equations is solved numerically. It is also shown that the channel impulse response shape is distorted by the presence of the interferers. For instance, there is a time shift of the peak in the number of absorbed molecules compared to the case without interference, as predicted by the proposed model. The analytical derivations are validated by means of particle based simulations.

[31]  arXiv:2111.12182 [pdf]
Title: Identifying Terms and Conditions Important to Consumers using Crowdsourcing
Subjects: Human-Computer Interaction (cs.HC)

Terms and conditions (T&Cs) are pervasive on the web and often contain important information for consumers, but are rarely read. Previous research has explored methods to surface alarming privacy policies using manual labelers, natural language processing, and deep learning techniques. However, this prior work used pre-determined categories for annotations, and did not investigate what consumers really deem as important from their perspective. In this paper, we instead combine crowdsourcing with an open definition of "what is important" in T&Cs. We present a workflow consisting of pairwise comparisons, agreement validation, and Bradley-Terry rank modeling, to effectively establish rankings of T&C statements from non-expert crowdworkers on this open definition, and further analyzed consumers' preferences. We applied this workflow to 1,551 T&C statements from 27 e-commerce websites, contributed by 3,462 unique crowd workers doing 203,068 pairwise comparisons, and conducted thematic and readability analysis on the statements considered as important/unimportant. We found that consumers especially cared about policies related to after-sales and money, and tended to regard harder-to-understand statements as more important. We also present machine learning models to identify T&C clauses that consumers considered important, achieving at best a 92.7% balanced accuracy, 91.6% recall, and 89.2% precision. We foresee using our workflow and model to efficiently and reliably highlight important T&Cs on websites at a large scale, improving consumers' awareness

[32]  arXiv:2111.12184 [pdf, other]
Title: Style-Guided Web Application Exploration
Subjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)

A wide range of analysis and testing techniques targeting modern web apps rely on the automated exploration of their state space by firing events that mimic user interactions. However, finding out which elements are actionable in web apps is not a trivial task. To improve the efficacy of exploring the event space of web apps, we propose a browser-independent, instrumentation-free approach based on structural and visual stylistic cues. Our approach, implemented in a tool called StyleX, employs machine learning models, trained on 700,000 web elements from 1,000 real-world websites, to predict actionable elements on a webpage a priori. In addition, our approach uses stylistic cues for ranking these actionable elements while exploring the app. Our actionable predictor models achieve 90.14\% precision and 87.76\% recall when considering the click event listener, and on average, 75.42\% precision and 77.76\% recall when considering the five most-frequent event types. Our evaluations show that StyleX can improve the JavaScript code coverage achieved by a general-purpose crawler by up to 23\%.

[33]  arXiv:2111.12187 [pdf, other]
Title: Input Convex Gradient Networks
Comments: Accepted to NeurIPS 2021 Optimal Transport and Machine Learning Workshop this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The gradients of convex functions are expressive models of non-trivial vector fields. For example, Brenier's theorem yields that the optimal transport map between any two measures on Euclidean space under the squared distance is realized as a convex gradient, which is a key insight used in recent generative flow models. In this paper, we study how to model convex gradients by integrating a Jacobian-vector product parameterized by a neural network, which we call the Input Convex Gradient Network (ICGN). We theoretically study ICGNs and compare them to taking the gradient of an Input-Convex Neural Network (ICNN), empirically demonstrating that a single layer ICGN can fit a toy example better than a single layer ICNN. Lastly, we explore extensions to deeper networks and connections to constructions from Riemannian geometry.

[34]  arXiv:2111.12193 [pdf, other]
Title: Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equivariance and improve it with approximate implicit differentiation, allowing for better optimization while being faster and saving memory. In a range of toy experiments, we show that the perspective of multiset-equivariance is beneficial and that our changes to DSPN achieve better results in most cases. On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation.

[35]  arXiv:2111.12197 [pdf, other]
Title: Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Cyber attacks are increasing in volume, frequency, and complexity. In response, the security community is looking toward fully automating cyber defense systems using machine learning. However, so far the resultant effects on the coevolutionary dynamics of attackers and defenders have not been examined. In this whitepaper, we hypothesise that increased automation on both sides will accelerate the coevolutionary cycle, thus begging the question of whether there are any resultant fixed points, and how they are characterised. Working within the threat model of Locked Shields, Europe's largest cyberdefense exercise, we study blackbox adversarial attacks on network classifiers. Given already existing attack capabilities, we question the utility of optimal evasion attack frameworks based on minimal evasion distances. Instead, we suggest a novel reinforcement learning setting that can be used to efficiently generate arbitrary adversarial perturbations. We then argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions, and introduce a temporally extended multi-agent reinforcement learning framework in which the resultant dynamics can be studied. We hypothesise that one plausible fixed point of AI-NIDS may be a scenario where the defense strategy relies heavily on whitelisted feature flow subspaces. Finally, we demonstrate that a continual learning approach is required to study attacker-defender dynamics in temporally extended general-sum games.

[36]  arXiv:2111.12202 [pdf]
Title: Combinations of Jaccard with Numerical Measures for Collaborative Filtering Enhancement: Current Work and Future Proposal
Comments: 13 pages, 6 Tables and 2 Figures
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Collaborative filtering (CF) is an important approach for recommendation system which is widely used in a great number of aspects of our life, heavily in the online-based commercial systems. One popular algorithms in CF is the K-nearest neighbors (KNN) algorithm, in which the similarity measures are used to determine nearest neighbors of a user, and thus to quantify the dependency degree between the relative user/item pair. Consequently, CF approach is not just sensitive to the similarity measure, yet it is completely contingent on selection of that measure. While Jaccard - as one of those commonly used similarity measures for CF tasks - concerns the existence of ratings, other numerical measures such as cosine and Pearson concern the magnitude of ratings. Particularly speaking, Jaccard is not a dominant measure, but it is long proven to be an important factor to improve any measure. Therefore, in our continuous efforts to find the most effective similarity measures for CF, this research focuses on proposing new similarity measure via combining Jaccard with several numerical measures. The combined measures would take the advantages of both existence and magnitude. Experimental results on, Movie-lens dataset, showed that the combined measures are preeminent outperforming all single measures over the considered evaluation metrics.

[37]  arXiv:2111.12204 [pdf, other]
Title: The Reproducibility of Programming-Related Issues in Stack Overflow Questions
Comments: This article is under the minor revision of the EMSE journal
Subjects: Software Engineering (cs.SE)

Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues. The outcomes of our study are three-fold. First, we found that we can reproduce approximately 68% of Java and 71% of Python issues, whereas we were unable to reproduce approximately 22% of Java and 19% of Python issues using the code segments. Of the issues that were reproducible, approximately 67% of the Java code segments and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines for writing effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigate the confounding factors (e.g., reputation) and find that confounding factors do not hurt the correlation between reproducibility status and answer meta-data.

[38]  arXiv:2111.12209 [pdf]
Title: Sistema de sensoriamento sem fio aplicavel a deteccao de incendios florestais
Comments: in Portuguese
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The developed system also has an integrated web application (dashboard) and that in real time, collects data from wireless sensors, which together form the sensor module, also called device. Then, this data is presented on a map associ- ated with the positioning of each sensor module. The developed system was tested using practical experiments that used flames, gases and lighting, simulating the occurrence of fires. With the tests performed, it was observed the feasibility of the system, hardware/software developed, in detecting the fires in the simulated scenarios. Therefore, it was found that the research is promising, and may advance in the future for the detection of real fires.

[39]  arXiv:2111.12210 [pdf, other]
Title: From Kepler to Newton: the Role of Explainable AI in Science Discovery
Comments: 14 pages, 8 figures, 6 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC)

The research paradigm of the Observation--Hypothesis--Prediction--Experimentation loop has been practiced by researchers for years towards scientific discovery. However, with the data explosion in both mega-scale and milli-scale scientific research, it has been sometimes very difficult to manually analyze the data and propose new hypothesis to drive the cycle for scientific discovery.
In this paper, we introduce an Explainable AI-assisted paradigm for science discovery. The key is to use Explainable AI (XAI) to help derive data or model interpretations and science discoveries. We show how computational and data-intensive methodology -- together with experimental and theoretical methodology -- can be seamlessly integrated for scientific research. To demonstrate the AI-assisted science discovery process, and to pay our respect to some of the greatest minds in human history, we show how Kepler's laws of planetary motion and Newton's law of universal gravitation can be rediscovered by (explainable) AI based on Tycho Brahe's astronomical observation data, whose works were leading the scientific revolution in the 16-17th century. This work also highlights the importance of Explainable AI (as compared to black-box AI) in science discovery to help humans prevent or better prepare for the possible technological singularity which may happen in the future.

[40]  arXiv:2111.12212 [pdf, other]
Title: Long-Term CSI-based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning
Comments: Under revision in IEEE journal. Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this paper, we study the transmission design for reconfigurable intelligent surface (RIS)-aided multiuser communication networks. Different from most of the existing contributions, we consider long-term CSI-based transmission design, where both the beamforming vectors at the base station (BS) and the phase shifts at the RIS are designed based on long-term CSI, which can significantly reduce the channel estimation overhead. Due to the lack of explicit ergodic data rate expression, we propose a novel deep deterministic policy gradient (DDPG) based algorithm to solve the optimization problem, which was trained by using the channel vectors generated in an offline manner. Simulation results demonstrate that the achievable net throughput is higher than that achieved by the conventional instantaneous-CSI based scheme when taking the channel estimation overhead into account.

[41]  arXiv:2111.12213 [pdf, other]
Title: Ex-DoF: Expansion of Action Degree-of-Freedom with Virtual Camera Rotation for Omnidirectional Image
Authors: Kosuke Tahara, Noriaki Hirose (Toyota Central R&D Labs., Inc.)
Comments: 8 pages, 9 figures, 2 tables
Subjects: Robotics (cs.RO)

Inter-robot transfer of training data is a little explored topic in learning and vision-based robot control. Thus, we propose a transfer method from a robot with a lower Degree-of-Freedom (DoF) action to one with a higher DoF utilizing an omnidirectional camera. The virtual rotation of the robot camera enables data augmentation in this transfer learning process. In this study, a vision-based control policy for a 6-DoF robot was trained using a dataset collected by a differential wheeled ground robot with only three DoFs. Towards application of robotic manipulations, we also demonstrate a control system of a 6-DoF arm robot using multiple policies with different fields of view to enable object reaching tasks.

[42]  arXiv:2111.12217 [pdf, other]
Title: Scale-Invariant Strength Assortativity of Streaming Butterflies
Comments: Submitted for publication
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB); Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI)

Bipartite graphs are rich data structures with prevalent applications and identifier structural features. However, less is known about their growth patterns, particularly in streaming settings. Current works study the patterns of static or aggregated temporal graphs optimized for certain down-stream analytics or ignoring multipartite/non-stationary data distributions, emergence patterns of subgraphs, and streaming paradigms. To address these, we perform statistical network analysis over web log streams and identify the governing patterns underlying the bursty emergence of mesoscopic building blocks, 2,2-bicliques known as butterflies, leading to a phenomenon that we call "scale-invariant strength assortativity of streaming butterflies". We provide the graph-theoretic explanation of this phenomenon. We further introduce a set of micro-mechanics in the body of a streaming growth algorithm, sGrow, to pinpoint the generative origins. sGrow supports streaming paradigms, emergence of 4-vertex graphlets, and provides user-specified configurations for the scale, burstiness, level of strength assortativity, probability of out-of-order records, generation time, and time-sensitive connections. Comprehensive Evaluations on pattern reproducing and stress testing validate the effectiveness, efficiency, and robustness of sGrow in realization of the observed patterns independent of initial conditions, scale, temporal characteristics, and model configurations. Theoretical and experimental analysis verify the robust ability of sGrow in generating streaming graphs based on user-specified configurations that affect the scale and burstiness of the stream, level of strength assortativity, probability of-of-order streaming records, generation time, and time-sensitive connections.

[43]  arXiv:2111.12218 [pdf, other]
Title: Flexible Pattern Discovery and Analysis
Comments: Preprint. 10 figures, 4 tables
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Based on the analysis of the proportion of utility in the supporting transactions used in the field of data mining, high utility-occupancy pattern mining (HUOPM) has recently attracted widespread attention. Unlike high-utility pattern mining (HUPM), which involves the enumeration of high-utility (e.g., profitable) patterns, HUOPM aims to find patterns representing a collection of existing transactions. In practical applications, however, not all patterns are used or valuable. For example, a pattern might contain too many items, that is, the pattern might be too specific and therefore lack value for users in real life. To achieve qualified patterns with a flexible length, we constrain the minimum and maximum lengths during the mining process and introduce a novel algorithm for the mining of flexible high utility-occupancy patterns. Our algorithm is referred to as HUOPM+. To ensure the flexibility of the patterns and tighten the upper bound of the utility-occupancy, a strategy called the length upper-bound (LUB) is presented to prune the search space. In addition, a utility-occupancy nested list (UO-nlist) and a frequency-utility-occupancy table (FUO-table) are employed to avoid multiple scans of the database. Evaluation results of the subsequent experiments confirm that the proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets. Moreover, it can decrease the execution time and memory consumption.

[44]  arXiv:2111.12221 [pdf]
Title: Source-free unsupervised domain adaptation for cross-modality abdominal multi-organ segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It is valuable to achieve domain adaptation to transfer the learned knowledge from the source labeled CT dataset to the target unlabeled MR dataset for abdominal multi-organ segmentation. Meanwhile, it is highly desirable to avoid high annotation cost of target dataset and protect privacy of source dataset. Therefore, we propose an effective source-free unsupervised domain adaptation method for cross-modality abdominal multi-organ segmentation without accessing the source dataset. The process of the proposed framework includes two stages. At the first stage, the feature map statistics loss is used to align the distributions of the source and target features in the top segmentation network, and entropy minimization loss is used to encourage high confidence segmentations. The pseudo-labels outputted from the top segmentation network is used to guide the style compensation network to generate source-like images. The pseudo-labels outputted from the middle segmentation network is used to supervise the learning of the desired model (the bottom segmentation network). At the second stage, the circular learning and the pixel-adaptive mask refinement are used to further improve the performance of the desired model. With this approach, we achieve satisfactory performances on the segmentations of liver, right kidney, left kidney, and spleen with the dice similarity coefficients of 0.884, 0.891, 0.864, and 0.911, respectively. In addition, the proposed approach can be easily extended to the situation when there exists target annotation data. The performance improves from 0.888 to 0.922 in average dice similarity coefficient, close to the supervised learning (0.929), with only one labeled MR volume.

[45]  arXiv:2111.12229 [pdf, other]
Title: Subspace Adversarial Training
Subjects: Machine Learning (cs.LG)

Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust. However, a serious problem of catastrophic overfitting exists, i.e., the robust accuracy against projected gradient descent (PGD) attack suddenly drops to $0\%$ during the training. In this paper, we understand this problem from a novel perspective of optimization and firstly reveal the close link between the fast-growing gradient of each sample and overfitting, which can also be applied to understand the robust overfitting phenomenon in multi-step AT. To control the growth of the gradient during the training, we propose a new AT method, subspace adversarial training (Sub-AT), which constrains the AT in a carefully extracted subspace. It successfully resolves both two kinds of overfitting and hence significantly boosts the robustness. In subspace, we also allow single-step AT with larger steps and larger radius, which further improves the robustness performance. As a result, we achieve the state-of-the-art single-step AT performance: our pure single-step AT can reach over $\mathbf{51}\%$ robust accuracy against strong PGD-50 attack with radius $8/255$ on CIFAR-10, even surpassing the standard multi-step PGD-10 AT with huge computational advantages. The code is released$\footnote{\url{https://github.com/nblt/Sub-AT}}$.

[46]  arXiv:2111.12231 [pdf, other]
Title: Universal Deep Network for Steganalysis of Color Image based on Channel Representation
Comments: To be improved version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Up to now, most existing steganalytic methods are designed for grayscale images, and they are not suitable for color images that are widely used in current social networks. In this paper, we design a universal color image steganalysis network (called UCNet) in spatial and JPEG domains. The proposed method includes preprocessing, convolutional, and classification modules. To preserve the steganographic artifacts in each color channel, in preprocessing module, we firstly separate the input image into three channels according to the corresponding embedding spaces (i.e. RGB for spatial steganography and YCbCr for JPEG steganography), and then extract the image residuals with 62 fixed high-pass filters, finally concatenate all truncated residuals for subsequent analysis rather than adding them together with normal convolution like existing CNN-based steganalyzers. To accelerate the network convergence and effectively reduce the number of parameters, in convolutional module, we carefully design three types of layers with different shortcut connections and group convolution structures to further learn high-level steganalytic features. In classification module, we employ a global average pooling and fully connected layer for classification. We conduct extensive experiments on ALASKA II to demonstrate that the proposed method can achieve state-of-the-art results compared with the modern CNN-based steganalyzers (e.g., SRNet and J-YeNet) in both spatial and JPEG domains, while keeping relatively few memory requirements and training time. Furthermore, we also provide necessary descriptions and many ablation experiments to verify the rationality of the network design.

[47]  arXiv:2111.12232 [pdf, other]
Title: PMSSC: Parallelizable Multi-Subset based Self-Expressive Model for Subspace Clustering
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Subspace clustering methods embrace a self-expressive model that represents each data point as a linear combination of other data points in the dataset are powerful unsupervised learning techniques. However, when dealing with large-scale datasets, the representation of each data point by referring to all data points as a dictionary suffers from high computational complexity. To alleviate this issue, we introduce a parallelizable multi-subset based self-expressive model (PMS) which represents each data point by combing multiple subsets, with each consisting of only a small percentage of samples. The adoption of PMS in subspace clustering (PMSSC) leads to computational advantages because each optimization problem decomposed into each subset is small, and can be solved efficiently in parallel. Besides, PMSSC is able to combine multiple self-expressive coefficient vectors obtained from subsets, which contributes to the improvement of self-expressiveness. Extensive experiments on synthetic data and real-world datasets show the efficiency and effectiveness of our approach against competitive methods.

[48]  arXiv:2111.12233 [pdf, other]
Title: Scaling Up Vision-Language Pre-training for Image Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

In recent years, we have witnessed significant performance boost in the image captioning task based on vision-language pre-training (VLP). Scale is believed to be an important factor for this advance. However, most existing work only focuses on pre-training transformers with moderate sizes (e.g., 12 or 24 layers) on roughly 4 million images. In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning. We use the state-of-the-art VinVL model as our reference model, which consists of an image feature extractor and a transformer model, and scale the transformer both up and down, with model sizes ranging from 13 to 675 million parameters. In terms of data, we conduct experiments with up to 200 million image-text pairs which are automatically collected from web based on the alt attribute of the image (dubbed as ALT200M). Extensive analysis helps to characterize the performance trend as the model size and the pre-training data size increase. We also compare different training recipes, especially for training on large-scale noisy data. As a result, LEMON achieves new state of the arts on several major image captioning benchmarks, including COCO Caption, nocaps, and Conceptual Captions. We also show LEMON can generate captions with long-tail visual concepts when used in a zero-shot manner.

[49]  arXiv:2111.12238 [pdf, other]
Title: Composing Loop-carried Dependence with Other Loops
Subjects: Programming Languages (cs.PL)

Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one of the kernels has loop-carried dependencies. Available implementations optimize individual sparse kernels. When optimized separately, the irregular dependence patterns of sparse kernels create synchronization overheads and load imbalance, and their irregular memory access patterns result in inefficient cache usage, which reduces parallel efficiency. Sparse fusion uses a novel inspection strategy with code transformations to generate parallel fused code for sparse kernel combinations that is optimized for data locality and load balance. Code generated by Sparse fusion outperforms the existing implementations ParSy and MKL on average 1.6X and 5.1X respectively and outperforms the LBC and DAGP coarsening strategies applied to a fused data dependence graph on average 5.1X and 7.2X respectively for various kernel combinations.

[50]  arXiv:2111.12239 [pdf, ps, other]
Title: Harmonic Centrality of Some Graph Families
Comments: 10 pages, 5 figures
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

One of the more recent measures of centrality in social network analysis is the normalized harmonic centrality. A variant of the closeness centrality, harmonic centrality sums the inverse of the geodesic distances of each node to other nodes where it is 0 if there is no path from one node to another. It is then normalized by dividing it by m-1, where m is the number of nodes of the graph. In this paper, we present notions regarding the harmonic centrality of some important classes of graphs.

[51]  arXiv:2111.12241 [pdf, other]
Title: Hierarchical Federated Learning based Anomaly Detection using Digital Twins for Smart Healthcare
Subjects: Cryptography and Security (cs.CR)

Internet of Medical Things (IoMT) is becoming ubiquitous with a proliferation of smart medical devices and applications used in smart hospitals, smart-home based care, and nursing homes.It utilizes smart medical devices and cloud computing services along with core Internet of Things (IoT) technologies to sense patients' vital body parameters, monitor health conditions and generate multivariate data to support just-in-time health services. Mostly, this large amount of data is analyzed in centralized servers. Anomaly Detection (AD) in a centralized healthcare ecosystem is often plagued by significant delays in response time with high performance overhead. Moreover, there are inherent privacy issues associated with sending patients' personal health data to a centralized server, which may also introduce several security threats to the AD model, such as possibility of data poisoning. To overcome these issues with centralized AD models, here we propose a Federated Learning (FL) based AD model which utilizes edge cloudlets to run AD models locally without sharing patients' data. Since existing FL approaches perform aggregation on a single server which restricts the scope of FL, in this paper, we introduce a hierarchical FL that allows aggregation at different levels enabling multi-party collaboration. We introduce a novel disease-based grouping mechanism where different AD models are grouped based on specific types of diseases. Furthermore, we develop a new Federated Time Distributed (FedTimeDis) Long Short-Term Memory (LSTM) approach to train the AD model. We present a Remote Patient Monitoring (RPM) use case to demonstrate our model, and illustrate a proof-of-concept implementation using Digital Twin (DT) and edge cloudlets.

[52]  arXiv:2111.12242 [pdf, other]
Title: PU-Transformer: Point Cloud Upsampling Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Given the rapid development of 3D scanners, point clouds are becoming popular in AI-driven machines. However, point cloud data is inherently sparse and irregular, causing major difficulties for machine perception. In this work, we focus on the point cloud upsampling task that intends to generate dense high-fidelity point clouds from sparse input data. Specifically, to activate the transformer's strong capability in representing features, we develop a new variant of a multi-head self-attention structure to enhance both point-wise and channel-wise relations of the feature map. In addition, we leverage a positional fusion block to comprehensively capture the local context of point cloud data, providing more position-related information about the scattered points. As the first transformer model introduced for point cloud upsampling, we demonstrate the outstanding performance of our approach by comparing with the state-of-the-art CNN-based methods on different benchmarks quantitatively and qualitatively.

[53]  arXiv:2111.12243 [pdf, other]
Title: Differentiating-based Vectorization for Sparse Kernels
Subjects: Programming Languages (cs.PL)

Sparse computations frequently appear in scientific simulations and the performance of these simulations rely heavily on the optimization of the sparse codes. The compact data structures and irregular computation patterns in sparse matrix computations introduce challenges to vectorizing these codes. Available approaches primarily vectorize regular regions of computations in the sparse code. They also reorganize data and computations, at a cost, to increase the number of regular regions. In this work, we propose a novel polyhedral model, called the partially strided codelets (PSC), that enables the vectorization of computation regions with irregular data access patterns. PSCs also improve data locality in sparse computation. Our DDF inspector-executor framework efficiently mines the memory accesses in the sparse computation, using an access function differentiation approach, to find PSC codelets. It generates vectorized code for the sparse matrix multiplication kernel (SpMV), a kernel with parallel outer loops, and for kernels with carried dependence, specifically the sparse triangular solver (SpTRSV). We demonstrate the performance of the DDF-generated code on a set of 60 large and small matrices (0.05-130M nonzeros). DDF outperforms the highly specialized library MKL with an average speedup of 1.93 and 4.5X for SpMV and SpTRSV, respectively. For the same matrices, DDF outperforms the state-of-the-art inspector-executor framework Sympiler [1] for the SpTRSV kernel by up to 11X and the work by Augustine et. al [2] for the SpMV kernel by up to 12X.

[54]  arXiv:2111.12253 [pdf, other]
Title: Third-party Service Dependencies and Centralization Around the World
Authors: Rashna Kumar, Sana Asif, Elise Lee, Fabi'an E. Bustamante (Northwestern University)
Comments: 17 pages, 14 figures
Subjects: Networking and Internet Architecture (cs.NI)

There is a growing concern about consolidation trends in Internet services, with, for instance, a large fraction of popular websites depending on a handful of third-party service providers. In this paper, we report on a large-scale study of third-party dependencies around the world, using vantage points from 50 countries, from all inhabited continents, and regional top-500 popular websites.This broad perspective shows that dependencies vary widely around the world. We find that between 15% and as much as 80% of websites, across all countries, depend on a DNS, CDN or CA third-party provider.Sites critical dependencies, while lower, are equally spread ranging from 9% and 61% (CDN and DNS in China, respectively).Despite this high variability, our results suggest a highly concentrated market of third-party providers: three third-party providers across all countries serve an average of 91.2% and Google, by itself, serves an average of 72% of the surveyed websites. We explore various factors that may help explain the differences and similarities in degrees of third-party dependency across countries, including economic conditions, Internet development, language, and economic trading partners.

[55]  arXiv:2111.12255 [pdf, other]
Title: A Family of Independent Variable Eddington Factor Methods with Efficient Linear Solvers
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

We present a family of discretizations for the Variable Eddington Factor (VEF) equations that have high-order accuracy on curved meshes and efficient preconditioned iterative solvers. The VEF discretizations are combined with a high-order Discontinuous Galerkin transport discretization to form an effective high-order, linear transport method. The VEF discretizations are derived by extending the unified analysis of Discontinuous Galerkin methods for elliptic problems to the VEF equations. This framework is used to define analogs of the interior penalty, second method of Bassi and Rebay, minimal dissipation local Discontinuous Galerkin, and continuous finite element methods. The analysis of subspace correction preconditioners, which use a continuous operator to iteratively precondition the discontinuous discretization, is extended to the case of the non-symmetric VEF system. Numerical results demonstrate that the VEF discretizations have arbitrary-order accuracy on curved meshes, preserve the thick diffusion limit, and are effective on a proxy problem from thermal radiative transfer in both outer transport iterations and inner preconditioned linear solver iterations. In addition, a parallel weak scaling study of the interior penalty VEF discretization demonstrates the scalability of the method out to 1152 processors.

[56]  arXiv:2111.12256 [pdf, other]
Title: ACD-EDMD: Analytical Construction for Dictionaries of Lifting Functions in Koopman Operator-based Nonlinear Robotic Systems
Comments: Accepted to IEEE Robotics and Automation Letters (RA-L), November 2021
Subjects: Robotics (cs.RO)

Koopman operator theory has been gaining momentum for model extraction, planning, and control of data-driven robotic systems. The Koopman operator's ability to extract dynamics from data depends heavily on the selection of an appropriate dictionary of lifting functions. In this paper we propose ACD-EDMD, a new method for Analytical Construction of Dictionaries of appropriate lifting functions for a range of data-driven Koopman operator based nonlinear robotic systems. The key insight of this work is that information about fundamental topological spaces of the nonlinear system (such as its configuration space and workspace) can be exploited to steer the construction of Hermite polynomial-based lifting functions. We show that the proposed method leads to dictionaries that are simple to implement while enjoying provable completeness and convergence guarantees when observables are weighted bounded. We evaluate ACD-EDMD using a range of diverse nonlinear robotic systems in both simulated and physical hardware experimentation (a wheeled mobile robot, a two-revolute-joint robotic arm, and a soft robotic leg). Results reveal that our method leads to dictionaries that enable high-accuracy prediction and that can generalize to diverse validation sets. The associated GitHub repository of our algorithm can be accessed at \url{https://github.com/UCR-Robotics/ACD-EDMD}.

[57]  arXiv:2111.12257 [pdf, ps, other]
Title: Post-Quantum Zero Knowledge, Revisited (or: How to Do Quantum Rewinding Undetectably)
Comments: 96 pages, 9 figures
Subjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)

A major difficulty in quantum rewinding is the fact that measurement is destructive: extracting information from a quantum state irreversibly changes it. This is especially problematic in the context of zero-knowledge simulation, where preserving the adversary's state is essential.
In this work, we develop new techniques for quantum rewinding in the context of extraction and zero-knowledge simulation:
(1) We show how to extract information from a quantum adversary by rewinding it without disturbing its internal state. We use this technique to prove that important interactive protocols, such as the Goldreich-Micali-Wigderson protocol for graph non-isomorphism and the Feige-Shamir protocol for NP, are zero-knowledge against quantum adversaries.
(2) We prove that the Goldreich-Kahan protocol for NP is post-quantum zero knowledge using a simulator that can be seen as a natural quantum extension of the classical simulator.
Our results achieve (constant-round) black-box zero-knowledge with negligible simulation error, appearing to contradict a recent impossibility result due to Chia-Chung-Liu-Yamakawa (FOCS 2021). This brings us to our final contribution:
(3) We introduce coherent-runtime expected quantum polynomial time, a computational model that (a) captures all of our zero-knowledge simulators, (b) cannot break any polynomial hardness assumptions, and (c) is not subject to the CCLY impossibility. In light of our positive results and the CCLY negative results, we propose coherent-runtime simulation to be the right quantum analogue of classical expected polynomial-time simulation.

[58]  arXiv:2111.12262 [pdf, other]
Title: Reinforcement Learning based Path Exploration for Sequential Explainable Recommendation
Comments: arXiv admin note: substantial text overlap with arXiv:2101.01433
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Recent advances in path-based explainable recommendation systems have attracted increasing attention thanks to the rich information provided by knowledge graphs. Most existing explainable recommendations only utilize static knowledge graphs and ignore the dynamic user-item evolutions, leading to less convincing and inaccurate explanations. Although there are some works that realize that modelling user's temporal sequential behaviour could boost the performance and explainability of the recommender systems, most of them either only focus on modelling user's sequential interactions within a path or independently and separately of the recommendation mechanism. In this paper, we propose a novel Temporal Meta-path Guided Explainable Recommendation leveraging Reinforcement Learning (TMER-RL), which utilizes reinforcement item-item path modelling between consecutive items with attention mechanisms to sequentially model dynamic user-item evolutions on dynamic knowledge graph for explainable recommendation. Compared with existing works that use heavy recurrent neural networks to model temporal information, we propose simple but effective neural networks to capture users' historical item features and path-based context to characterize the next purchased item. Extensive evaluations of TMER on two real-world datasets show state-of-the-art performance compared against recent strong baselines.

[59]  arXiv:2111.12263 [pdf, other]
Title: APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic Segmentation
Comments: 11 pages, Submitted to IEEE TMM. arXiv admin note: substantial text overlap with arXiv:2104.09216
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images. Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype. However, this framework suffers from biased classification due to incomplete feature comparisons. To address this issue, we present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes and thus construct complete sample pairs for learning semantic alignment with query features. The complementary features learning manner effectively enriches feature comparison and helps yield an unbiased segmentation model in the few-shot setting. It is implemented with a two-branch end-to-end network (\ie, a class-specific branch and a class-agnostic branch), which generates prototypes and then combines query features to perform comparisons. In addition, the proposed class-agnostic branch is simple yet effective. In practice, it can adaptively generate multiple class-agnostic prototypes for query images and learn feature alignment in a self-contrastive manner. Extensive experiments on PASCAL-5$^i$ and COCO-20$^i$ demonstrate the superiority of our method. At no expense of inference efficiency, our model achieves state-of-the-art results in both 1-shot and 5-shot settings for semantic segmentation.

[60]  arXiv:2111.12264 [pdf, other]
Title: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

State-of-the-art (SOTA) anomaly segmentation approaches on complex urban driving scenes explore pixel-wise classification uncertainty learned from outlier exposure, or external reconstruction models. However, previous uncertainty approaches that directly associate high uncertainty to anomaly may sometimes lead to incorrect anomaly predictions, and external reconstruction models tend to be too inefficient for real-time self-driving embedded systems. In this paper, we propose a new anomaly segmentation method, named pixel-wise energy-biased abstention learning (PEBAL), that explores pixel-wise abstention learning (AL) with a model that learns an adaptive pixel-level anomaly class, and an energy-based model (EBM) that learns inlier pixel distribution. More specifically, PEBAL is based on a non-trivial joint training of EBM and AL, where EBM is trained to output high-energy for anomaly pixels (from outlier exposure) and AL is trained such that these high-energy pixels receive adaptive low penalty for being included to the anomaly class. We extensively evaluate PEBAL against the SOTA and show that it achieves the best performance across four benchmarks. Code is available at https://github.com/tianyu0207/PEBAL.

[61]  arXiv:2111.12265 [pdf, other]
Title: Distribution Estimation to Automate Transformation Policies for Self-Supervision
Comments: NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent visual self-supervision works, an imitated classification objective, called pretext task, is established by assigning labels to transformed or augmented input images. The goal of pretext can be predicting what transformations are applied to the image. However, it is observed that image transformations already present in the dataset might be less effective in learning such self-supervised representations. Building on this observation, we propose a framework based on generative adversarial network to automatically find the transformations which are not present in the input dataset and thus effective for the self-supervised learning. This automated policy allows to estimate the transformation distribution of a dataset and also construct its complementary distribution from which training pairs are sampled for the pretext task. We evaluated our framework using several visual recognition datasets to show the efficacy of our automated transformation policy.

[62]  arXiv:2111.12273 [pdf, other]
Title: Sharpness-aware Quantization for Deep Neural Networks
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Network quantization is an effective compression method to reduce the model size and computational cost. Despite the high compression ratio, training a low-precision model is difficult due to the discrete and non-differentiable nature of quantization, resulting in considerable performance degradation. Recently, Sharpness-Aware Minimization (SAM) is proposed to improve the generalization performance of the models by simultaneously minimizing the loss value and the loss curvature. In this paper, we devise a Sharpness-Aware Quantization (SAQ) method to train quantized models, leading to better generalization performance. Moreover, since each layer contributes differently to the loss value and the loss sharpness of a network, we further devise an effective method that learns a configuration generator to automatically determine the bitwidth configurations of each layer, encouraging lower bits for flat regions and vice versa for sharp landscapes, while simultaneously promoting the flatness of minima to enable more aggressive quantization. Extensive experiments on CIFAR-100 and ImageNet show the superior performance of the proposed methods. For example, our quantized ResNet-18 with 55.1x Bit-Operation (BOP) reduction even outperforms the full-precision one by 0.7% in terms of the Top-1 accuracy. Code is available at https://github.com/zhuang-group/SAQ.

[63]  arXiv:2111.12274 [pdf, other]
Title: Formalization of Bond Graph using Higher-order-logic Theorem Proving
Comments: ISA Transactions, Elsevier
Subjects: Logic in Computer Science (cs.LO)

Bond graph is a unified graphical approach for describing the dynamics of complex engineering and physical systems and is widely adopted in a variety of domains, such as, electrical, mechanical, medical, thermal and fluid mechanics. Traditionally, these dynamics are analyzed using paper-and-pencil proof methods and computer-based techniques. However, both of these techniques suffer from their inherent limitations, such as human-error proneness, approximations of results and enormous computational requirements. Thus, these techniques cannot be trusted for performing the bond graph based dynamical analysis of systems from the safety-critical domains like robotics and medicine. Formal methods, in particular, higher-order-logic theorem proving, can overcome the shortcomings of these traditional methods and provide an accurate analysis of these systems. It has been widely used for analyzing the dynamics of engineering and physical systems. In this paper, we propose to use higher-order-logic theorem proving for performing the bond graph based analysis of the physical systems. In particular, we provide formalization of bond graph, which mainly includes functions that allow conversion of a bond graph to its corresponding mathematical model (state-space model) and the verification of its various properties, such as, stability. To illustrate the practical effectiveness of our proposed approach, we present the formal stability analysis of a prosthetic mechatronic hand using HOL Light theorem prover. Moreover, to help non-experts in HOL, we encode our formally verified stability theorems in MATLAB to perform the stability analysis of an anthropomorphic prosthetic mechatronic hand.

[64]  arXiv:2111.12276 [pdf, other]
Title: Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Comments: Accept as short paper at ACM MMAsia 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper presents a novel training method for end-to-end scene text recognition. End-to-end scene text recognition offers high recognition accuracy, especially when using the encoder-decoder model based on Transformer. To train a highly accurate end-to-end model, we need to prepare a large image-to-text paired dataset for the target language. However, it is difficult to collect this data, especially for resource-poor languages. To overcome this difficulty, our proposed method utilizes well-prepared large datasets in resource-rich languages such as English, to train the resource-poor encoder-decoder model. Our key idea is to build a model in which the encoder reflects knowledge of multiple languages while the decoder specializes in knowledge of just the resource-poor language. To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language's dataset and the resource-rich language's dataset to learn language-invariant knowledge for scene text recognition. The proposed method also pre-trains the decoder by using the resource-poor language's dataset to make the decoder better suited to the resource-poor language. Experiments on Japanese scene text recognition using a small, publicly available dataset demonstrate the effectiveness of the proposed method.

[65]  arXiv:2111.12278 [pdf, ps, other]
Title: An efficient estimation of nested expectations without conditional sampling
Subjects: Numerical Analysis (math.NA)

Estimating nested expectations is an important task in computational mathematics and statistics. In this paper we propose a new Monte Carlo method using post-stratification to estimate nested expectations efficiently without taking samples of the inner random variable from the conditional distribution given the outer random variable. This property provides the advantage over many existing methods that it enables us to estimate nested expectations only with a dataset on the pair of the inner and outer variables drawn from the joint distribution. We show an upper bound on the mean squared error of the proposed method under some assumptions. Numerical experiments are conducted to compare our proposed method with several existing methods (nested Monte Carlo method, multilevel Monte Carlo method, and regression-based method), and we see that our proposed method is superior to the compared methods in terms of efficiency and applicability.

[66]  arXiv:2111.12281 [pdf, other]
Title: Locality-based Graph Reordering for Processing Speed-Ups and Impact of Diameter
Comments: 20 pages
Subjects: Hardware Architecture (cs.AR)

Graph analysis involves a high number of random memory access patterns. Earlier research has shownthat the cache miss latency is responsible for more than half of the graph processing time, with the CPU execution having the smaller share. There has been significant study on decreasing the CPU computing time for example, by employing better cache prefetching and replacement policies. In thispaper, we study the various methods that do so by attempting to decrease the CPU cache miss ratio.Graph Reordering attempts to exploit the power-law distribution of graphs- few sparsely-populated vertices in the graph have high number of connections- to keep the frequently accessed vertices together locally and hence decrease the cache misses. However, reordering the graph by keeping the hot vertices together may affect the spatial locality of the graph, and thus add to the total CPU compute time.Also, we also need to have a control over the total reordering time and its inverse relation with thefinal CPU execution timeIn order to exploit this trade-off between reordering as per vertex hotness and spatial locality, we introduce the light-weight Community-based Reordering. We attempt to maintain the community-structureof the graph by storing the hot-members in the community locally together. The implementation also takes into consideration the impact of graph diameter on the execution time. We compare our implementation with other reordering implementations and find a significantly better result on five graph processing algorithms- BFS, CC, CCSV, PR and BC. Lorder achieved speed-up of upto 7x and an average speed-up of 1.2x as compared to other reordering algorithms

[67]  arXiv:2111.12282 [pdf, ps, other]
Title: Self-orthogonality matrix and Reed-Muller code
Subjects: Information Theory (cs.IT)

Kim et al. (2021) gave a method to embed a given binary $[n,k]$ code $\mathcal{C}$ $(k = 3, 4)$ into a self-orthogonal code of the shortest length which has the same dimension $k$ and minimum distance $d' \ge d(\mathcal{C})$. We extends this result for $k=5$ and $6$ by proposing a new method related to a special matrix, called the self-orthogonality matrix $SO_k$, obtained by shortnening a Reed-Muller code $\mathcal{R}(2,k)$. Furthermore, we disprove partially the conjecture (Kim et al. (2021)) by showing that if $31 \le n \le 256$ and $n\equiv 14,22,29 \pmod{31}$, then there exist optimal $[n,5]$ codes which are self-orthogonal. We also construct optimal self-orthogonal $[n,6]$ codes when $41 \le n \le 256$ satisfies $n \ne 46, 54, 61$ and $n \not\equiv 7, 14, 22, 29, 38, 45, 53, 60 \pmod{63}$.

[68]  arXiv:2111.12284 [pdf, other]
Title: A Self-Supervised Automatic Post-Editing Data Generation Tool
Subjects: Computation and Language (cs.CL)

Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.

[69]  arXiv:2111.12289 [pdf]
Title: Real-time smart vehicle surveillance system
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Over the last decade, there has been a spike in criminal activity all around the globe. According to the Indian police department, vehicle theft is one of the least solved offenses, and almost 19% of all recorded cases are related to motor vehicle theft. To overcome these adversaries, we propose a real-time vehicle surveillance system, which detects and tracks the suspect vehicle using the CCTV video feed. The proposed system extracts various attributes of the vehicle such as Make, Model, Color, License plate number, and type of the license plate. Various image processing and deep learning algorithms are employed to meet the objectives of the proposed system. The extracted features can be used as evidence to report violations of law. Although the system uses more parameters, it is still able to make real time predictions with minimal latency and accuracy loss.

[70]  arXiv:2111.12290 [pdf, other]
Title: Attention-based Dual-stream Vision Transformer for Radar Gait Recognition
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Radar gait recognition is robust to light variations and less infringement on privacy. Previous studies often utilize either spectrograms or cadence velocity diagrams. While the former shows the time-frequency patterns, the latter encodes the repetitive frequency patterns. In this work, a dual-stream neural network with attention-based fusion is proposed to fully aggregate the discriminant information from these two representations. The both streams are designed based on the Vision Transformer, which well captures the gait characteristics embedded in these representations. The proposed method is validated on a large benchmark dataset for radar gait recognition, which shows that it significantly outperforms state-of-the-art solutions.

[71]  arXiv:2111.12292 [pdf, other]
Title: Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training iterations is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis which is popular in learning theory. Our result reveals that the final prediction precision may have a weak dependency on the pre-trained model especially in the case of large training iterations. The observation inspires us to leverage pre-training data for fine-tuning, since this data is also available for fine-tuning. The generalization result of using pre-training data shows that the final performance on a target task can be improved when the appropriate pre-training data is included in fine-tuning. With the insight of the theoretical finding, we propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline.

[72]  arXiv:2111.12293 [pdf, other]
Title: PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.

[73]  arXiv:2111.12294 [pdf, other]
Title: An Image Patch is a Wave: Phase-Aware Vision MLP
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Different from traditional convolutional neural network (CNN) and vision transformer, the multilayer perceptron (MLP) is a new kind of vision model with extremely simple architecture that only stacked by fully-connected layers. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation.

[74]  arXiv:2111.12295 [pdf, other]
Title: Animal Behavior Classification via Deep Learning on Embedded Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Machine Learning (stat.ML)

We develop an end-to-end deep-neural-network-based algorithm for classifying animal behavior using accelerometry data on the embedded system of an artificial intelligence of things (AIoT) device installed in a wearable collar tag. The proposed algorithm jointly performs feature extraction and classification utilizing a set of infinite-impulse-response (IIR) and finite-impulse-response (FIR) filters together with a multilayer perceptron. The utilized IIR and FIR filters can be viewed as specific types of recurrent and convolutional neural network layers, respectively. We evaluate the performance of the proposed algorithm via two real-world datasets collected from grazing cattle. The results show that the proposed algorithm offers good intra- and inter-dataset classification accuracy and outperforms its closest contenders including two state-of-the-art convolutional-neural-network-based time-series classification algorithms, which are significantly more complex. We implement the proposed algorithm on the embedded system of the collar tag's AIoT device to perform in-situ classification of animal behavior. We achieve real-time in-situ behavior inference from accelerometry data without imposing any strain on the available computational, memory, or energy resources of the embedded system.

[75]  arXiv:2111.12296 [pdf, other]
Title: Spatial-context-aware deep neural network for multi-class image classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-label image classification is a fundamental but challenging task in computer vision. Over the past few decades, solutions exploring relationships between semantic labels have made great progress. However, the underlying spatial-contextual information of labels is under-exploited. To tackle this problem, a spatial-context-aware deep neural network is proposed to predict labels taking into account both semantic and spatial information. This proposed framework is evaluated on Microsoft COCO and PASCAL VOC, two widely used benchmark datasets for image multi-labelling. The results show that the proposed approach is superior to the state-of-the-art solutions on dealing with the multi-label image classification problem.

[76]  arXiv:2111.12299 [pdf, other]
Title: EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search
Comments: 8 pages, 5 figures
Subjects: Machine Learning (cs.LG)

In hardware-aware Differentiable Neural Architecture Search (DNAS), it is challenging to compute gradients of hardware metrics to perform architecture search. Existing works rely on linear approximations with limited support to customized hardware accelerators. In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully automated DNAS to deliver hardware-efficient deep neural networks on various platforms, including Edge GPUs, Edge TPUs, Mobile CPUs, and customized accelerators. Given a desired hardware platform, we propose to learn a differentiable model predicting the end-to-end hardware performance of neural network architectures for DNAS. We also introduce E2E-Perf, an end-to-end hardware benchmarking tool for customized accelerators. Experiments on CIFAR10 and ImageNet show that EH-DNAS improves the hardware performance by an average of $1.4\times$ on customized accelerators and $1.6\times$ on existing hardware processors while maintaining the classification accuracy.

[77]  arXiv:2111.12301 [pdf, other]
Title: One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable effort in developing a system which could automatically solve the RPM problem, often through a black-box end-to-end Convolutional Neural Network (CNN) for both visual recognition and logical reasoning tasks. Towards the objective of developing a highly explainable solution, we propose a One-shot Human-Understandable ReaSoner (Os-HURS), which is a two-step framework including a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we propose a "2+1" formulation that can be better understood by humans and significantly reduces the model complexity. As a result, a precise reasoning rule can be deduced from one RPM sample only, which is not feasible for existing solution methods. The proposed reasoning module is also capable of yielding a set of reasoning rules, precisely modeling the human knowledge in solving the RPM problem. To validate the proposed method on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames instead of synthetic images. Experimental results on various RPM-like datasets demonstrate that the proposed Os-HURS achieves a significant and consistent performance gain compared with the state-of-the-art models.

[78]  arXiv:2111.12305 [pdf, other]
Title: Thundernna: a white box adversarial attack
Authors: Linfeng Ye
Comments: 10 pages, 5 figures
Subjects: Machine Learning (cs.LG)

The existing work shows that the neural network trained by naive gradient-based optimization method is prone to adversarial attacks, adds small malicious on the ordinary input is enough to make the neural network wrong. At the same time, the attack against a neural network is the key to improving its robustness. The training against adversarial examples can make neural networks resist some kinds of adversarial attacks. At the same time, the adversarial attack against a neural network can also reveal some characteristics of the neural network, a complex high-dimensional non-linear function, as discussed in previous work.
In This project, we develop a first-order method to attack the neural network. Compare with other first-order attacks, our method has a much higher success rate. Furthermore, it is much faster than second-order attacks and multi-steps first-order attacks.

[79]  arXiv:2111.12306 [pdf, ps, other]
Title: Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Subjects: Machine Learning (cs.LG)

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class $\mathcal F$. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over $\mathcal F$. This resolves an open problem of Dud\'ik et al. [2015] on oracle efficient, regret-optimal algorithms for contextual dueling bandits.

[80]  arXiv:2111.12309 [pdf, other]
Title: RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?
Comments: 15 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-supervised methods (SSL) have achieved significant success via maximizing the mutual information between two augmented views, where cropping is a popular augmentation technique. Cropped regions are widely used to construct positive pairs, while the left regions after cropping have rarely been explored in existing methods, although they together constitute the same image instance and both contribute to the description of the category. In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL). Specifically, given two different images, we randomly crop a region (called the paste view) from each image with the same size and swap them to compose two new images together with the left regions (called the canvas view), respectively. Then, contrastive pairs can be efficiently constructed according to the following simple criteria, i.e., each view is (1) positive with views augmented from the same original image and (2) negative with views augmented from other images. With minor modifications to popular SSL methods, RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views, therefore learning better visual representations. Experiments on ImageNet, MS COCO, and Cityscapes demonstrate that RegionCL improves MoCo v2, DenseCL, and SimSiam by large margins and achieves state-of-the-art performance on classification, detection, and segmentation tasks. The code will be available at https://github.com/Annbless/RegionCL.git.

[81]  arXiv:2111.12313 [pdf, ps, other]
Title: Explicit solution of divide-and-conquer dividing by a half recurrences with polynomial independent term
Comments: 50 pages
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO); Populations and Evolution (q-bio.PE)

Divide-and-conquer dividing by a half recurrences, of the form $x_n =a\cdot x_{\left\lceil{n}/{2}\right\rceil}+a\cdot x_{\left\lfloor{n}/{2}\right\rfloor}+p(n)$, $n\geq 2$, appear in many areas of applied mathematics, from the analysis of algorithms to the optimization of phylogenetic balance indices. The Master Theorems that solve these equations do not provide the solution's explicit expression, only its big-$\Theta$ order of growth. In this paper we give an explicit expression (in terms of the binary decomposition of $n$) for the solution $x_n$ of a recurrence of this form, with given initial condition $x_1$, when the independent term $p(n)$ is a polynomial in $\lceil{n}/{2}\rceil$ and $\lfloor{n}/{2}\rfloor$.

[82]  arXiv:2111.12315 [pdf, other]
Title: Dynamic Texture Recognition using PDV Hashing and Dictionary Learning on Multi-scale Volume Local Binary Pattern
Comments: 5 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Spatial-temporal local binary pattern (STLBP) has been widely used in dynamic texture recognition. STLBP often encounters the high-dimension problem as its dimension increases exponentially, so that STLBP could only utilize a small neighborhood. To tackle this problem, we propose a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP). Instead of forming very high-dimensional LBP histogram features, it first uses hash functions to map the pixel difference vectors (PDVs) to binary vectors, then forms a dictionary using the derived binary vector, and encodes them using the derived dictionary. In such a way, the PDVs are mapped to feature vectors of the size of dictionary, instead of LBP histograms of very high dimension. Such an encoding scheme could extract the discriminant information from videos in a much larger neighborhood effectively. The experimental results on two widely-used dynamic textures datasets, DynTex++ and UCLA, show the superiority performance of the proposed approach over the state-of-the-art methods.

[83]  arXiv:2111.12317 [pdf, other]
Title: Handling tree-structured text: parsing directory pages
Subjects: Computation and Language (cs.CL)

The determination of the reading sequence of text is fundamental to document understanding. This problem is easily solved in pages where the text is organized into a sequence of lines and vertical alignment runs the height of the page (producing multiple columns which can be read from left to right). We present a situation -- the directory page parsing problem -- where information is presented on the page in an irregular, visually-organized, two-dimensional format. Directory pages are fairly common in financial prospectuses and carry information about organizations, their addresses and relationships that is key to business tasks in client onboarding. Interestingly, directory pages sometimes have hierarchical structure, motivating the need to generalize the reading sequence to a reading tree. We present solutions to the problem of identifying directory pages and constructing the reading tree, using (learnt) classifiers for text segments and a bottom-up (right to left, bottom-to-top) traversal of segments. The solution is a key part of a production service supporting automatic extraction of organization, address and relationship information from client onboarding documents.

[84]  arXiv:2111.12320 [pdf, other]
Title: Consistency Regularization for Deep Face Anti-Spoofing
Comments: 10 tables, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Empirically, given an image, a model with more consistent output on different views of this image usually performs better, as shown in Fig.1. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. In this paper, we explore this way thoroughly by enhancing both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS. Specifically, at the embedding-level, we design a dense similarity loss to maximize the similarities between all positions of two intermediate feature maps in a self-supervised fashion; while at the prediction-level, we optimize the mean square error between the predictions of two views. Notably, our EPCR is free of annotations and can directly integrate into semi-supervised learning schemes. Considering different application scenarios, we further design five diverse semi-supervised protocols to measure semi-supervised FAS techniques. We conduct extensive experiments to show that EPCR can significantly improve the performance of several supervised and semi-supervised tasks on benchmark datasets. The codes and protocols will be released soon.

[85]  arXiv:2111.12321 [pdf]
Title: Efficient Secure Aggregation Based on SHPRG For Federated Learning
Comments: 6 pages, 3 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

We propose a novel secure aggregation scheme based on seed-homomorphic pseudo-random generator (SHPRG) to prevent private training data leakage from model-related information in Federated Learning systems. Our constructions leverage the homomorphic property of SHPRG to simplify the masking and demasking scheme, which entails a linear overhead while revealing nothing beyond the aggregation result against colluding entities. Additionally, our scheme is resilient to dropouts without extra overhead. We experimentally demonstrate our scheme significantly improves the efficiency to 20 times over baseline, especially in the more realistic case in which the number of clients and model size become large and a certain percentage of clients drop out from the system.

[86]  arXiv:2111.12322 [pdf]
Title: Stochastic optimal scheduling of demand response-enabled microgrids with renewable generations: An analytical-heuristic approach
Comments: Accepted by Journal of Cleaner Production
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In the context of transition towards cleaner and sustainable energy production, microgrids have become an effective way for tackling environmental pollution and energy crisis issues. With the increasing penetration of renewables, how to coordinate demand response and renewable generations is a critical and challenging issue in the field of microgrid scheduling. To this end, a bi-level scheduling model is put forward for isolated microgrids with consideration of multi-stakeholders in this paper, where the lower- and upper-level models respectively aim to the minimization of user cost and microgrid operational cost under real-time electricity pricing environments. In order to solve this model, this research combines Jaya algorithm and interior point method (IPM) to develop a hybrid analysis-heuristic solution method called Jaya-IPM, where the lower- and upper- levels are respectively addressed by the IPM and the Jaya, and the scheduling scheme is obtained via iterations between the two levels. After that, the real-time prices updated by the upper-level model and the electricity plans determined by the lower-level model will be alternately iterated between the upper- and lower- levels through the real-time pricing mechanism to obtain an optimal scheduling plan. The test results show that the proposed method can coordinate the uncertainty of renewable generations with demand response strategies, thereby achieving a balance between the interests of microgrid and users; and that by leveraging demand response, the flexibility of the load side can be fully exploited to achieve peak load shaving while maintaining the balance of supply and demand. In addition, the Jaya-IPM algorithm is proven to be superior to the traditional hybrid intelligent algorithm (HIA) and the CPLEX solver in terms of optimization results and calculation efficiency.

[87]  arXiv:2111.12323 [pdf, other]
Title: Information Dispersal with Provable Retrievability for Rollups
Subjects: Cryptography and Security (cs.CR)

The ability to verifiably retrieve transaction or state data stored off-chain is crucial to blockchain scaling techniques such as rollups or sharding. We formalize the problem and design a storage- and communication-efficient protocol using linear erasure-correcting codes and homomorphic vector commitments. Motivated by application requirements for rollups, our solution departs from earlier Verifiable Information Dispersal schemes in that we do not require comprehensive termination properties or retrievability from any but only from some known sufficiently large set of storage nodes. Compared to Data Availability Oracles, under no circumstance do we fall back to returning empty blocks. Distributing a file of 28.8 MB among 900 storage nodes (up to 300 of which may be adversarial) requires in total approx. 95 MB of communication and storage and approx. 30 seconds of cryptographic computation on a single-threaded consumer-grade laptop computer. Our solution requires no modification to on-chain contracts of Validium rollups such as StarkWare's StarkEx. Additionally, it provides privacy of the dispersed data against honest-but-curious storage nodes.

[88]  arXiv:2111.12324 [pdf, other]
Title: How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Interestingly, by removing one or several unimportant components, the cross-corpus results can be improved. This demonstrates the potential of the decomposition approach towards a generalizable emotion recognition.

[89]  arXiv:2111.12325 [pdf, other]
Title: MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene FlowEstimation with Monocular Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-scale scene flow estimation has become increasingly important for 3D computer vision. Some works successfully estimate real-scale 3D scene flow with LiDAR. However, these ubiquitous and expensive sensors are still unlikely to be equipped widely for real application. Other works use monocular images to estimate scene flow, but their scene flow estimations are normalized with scale ambiguity, where additional depth or point cloud ground truth are required to recover the real scale. Even though they perform well in 2D, these works do not provide accurate and reliable 3D estimates. We present a deep learning architecture on permutohedral lattice - MonoPLFlowNet. Different from all previous works, our MonoPLFlowNet is the first work where only two consecutive monocular images are used as input, while both depth and 3D scene flow are estimated in real scale. Our real-scale scene flow estimation outperforms all state-of-the-art monocular-image based works recovered to real scale by ground truth, and is comparable to LiDAR approaches. As a by-product, our real-scale depth estimation also outperforms other state-of-the-art works.

[90]  arXiv:2111.12326 [pdf, other]
Title: A Study on Decoupled Probabilistic Linear Discriminant Analysis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Probabilistic linear discriminant analysis (PLDA) has broad application in open-set verification tasks, such as speaker verification. A key concern for PLDA is that the model is too simple (linear Gaussian) to deal with complicated data; however, the simplicity by itself is a major advantage of PLDA, as it leads to desirable generalization. An interesting research therefore is how to improve modeling capacity of PLDA while retaining the simplicity. This paper presents a decoupling approach, which involves a global model that is simple and generalizable, and a local model that is complex and expressive. While the global model holds a bird view on the entire data, the local model represents the details of individual classes. We conduct a preliminary study towards this direction and investigate a simple decoupling model including both the global and local models. The new model, which we call decoupled PLDA, is tested on a speaker verification task. Experimental results show that it consistently outperforms the vanilla PLDA when the model is based on raw speaker vectors. However, when the speaker vectors are processed by length normalization, the advantage of decoupled PLDA will be largely lost, suggesting future research on non-linear local models.

[91]  arXiv:2111.12330 [pdf, other]
Title: Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks
Comments: 13 pages, 7 figures. Accepted to the British Machine Vision Conference (BMVC) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) are so over-parametrized that recent research has found them to already contain a subnetwork with high accuracy at their randomly initialized state. Finding these subnetworks is a viable alternative training method to weight learning. In parallel, another line of work has hypothesized that deep residual networks (ResNets) are trying to approximate the behaviour of shallow recurrent neural networks (RNNs) and has proposed a way for compressing them into recurrent models. This paper proposes blending these lines of research into a highly compressed yet accurate model: Hidden-Fold Networks (HFNs). By first folding ResNet into a recurrent structure and then searching for an accurate subnetwork hidden within the randomly initialized model, a high-performing yet tiny HFN is obtained without ever updating the weights. As a result, HFN achieves equivalent performance to ResNet50 on CIFAR100 while occupying 38.5x less memory, and similar performance to ResNet34 on ImageNet with a memory size 26.8x smaller. The HFN will become even more attractive by minimizing data transfers while staying accurate when it runs on highly-quantized and randomly-weighted DNN inference accelerators. Code available at https://github.com/Lopez-Angel/hidden-fold-networks

[92]  arXiv:2111.12331 [pdf, other]
Title: An MAP Estimation for Between-Class Variance
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Probabilistic linear discriminant analysis (PLDA) has been widely used in open-set verification tasks, such as speaker verification. A potential issue of this model is that the training set often contains limited number of classes, which makes the estimation for the between-class variance unreliable. This unreliable estimation often leads to degraded generalization. In this paper, we present an MAP estimation for the between-class variance, by employing an Inverse-Wishart prior. A key problem is that with hierarchical models such as PLDA, the prior is placed on the variance of class means while the likelihood is based on class members, which makes the posterior inference intractable. We derive a simple MAP estimation for such a model, and test it in both PLDA scoring and length normalization. In both cases, the MAP-based estimation delivers interesting performance improvement.

[93]  arXiv:2111.12332 [pdf, other]
Title: Securing Proof-of-Stake Nakamoto Consensus Under Bandwidth Constraint
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Satoshi Nakamoto's Proof-of-Work (PoW) longest chain (LC) protocol was a breakthrough for Internet-scale open-participation consensus. Many Proof-of-Stake (PoS) variants of Nakamoto's protocol such as Ouroboros or Snow White aim to preserve the advantages of LC by mimicking PoW LC closely, while mitigating downsides of PoW by using PoS for Sybil resistance. Previous works have proven these PoS LC protocols secure assuming all network messages are delivered within a bounded delay. However, this assumption is not compatible with PoS when considering bandwidth constraints in the underlying communication network. This is because PoS enables the adversary to reuse block production opportunities and spam the network with equivocating blocks, which is impossible in PoW. The bandwidth constraint necessitates that nodes choose carefully which blocks to spend their limited download budget on. We show that 'download along the longest header chain', a natural download rule for PoW LC, emulated by PoS variants, is insecure for PoS LC. Instead, we propose 'download towards the freshest block' and prove that PoS LC with this download rule is secure in bandwidth constrained networks. Our result can be viewed as a first step towards the co-design of consensus and network layer protocols.

[94]  arXiv:2111.12334 [pdf, other]
Title: MobileXNet: An Efficient Convolutional Neural Network for Monocular Depth Estimation
Subjects: Robotics (cs.RO)

Depth is a vital piece of information for autonomous vehicles to perceive obstacles. Due to the relatively low price and small size of monocular cameras, depth estimation from a single RGB image has attracted great interest in the research community. In recent years, the application of Deep Neural Networks (DNNs) has significantly boosted the accuracy of monocular depth estimation (MDE). State-of-the-art methods are usually designed on top of complex and extremely deep network architectures, which require more computational resources and cannot run in real-time without using high-end GPUs. Although some researchers tried to accelerate the running speed, the accuracy of depth estimation is degraded because the compressed model does not represent images well. In addition, the inherent characteristic of the feature extractor used by the existing approaches results in severe spatial information loss in the produced feature maps, which also impairs the accuracy of depth estimation on small sized images. In this study, we are motivated to design a novel and efficient Convolutional Neural Network (CNN) that assembles two shallow encoder-decoder style subnetworks in succession to address these problems. In particular, we place our emphasis on the trade-off between the accuracy and speed of MDE. Extensive experiments have been conducted on the NYU depth v2, KITTI, Make3D and Unreal data sets. Compared with the state-of-the-art approaches which have an extremely deep and complex architecture, the proposed network not only achieves comparable performance but also runs at a much faster speed on a single, less powerful GPU.

[95]  arXiv:2111.12340 [pdf, other]
Title: How does AI play football? An analysis of RL and real-world football strategies
Comments: 11 pages, 7 figures; accepted as a full paper for a 25 minutes oral presentation at ICAART 2022 (URL will be updated when available)
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Recent advances in reinforcement learning (RL) have made it possible to develop sophisticated agents that excel in a wide range of applications. Simulations using such agents can provide valuable information in scenarios that are difficult to scientifically experiment in the real world. In this paper, we examine the play-style characteristics of football RL agents and uncover how strategies may develop during training. The learnt strategies are then compared with those of real football players. We explore what can be learnt from the use of simulated environments by using aggregated statistics and social network analysis (SNA). As a result, we found that (1) there are strong correlations between the competitiveness of an agent and various SNA metrics and (2) aspects of the RL agents play style become similar to real world footballers as the agent becomes more competitive. We discuss further advances that may be necessary to improve our understanding necessary to fully utilise RL for the analysis of football.

[96]  arXiv:2111.12341 [pdf, other]
Title: EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation
Comments: CVPR 2021 (updated references in this version)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur, showing advantages over conventional cameras. A hurdle of training event-based models is the lack of large qualitative labeled data. Prior works learning end-tasks mostly rely on labeled or pseudo-labeled datasets obtained from the active pixel sensor (APS) frames; however, such datasets' quality is far from rivaling those based on the canonical images. In this paper, we propose a novel approach, called \textbf{EvDistill}, to learn a student network on the unlabeled and unpaired event data (target modality) via knowledge distillation (KD) from a teacher network trained with large-scale, labeled image data (source modality). To enable KD across the unpaired modalities, we first propose a bidirectional modality reconstruction (BMR) module to bridge both modalities and simultaneously exploit them to distill knowledge via the crafted pairs, causing no extra computation in the inference. The BMR is improved by the end-tasks and KD losses in an end-to-end manner. Second, we leverage the structural similarities of both modalities and adapt the knowledge by matching their distributions. Moreover, as most prior feature KD methods are uni-modality and less applicable to our problem, we propose to leverage an affinity graph KD loss to boost the distillation. Our extensive experiments on semantic segmentation and object recognition demonstrate that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.

[97]  arXiv:2111.12345 [pdf, other]
Title: dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference
Comments: Accepted at International Conference on Computer-Aided Design (ICCAD) 2021
Subjects: Data Structures and Algorithms (cs.DS)

Reducing the memory footprint of neural networks is a crucial prerequisite for deploying them in small and low-cost embedded devices. Network parameters can often be reduced significantly through pruning. We discuss how to best represent the indexing overhead of sparse networks for the coming generation of Single Instruction, Multiple Data (SIMD)-capable microcontrollers. From this, we develop Delta-Compressed Storage Row (dCSR), a storage format for sparse matrices that allows for both low overhead storage and fast inference on embedded systems with wide SIMD units. We demonstrate our method on an ARM Cortex-M55 MCU prototype with M-Profile Vector Extension(MVE). A comparison of memory consumption and throughput shows that our method achieves competitive compression ratios and increases throughput over dense methods by up to $2.9 \times$ for sparse matrix-vector multiplication (SpMV)-based kernels and $1.06 \times$ for sparse matrix-matrix multiplication (SpMM). This is accomplished through handling the generation of index information directly in the SIMD unit, leading to an increase in effective memory bandwidth.

[98]  arXiv:2111.12346 [pdf, other]
Title: Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning based virtual try-on system has achieved some encouraging progress recently, but there still remain several big challenges that need to be solved, such as trying on arbitrary clothes of all types, trying on the clothes from one category to another and generating image-realistic results with few artifacts. To handle this issue, we in this paper first collect a new dataset with all types of clothes, \ie tops, bottoms, and whole clothes, each one has multiple categories with rich information of clothing characteristics such as patterns, logos, and other details. Based on this dataset, we then propose the Arbitrary Virtual Try-On Network (AVTON) that is utilized for all-type clothes, which can synthesize realistic try-on images by preserving and trading off characteristics of the target clothes and the reference person. Our approach includes three modules: 1) Limbs Prediction Module, which is utilized for predicting the human body parts by preserving the characteristics of the reference person. This is especially good for handling cross-category try-on task (\eg long sleeves \(\leftrightarrow\) short sleeves or long pants \(\leftrightarrow\) skirts, \etc), where the exposed arms or legs with the skin colors and details can be reasonably predicted; 2) Improved Geometric Matching Module, which is designed to warp clothes according to the geometry of the target person. We improve the TPS based warping method with a compactly supported radial function (Wendland's \(\Psi\)-function); 3) Trade-Off Fusion Module, which is to trade off the characteristics of the warped clothes and the reference person. This module is to make the generated try-on images look more natural and realistic based on a fine-tune symmetry of the network structure. Extensive simulations are conducted and our approach can achieve better performance compared with the state-of-the-art virtual try-on methods.

[99]  arXiv:2111.12350 [pdf, other]
Title: Supervised Neural Discrete Universal Denoiser for Adaptive Denoising
Comments: Preprint
Subjects: Machine Learning (cs.LG)

We improve the recently developed Neural DUDE, a neural network-based adaptive discrete denoiser, by combining it with the supervised learning framework. Namely, we make the supervised pre-training of Neural DUDE compatible with the adaptive fine-tuning of the parameters based on the given noisy data subject to denoising. As a result, we achieve a significant denoising performance boost compared to the vanilla Neural DUDE, which only carries out the adaptive fine-tuning step with randomly initialized parameters. Moreover, we show the adaptive fine-tuning makes the algorithm robust such that a noise-mismatched or blindly trained supervised model can still achieve the performance of that of the matched model. Furthermore, we make a few algorithmic advancements to make Neural DUDE more scalable and deal with multi-dimensional data or data with larger alphabet size. We systematically show our improvements on two very diverse datasets, binary images and DNA sequences.

[100]  arXiv:2111.12351 [pdf, other]
Title: Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic information has been proved effective in scene text recognition. Most existing methods tend to couple both visual and semantic information in an attention-based decoder. As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance. In this paper, we propose a novel Visual-Semantic Decoupling Network (VSDN) to address the problem. Our VSDN contains a Visual Decoder (VD) and a Semantic Decoder (SD) to learn purer visual and semantic feature representation respectively. Besides, a Semantic Encoder (SE) is designed to match SD, which can be pre-trained together by additional inexpensive large vocabulary via a simple word correction task. Thus the semantic feature is more unbiased and precise to guide the visual feature alignment and enrich the final character representation. Experiments show that our method achieves state-of-the-art or competitive results on the standard benchmarks, and outperforms the popular baseline by a large margin under circumstances where the training set has a small size of vocabulary.

[101]  arXiv:2111.12358 [pdf, other]
Title: SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning
Comments: 15 pages; The code is publicly available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although there is significant progress in supervised semantic segmentation, it remains challenging to deploy the segmentation models to unseen domains due to domain biases. Domain adaptation can help in this regard by transferring knowledge from a labeled source domain to an unlabeled target domain. Previous methods typically attempt to perform the adaptation on global features, however, the local semantic affiliations accounting for each pixel in the feature space are often ignored, resulting in less discriminability. To solve this issue, we propose a novel semantic prototype-based contrastive learning framework for fine-grained class alignment. Specifically, the semantic prototypes provide supervisory signals for per-pixel discriminative representation learning and each pixel of source and target domains in the feature space is required to reflect the content of the corresponding semantic prototype. In this way, our framework is able to explicitly make intra-class pixel representations closer and inter-class pixel representations further apart to improve the robustness of the segmentation model as well as alleviate the domain shift problem. Our method is easy to implement and attains superior results compared to state-of-the-art approaches, as is demonstrated with a number of experiments. The code is publicly available at [this https URL](https://github.com/BinhuiXie/SPCL).

[102]  arXiv:2111.12360 [pdf, other]
Title: Fault-Tolerant Perception for Automated Driving A Lightweight Monitoring Approach
Subjects: Robotics (cs.RO)

While the most visible part of the safety verification process of automated vehicles concerns the planning and control system, it is often overlooked that safety of the latter crucially depends on the fault-tolerance of the preceding environment perception. Modern perception systems feature complex and often machine-learning-based components with various failure modes that can jeopardize the overall safety. At the same time, a verification by for example redundant execution is not always feasible due to resource constraints. In this paper, we address the need for feasible and efficient perception monitors and propose a lightweight approach that helps to protect the integrity of the perception system while keeping the additional compute overhead minimal. In contrast to existing solutions, the monitor is realized by a well-balanced combination of sensor checks -- here using LiDAR information -- and plausibility checks on the object motion history. It is designed to detect relevant errors in the distance and velocity of objects in the environment of the automated vehicle. In conjunction with an appropriate planning system, such a monitor can help to make safe automated driving feasible.

[103]  arXiv:2111.12364 [pdf, other]
Title: Crawling the MobileCoin Quorum System
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

We continuously crawl the young MobileCoin network, uncovering the quorum configurations of core nodes and the quorum system resulting from these configurations. This report discusses our crawl methodology, encountered challenges, and our current empirical results. We find that the MobileCoin quorum system currently comprises of 7 organisations controlling a total of 10 validator nodes. Current quorum set configurations prioritise safety over liveness. At the time of writing, one of the involved organisations is technically able to block the approval of new blocks, as is the case for one of the (two) ISPs employed by crawled nodes.

[104]  arXiv:2111.12370 [pdf, other]
Title: Uniform Convergence Rates for Lipschitz Learning on Graphs
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Analysis of PDEs (math.AP)

Lipschitz learning is a graph-based semi-supervised learning method where one extends labels from a labeled to an unlabeled data set by solving the infinity Laplace equation on a weighted graph. In this work we prove uniform convergence rates for solutions of the graph infinity Laplace equation as the number of vertices grows to infinity. Their continuum limits are absolutely minimizing Lipschitz extensions with respect to the geodesic metric of the domain where the graph vertices are sampled from. We work under very general assumptions on the graph weights, the set of labeled vertices, and the continuum domain. Our main contribution is that we obtain quantitative convergence rates even for very sparsely connected graphs, as they typically appear in applications like semi-supervised learning. In particular, our framework allows for graph bandwidths down to the connectivity radius. For proving this we first show a quantitative convergence statement for graph distance functions to geodesic distance functions in the continuum. Using the "comparison with distance functions" principle, we can pass these convergence statements to infinity harmonic functions and absolutely minimizing Lipschitz extensions.

[105]  arXiv:2111.12372 [pdf, other]
Title: Privacy-Preserving Biometric Matching Using Homomorphic Encryption
Subjects: Cryptography and Security (cs.CR)

Biometric matching involves storing and processing sensitive user information. Maintaining the privacy of this data is thus a major challenge, and homomorphic encryption offers a possible solution. We propose a privacy-preserving biometrics-based authentication protocol based on fully homomorphic encryption, where the biometric sample for a user is gathered by a local device but matched against a biometric template by a remote server operating solely on encrypted data. The design ensures that 1) the user's sensitive biometric data remains private, and 2) the user and client device are securely authenticated to the server. A proof-of-concept implementation building on the TFHE library is also presented, which includes the underlying basic operations needed to execute the biometric matching. Performance results from the implementation show how complex it is to make FHE practical in this context, but it appears that, with implementation optimisations and improvements, the protocol could be used for real-world applications.

[106]  arXiv:2111.12373 [pdf, other]
Title: Solving cubic matrix equations arising in conservative dynamics
Comments: 11 pages, 1 figure
Subjects: Numerical Analysis (math.NA)

In this paper we consider the spatial semi-discretization of conservative PDEs. Such finite dimensional approximations of infinite dimensional dynamical systems can be described as flows in suitable matrix spaces, which in turn leads to the need to solve polynomial matrix equations, a classical and important topic both in theoretical and in applied mathematics. Solving numerically these equations is challenging due to the presence of several conservation laws which our finite models incorporate and which must be retained while integrating the equations of motion. In the last thirty years, the theory of geometric integration has provided a variety of techniques to tackle this problem. These numerical methods require to solve both direct and inverse problems in matrix spaces. We present two algorithms to solve a cubic matrix equation arising in the geometric integration of isospectral flows. This type of ODEs includes finite models of ideal hydrodynamics, plasma dynamics, and spin particles, which we use as test problems for our algorithms.

[107]  arXiv:2111.12374 [pdf, other]
Title: MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Comments: Technical Report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognizing and localizing events in videos is a fundamental task for video understanding. Since events may occur in auditory and visual modalities, multimodal detailed perception is essential for complete scene comprehension. Most previous works attempted to analyze videos from a holistic perspective. However, they do not consider semantic information at multiple scales, which makes the model difficult to localize events in various lengths. In this paper, we present a Multimodal Pyramid Attentional Network (MM-Pyramid) that captures and integrates multi-level temporal features for audio-visual event localization and audio-visual video parsing. Specifically, we first propose the attentive feature pyramid module. This module captures temporal pyramid features via several stacking pyramid units, each of them is composed of a fixed-size attention block and dilated convolution block. We also design an adaptive semantic fusion module, which leverages a unit-level attention block and a selective fusion block to integrate pyramid features interactively. Extensive experiments on audio-visual event localization and weakly-supervised audio-visual video parsing tasks verify the effectiveness of our approach.

[108]  arXiv:2111.12375 [pdf, other]
Title: Human Activity Recognition Using 3D Orthogonally-projected EfficientNet on Radar Time-Range-Doppler Signature
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In radar activity recognition, 2D signal representations such as spectrogram, cepstrum and cadence velocity diagram are often utilized, while range information is often neglected. In this work, we propose to utilize the 3D time-range-Doppler (TRD) representation, and design a 3D Orthogonally-Projected EfficientNet (3D-OPEN) to effectively capture the discriminant information embedded in the 3D TRD cubes for accurate classification. The proposed model aggregates the discriminant information from three orthogonal planes projected from the 3D feature space. It alleviates the difficulty of 3D CNNs in exploiting sparse semantic abstractions directly from the high-dimensional 3D representation. The proposed method is evaluated on the Millimeter-Wave Radar Walking Dataset. It significantly and consistently outperforms the state-of-the-art methods for radar activity recognition.

[109]  arXiv:2111.12379 [pdf, other]
Title: Efficient Anomaly Detection Using Self-Supervised Multi-Cue Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep anomaly detection has proven to be an efficient and robust approach in several fields. The introduction of self-supervised learning has greatly helped many methods including anomaly detection where simple geometric transformation recognition tasks are used. However these methods do not perform well on fine-grained problems since they lack finer features and are usually highly dependent on the anomaly type. In this paper, we explore each step of self-supervised anomaly detection with pretext tasks. First, we introduce novel discriminative and generative tasks which focus on different visual cues. A piece-wise jigsaw puzzle task focuses on structure cues, while a tint rotation recognition is used on each piece for colorimetry and a partial re-colorization task is performed. In order for the re-colorization task to focus more on the object rather than on the background, we propose to include the contextual color information of the image border. Then, we present a new out-of-distribution detection function and highlight its better stability compared to other out-of-distribution detection methods. Along with it, we also experiment different score fusion functions. Finally, we evaluate our method on a comprehensive anomaly detection protocol composed of object anomalies with classical object recognition, style anomalies with fine-grained classification and local anomalies with face anti-spoofing datasets. Our model can more accurately learn highly discriminative features using these self-supervised tasks. It outperforms state-of-the-art with up to 36% relative error improvement on object anomalies and 40% on face anti-spoofing problems.

[110]  arXiv:2111.12382 [pdf, ps, other]
Title: Compressed Sensing Channel Estimation for OTFS Modulation in Non-Integer Delay-Doppler Domain
Comments: This is the author's self-archive preprint of a paper accepted in IEEE GLOBECOM 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper introduces a Compressed Sensing (CS) estimation scheme for Orthogonal Time Frequency Space (OTFS) channels with sparse multipath. The OTFS waveform represents signals in a two dimensional Delay-Doppler (DD) orthonormal basis. The proposed model does not require the assumption that the delays are integer multiples of the sampling period. The analysis shows that non-integer delay and Doppler shifts in the channel cannot be accurately modelled by integer approximations. An Orthogonal Matching Pursuit with Binary-division Refinement (OMPBR) estimation algorithm is proposed. The proposed estimator finds the best channel approximation over a continuous DD dictionary without integer approximations. This results in a significant reduction of the estimation normalized mean squared error with reasonable computational complexity.

[111]  arXiv:2111.12385 [pdf, other]
Title: Space-Partitioning RANSAC
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A new algorithm is proposed to accelerate RANSAC model quality calculations. The method is based on partitioning the joint correspondence space, e.g., 2D-2D point correspondences, into a pair of regular grids. The grid cells are mapped by minimal sample models, estimated within RANSAC, to reject correspondences that are inconsistent with the model parameters early. The proposed technique is general. It works with arbitrary transformations even if a point is mapped to a point set, e.g., as a fundamental matrix maps to epipolar lines. The method is tested on thousands of image pairs from publicly available datasets on fundamental and essential matrix, homography and radially distorted homography estimation. On average, it reduces the RANSAC run-time by 41% with provably no deterioration in the accuracy. It can be straightforwardly plugged into state-of-the-art RANSAC frameworks, e.g. VSAC.

[112]  arXiv:2111.12386 [pdf, other]
Title: One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data
Comments: Technical Report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The foundation model is not the last chapter of the model production pipeline. Transferring with few data in a general way to thousands of downstream tasks is becoming a trend of the foundation model's application. In this paper, we proposed a universal transfer framework: One to Transfer All (OTA) to transfer any Vision Foundation Model (VFM) to any downstream tasks with few downstream data. We first transfer a VFM to a task-specific model by Image Re-representation Fine-tuning (IRF) then distilling knowledge from a task-specific model to a deployed model with data produced by Downstream Image-Guided Generation (DIGG). OTA has no dependency on upstream data, VFM, and downstream tasks when transferring. It also provides a way for VFM researchers to release their upstream information for better transferring but not leaking data due to privacy requirements. Massive experiments validate the effectiveness and superiority of our methods in few data setting. Our code will be released.

[113]  arXiv:2111.12389 [pdf, other]
Title: Track Boosting and Synthetic Data Aided Drone Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

As the usage of drones increases with lowered costs and improved drone technology, drone detection emerges as a vital object detection task. However, detecting distant drones under unfavorable conditions, namely weak contrast, long-range, low visibility, requires effective algorithms. Our method approaches the drone detection problem by fine-tuning a YOLOv5 model with real and synthetically generated data using a Kalman-based object tracker to boost detection confidence. Our results indicate that augmenting the real data with an optimal subset of synthetic data can increase the performance. Moreover, temporal information gathered by object tracking methods can increase performance further.

[114]  arXiv:2111.12392 [pdf, ps, other]
Title: Characterization of canonical systems with six types of coins for the change-making problem
Comments: 18 pages
Subjects: Discrete Mathematics (cs.DM)

This paper analyzes a necessary and sufficient condition for the change-making problem to be solvable with a greedy algorithm. The change-making problem is to minimize the number of coins used to pay a given value in a specified currency system. This problem is NP-hard, and therefore the greedy algorithm does not always yield an optimal solution. Yet for almost all real currency systems, the greedy algorithm outputs an optimal solution. A currency system for which the greedy algorithm returns an optimal solution for any value of payment is called a canonical system. Canonical systems with at most five types of coins have been characterized in previous studies. In this paper, we give characterization of canonical systems with six types of coins, and we propose a partial generalization of characterization of canonical systems.

[115]  arXiv:2111.12393 [pdf, other]
Title: On the convergence of Broyden's method and some accelerated schemes for singular problems
Authors: Florian Mannel
Comments: 32 pages, 8 tables, 1 figure
Subjects: Numerical Analysis (math.NA)

We consider Broyden's method and some accelerated schemes for nonlinear equations having a strongly regular singularity of first order with a one-dimensional nullspace. Our two main results are as follows. First, we show that the use of a preceding Newton--like step ensures convergence for starting points in a starlike domain with density 1. This extends the domain of convergence of these methods significantly. Second, we establish that the matrix updates of Broyden's method converge q-linearly with the same asymptotic factor as the iterates. This contributes to the long--standing question whether the Broyden matrices converge by showing that this is indeed the case for the setting at hand. Furthermore, we prove that the Broyden directions violate uniform linear independence, which implies that existing results for convergence of the Broyden matrices cannot be applied. Numerical experiments of high precision confirm the enlarged domain of convergence, the q-linear convergence of the matrix updates, and the lack of uniform linear independence. In addition, they suggest that these results can be extended to singularities of higher order and that Broyden's method can converge r-linearly without converging q-linearly. The underlying code is freely available.

[116]  arXiv:2111.12395 [pdf, other]
Title: I'll be back: Examining Restored Accounts On Twitter
Subjects: Social and Information Networks (cs.SI)

Online social networks like Twitter actively monitor their platform to identify accounts that go against their rules. Twitter enforces account level moderation, i.e. suspension of a Twitter account in severe cases of platform abuse. A point of note is that these suspensions are sometimes temporary and even incorrect. Twitter provides a redressal mechanism to 'restore' suspended accounts. We refer to all suspended accounts who later have their suspension reversed as 'restored accounts'. In this paper, we release the firstever dataset and methodology 1 to identify restored accounts. We inspect account properties and tweets of these restored accounts to get key insights into the effects of suspension.We build a prediction model to classify an account into normal, suspended or restored. We use SHAP values to interpret this model and identify important features. SHAP (SHapley Additive exPlanations) is a method to explain individual predictions. We show that profile features like date of account creation and the ratio of retweets to total tweets are more important than content-based features like sentiment scores and Ekman emotion scores when it comes to classification of an account as normal, suspended or restored. We investigate restored accounts further in the pre-suspension and post-restoration phases. We see that the number of tweets per account drop by 53.95% in the post-restoration phase, signifying less 'spammy' behaviour after reversal of suspension. However, there was no substantial difference in the content of the tweets posted in the pre-suspension and post-restoration phases.

[117]  arXiv:2111.12399 [pdf, other]
Title: Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem
Authors: Jeremy E. Cohen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained low-rank approximations are nowadays important research topics. This work deals with columns of factor matrices of a low-rank approximation being sparse in a known and possibly overcomplete basis, a model coined as Dictionary-based Low-Rank Approximation (DLRA). While earlier contributions focused on finding factor columns inside a dictionary of candidate columns, i.e. one-sparse approximations, this work is the first to tackle DLRA with sparsity larger than one. I propose to focus on the sparse-coding subproblem coined Mixed Sparse-Coding (MSC) that emerges when solving DLRA with an alternating optimization strategy. Several algorithms based on sparse-coding heuristics (greedy methods, convex relaxations) are provided to solve MSC. The performance of these heuristics is evaluated on simulated data. Then, I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics. These experiments suggest that DLRA extends the modeling capabilities of low-rank approximations, helps reducing estimation variance and enhances the identifiability and interpretability of estimated factors.

[118]  arXiv:2111.12405 [pdf, other]
Title: An Attack on Feature Level-based Facial Soft-biometric Privacy Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the recent past, different researchers have proposed novel privacy-enhancing face recognition systems designed to conceal soft-biometric information at feature level. These works have reported impressive results, but usually do not consider specific attacks in their analysis of privacy protection. In most cases, the privacy protection capabilities of these schemes are tested through simple machine learning-based classifiers and visualisations of dimensionality reduction tools. In this work, we introduce an attack on feature level-based facial soft-biometric privacy-enhancement techniques. The attack is based on two observations: (1) to achieve high recognition accuracy, certain similarities between facial representations have to be retained in their privacy-enhanced versions; (2) highly similar facial representations usually originate from face images with similar soft-biometric attributes. Based on these observations, the proposed attack compares a privacy-enhanced face representation against a set of privacy-enhanced face representations with known soft-biometric attributes. Subsequently, the best obtained similarity scores are analysed to infer the unknown soft-biometric attributes of the attacked privacy-enhanced face representation. That is, the attack only requires a relatively small database of arbitrary face images and the privacy-enhancing face recognition algorithm as a black-box. In the experiments, the attack is applied to two representative approaches which have previously been reported to reliably conceal the gender in privacy-enhanced face representations. It is shown that the presented attack is able to circumvent the privacy enhancement to a considerable degree and is able to correctly classify gender with an accuracy of up to approximately 90% for both of the analysed privacy-enhancing face recognition systems.

[119]  arXiv:2111.12406 [pdf, other]
Title: Auto robust relative radiometric normalization via latent change noise modelling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Relative radiometric normalization(RRN) of different satellite images of the same terrain is necessary for change detection, object classification/segmentation, and map-making tasks. However, traditional RRN models are not robust, disturbing by object change, and RRN models precisely considering object change can not robustly obtain the no-change set. This paper proposes auto robust relative radiometric normalization methods via latent change noise modeling. They utilize the prior knowledge that no change points possess small-scale noise under relative radiometric normalization and that change points possess large-scale radiometric noise after radiometric normalization, combining the stochastic expectation maximization method to quickly and robustly extract the no-change set to learn the relative radiometric normalization mapping functions. This makes our model theoretically grounded regarding the probabilistic theory and mathematics deduction. Specifically, when we select histogram matching as the relative radiometric normalization learning scheme integrating with the mixture of Gaussian noise(HM-RRN-MoG), the HM-RRN-MoG model achieves the best performance. Our model possesses the ability to robustly against clouds/fogs/changes. Our method naturally generates a robust evaluation indicator for RRN that is the no-change set root mean square error. We apply the HM-RRN-MoG model to the latter vegetation/water change detection task, which reduces the radiometric contrast and NDVI/NDWI differences on the no-change set, generates consistent and comparable results. We utilize the no-change set into the building change detection task, efficiently reducing the pseudo-change and boosting the precision.

[120]  arXiv:2111.12408 [pdf, other]
Title: Markov Chain Generative Adversarial Neural Networks for Solving Bayesian Inverse Problems in Physics Applications
Subjects: Numerical Analysis (math.NA)

In the context of solving inverse problems for physics applications within a Bayesian framework, we present a new approach, Markov Chain Generative Adversarial Neural Networks (MCGANs), to alleviate the computational costs associated with solving the Bayesian inference problem. GANs pose a very suitable framework to aid in the solution of Bayesian inference problems, as they are designed to generate samples from complicated high-dimensional distributions. By training a GAN to sample from a low-dimensional latent space and then embedding it in a Markov Chain Monte Carlo method, we can highly efficiently sample from the posterior, by replacing both the high-dimensional prior and the expensive forward map. We prove that the proposed methodology converges to the true posterior in the Wasserstein-1 distance and that sampling from the latent space is equivalent to sampling in the high-dimensional space in a weak sense. The method is showcased on three test cases where we perform both state and parameter estimation simultaneously. The approach is shown to be up to two orders of magnitude more accurate than alternative approaches while also being up to an order of magnitude computationally faster, in several test cases, including the important engineering setting of detecting leaks in pipelines.

[121]  arXiv:2111.12417 [pdf, other]
Title: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

This paper presents a unified multimodal pre-trained model called N\"UWA that can generate new or manipulate existing visual data (i.e., images and videos) for various visual synthesis tasks. To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively. A 3D Nearby Attention (3DNA) mechanism is also proposed to consider the nature of the visual data and reduce the computational complexity. We evaluate N\"UWA on 8 downstream tasks. Compared to several strong baselines, N\"UWA achieves state-of-the-art results on text-to-image generation, text-to-video generation, video prediction, etc. Furthermore, it also shows surprisingly good zero-shot capabilities on text-guided image and video manipulation tasks. Project repo is https://github.com/microsoft/NUWA.

[122]  arXiv:2111.12419 [pdf, other]
Title: NAM: Normalization-based Attention Module
Comments: 3 pages, 2 figures, 2 tables, 2 tables in the appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recognizing less salient features is the key for model compression. However, it has not been investigated in the revolutionary attention mechanisms. In this work, we propose a novel normalization-based attention module (NAM), which suppresses less salient weights. It applies a weight sparsity penalty to the attention modules, thus, making them more computational efficient while retaining similar performance. A comparison with three other attention mechanisms on both Resnet and Mobilenet indicates that our method results in higher accuracy. Code for this paper can be publicly accessed at https://github.com/Christian-lyc/NAM.

[123]  arXiv:2111.12420 [pdf, other]
Title: CircuitFlow: A Domain Specific Language for Dataflow Programming (with appendices)
Comments: 31 pages, 5 figures, to be published in PADL 2022
Subjects: Programming Languages (cs.PL)

Dataflow applications, such as machine learning algorithms, can run for days, making it desirable to have assurances that they will work correctly. Current tools are not good enough: too often the interactions between tasks are not type-safe, leading to undesirable run-time errors. This paper presents a new declarative Haskell Embedded DSL (eDSL) for dataflow programming: CircuitFlow. Defined as a Symmetric Monoidal Preorder (SMP) on data that models dependencies in the workflow, it has a strong mathematical basis, refocusing on how data flows through an application, resulting in a more expressive solution that not only catches errors statically, but also achieves competitive run-time performance. In our preliminary evaluation, CircuitFlow outperforms the industry-leading Luigi library of Spotify by scaling better with the number of inputs. The innovative creation of CircuitFlow is also of note, exemplifying how to create a modular eDSL whose semantics necessitates effects, and where storing complex type information for program correctness is paramount.

[124]  arXiv:2111.12421 [pdf, other]
Title: Few-shot Named Entity Recognition with Cloze Questions
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Despite the huge and continuous advances in computational linguistics, the lack of annotated data for Named Entity Recognition (NER) is still a challenging issue, especially in low-resource languages and when domain knowledge is required for high-quality annotations. Recent findings in NLP show the effectiveness of cloze-style questions in enabling language models to leverage the knowledge they acquired during the pre-training phase. In our work, we propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning: the key idea is to rephrase the NER task with patterns. Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines without relying on manually annotated data or distant supervision on three benchmark datasets: NCBI-disease, BC2GM and a private Italian biomedical corpus.

[125]  arXiv:2111.12423 [pdf]
Title: Machine Learning Guided Cross-Contract Fuzzing
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Smart contract transactions are increasingly interleaved by cross-contract calls. While many tools have been developed to identify a common set of vulnerabilities to guard smart contracts, the cross-contract vulnerability is however overlooked by existing tools. Cross-contract vulnerabilities are exploitable bugs that manifest in the presence of more than two interacting contracts. Existing methods are however limited to analyze a maximum of two contracts at the same time. Detecting cross-contract vulnerabilities is highly non-trivial. With multiple interacting contracts, the search space is much larger than that of a single contract. To address this problem, we present xFuzz, a machine learning guided smart contract fuzzing framework. The machine learning models are trained with novel features (e.g., word vectors and instructions) and are used to filter likely benign program paths. Comparing with existing static tools, machine learning model is proven to be more robust, avoiding directly adopting manually-defined rules in specific tools. We compare xFuzz with three state-of-the-art tools on 7,391 contracts. xFuzz detects 18 exploitable cross-contract vulnerabilities, of which 15 vulnerabilities are exposed for the first time. Furthermore, our approach is shown to be efficient in detecting non-cross-contract vulnerabilities as well-using less than 20% time as that of other fuzzing tools, xFuzz detects twice as many vulnerabilities.

[126]  arXiv:2111.12427 [pdf, other]
Title: Challenges of Adversarial Image Augmentations
Comments: To appear at the ICBINB 2021 Neurips Workshop
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Image augmentations applied during training are crucial for the generalization performance of image classifiers. Therefore, a large body of research has focused on finding the optimal augmentation policy for a given task. Yet, RandAugment [2], a simple random augmentation policy, has recently been shown to outperform existing sophisticated policies. Only Adversarial AutoAugment (AdvAA) [11], an approach based on the idea of adversarial training, has shown to be better than RandAugment. In this paper, we show that random augmentations are still competitive compared to an optimal adversarial approach, as well as to simple curricula, and conjecture that the success of AdvAA is due to the stochasticity of the policy controller network, which introduces a mild form of curriculum.

[127]  arXiv:2111.12429 [pdf, other]
Title: tsflex: flexible time series processing & feature extraction
Comments: The first two authors contributed equally. Submitted to SoftwareX
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their real-world applicability, as they cannot cope with irregularly-sampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domain-independent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularly-sampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for strided-window feature extraction, and (2) the sequence-index is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, in-depth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memory-efficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.

[128]  arXiv:2111.12434 [pdf, other]
Title: The Evolving Path of "the Right to Be Left Alone" - When Privacy Meets Technology
Authors: Michela Iezzi
Comments: Accepted at IEEE International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications 2021 (IEEE TPS-ISA 2021)
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

This paper deals with the hot, evergreen topic of the relationship between privacy and technology. We give extensive motivation for why the privacy debate is still alive for private citizens and institutions, and we investigate the privacy concept. This paper proposes a novel vision of the privacy ecosystem, introducing privacy dimensions, the related users' expectations, the privacy violations, and the changing factors. We provide a critical assessment of the Privacy by Design paradigm, strategies, tactics, patterns, and Privacy-Enhancing Technologies, highlighting the current open issues. We believe that promising approaches to tackle the privacy challenges move in two directions: (i) identification of effective privacy metrics; and (ii) adoption of formal tools to design privacy-compliant applications.

[129]  arXiv:2111.12436 [pdf, ps, other]
Title: Matroid Partition Property and the Secretary Problem
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

A matroid $\mathcal{M}$ on a set $E$ of elements has the $\alpha$-partition property, for some $\alpha>0$, if it is possible to (randomly) construct a partition matroid $\mathcal{P}$ on (a subset of) elements of $\mathcal{M}$ such that every independent set of $\mathcal{P}$ is independent in $\mathcal{M}$ and for any weight function $w:E\to\mathbb{R}_{\geq 0}$, the expected value of the optimum of the matroid secretary problem on $\mathcal{P}$ is at least an $\alpha$-fraction of the optimum on $\mathcal{M}$. We show that the complete binary matroid, ${\cal B}_d$ on $\mathbb{F}_2^d$ does not satisfy the $\alpha$-partition property for any constant $\alpha>0$ (independent of $d$).
Furthermore, we refute a recent conjecture of B\'erczi, Schwarcz, and Yamaguchi by showing the same matroid is $2^d/d$-colorable but cannot be reduced to an $\alpha 2^d/d$-colorable partition matroid for any $\alpha$ that is sublinear in $d$.

[130]  arXiv:2111.12443 [pdf, ps, other]
Title: A topology optimisation of acoustic devices based on the frequency response estimation with the Padé approximation
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)

We propose a topology optimisation of acoustic devices that work in a certain bandwidth. To achieve this, we define the objective function as the frequency-averaged sound intensity at given observation points, which is represented by a frequency integral over a given frequency band. It is, however, prohibitively expensive to evaluate such an integral naively by a quadrature. We thus estimate the frequency response by the Pad\'{e} approximation and integrate the approximated function to obtain the objective function. The corresponding topological derivative is derived with the help of the adjoint variable method and chain rule. It is shown that the objective and its sensitivity can be evaluated semi-analytically. We present efficient numerical procedures to compute them and incorporate them into a topology optimisation based on the level-set method. We confirm the validity and effectiveness of the present method through some numerical examples.

[131]  arXiv:2111.12444 [pdf, other]
Title: Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and Applications
Comments: This work is a JSAC invited survey & tutorial paper. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

The thriving of artificial intelligence (AI) applications is driving the further evolution of wireless networks. It has been envisioned that 6G will be transformative and will revolutionize the evolution of wireless from "connected things" to "connected intelligence". However, state-of-the-art deep learning and big data analytics based AI systems require tremendous computation and communication resources, causing significant latency, energy consumption, network congestion, and privacy leakage in both of the training and inference processes. By embedding model training and inference capabilities into the network edge, edge AI stands out as a disruptive technology for 6G to seamlessly integrate sensing, communication, computation, and intelligence, thereby improving the efficiency, effectiveness, privacy, and security of 6G networks. In this paper, we shall provide our vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models. New design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described. Standardization, software and hardware platforms, and application scenarios are also discussed to facilitate the industrialization and commercialization of edge AI systems.

[132]  arXiv:2111.12447 [pdf, other]
Title: Revisiting Contextual Toxicity Detection in Conversations
Subjects: Computation and Language (cs.CL)

Understanding toxicity in user conversations is undoubtedly an important problem. As it has been argued in previous work, addressing "covert" or implicit cases of toxicity is particularly hard and requires context. Very few previous studies have analysed the influence of conversational context in human perception or in automated detection models. We dive deeper into both these directions. We start by analysing existing contextual datasets and come to the conclusion that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context. We then propose to bring these findings into computational detection models by introducing (a) neural architectures for contextual toxicity detection that are aware of the conversational structure, and (b) data augmentation strategies that can help model contextual toxicity detection. Our results have shown the encouraging potential of neural architectures that are aware of the conversation structure. We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain.

[133]  arXiv:2111.12448 [pdf, other]
Title: 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Learning a disentangled, interpretable, and structured latent representation in 3D generative models of faces and bodies is still an open problem. The problem is particularly acute when control over identity features is required. In this paper, we propose an intuitive yet effective self-supervised approach to train a 3D shape variational autoencoder (VAE) which encourages a disentangled latent representation of identity features. Curating the mini-batch generation by swapping arbitrary features across different shapes allows to define a loss function leveraging known differences and similarities in the latent representations. Experimental results conducted on 3D meshes show that state-of-the-art methods for latent disentanglement are not able to disentangle identity features of faces and bodies. Our proposed method properly decouples the generation of such features while maintaining good representation and reconstruction capabilities.

[134]  arXiv:2111.12449 [pdf, other]
Title: Background-Click Supervision for Temporal Action Localization
Comments: To appear at TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion. To overcome this challenge, one recent work builds an action-click supervision framework. It requires similar annotation costs but can steadily improve the localization performance when compared to the conventional weakly supervised methods. In this paper, by revealing that the performance bottleneck of the existing approaches mainly comes from the background errors, we find that a stronger action localizer can be trained with labels on the background video frames rather than those on the action frames. To this end, we convert the action-click supervision to the background-click supervision and develop a novel method, called BackTAL. Specifically, BackTAL implements two-fold modeling on the background video frames, i.e. the position modeling and the feature modeling. In position modeling, we not only conduct supervised learning on the annotated video frames but also design a score separation module to enlarge the score differences between the potential action frames and backgrounds. In feature modeling, we propose an affinity module to measure frame-specific similarities among neighboring frames and dynamically attend to informative neighbors when calculating temporal convolution. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of the established BackTAL and the rationality of the proposed background-click supervision. Code is available at https://github.com/VividLe/BackTAL.

[135]  arXiv:2111.12454 [pdf, other]
Title: Exploring Business Process Deviance with Sequential and Declarative Patterns
Subjects: Artificial Intelligence (cs.AI)

Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to {their} expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing event logs stored by the systems supporting the execution of a business process. In this paper, the problem of explaining deviations in business processes is first investigated by using features based on sequential and declarative patterns, and a combination of them. Then, the explanations are further improved by leveraging the data attributes of events and traces in event logs through features based on pure data attribute values and data-aware declarative rules. The explanations characterizing the deviances are then extracted by direct and indirect methods for rule induction. Using real-life logs from multiple domains, a range of feature types and different forms of decision rules are evaluated in terms of their ability to accurately discriminate between non-deviant and deviant executions of a process as well as in terms of understandability of the final outcome returned to the users.

[136]  arXiv:2111.12456 [pdf, other]
Title: SoK: Untangling File-based Encryption on Mobile Devices
Subjects: Cryptography and Security (cs.CR)

File-based encryption (FBE) schemes have been developed by software vendors to address security concerns related to data storage. While methods of encrypting data-at-rest may seem relatively straightforward, the main proponents of these technologies in mobile devices have nonetheless created seemingly different FBE solutions. As most of the underlying design decisions are described either at a high-level in whitepapers, or are accessible at a low-level by examining the corresponding source code (Android) or through reverse-engineering (iOS), comparisons between schemes and discussions on their relative strengths are scarce. In this paper, we propose a formal framework for the study of file-based encryption systems, focusing on two prominent implementations: the FBE scheme used in Android and Linux operating systems, as well as the FBE scheme used in iOS. Our proposed formal model and our detailed description of the existing algorithms are based on documentation of diverse nature, such as whitepapers, technical reports, presentations and blog posts, among others. Using our framework we validate the security of the existing key derivation chains, as well as the security of the overall designs, under widely-known security assumptions for symmetric ciphers, such as IND-CPA or INT-CTXT security, in the random-oracle model.

[137]  arXiv:2111.12460 [pdf, other]
Title: ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

This work presents a self-supervised method to learn dense semantically rich visual concept embeddings for images inspired by methods for learning word embeddings in NLP. Our method improves on prior work by generating more expressive embeddings and by being applicable for high-resolution images. Viewing the generation of natural images as a stochastic process where a set of latent visual concepts give rise to observable pixel appearances, our method is formulated to learn the inverse mapping from pixels to concepts. Our method greatly improves the effectiveness of self-supervised learning for dense embedding maps by introducing superpixelization as a natural hierarchical step up from pixels to a small set of visually coherent regions. Additional contributions are regional contextual masking with nonuniform shapes matching visually coherent patches and complexity-based view sampling inspired by masked language models. The enhanced expressiveness of our dense embeddings is demonstrated by significantly improving the state-of-the-art representation quality benchmarks on COCO (+12.94 mIoU, +87.6\%) and Cityscapes (+16.52 mIoU, +134.2\%). Results show favorable scaling and domain generalization properties not demonstrated by prior work.

[138]  arXiv:2111.12465 [pdf, ps, other]
Title: Introduction to Presentation Attack Detection in Iris Biometrics and Recent Advances
Comments: Chapter of the Handbook of Biometric Anti-Spoofing (Third Edition)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Iris recognition technology has attracted an increasing interest in the last decades in which we have witnessed a migration from research laboratories to real world applications. The deployment of this technology raises questions about the main vulnerabilities and security threats related to these systems. Among these threats presentation attacks stand out as some of the most relevant and studied. Presentation attacks can be defined as presentation of human characteristics or artifacts directly to the capture device of a biometric system trying to interfere its normal operation. In the case of the iris, these attacks include the use of real irises as well as artifacts with different level of sophistication such as photographs or videos. This chapter introduces iris Presentation Attack Detection (PAD) methods that have been developed to reduce the risk posed by presentation attacks. First, we summarise the most popular types of attacks including the main challenges to address. Secondly, we present a taxonomy of Presentation Attack Detection methods as a brief introduction to this very active research area. Finally, we discuss the integration of these methods into Iris Recognition Systems according to the most important scenarios of practical application.

[139]  arXiv:2111.12472 [pdf, other]
Title: COVID-19 vaccination certificates in the Darkweb
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

COVID-19 vaccines have been rolled out in many countries and with them a number of vaccination certificates. For instance, the EU is utilizing a digital certificate in the form of a QR-code that is digitally signed and can be easily validated throughout all EU countries.In this paper, we investigate the current state of the COVID-19 vaccination certificate market in the darkweb with a focus on the EU Digital Green Certificate (DGC). We investigate $17$ marketplaces and $10$ vendor shops, that include vaccination certificates in their listings. Our results suggest that a multitude of sellers in both types of platforms are advertising selling capabilities. According to their claims, it is possible to buy fake vaccination certificates issued in most countries worldwide. We demonstrate some examples of such sellers, including how they advertise their capabilities, and the methods they claim to be using to provide their services. We highlight two particular cases of vendor shops, with one of them showing an elevated degree of professionalism, showcasing forged valid certificates, the validity of which we verify using two different national mobile COVID-19 applications.

[140]  arXiv:2111.12476 [pdf, other]
Title: Hierarchical Modular Network for Video Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video captioning aims to generate natural language descriptions according to the content, where representation learning plays a crucial role. Existing methods are mainly developed within the supervised learning framework via word-by-word comparison of the generated caption against the ground-truth text without fully exploiting linguistic semantics. In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics from three levels before generating captions. In particular, the hierarchy is composed of: (I) Entity level, which highlights objects that are most likely to be mentioned in captions. (II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions. (III) Sentence level, which learns the global semantic representation and is supervised by the whole caption. Each level is implemented by one module. Extensive experimental results show that the proposed method performs favorably against the state-of-the-art models on the two widely-used benchmarks: MSVD 104.0% and MSR-VTT 51.5% in CIDEr score.

[141]  arXiv:2111.12477 [pdf, other]
Title: Selection of pseudo-annotated data for adverse drug reaction classification across drug groups
Comments: Accepted to AIST 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Automatic monitoring of adverse drug events (ADEs) or reactions (ADRs) is currently receiving significant attention from the biomedical community. In recent years, user-generated data on social media has become a valuable resource for this task. Neural models have achieved impressive performance on automatic text classification for ADR detection. Yet, training and evaluation of these methods are carried out on user-generated texts about a targeted drug. In this paper, we assess the robustness of state-of-the-art neural architectures across different drug groups. We investigate several strategies to use pseudo-labeled data in addition to a manually annotated train set. Out-of-dataset experiments diagnose the bottleneck of supervised models in terms of breakdown performance, while additional pseudo-labeled data improves overall results regardless of the text selection strategy.

[142]  arXiv:2111.12478 [pdf, other]
Title: Predictive Data Race Detection for GPUs
Comments: 14 pages, 12 figures, 4 tables
Subjects: Programming Languages (cs.PL)

The high degree of parallelism and relatively complicated synchronization mechanisms in GPUs make writing correct kernels difficult. Data races pose one such concurrency correctness challenge, and therefore, effective methods of detecting as many data races as possible are required.
Predictive partial order relations for CPU programs aim to expose data races that can be hidden during a dynamic execution. Existing predictive partial orders cannot be na\"ively applied to analyze GPU kernels because of the differences in programming models. This work proposes GWCP, a predictive partial order for data race detection of GPU kernels. GWCP extends a sound and precise relation called weak-causally-precedes (WCP) proposed in the context of multithreaded shared memory CPU programs to GPU kernels. GWCP takes into account the GPU thread hierarchy and different synchronization semantics such as barrier synchronization and scoped atomics and locks.
We implement a tool called PreDataR that tracks the GWCP relation using binary instrumentation. PreDataR includes three optimizations and a novel vector clock compression scheme that are readily applicable to other partial order based analyses. Our evaluation with several microbenchmarks and benchmarks shows that PreDataR has better data race coverage compared to prior techniques at practical run-time overheads.

[143]  arXiv:2111.12479 [pdf, other]
Title: Construction and evaluation of PH curves in exponential-polynomial spaces
Subjects: Numerical Analysis (math.NA)

In the past few decades polynomial curves with Pythagorean Hodograph (for short PH curves) have received considerable attention due to their usefulness in various CAD/CAM areas, manufacturing, numerical control machining and robotics. This work deals with classes of PH curves built-upon exponential-polynomial spaces (for short EPH curves). In particular, for the two most frequently encountered exponential-polynomial spaces, we first provide necessary and sufficient conditions to be satisfied by the control polygon of the B\'{e}zier-like curve in order to fulfill the PH property. Then, for such EPH curves, fundamental characteristics like parametric speed or cumulative and total arc length are discussed to show the interesting analogies with their well-known polynomial counterparts. Differences and advantages with respect to ordinary PH curves become commendable when discussing the solutions to application problems like the interpolation of first-order Hermite data. Finally, a new evaluation algorithm for EPH curves is proposed and shown to compare favorably with the celebrated de Casteljau-like algorithm and two recently proposed methods: Wo\'zny and Chudy's algorithm and the dynamic evaluation procedure by Yang and Hong.

[144]  arXiv:2111.12480 [pdf, other]
Title: Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Autoregressive models have proven to be very powerful in NLP text generation tasks and lately have gained popularity for image generation as well. However, they have seen limited use for the synthesis of 3D shapes so far. This is mainly due to the lack of a straightforward way to linearize 3D data as well as to scaling problems with the length of the resulting sequences when describing complex shapes. In this work we address both of these problems. We use octrees as a compact hierarchical shape representation that can be sequentialized by traversal ordering. Moreover, we introduce an adaptive compression scheme, that significantly reduces sequence lengths and thus enables their effective generation with a transformer, while still allowing fully autoregressive sampling and parallel training. We demonstrate the performance of our model by comparing against the state-of-the-art in shape generation.

[145]  arXiv:2111.12481 [pdf, other]
Title: It Is Different When Items Are Older: Debiasing Recommendations When Selection Bias and User Preferences Are Dynamic
Comments: WSDM 2022
Subjects: Information Retrieval (cs.IR)

User interactions with recommender systems (RSs) are affected by user selection bias, e.g., users are more likely to rate popular items (popularity bias) or items that they expect to enjoy beforehand (positivity bias). Methods exist for mitigating the effects of selection bias in user ratings on the evaluation and optimization of RSs. However, these methods treat selection bias as static, despite the fact that the popularity of an item may change drastically over time and the fact that user preferences may also change over time. We focus on the age of an item and its effect on selection bias and user preferences. Our experimental analysis reveals that the rating behavior of users on the MovieLens dataset is better captured by methods that consider effects from the age of item on bias and preferences. We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased. To address this limitation, we introduce DebiAsing in the dyNamiC scEnaRio (DANCER), a novel debiasing method that extends the inverse propensity scoring debiasing method to account for dynamic selection bias and user preferences. Our experimental results indicate that DANCER improves rating prediction performance compared to debiasing methods that incorrectly assume that selection bias is static in a dynamic scenario. To the best of our knowledge, DANCER is the first debiasing method that accounts for dynamic selection bias and user preferences in RSs.

[146]  arXiv:2111.12485 [pdf, other]
Title: Graph Modularity: Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

There are good arguments to support the claim that feature representations eventually transition from general to specific in deep neural networks (DNNs), but this transition remains relatively underexplored. In this work, we move a tiny step towards understanding the transition of feature representations. We first characterize this transition by analyzing the class separation in intermediate layers, and next model the process of class separation as community evolution in dynamic graphs. Then, we introduce modularity, a common metric in graph theory, to quantify the evolution of communities. We find that modularity tends to rise as the layer goes deeper, but descends or reaches a plateau at particular layers. Through an asymptotic analysis, we show that modularity can provide quantitative analysis of the transition of the feature representations. With the insight on feature representations, we demonstrate that modularity can also be used to identify and locate redundant layers in DNNs, which provides theoretical guidance for layer pruning. Based on this inspiring finding, we propose a layer-wise pruning method based on modularity. Further experiments show that our method can prune redundant layers with minimal impact on performance. The codes are available at https://github.com/yaolu-zjut/Dynamic-Graphs-Construction.

[147]  arXiv:2111.12487 [pdf, other]
Title: Distributed Evaluation of Graph Queries using Recursive Relational Algebra
Authors: Sarah Chlyah (TYREX), Pierre Genevès (TYREX), Nabil Layaïda (TYREX)
Subjects: Databases (cs.DB)

We present a system called Dist-$\mu$-RA for the distributed evaluation of recursive graph queries. Dist-$\mu$-RA builds on the recursive relational algebra and extends it with evaluation plans suited for the distributed setting. The goal is to offer expressivity for high-level queries while providing efficiency at scale and reducing communication costs. Experimental results on both real and synthetic graphs show the effectiveness of the proposed approach compared to existing systems.

[148]  arXiv:2111.12488 [pdf, other]
Title: Intuitive Shape Editing in Latent Space
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

The use of autoencoders for shape generation and editing suffers from manipulations in latent space that may lead to unpredictable changes in the output shape. We present an autoencoder-based method that enables intuitive shape editing in latent space by disentangling latent sub-spaces to obtain control points on the surface and style variables that can be manipulated independently. The key idea is adding a Lipschitz-type constraint to the loss function, i.e. bounding the change of the output shape proportionally to the change in latent space, leading to interpretable latent space representations. The control points on the surface can then be freely moved around, allowing for intuitive shape editing directly in latent space. We evaluate our method by comparing it to state-of-the-art data-driven shape editing methods. Besides shape manipulation, we demonstrate the expressiveness of our control points by leveraging them for unsupervised part segmentation.

[149]  arXiv:2111.12489 [pdf, ps, other]
Title: Repeated-root Constacyclic Codes with Optimal Locality
Subjects: Information Theory (cs.IT)

A code is called a locally repairable code (LRC) if any code symbol is a function of a small fraction of other code symbols. When a locally repairable code is employed in a distributed storage systems, an erased symbol can be recovered by accessing only a small number of other symbols, and hence alleviating the network resources required during the repair process. In this paper we consider repeated-root constacyclic codes, which is a generalization of cyclic codes, that are optimal with respect to a Singleton-like bound on minimum distance. An LRC with the structure of a constacyclic code can be encoded efficiently using any encoding algorithm for constacyclic codes in general. In this paper we obtain optimal LRCs among these repeated-root constacyclic codes. Several infinite classes of optimal LRCs over a fixed alphabet are found. Under a further assumption that the ambient space of the repeated-root constacyclic codes is a chain ring, we show that there is no other optimal LRC.

[150]  arXiv:2111.12490 [pdf, other]
Title: Causal Regularization Using Domain Priors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Neural networks leverage both causal and correlation-based relationships in data to learn models that optimize a given performance criterion, such as classification accuracy. This results in learned models that may not necessarily reflect the true causal relationships between input and output. When domain priors of causal relationships are available at the time of training, it is essential that a neural network model maintains these relationships as causal, even as it learns to optimize the performance criterion. We propose a causal regularization method that can incorporate such causal domain priors into the network and which supports both direct and total causal effects. We show that this approach can generalize to various kinds of specifications of causal priors, including monotonicity of causal effect of a given input feature or removing a certain influence for purposes of fairness. Our experiments on eleven benchmark datasets show the usefulness of this approach in regularizing a learned neural network model to maintain desired causal effects. On most datasets, domain-prior consistent models can be obtained without compromising on accuracy.

[151]  arXiv:2111.12493 [pdf, other]
Title: Time and Memory Efficient Algorithm for Structural Graph Summaries over Evolving Graphs
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)

Existing graph summarization algorithms are tailored to specific graph summary models, only support one-time batch computation, are designed and implemented for a specific task, or are evaluated using static graphs. Our novel, incremental, parallel algorithm addresses all of these shortcomings. We support infinitely many structural graph summary models defined in a formal language. All graph summaries can be updated in time $\mathcal{O}(\Delta \cdot d^k)$, where $\Delta$ is the number of additions, deletions, and modifications to the input graph, $d$ is its maximum degree, and $k$ is the maximum distance in the subgraphs considered while summarizing. We empirically evaluate the performance of our incremental algorithm on benchmark and real-world datasets. Overall our experiments show that, for commonly used summary models and datasets, the incremental summarization algorithm almost always outperforms its batch counterpart, even when about $50\%$ of the graph database changes. Updating the summaries of the real-world DyLDO-core dataset with our incremental algorithm is $5$ to $44$~times faster than computing a new summary, when using four cores. Furthermore, the incremental computations require a low memory overhead of only $8\%$ ($\pm 1\%$). Finally, the incremental summarization algorithm outperforms the batch algorithm even when using fewer cores.

[152]  arXiv:2111.12494 [pdf, other]
Title: Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC
Comments: Accepted for publication at CSCN 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The deployment of multi-access edge computing (MEC) is paving the way towards pervasive intelligence in future 6G networks. This new paradigm also proposes emerging requirements of dependable communications, which goes beyond the ultra-reliable low latency communication (URLLC), focusing on the performance of a closed loop instead of that of an unidirectional link. This work studies the simple but efficient one-shot transmission scheme, investigating the closed-loop-reliability-optimal policy of blocklength allocation under stringent time and energy constraints.

[153]  arXiv:2111.12495 [pdf, other]
Title: Softmax Gradient Tampering: Decoupling the Backward Pass for Improved Fitting
Comments: 13 pages, 4 figures, conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We introduce Softmax Gradient Tampering, a technique for modifying the gradients in the backward pass of neural networks in order to enhance their accuracy. Our approach transforms the predicted probability values using a power-based probability transformation and then recomputes the gradients in the backward pass. This modification results in a smoother gradient profile, which we demonstrate empirically and theoretically. We do a grid search for the transform parameters on residual networks. We demonstrate that modifying the softmax gradients in ConvNets may result in increased training accuracy, thus increasing the fit across the training data and maximally utilizing the learning capacity of neural networks. We get better test metrics and lower generalization gaps when combined with regularization techniques such as label smoothing. Softmax gradient tampering improves ResNet-50's test accuracy by $0.52\%$ over the baseline on the ImageNet dataset. Our approach is very generic and may be used across a wide range of different network architectures and datasets.

[154]  arXiv:2111.12497 [pdf, ps, other]
Title: Performance of Reconfigurable Intelligent Surfaces in the Presence of Generalized Gaussian Noise
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we investigate the performance of reconfigurable intelligent surface (RIS)-assisted communications, under the assumption of generalized Gaussian noise (GGN), over Rayleigh fading channels. Specifically, we consider an RIS, equipped with $N$ reflecting elements, and derive a novel closed-form expression for the symbol error rate (SER) of arbitrary modulation schemes. The usefulness of the derived new expression is that it can be used to capture the SER performance in the presence of special additive noise distributions such as Gamma, Laplacian, and Gaussian noise. These special cases are also considered and their associated asymptotic SER expressions are derived, and then employed to quantify the achievable diversity order of the system. The theoretical framework is corroborated by numerical results, which reveal that the shaping parameter of the GGN ($\alpha$) has a negligible effect on the diversity order of RIS-assisted systems, particularly for large $\alpha$ values. Accordingly, the maximum achievable diversity order is determined by $N$.

[155]  arXiv:2111.12498 [pdf, other]
Title: Meta Mask Correction for Nuclei Segmentation in Histopathological Image
Comments: Accepted by BIBM 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data. Then the corrected masks can be used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model in an end-to-end way. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. It even achieves comparable performance with the model training on supervised data in some noisy settings.

[156]  arXiv:2111.12502 [pdf, other]
Title: TriStereoNet: A Trinocular Framework for Multi-baseline Disparity Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Stereo vision is an effective technique for depth estimation with broad applicability in autonomous urban and highway driving. While various deep learning-based approaches have been developed for stereo, the input data from a binocular setup with a fixed baseline are limited. Addressing such a problem, we present an end-to-end network for processing the data from a trinocular setup, which is a combination of a narrow and a wide stereo pair. In this design, two pairs of binocular data with a common reference image are treated with shared weights of the network and a mid-level fusion. We also propose a Guided Addition method for merging the 4D data of the two baselines. Additionally, an iterative sequential self-supervised and supervised learning on real and synthetic datasets is presented, making the training of the trinocular system practical with no need to ground-truth data of the real dataset. Experimental results demonstrate that the trinocular disparity network surpasses the scenario where individual pairs are fed into a similar architecture. Code and dataset: https://github.com/cogsys-tuebingen/tristereonet.

[157]  arXiv:2111.12503 [pdf, other]
Title: Extracting Triangular 3D Models, Materials, and Lighting From Images
Authors: Jacob Munkberg (1), Jon Hasselgren (1), Tianchang Shen (1,2,3), Jun Gao (1,2,3), Wenzheng Chen (1), Alex Evans (1), Thomas Müller (1), Sanja Fidler (1,2,3) ((1) NVIDIA, (2) University of Toronto, (3) Vector Institute)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations. Unlike recent multi-view reconstruction approaches, which typically produce entangled 3D representations encoded in neural networks, we output triangle meshes with spatially-varying materials and environment lighting that can be deployed in any traditional graphics engine unmodified. We leverage recent work in differentiable rendering, coordinate-based networks to compactly represent volumetric texturing, alongside differentiable marching tetrahedrons to enable gradient-based optimization directly on the surface mesh. Finally, we introduce a differentiable formulation of the split sum approximation of environment lighting to efficiently recover all-frequency lighting. Experiments show our extracted models used in advanced scene editing, material decomposition, and high quality view interpolation, all running at interactive rates in triangle-based renderers (rasterizers and path tracers).

[158]  arXiv:2111.12506 [pdf, ps, other]
Title: A Unified Approach to Variational Autoencoders and Stochastic Normalizing Flows via Markov Chains
Subjects: Machine Learning (cs.LG); Probability (math.PR)

Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. In this paper, we provide a unified framework to handle these approaches via Markov chains. Indeed, we consider stochastic normalizing flows as pair of Markov chains fulfilling some properties and show that many state-of-the-art models for data generation fit into this framework. The Markov chains point of view enables us to couple both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers and variational autoencoders in a mathematically sound way. Besides layers with densities as Langevin layers, diffusion layers or variational autoencoders, also layers having no densities as deterministic layers or Metropolis-Hasting layers can be handled. Hence our framework establishes a useful mathematical tool to combine the various approaches.

[159]  arXiv:2111.12511 [pdf, other]
Title: Deep learning-based reduced order models for the real-time simulation of the nonlinear dynamics of microstructures
Comments: arXiv admin note: text overlap with arXiv:2001.04001
Subjects: Numerical Analysis (math.NA)

We propose a non-intrusive Deep Learning-based Reduced Order Model (DL-ROM) capable of capturing the complex dynamics of mechanical systems showing inertia and geometric nonlinearities. In the first phase, a limited number of high fidelity snapshots are used to generate a POD-Galerkin ROM which is subsequently exploited to generate the data, covering the whole parameter range, used in the training phase of the DL-ROM. A convolutional autoencoder is employed to map the system response onto a low-dimensional representation and, in parallel, to model the reduced nonlinear trial manifold. The system dynamics on the manifold is described by means of a deep feedforward neural network that is trained together with the autoencoder. The strategy is benchmarked against high fidelity solutions on a clamped-clamped beam and on a real micromirror with softening response and multiplicity of solutions. By comparing the different computational costs, we discuss the impressive gain in performance and show that the DL-ROM truly represents a real-time tool which can be profitably and efficiently employed in complex system-level simulation procedures for design and optimisation purposes.

[160]  arXiv:2111.12513 [pdf, other]
Title: FLACOCO: Fault Localization for Java based on Industry-grade Coverage
Comments: 4 pages, tool paper, demo available under this https URL, code available under this https URL
Subjects: Software Engineering (cs.SE)

Fault localization is an essential step in the debugging process. Spectrum-Based Fault Localization (SBFL) is a popular fault localization family of techniques, utilizing code-coverage to predict suspicious lines of code. In this paper, we present FLACOCO, a new fault localization tool for Java. The key novelty of FLACOCO is that it is built on top of one of the most used and most reliable coverage libraries for Java, JaCoCo. FLACOCO is made available through a well-designed command-line interface and Java API and supports all Java versions. We validate FLACOCO on two use-cases from the automatic program repair domain by reproducing previous scientific experiments. We find it is capable of effectively replacing the state-of-the-art FL library. Overall, we hope that FLACOCO will help research in fault localization as well as industry adoption thanks to being founded on industry-grade code coverage. An introductory video is available at https://youtu.be/RFRyvQuwRYA

[161]  arXiv:2111.12525 [pdf, other]
Title: Causality-inspired Single-source Domain Generalization for Medical Image Segmentation
Comments: Preprint.10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning models usually suffer from domain shift issues, where models trained on one source domain do not generalize well to other unseen domains. In this work, we investigate the single-source domain generalization problem: training a deep network that is robust to unseen domains, under the condition that training data is only available from one source domain, which is common in medical imaging applications. We tackle this problem in the context of cross-domain medical image segmentation. Under this scenario, domain shifts are mainly caused by different acquisition processes. We propose a simple causality-inspired data augmentation approach to expose a segmentation model to synthesized domain-shifted training examples. Specifically, 1) to make the deep model robust to discrepancies in image intensities and textures, we employ a family of randomly-weighted shallow networks. They augment training images using diverse appearance transformations. 2) Further we show that spurious correlations among objects in an image are detrimental to domain robustness. These correlations might be taken by the network as domain-specific clues for making predictions, and they may break on unseen domains. We remove these spurious correlations via causal intervention. This is achieved by stratifying the appearances of potentially correlated objects. The proposed approach is validated on three cross-domain segmentation tasks: cross-modality (CT-MRI) abdominal image segmentation, cross-sequence (bSSFP-LGE) cardiac MRI segmentation, and cross-center prostate MRI segmentation. The proposed approach yields consistent performance gains compared with competitive methods when tested on unseen domains.

[162]  arXiv:2111.12527 [pdf, other]
Title: MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video
Comments: preprint version
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-attention has become an integral component of the recent network architectures, e.g., Transformer, that dominate major image and video benchmarks. This is because self-attention can flexibly model long-range information. For the same reason, researchers make attempts recently to revive Multiple Layer Perceptron (MLP) and propose a few MLP-Like architectures, showing great potential. However, the current MLP-Like architectures are not good at capturing local details and lack progressive understanding of core details in the images and/or videos. To overcome this issue, we propose a novel MorphMLP architecture that focuses on capturing local details at the low-level layers, while gradually changing to focus on long-term modeling at the high-level layers. Specifically, we design a Fully-Connected-Like layer, dubbed as MorphFC, of two morphable filters that gradually grow its receptive field along the height and width dimension. More interestingly, we propose to flexibly adapt our MorphFC layer in the video domain. To our best knowledge, we are the first to create a MLP-Like backbone for learning video representation. Finally, we conduct extensive experiments on image classification, semantic segmentation and video classification. Our MorphMLP, such a self-attention free backbone, can be as powerful as and even outperform self-attention based models.

[163]  arXiv:2111.12528 [pdf]
Title: Systematic Analysis of Programming Languages and Their Execution Environments for Spectre Attacks
Subjects: Cryptography and Security (cs.CR)

In this paper, we analyze the security of programming languages and their execution environments (compilers and interpreters) with respect to Spectre attacks. The analysis shows that only 16 out of 42 execution environments have mitigations against at least one Spectre variant, i.e., 26 have no mitigations against any Spectre variant. Using our novel tool Speconnector, we develop Spectre proof-of-concept attacks in 8 programming languages and on code generated by 11 execution environments that were previously not known to be affected. Our results highlight some programming languages that are used to implement security-critical code, but remain entirely unprotected, even three years after the discovery of Spectre.

[164]  arXiv:2111.12531 [pdf, ps, other]
Title: Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Comments: 4 pages + 1 refs; 1 figure; submitted to IEEE SPL (pending review)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Non-intrusive speech intelligibility (SI) prediction from binaural signals is useful in many applications. However, most existing signal-based measures are designed to be applied to single-channel signals. Measures specifically designed to take into account the binaural properties of the signal are often intrusive - characterised by requiring access to a clean speech signal - and typically rely on combining both channels into a single-channel signal before making predictions. This paper proposes a non-intrusive SI measure that computes features from a binaural input signal using a combination of vector quantization (VQ) and contrastive predictive coding (CPC) methods. VQ-CPC feature extraction does not rely on any model of the auditory system and is instead trained to maximise the mutual information between the input signal and output features. The computed VQ-CPC features are input to a predicting function parameterized by a neural network. Two predicting functions are considered in this paper. Both feature extractor and predicting functions are trained on simulated binaural signals with isotropic noise. They are tested on simulated signals with isotropic and real noise. For all signals, the ground truth scores are the (intrusive) deterministic binaural STOI. Results are presented in terms of correlations and MSE and demonstrate that VQ-CPC features are able to capture information relevant to modelling SI and outperform all the considered benchmarks - even when evaluating on data comprising of different noise field types.

[165]  arXiv:2111.12535 [pdf, other]
Title: Knowledge Enhanced Sports Game Summarization
Comments: Accepted to WSDM 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.

[166]  arXiv:2111.12537 [pdf, ps, other]
Title: Processing of optical signals by "surgical" methods for the Gelfand-Levitan-Marchenko equation
Subjects: Numerical Analysis (math.NA)

We propose a new method for solving the Gelfand-Levitan-Marchenko equation (GLME) based on the block version of the Toeplitz Inner-Bordering (TIB) with an arbitrary point to start the calculation. This makes it possible to find solutions of the GLME at an arbitrary point with a cutoff of the matrix coefficient, which allows to avoid the occurrence of numerical instability and to perform calculations for soliton solutions spaced apart in the time domain. Using an example of two solitons, we demonstrate our method and its range of applicability. An example of eight solitons shows how the method can be applied to a more complex signal configuration.

[167]  arXiv:2111.12539 [pdf, other]
Title: Information-Theoretic Approach for Model Reduction Over Finite Time Horizon
Subjects: Systems and Control (eess.SY)

This paper presents an information-theoretic approach for model reduction for finite time simulation. Although system models are typically used for simulation over a finite time, most of the metrics (and pseudo-metrics) used for model accuracy assessment consider asymptotic behavior e.g., Hankel singular values and Kullback-Leibler(KL) rate metric. These metrics could further be used for model order reduction. Hence, in this paper, we propose a generalization of KL divergence-based metric called n-step KL rate metric, which could be used to compare models over a finite time horizon. We then demonstrate that the asymptotic metrics for comparing dynamical systems may not accurately assess the model prediction uncertainties over a finite time horizon. Motivated by this finite time analysis, we propose a new pragmatic approach to compute the influence of a subset of states on a combination of states called information transfer (IT). Model reduction typically involves the removal or truncation of states. IT combines the concepts from the n-step KL rate metric and model reduction. Finally, we demonstrate the application of information transfer for model reduction. Although the analysis and definitions presented in this paper assume linear systems, they can be extended for nonlinear systems.

[168]  arXiv:2111.12542 [pdf]
Title: Autonomous bot with ML-based reactive navigation for indoor environment
Comments: This paper was presented in RIACT2021, an international conference, and was awarded 'outstanding oral presentation'. It was also selected for publication in springer special issue 2021. 12 pages, with 6 main figures and 1 figure in appendix
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Local or reactive navigation is essential for autonomous mobile robots which operate in an indoor environment. Techniques such as SLAM, computer vision require significant computational power which increases cost. Similarly, using rudimentary methods makes the robot susceptible to inconsistent behavior. This paper aims to develop a robot that balances cost and accuracy by using machine learning to predict the best obstacle avoidance move based on distance inputs from four ultrasonic sensors that are strategically mounted on the front, front-left, front-right, and back of the robot. The underlying hardware consists of an Arduino Uno and a Raspberry Pi 3B. The machine learning model is first trained on the data collected by the robot. Then the Arduino continuously polls the sensors and calculates the distance values, and in case of critical need for avoidance, a suitable maneuver is made by the Arduino. In other scenarios, sensor data is sent to the Raspberry Pi using a USB connection and the machine learning model generates the best move for navigation, which is sent to the Arduino for driving motors accordingly. The system is mounted on a 2-WD robot chassis and tested in a cluttered indoor setting with most impressive results.

[169]  arXiv:2111.12544 [pdf, other]
Title: LDDMM meets GANs: Generative Adversarial Networks for diffeomorphic registration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The purpose of this work is to contribute to the state of the art of deep-learning methods for diffeomorphic registration. We propose an adversarial learning LDDMM method for pairs of 3D mono-modal images based on Generative Adversarial Networks. The method is inspired by the recent literature for deformable image registration with adversarial learning. We combine the best performing generative, discriminative, and adversarial ingredients from the state of the art within the LDDMM paradigm. We have successfully implemented two models with the stationary and the EPDiff-constrained non-stationary parameterizations of diffeomorphisms. Our unsupervised and data-hungry approach has shown a competitive performance with respect to a benchmark supervised and rich-data approach. In addition, our method has shown similar results to model-based methods with a computational time under one second.

[170]  arXiv:2111.12545 [pdf, other]
Title: Learning to Refit for Convex Learning Problems
Subjects: Machine Learning (cs.LG); Computation (stat.CO)

Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the state-of-the-art.

[171]  arXiv:2111.12548 [pdf, other]
Title: AutoDC: Automated data-centric processing
Comments: NeurIPS 2021- Data-Centric AI (DCAI) workshop
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

AutoML (automated machine learning) has been extensively developed in the past few years for the model-centric approach. As for the data-centric approach, the processes to improve the dataset, such as fixing incorrect labels, adding examples that represent edge cases, and applying data augmentation, are still very artisanal and expensive. Here we develop an automated data-centric tool (AutoDC), similar to the purpose of AutoML, aims to speed up the dataset improvement processes. In our preliminary tests on 3 open source image classification datasets, AutoDC is estimated to reduce roughly 80% of the manual time for data improvement tasks, at the same time, improve the model accuracy by 10-15% with the fixed ML code.

[172]  arXiv:2111.12549 [pdf, ps, other]
Title: Interpolating Rotations with Non-abelian Kuramoto Model on the 3-Sphere
Journal-ref: Advanced Technologies, Systems, and Applications VI. IAT 2021. Lecture Notes in Networks and Systems, vol 316. Springer, Cham
Subjects: Graphics (cs.GR)

The paper presents a novel method for interpolating rotations based on the non-Abelian Kuramoto model on sphere S3. The algorithm, introduced in this paper, finds the shortest and most direct path between two rotations. We have discovered that it gives approximately the same results as a Spherical Linear Interpolation algorithm. Simulation results of our algorithm are visualized on S2 using Hopf fibration. In addition, in order to gain a better insight, we have provided one short video illustrating the rotation of an object between two positions.

[173]  arXiv:2111.12550 [pdf, other]
Title: A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits
Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Crowdsourcing system has emerged as an effective platform to label data with relatively low cost by using non-expert workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$-type worker-task specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given recovery accuracy, and propose an inference algorithm achieving the order-wise optimal bound. We conduct experiments both on synthetic and real-world datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions.

[174]  arXiv:2111.12553 [pdf, ps, other]
Title: CycleQ: An Efficient Basis for Cyclic Equational Reasoning
Subjects: Programming Languages (cs.PL)

We propose a new cyclic proof system for automated, equational reasoning about the behaviour of pure functional programs. The key to the system is the way in which cyclic proof and equational reasoning are mediated by the use of contextual substitution as a cut rule. We show that our system, although simple, already subsumes several of the approaches to implicit induction variously known as "inductionless induction", "rewriting induction", and "proof by consistency". By restricting the form of the traces, we show that global correctness in our system can be verified incrementally, taking advantage of the well-known size-change principle, which leads to an efficient implementation of proof search. Our CycleQ tool, accessible as a GHC plugin, shows promising results on a number of standard benchmarks.

[175]  arXiv:2111.12555 [pdf, other]
Title: Serpens: A High Bandwidth Memory Based Accelerator for General-Purpose Sparse Matrix-Vector Multiplication
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Sparse matrix-vector multiplication (SpMV) multiplies a sparse matrix with a dense vector. SpMV plays a crucial role in many applications, from graph analytics to deep learning. The random memory accesses of the sparse matrix make accelerator design challenging. However, high bandwidth memory (HBM) based FPGAs are a good fit for designing accelerators for SpMV. In this paper, we present Serpens, an HBM based accelerator for general-purpose SpMV.Serpens features (1) a general-purpose design, (2) memory-centric processing engines, and (3) index coalescing to support the efficient processing of arbitrary SpMVs. From the evaluation of twelve large-size matrices, Serpens is 1.91x and 1.76x better in terms of geomean throughput than the latest accelerators GraphLiLy and Sextans, respectively. We also evaluate 2,519 SuiteSparse matrices, and Serpens achieves 2.10x higher throughput than a K80 GPU. For the energy efficiency, Serpens is 1.71x, 1.90x, and 42.7x better compared with GraphLily, Sextans, and K80, respectively. After scaling up to 24 HBM channels, Serpens achieves up to 30,204MTEPS and up to 3.79x over GraphLily.

[176]  arXiv:2111.12557 [pdf, ps, other]
Title: Optimization-free Ground Contact Force Constraint Satisfaction in Quadrupedal Locomotion
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

We are seeking control design paradigms for legged systems that allow bypassing costly algorithms that depend on heavy on-board computers widely used in these systems and yet being able to match what they can do by using less expensive optimization-free frameworks. In this work, we present our preliminary results in modeling and control design of a quadrupedal robot called \textit{Husky Carbon}, which under development at Northeastern University (NU) in Boston. In our approach, we utilized a supervisory controller and an Explicit Reference Governor (ERG) to enforce ground reaction force constraints. These constraints are usually enforced using costly optimizations. However, in this work, the ERG manipulates the state references applied to the supervisory controller to enforce the ground contact constraints through an updated law based on Lyapunov stability arguments. As a result, the approach is much faster to compute than the widely used optimization-based methods.

[177]  arXiv:2111.12560 [pdf, other]
Title: Building Object-based Causal Programs for Human-like Generalization
Comments: To appear in NeurIPs workshop WHY-21 - Causal Inference & Machine Learning: Why now?
Subjects: Artificial Intelligence (cs.AI); Other Computer Science (cs.OH)

We present a novel task that measures how people generalize objects' causal powers based on observing a single (Experiment 1) or a few (Experiment 2) causal interactions between object pairs. We propose a computational modeling framework that can synthesize human-like generalization patterns in our task setting, and sheds light on how people may navigate the compositional space of possible causal functions and categories efficiently. Our modeling framework combines a causal function generator that makes use of agent and recipient objects' features and relations, and a Bayesian non-parametric inference process to govern the degree of similarity-based generalization. Our model has a natural "resource-rational" variant that outperforms a naive Bayesian account in describing participants, in particular reproducing a generalization-order effect and causal asymmetry observed in our behavioral experiments. We argue that this modeling framework provides a computationally plausible mechanism for real world causal generalization.

[178]  arXiv:2111.12577 [pdf, other]
Title: A Method for Evaluating the Capacity of Generative Adversarial Networks to Reproduce High-order Spatial Context
Comments: Submitted to IEEE-TPAMI. Early version with partial results has been accepted for poster presentation at SPIE-MI 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Generative adversarial networks are a kind of deep generative model with the potential to revolutionize biomedical imaging. This is because GANs have a learned capacity to draw whole-image variates from a lower-dimensional representation of an unknown, high-dimensional distribution that fully describes the input training images. The overarching problem with GANs in clinical applications is that there is not adequate or automatic means of assessing the diagnostic quality of images generated by GANs. In this work, we demonstrate several tests of the statistical accuracy of images output by two popular GAN architectures. We designed several stochastic object models (SOMs) of distinct features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect the known arrangement rules. We then tested the rates at which the different GANs correctly reproduced the rules under a variety of training scenarios and degrees of feature-class similarity. We found that ensembles of generated images can appear accurate visually, and correspond to low Frechet Inception Distance scores (FID), while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that while low-order ensemble statistics are largely correct, there are numerous quantifiable errors per image that plausibly can affect subsequent use of the GAN-generated images.

[179]  arXiv:2111.12579 [pdf]
Title: Water Care: Water Surface Cleaning Bot and Water Body Surveillance System
Comments: This paper was presented in RIACT 2021, an international conference, and was selected for publication in springer special issue 2021
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Whenever a person hears about pollution, more often than not, the first thought that comes to their mind is air pollution. One of the most under-mentioned and under-discussed pollution globally is that caused by the non-biodegradable waste in our water bodies. In the case of India, there is a lot of plastic waste on the surface of rivers and lakes. The Ganga river is one of the 10 rivers which account for 90 percent of the plastic that ends up in the sea and there are major cases of local nalaas and lakes being contaminated due to this waste. This limits the source of clean water which leads to major depletion in water sources. From 2001 to 2012, in the city of Hyderabad, 3245 hectares of lakes dissipated. The water recedes by nine feet a year on average in southern New Delhi. Thus, cleaning of these local water bodies and rivers is of utmost importance. Our aim is to develop a water surface cleaning bot that is deployed across the shore. The bot will detect garbage patches on its way and collect the garbage thus making the water bodies clean. This solution employs a surveillance mechanism in order to alert the authorities in case anyone is found polluting the water bodies. A more sustainable system by using solar energy to power the system has been developed. Computer vision algorithms are used for detecting trash on the surface of the water. This trash is collected by the bot and is disposed of at a designated location. In addition to cleaning the water bodies, preventive measures have been also implemented with the help of a virtual fencing algorithm that alerts the authorities if anyone tries to pollute the water premises. A web application and a mobile app is deployed to keep a check on the movement of the bot and shore surveillance respectively. This complete solution involves both preventive and curative measures that are required for water care.

[180]  arXiv:2111.12580 [pdf, other]
Title: UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning to estimate object pose often requires ground-truth (GT) labels, such as CAD model and absolute-scale object pose, which is expensive and laborious to obtain in the real world. To tackle this problem, we propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called \textbf{UDA-COPE}. Inspired by the recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain labels. We also introduce a bidirectional filtering method between predicted normalized object coordinate space (NOCS) map and observed point cloud, to not only make our teacher network more robust to the target domain but also to provide more reliable pseudo labels for the student network training. Extensive experimental results demonstrate the effectiveness of our proposed method both quantitatively and qualitatively. Notably, without leveraging target-domain GT labels, our proposed method achieves comparable or sometimes superior performance to existing methods that depend on the GT labels.

[181]  arXiv:2111.12581 [pdf, other]
Title: Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

In recent years there is a growing effort to provide learning algorithms for spectrum collaboration. In this paper we present a medium access control protocol which allows spectrum collaboration with minimal regret and high spectral efficiency in highly loaded networks. We present a fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks. The algorithm jointly solves both the channel allocation and access scheduling problems. We prove that the algorithm has an optimal logarithmic regret. Based on the algorithm we provide a medium access control protocol which allows distributed implementation of the algorithm in ad-hoc networks. The protocol utilizes single-channel opportunistic carrier sensing to carry out a low-complexity distributed auction in time and frequency. We also discuss practical implementation issues such as bounded frame size and speed of convergence. Computer simulations comparing the algorithm to state-of-the-art distributed medium access control protocols show the significant advantage of the proposed scheme.

[182]  arXiv:2111.12583 [pdf, other]
Title: Optimizing Latent Space Directions For GAN-based Local Image Editing
Comments: 4 pages, 5 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generative Adversarial Network (GAN) based localized image editing can suffer ambiguity between semantic attributes. We thus present a novel objective function to evaluate the locality of an image edit. By introducing the supervision from a pre-trained segmentation network and optimizing the objective function, our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture. Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image. Our experiments on both GAN-generated and real images qualitatively demonstrate the high quality and advantages of our method.

[183]  arXiv:2111.12588 [pdf, other]
Title: Towards Cross-Cultural Analysis using Music Information Dynamics
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

A music piece is both comprehended hierarchically, from sonic events to melodies, and sequentially, in the form of repetition and variation. Music from different cultures establish different aesthetics by having different style conventions on these two aspects. We propose a framework that could be used to quantitatively compare music from different cultures by looking at these two aspects.
The framework is based on an Music Information Dynamics model, a Variable Markov Oracle (VMO), and is extended with a variational representation learning of audio. A variational autoencoder (VAE) is trained to map audio fragments into a latent representation. The latent representation is fed into a VMO. The VMO then learns a clustering of the latent representation via a threshold that maximizes the information rate of the quantized latent representation sequence. This threshold effectively controls the sensibility of the predictive step to acoustic changes, which determines the framework's ability to track repetitions on longer time scales. This approach allows characterization of the overall information contents of a musical signal at each level of acoustic sensibility.
Our findings under this framework show that sensibility to subtle acoustic changes is higher for East-Asian musical traditions, while the Western works exhibit longer motivic structures at higher thresholds of differences in the latent space. This suggests that a profile of information contents, analyzed as a function of the level of acoustic detail can serve as a possible cultural characteristic.

[184]  arXiv:2111.12591 [pdf, other]
Title: Lepard: Learning partial point cloud matching in rigid and deformable scenes
Comments: Code and data: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Lepard, a Learning based approach for partial point cloud matching for rigid and deformable scenes. The key characteristic of Lepard is the following approaches that exploit 3D positional knowledge for point cloud matching: 1) An architecture that disentangles point cloud representation into feature space and 3D position space. 2) A position encoding method that explicitly reveals 3D relative distance information through the dot product of vectors. 3) A repositioning technique that modifies the cross-point-cloud relative positions. Ablation studies demonstrate the effectiveness of the above techniques. For rigid point cloud matching, Lepard sets a new state-of-the-art on the 3DMatch / 3DLoMatch benchmarks with 93.6% / 69.0% registration recall. In deformable cases, Lepard achieves +27.1% / +34.8% higher non-rigid feature matching recall than the prior art on our newly constructed 4DMatch / 4DLoMatch benchmark.

[185]  arXiv:2111.12594 [pdf, other]
Title: Conditional Object-Centric Learning from Video
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for any supervision. However, such fully-unsupervised methods still fail to scale to diverse realistic data, despite the use of increasingly complex inductive biases such as priors for the size of objects or the 3D geometry of the scene. In this paper, we instead take a weakly-supervised approach and focus on how 1) using the temporal dynamics of video data in the form of optical flow and 2) conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data. We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation. These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences. We also find that such initial-state-conditioning can be used during inference as a flexible interface to query the model for specific objects or parts of objects, which could pave the way for a range of weakly-supervised approaches and allow more effective interaction with trained models.

[186]  arXiv:2111.12600 [pdf, other]
Title: Learning State Representations via Retracing in Reinforcement Learning
Subjects: Machine Learning (cs.LG)

We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks. In addition to the predictive (reconstruction) supervision in the forward direction, we propose to include `"retraced" transitions for representation/model learning, by enforcing the cycle-consistency constraint between the original and retraced states, hence improve upon the sample efficiency of learning. Moreover, learning via retracing explicitly propagates information about future transitions backward for inferring previous states, thus facilitates stronger representation learning. We introduce Cycle-Consistency World Model (CCWM), a concrete instantiation of learning via retracing implemented under existing model-based reinforcement learning framework. Additionally we propose a novel adaptive "truncation" mechanism for counteracting the negative impacts brought by the "irreversible" transitions such that learning via retracing can be maximally effective. Through extensive empirical studies on continuous control benchmarks, we demonstrates that CCWM achieves state-of-the-art performance in terms of sample efficiency and asymptotic performance.

[187]  arXiv:2111.12602 [pdf, other]
Title: Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion
Comments: Under Review at CVPR
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Probability (math.PR)

Models of human motion commonly focus either on trajectory prediction or action classification but rarely both. The marked heterogeneity and intricate compositionality of human motion render each task vulnerable to the data degradation and distributional shift common to real-world scenarios. A sufficiently expressive generative model of action could in theory enable data conditioning and distributional resilience within a unified framework applicable to both tasks. Here we propose a novel architecture based on hierarchical variational autoencoders and deep graph convolutional neural networks for generating a holistic model of action over multiple time-scales. We show this Hierarchical Graph-convolutional Variational Autoencoder (HG-VAE) to be capable of generating coherent actions, detecting out-of-distribution data, and imputing missing data by gradient ascent on the model's posterior. Trained and evaluated on H3.6M and the largest collection of open source human motion data, AMASS, we show HG-VAE can facilitate downstream discriminative learning better than baseline models.

[188]  arXiv:2111.12606 [pdf, other]
Title: Deep metric learning improves lab of origin prediction of genetically engineered plasmids
Comments: 20 pages, 7 figures, 48 citations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Genome engineering is undergoing unprecedented development and is now becoming widely available. To ensure responsible biotechnology innovation and to reduce misuse of engineered DNA sequences, it is vital to develop tools to identify the lab-of-origin of engineered plasmids. Genetic engineering attribution (GEA), the ability to make sequence-lab associations, would support forensic experts in this process. Here, we propose a method, based on metric learning, that ranks the most likely labs-of-origin whilst simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employs a circular shift augmentation approach and is able to correctly rank the lab-of-origin $90\%$ of the time within its top 10 predictions - outperforming all current state-of-the-art approaches. We also demonstrate that we can perform few-shot-learning and obtain $76\%$ top-10 accuracy using only $10\%$ of the sequences. This means, we outperform the previous CNN approach using only one-tenth of the data. We also demonstrate that we are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.

[189]  arXiv:2111.12608 [pdf, other]
Title: Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing
Comments: code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses the aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of weakly supervised learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models can be accessed at https://github.com/OPEN-AIR-SUN/Cerberus.

[190]  arXiv:2111.12609 [pdf, other]
Title: GreedyNASv2: Greedier Search with a Greedy Path Filter
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Training a good supernet in one-shot NAS methods is difficult since the search space is usually considerably huge (e.g., $13^{21}$). In order to enhance the supernet's evaluation ability, one greedy strategy is to sample good paths, and let the supernet lean towards the good ones and ease its evaluation burden as a result. However, in practice the search can be still quite inefficient since the identification of good paths is not accurate enough and sampled paths still scatter around the whole search space. In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently. Concretely, based on the fact that good paths are much less than the weak ones in the space, we argue that the label of "weak paths" will be more confident and reliable than that of ``good paths" in multi-path sampling. In this way, we thus cast the training of path filter in the positive and unlabeled (PU) learning paradigm, and also encourage a \textit{path embedding} as better path/operation representation to enhance the identification capacity of the learned filter. By dint of this embedding, we can further shrink the search space by aggregating similar operations with similar embeddings, and the search can be more efficient and accurate. Extensive experiments validate the effectiveness of the proposed method GreedyNASv2. For example, our obtained GreedyNASv2-L achieves $81.1\%$ Top-1 accuracy on ImageNet dataset, significantly outperforming the ResNet-50 strong baselines.

[191]  arXiv:2111.12614 [pdf, other]
Title: PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling
Comments: 10 pages
Subjects: Information Retrieval (cs.IR)

Personalized search plays a crucial role in improving user search experience owing to its ability to build user profiles based on historical behaviors. Previous studies have made great progress in extracting personal signals from the query log and learning user representations. However, neural personalized search is extremely dependent on sufficient data to train the user model. Data sparsity is an inevitable challenge for existing methods to learn high-quality user representations. Moreover, the overemphasis on final ranking quality leads to rough data representations and impairs the generalizability of the model. To tackle these issues, we propose a Personalized Search framework with Self-supervised Learning (PSSL) to enhance data representations. Specifically, we adopt a contrastive sampling method to extract paired self-supervised information from sequences of user behaviors in query logs. Four auxiliary tasks are designed to pre-train the sentence encoder and the sequence encoder used in the ranking model. They are optimized by contrastive loss which aims to close the distance between similar user sequences, queries, and documents. Experimental results on two datasets demonstrate that our proposed model PSSL achieves state-of-the-art performance compared with existing baselines.

[192]  arXiv:2111.12618 [pdf, other]
Title: Group based Personalized Search by Integrating Search Behaviour and Friend Network
Comments: 10 pages
Subjects: Information Retrieval (cs.IR)

The key to personalized search is to build the user profile based on historical behaviour. To deal with the users who lack historical data, group based personalized models were proposed to incorporate the profiles of similar users when re-ranking the results. However, similar users are mostly found based on simple lexical or topical similarity in search behaviours. In this paper, we propose a neural network enhanced method to highlight similar users in semantic space. Furthermore, we argue that the behaviour-based similar users are still insufficient to understand a new query when user's historical activities are limited. To tackle this issue, we introduce the friend network into personalized search to determine the closeness between users in another way. Since the friendship is often formed based on similar background or interest, there are plenty of personalized signals hidden in the friend network naturally. Specifically, we propose a friend network enhanced personalized search model, which groups the user into multiple friend circles based on search behaviours and friend relations respectively. These two types of friend circles are complementary to construct a more comprehensive group profile for refining the personalization. Experimental results show the significant improvement of our model over existing personalized search models.

[193]  arXiv:2111.12620 [pdf, other]
Title: Convergence of the harmonic balance method for smooth Hilbert space valued differential-algebraic equations
Subjects: Numerical Analysis (math.NA)

We analyze the convergence of the harmonic balance method for computing isolated periodic solutions of a large class of continuously differentiable Hilbert space valued differential-algebraic equations (DAEs). We establish asymptotic convergence estimates for (i) the approximate periodic solution in terms of the number of approximated harmonics and (ii) the inexact Newton method used to compute the approximate Fourier coefficients. The convergence estimates are deter-mined by the rate of convergence of the Fourier series of the exact solution and the structure of the DAE. Both the case that the period is known and unknown are analyzed, where in the latter case we require enforcing an appropriately defined phase condition. The theoretical results are illustrated with several numerical experiments from circuit modeling and structural dynamics.

[194]  arXiv:2111.12621 [pdf, other]
Title: Accelerating Deep Learning with Dynamic Data Pruning
Comments: 11 pages, 13 figures, under review
Subjects: Machine Learning (cs.LG)

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing systems to train state-of-the-art networks. A large body of research has been devoted to addressing the cost per iteration of training through various model compression techniques like pruning and quantization. Less effort has been spent targeting the number of iterations. Previous work, such as forget scores and GraNd/EL2N scores, address this problem by identifying important samples within a full dataset and pruning the remaining samples, thereby reducing the iterations per epoch. Though these methods decrease the training time, they use expensive static scoring algorithms prior to training. When accounting for the scoring mechanism, the total run time is often increased. In this work, we address this shortcoming with dynamic data pruning algorithms. Surprisingly, we find that uniform random dynamic pruning can outperform the prior work at aggressive pruning rates. We attribute this to the existence of "sometimes" samples -- points that are important to the learned decision boundary only some of the training time. To better exploit the subtlety of sometimes samples, we propose two algorithms, based on reinforcement learning techniques, to dynamically prune samples and achieve even higher accuracy than the random dynamic method. We test all our methods against a full-dataset baseline and the prior work on CIFAR-10 and CIFAR-100, and we can reduce the training time by up to 2x without significant performance loss. Our results suggest that data pruning should be understood as a dynamic process that is closely tied to a model's training trajectory, instead of a static step based solely on the dataset alone.

[195]  arXiv:2111.12624 [pdf, other]
Title: Self-slimmed Vision Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful transformers bring a huge computation burden. And the essential barrier behind this is the exhausting token-to-token comparison. To alleviate this, we delve deeply into the model properties of ViT and observe that ViTs exhibit sparse attention with high token similarity. This intuitively introduces us a feasible structure-agnostic dimension, token number, to reduce the computational cost. Based on this exploration, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation. Different from the token hard dropping, our TSM softly integrates redundant tokens into fewer informative ones, which can dynamically zoom visual attention without cutting off discriminative token relations in the images. Furthermore, we introduce a concise Dense Knowledge Distillation (DKD) framework, which densely transfers unorganized token information in a flexible auto-encoder manner. Due to the similar structure between teacher and student, our framework can effectively leverage structure knowledge for better convergence. Finally, we conduct extensive experiments to evaluate our SiT. It demonstrates that our method can speed up ViTs by 1.7x with negligible accuracy drop, and even speed up ViTs by 3.6x while maintaining 97% of their performance. Surprisingly, by simply arming LV-ViT with our SiT, we achieve new state-of-the-art performance on ImageNet, surpassing all the CNNs and ViTs in the recent literature.

[196]  arXiv:2111.12628 [pdf, other]
Title: Efficient Decompositional Rule Extraction for Deep Neural Networks
Comments: Accepted at NeurIPS 2021 Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In recent years, there has been significant work on increasing both interpretability and debuggability of a Deep Neural Network (DNN) by extracting a rule-based model that approximates its decision boundary. Nevertheless, current DNN rule extraction methods that consider a DNN's latent space when extracting rules, known as decompositional algorithms, are either restricted to single-layer DNNs or intractable as the size of the DNN or data grows. In this paper, we address these limitations by introducing ECLAIRE, a novel polynomial-time rule extraction algorithm capable of scaling to both large DNN architectures and large training datasets. We evaluate ECLAIRE on a wide variety of tasks, ranging from breast cancer prognosis to particle detection, and show that it consistently extracts more accurate and comprehensible rule sets than the current state-of-the-art methods while using orders of magnitude less computational resources. We make all of our methods available, including a rule set visualisation interface, through the open-source REMIX library (https://github.com/mateoespinosa/remix).

[197]  arXiv:2111.12629 [pdf, other]
Title: WFDefProxy: Modularly Implementing and Empirically Evaluating Website Fingerprinting Defenses
Subjects: Cryptography and Security (cs.CR)

Tor, an onion-routing anonymity network, has been shown to be vulnerable to Website Fingerprinting (WF), which de-anonymizes web browsing by analyzing the unique characteristics of the encrypted network traffic. Although many defenses have been proposed, few have been implemented and tested in the real world; others were only simulated. Due to its synthetic nature, simulation may fail to capture the real performance of these defenses. To figure out how these defenses perform in the real world, we propose WFDefProxy, a general platform for WF defense implementation on Tor using pluggable transports. We create the first full implementation of three WF defenses: FRONT, Tamaraw and Random-WT. We evaluate each defense in both simulation and implementation to compare their results, and we find that simulation correctly captures the strength of each defense against attacks. In addition, we confirm that Random-WT is not effective in both simulation and implementation, reducing the strongest attacker's accuracy by only 7%.
We also found a minor difference in overhead between simulation and implementation. We analyze how this may be due to assumptions made in simulation regarding packet delays and queuing, or the soft stop condition we implemented in WFDefProxy to detect the end of a page load. The implementation of FRONT cost about 23% more data overhead than simulation, while the implementation of Tamaraw cost about 28% - 45% less data overhead. In addition, the implementation of Tamaraw incurred only 21% time overhead, compared to 51% - 242% estimated by simulation in previous work.

[198]  arXiv:2111.12631 [pdf, other]
Title: EAD: an ensemble approach to detect adversarial examples from the hidden features of deep neural networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the key challenges in Deep Learning is the definition of effective strategies for the detection of adversarial examples. To this end, we propose a novel approach named Ensemble Adversarial Detector (EAD) for the identification of adversarial examples, in a standard multiclass classification scenario. EAD combines multiple detectors that exploit distinct properties of the input instances in the internal representation of a pre-trained Deep Neural Network (DNN). Specifically, EAD integrates the state-of-the-art detectors based on Mahalanobis distance and on Local Intrinsic Dimensionality (LID) with a newly introduced method based on One-class Support Vector Machines (OSVMs). Although all constituting methods assume that the greater the distance of a test instance from the set of correctly classified training instances, the higher its probability to be an adversarial example, they differ in the way such distance is computed. In order to exploit the effectiveness of the different methods in capturing distinct properties of data distributions and, accordingly, efficiently tackle the trade-off between generalization and overfitting, EAD employs detector-specific distance scores as features of a logistic regression classifier, after independent hyperparameters optimization. We evaluated the EAD approach on distinct datasets (CIFAR-10, CIFAR-100 and SVHN) and models (ResNet and DenseNet) and with regard to four adversarial attacks (FGSM, BIM, DeepFool and CW), also by comparing with competing approaches. Overall, we show that EAD achieves the best AUROC and AUPR in the large majority of the settings and comparable performance in the others. The improvement over the state-of-the-art, and the possibility to easily extend EAD to include any arbitrary set of detectors, pave the way to a widespread adoption of ensemble approaches in the broad field of adversarial example detection.

[199]  arXiv:2111.12638 [pdf, other]
Title: Optimal Robust Exact Differentiation via Linear Adaptive Techniques
Subjects: Systems and Control (eess.SY)

The problem of differentiating a function with bounded second derivative in the presence of bounded measurement noise is considered. Performance limitations in terms of the smallest achievable worst-case differentiation error of causal and exact differentiators are shown. A robust exact differentiator is then constructed via the adaptation of a single parameter of a linear differentiator. It is demonstrated that the resulting differentiator is robust with respect to noise, that it instantaneously converges to the exact derivative in the absence of noise, and that it attains the smallest possible -- hence optimal -- upper bound on its differentiation error under noisy measurements. For practical realization in the presence of sampled measurements, a discrete-time realization is shown that achieves optimal asymptotic accuracy with respect to the noise and the sampling period.

[200]  arXiv:2111.12642 [pdf, other]
Title: A global quadratic speed-up for computing the principal eigenvalue of Perron-like operators
Authors: Dong Li, Jianan Li
Comments: 18 pages
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

We consider a new algorithm in light of the min-max Collatz-Wielandt formalism to compute the principal eigenvalue and the eigenvector (eigen-function) for a class of positive Perron-Frobenius-like operators. Such operators are natural generalizations of the usual nonnegative primitive matrices. These have nontrivial applications in PDE problems such as computing the principal eigenvalue of Dirichlet Laplacian operators on general domains. We rigorously prove that for general initial data the corresponding numerical iterates converge globally to the unique principal eigenvalue with quadratic convergence. We show that the quadratic convergence is sharp with compatible upper and lower bounds. We demonstrate the effectiveness of the scheme via several illustrative numerical examples.

[201]  arXiv:2111.12643 [pdf, other]
Title: SM3D: Simultaneous Monocular Mapping and 3D Detection
Comments: This paper is published on 2021 IEEE International Conference on Image Processing (ICIP 2021), this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mapping and 3D detection are two major issues in vision-based robotics, and self-driving. While previous works only focus on each task separately, we present an innovative and efficient multi-task deep learning framework (SM3D) for Simultaneous Mapping and 3D Detection by bridging the gap with robust depth estimation and "Pseudo-LiDAR" point cloud for the first time. The Mapping module takes consecutive monocular frames to generate depth and pose estimation. In 3D Detection module, the depth estimation is projected into 3D space to generate "Pseudo-LiDAR" point cloud, where LiDAR-based 3D detector can be leveraged on point cloud for vehicular 3D detection and localization. By end-to-end training of both modules, the proposed mapping and 3D detection method outperforms the state-of-the-art baseline by 10.0% and 13.2% in accuracy, respectively. While achieving better accuracy, our monocular multi-task SM3D is more than 2 times faster than pure stereo 3D detector, and 18.3% faster than using two modules separately.

[202]  arXiv:2111.12661 [pdf, ps, other]
Title: Analysing Statistical methods for Automatic Detection of Image Forgery
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image manipulation and forgery detection have been a topic of research for more than a decade now. New-age tools and large-scale social platforms have given space for manipulated media to thrive. These media can be potentially dangerous and thus innumerable methods have been designed and tested to prove their robustness in detecting forgery. However, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only with particular datasets. In this work, we analyze the issue of out-of-distribution generalisability of the current state-of-the-art image forgery detection techniques through several experiments. Our study focuses on models that utilise handcrafted features for image forgery detection. We show that the developed methods fail to perform well on cross-dataset evaluations and in-the-wild manipulated media. As a consequence, a question is raised about the current evaluation and overestimated performance of the systems under consideration. Note: This work was done during a summer research internship at ITMR Lab, IIIT-Allahabad under the supervision of Prof. Anupam Agarwal.

[203]  arXiv:2111.12663 [pdf, ps, other]
Title: PointPCA: Point Cloud Objective Quality Assessment Using PCA-Based Descriptors
Comments: 14 pages, 9 figures, 6 tables
Subjects: Multimedia (cs.MM)

With the increasing popularity of extended reality technology and the adoption of depth-enhanced visual data in information exchange and telecommunication systems, point clouds have emerged as a promising 3D imaging modality. Similarly to other types of content representations, visual quality predictors for point cloud data are vital for a wide range of applications, enabling perceptually optimized solutions from acquisition to rendering. Recent standardization activities on point cloud compression have urged the need for objective quality evaluation methods, driving the research community to the development of relevant algorithms. In this work, we complement existing approaches by proposing a new quality metric that compares local shape and appearance measurements between a reference and a distorted point cloud. To this aim, a large set of geometric and textural descriptors is defined, and the prediction accuracy of corresponding statistical features is evaluated in the context of quality assessment. Different combination strategies are examined, providing insights regarding the effectiveness of different metric designs. The performance of the proposed method is validated against subjectively-annotated datasets, showing better performance against state-of-the-art solutions in the majority of cases. A software implementation of the metric is made available here: https://github.com/cwi-dis/pointpca.

[204]  arXiv:2111.12664 [pdf, other]
Title: MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Self-supervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the state-of-the-art self-supervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This formulation not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the displacement of the feature vectors in the feature space. This helps us to get a mathematical insight into the working principle of contrastive learning. An additive $L_2$ regularizer is also used to prevent diverging of the feature vectors and to improve performance. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100. After only 250 epochs of pre-training, the proposed model achieves the best accuracy of 85.44\%, 60.75\%, 56.81\% on CIFAR-10, STL-10, CIFAR-100 datasets, respectively.

[205]  arXiv:2111.12665 [pdf, ps, other]
Title: Finite-Time Error Bounds for Distributed Linear Stochastic Approximation
Subjects: Machine Learning (cs.LG)

This paper considers a novel multi-agent linear stochastic approximation algorithm driven by Markovian noise and general consensus-type interaction, in which each agent evolves according to its local stochastic approximation process which depends on the information from its neighbors. The interconnection structure among the agents is described by a time-varying directed graph. While the convergence of consensus-based stochastic approximation algorithms when the interconnection among the agents is described by doubly stochastic matrices (at least in expectation) has been studied, less is known about the case when the interconnection matrix is simply stochastic. For any uniformly strongly connected graph sequences whose associated interaction matrices are stochastic, the paper derives finite-time bounds on the mean-square error, defined as the deviation of the output of the algorithm from the unique equilibrium point of the associated ordinary differential equation. For the case of interconnection matrices being stochastic, the equilibrium point can be any unspecified convex combination of the local equilibria of all the agents in the absence of communication. Both the cases with constant and time-varying step-sizes are considered. In the case when the convex combination is required to be a straight average and interaction between any pair of neighboring agents may be uni-directional, so that doubly stochastic matrices cannot be implemented in a distributed manner, the paper proposes a push-sum-type distributed stochastic approximation algorithm and provides its finite-time bound for the time-varying step-size case by leveraging the analysis for the consensus-type algorithm with stochastic matrices and developing novel properties of the push-sum algorithm.

[206]  arXiv:2111.12673 [pdf, other]
Title: Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Accurate value estimates are important for off-policy reinforcement learning. Algorithms based on temporal difference learning typically are prone to an over- or underestimation bias building up over time. In this paper, we propose a general method called Adaptively Calibrated Critics (ACC) that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We apply ACC to Truncated Quantile Critics, which is an algorithm for continuous control that allows regulation of the bias with a hyperparameter tuned per environment. The resulting algorithm adaptively adjusts the parameter during training rendering hyperparameter search unnecessary and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment. Additionally, we demonstrate that ACC is quite general by further applying it to TD3 and showing an improved performance also in this setting.

[207]  arXiv:2111.12675 [pdf, other]
Title: The Surprising Benefits of Hysteresis in Unlimited Sampling: Theory, Algorithms and Experiments
Comments: 24 pages
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The Unlimited Sensing Framework (USF) was recently introduced to overcome the sensor saturation bottleneck in conventional digital acquisition systems. At its core, the USF allows for high-dynamic-range (HDR) signal reconstruction by converting a continuous-time signal into folded, low-dynamic-range (LDR), modulo samples. HDR reconstruction is then carried out by algorithmic unfolding of the folded samples. In hardware, however, implementing an ideal modulo folding requires careful calibration, analog design and high precision. At the interface of theory and practice, this paper explores a computational sampling strategy that relaxes strict hardware requirements by compensating them via a novel, mathematically guaranteed recovery method. Our starting point is a generalized model for USF. The generalization relies on two new parameters modeling hysteresis and folding transients} in addition to the modulo threshold. Hysteresis accounts for the mismatch between the reset threshold and the amplitude displacement at the folding time and we refer to a continuous transition period in the implementation of a reset as folding transient. Both these effects are motivated by our hardware experiments and also occur in previous, domain-specific applications. We show that the effect of hysteresis is beneficial for the USF and we leverage it to derive the first recovery guarantees in the context of our generalized USF model. Additionally, we show how the proposed recovery can be directly generalized for the case of lower sampling rates. Our theoretical work is corroborated by hardware experiments that are based on a hysteresis enabled, modulo ADC testbed comprising off-the-shelf electronic components. Thus, by capitalizing on a collaboration between hardware and algorithms, our paper enables an end-to-end pipeline for HDR sampling allowing more flexible hardware implementations.

[208]  arXiv:2111.12677 [pdf, other]
Title: Topological and Algebraic Structures of the Space of Atanassov's Intuitionistic Fuzzy Values
Subjects: Artificial Intelligence (cs.AI)

We demonstrate that the space of intuitionistic fuzzy values (IFVs) with the linear order based on a score function and an accuracy function has the same algebraic structure as the one induced by the linear order based on a similarity function and an accuracy function. By introducing a new operator for IFVs via the linear order based on a score function and an accuracy function, we present that such an operator is a strong negation on IFVs. Moreover, we propose that the space of IFVs is a complete lattice and a Kleene algebra with the new operator. We also observe that the topological space of IFVs with the order topology induced by the above two linear orders is not separable and metrizable but compact and connected. From exactly new perspectives, our results partially answer three open problems posed by Atanassov [Intuitionistic Fuzzy Sets: Theory and Applications, Springer, 1999] and [On Intuitionistic Fuzzy Sets Theory, Springer, 2012]. Furthermore, we construct an isomorphism between the spaces of IFVs and q-rung orthopedic fuzzy values (q-ROFVs) under the corresponding linear orders. Meanwhile, we introduce the concept of the admissible similarity measures with particular orders for IFSs, extending the previous definition of the similarity measure for IFSs, and construct an admissible similarity measure with the linear order based on a score function and an accuracy function, which is effectively applied to a pattern recognition problem about the classification of building materials.

[209]  arXiv:2111.12678 [pdf, ps, other]
Title: Output Regulation by Postprocessing Internal Models for a Class of Multivariable Nonlinear Systems
Comments: The published version contains a few small rendering issues in some formulae. Here, these are corrected
Journal-ref: International Journal of Robust and Nonlinear Control, vol. 30, pp. 1115-1140, 2020
Subjects: Systems and Control (eess.SY)

In this paper we propose a new design paradigm, which employing a postprocessing internal model unit, to approach the problem of output regulation for a class of multivariable minimum-phase nonlinear systems possessing a partial normal form. Contrary to previous approaches, the proposed regulator handles control inputs of dimension larger than the number of regulated variables, provided that a controllability assumption holds, and can employ additional measurements that need not to vanish at the ideal error-zeroing steady state, but that can be useful for stabilization purposes or to fulfil the minimum-phase requirement. Conditions for practical and asymptotic output regulation are given, underlying how in postprocessing schemes the design of internal models is necessarily intertwined with that of the stabilizer.

[210]  arXiv:2111.12679 [pdf, other]
Title: Reinforcement Learning for General LTL Objectives Is Intractable
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable only if the formula is in the most limited class in the LTL hierarchy, consisting of only finite-horizon-decidable properties. Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for non-finite-horizon-decidable LTL objectives.

[211]  arXiv:2111.12680 [pdf, other]
Title: An XGBoost-Based Forecasting Framework for Product Cannibalization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Two major challenges in demand forecasting are product cannibalization and long term forecasting. Product cannibalization is a phenomenon in which high demand of some products leads to reduction in sales of other products. Long term forecasting involves forecasting the sales over longer time frame that is critical for strategic business purposes. Also, conventional methods, for instance, recurrent neural networks may be ineffective where train data size is small as in the case in this study. This work presents XGBoost-based three-stage framework that addresses product cannibalization and associated long term error propagation problems. The performance of the proposed three-stage XGBoost-based framework is compared to and is found superior than that of regular XGBoost algorithm.

[212]  arXiv:2111.12681 [pdf, other]
Title: VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Comments: Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A great challenge in video-language (VidL) modeling lies in the disconnection between fixed video representations extracted from image/video understanding models and downstream VidL data. Recent studies try to mitigate this disconnection via end-to-end training. To make it computationally feasible, prior works tend to "imagify" video inputs, i.e., a handful of sparsely sampled frames are fed into a 2D CNN, followed by a simple mean-pooling or concatenation to obtain the overall video representations. Although achieving promising results, such simple approaches may lose temporal information that is essential for performing downstream VidL tasks. In this work, we present VIOLET, a fully end-to-end VIdeO-LanguagE Transformer, which adopts a video transformer to explicitly model the temporal dynamics of video inputs. Further, unlike previous studies that found pre-training tasks on video inputs (e.g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling. Specifically, the original video frame patches are "tokenized" into discrete visual tokens, and the goal is to recover the original visual tokens based on the masked patches. Comprehensive analysis demonstrates the effectiveness of both explicit temporal modeling via video transformer and MVM. As a result, VIOLET achieves new state-of-the-art performance on 5 video question answering tasks and 4 text-to-video retrieval tasks.

[213]  arXiv:2111.12682 [pdf, other]
Title: A Formally-Verified Framework for Fair Synchronization in Kotlin Coroutines
Subjects: Programming Languages (cs.PL); Data Structures and Algorithms (cs.DS)

Writing concurrent code that is both correct and efficient is notoriously difficult: thus, programmers often prefer to use synchronization abstractions, which render code simpler and easier to reason about. Despite a wealth of work on this topic, there is still a gap between the rich semantics provided by synchronization abstractions in modern programming languages--specifically, fair FIFO ordering of synchronization requests and support for abortable operations--and frameworks for implementing such semantics correctly and efficiently. Supporting such semantics is critical given the rising popularity of constructs for asynchronous programming, such as coroutines, which abort frequently, and should be cheaper to suspend and resume compared to native threads.
We introduce a new framework called the CancellableQueueSynchronizer (CQS), which enables efficient fair and abortable implementations of fundamental synchronization primitives such as mutexes, semaphores, barriers, count-down-latches, and blocking pools. Our first contribution is algorithmic, as implementing both fairness and abortability efficiently at this level of generality is non-trivial. Importantly, all our algorithms come with formal proofs in the Iris framework for Coq. These proofs are modular, so it is easy to prove correctness for new primitives implemented on top of CQS. To validate practical impact, we integrated CQS into the Kotlin Coroutines library. Compared against Java's AbstractQueuedSynchronizer, the only practical abstraction to provide similar semantics, CQS shows significant improvements across all benchmarks, of up to two orders of magnitude. In sum, CQS is the first framework to combine expressiveness with formal guarantees and strong practical performance, and should be extensible to other languages and other families of synchronization primitives.

[214]  arXiv:2111.12685 [pdf, other]
Title: EgoRenderer: Rendering Human Avatars from Egocentric Camera Images
Comments: ICCV 2021. this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present EgoRenderer, a system for rendering full-body neural avatars of a person captured by a wearable, egocentric fisheye camera that is mounted on a cap or a VR headset. Our system renders photorealistic novel views of the actor and her motion from arbitrary virtual camera locations. Rendering full-body avatars from such egocentric images come with unique challenges due to the top-down view and large distortions. We tackle these challenges by decomposing the rendering process into several steps, including texture synthesis, pose construction, and neural image translation. For texture synthesis, we propose Ego-DPNet, a neural network that infers dense correspondences between the input fisheye images and an underlying parametric body model, and to extract textures from egocentric inputs. In addition, to encode dynamic appearances, our approach also learns an implicit texture stack that captures detailed appearance variation across poses and viewpoints. For correct pose generation, we first estimate body pose from the egocentric view using a parametric model. We then synthesize an external free-viewpoint pose image by projecting the parametric model to the user-specified target viewpoint. We next combine the target pose image and the textures into a combined feature image, which is transformed into the output color image using a neural image translation network. Experimental evaluations show that EgoRenderer is capable of generating realistic free-viewpoint avatars of a person wearing an egocentric camera. Comparisons to several baselines demonstrate the advantages of our approach.

[215]  arXiv:2111.12689 [pdf, other]
Title: A stacked deep convolutional neural network to predict the remaining useful life of a turbofan engine
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper presents the data-driven techniques and methodologies used to predict the remaining useful life (RUL) of a fleet of aircraft engines that can suffer failures of diverse nature. The solution presented is based on two Deep Convolutional Neural Networks (DCNN) stacked in two levels. The first DCNN is used to extract a low-dimensional feature vector using the normalized raw data as input. The second DCNN ingests a list of vectors taken from the former DCNN and estimates the RUL. Model selection was carried out by means of Bayesian optimization using a repeated random subsampling validation approach. The proposed methodology was ranked in the third place of the 2021 PHM Conference Data Challenge.

[216]  arXiv:2111.12690 [pdf, other]
Title: Automatic Mapping with Obstacle Identification for Indoor Human Mobility Assessment
Subjects: Robotics (cs.RO)

We propose a framework that allows a mobile robot to build a map of an indoor scenario, identifying and highlighting objects that may be considered a hindrance to people with limited mobility. The map is built by combining recent developments in monocular SLAM with information from inertial sensors of the robot platform, resulting in a metric point cloud that can be further processed to obtain a mesh. The images from the monocular camera are simultaneously analyzed with an object recognition neural network, tuned to detect a particular class of targets. This information is then processed and incorporated on the metric map, resulting in a detailed survey of the locations and bounding volumes of the objects of interest. The result can be used to inform policy makers and users with limited mobility of the hazards present in a particular indoor location. Our initial tests were performed using a micro-UAV and will be extended to other robotic platforms.

[217]  arXiv:2111.12696 [pdf, other]
Title: A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Existing deep learning-based human mesh reconstruction approaches have a tendency to build larger networks in order to achieve higher accuracy. Computational complexity and model size are often neglected, despite being key characteristics for practical use of human mesh reconstruction models (e.g. virtual try-on systems). In this paper, we present GTRS, a lightweight pose-based method that can reconstruct human mesh from 2D human pose. We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh. We demonstrate the efficiency and generalization of GTRS by extensive evaluations on the Human3.6M and 3DPW datasets. In particular, GTRS achieves better accuracy than the SOTA pose-based method Pose2Mesh while only using 10.2% of the parameters (Params) and 2.5% of the FLOPs on the challenging in-the-wild 3DPW dataset. Code will be publicly available.

[218]  arXiv:2111.12698 [pdf, other]
Title: Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations. It is an important step toward reducing laborious human supervision. Most existing works first pretrain a model on captioned images covering many novel classes and then finetune it on limited base classes with mask annotations. However, the high-level textual information learned from caption pretraining alone cannot effectively encode the details required for pixel-wise segmentation. To address this, we propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images. Thus, our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model. To account for noises in pseudo masks, we design a robust student model that selectively distills mask knowledge by estimating the mask noise levels, hence mitigating the adverse impact of noisy pseudo masks. By extensive experiments, we show the effectiveness of our framework, where we significantly improve mAP score by 4.5% on MS-COCO and 5.1% on the large-scale Open Images & Conceptual Captions datasets compared to the state-of-the-art.

[219]  arXiv:2111.12701 [pdf, other]
Title: Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Comments: 19 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements.

[220]  arXiv:2111.12702 [pdf, other]
Title: Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
Comments: Accepted to NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Chamfer Distance (CD) and Earth Mover's Distance (EMD) are two broadly adopted metrics for measuring the similarity between two point sets. However, CD is usually insensitive to mismatched local density, and EMD is usually dominated by global distribution while overlooks the fidelity of detailed structures. Besides, their unbounded value range induces a heavy influence from the outliers. These defects prevent them from providing a consistent evaluation. To tackle these problems, we propose a new similarity measure named Density-aware Chamfer Distance (DCD). It is derived from CD and benefits from several desirable properties: 1) it can detect disparity of density distributions and is thus a more intensive measure of similarity compared to CD; 2) it is stricter with detailed structures and significantly more computationally efficient than EMD; 3) the bounded value range encourages a more stable and reasonable evaluation over the whole test set. We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other. We can also use DCD as the training loss, which outperforms the same model trained with CD loss on all three metrics. In addition, we propose a novel point discriminator module that estimates the priority for another guided down-sampling step, and it achieves noticeable improvements under DCD together with competitive results for both CD and EMD. We hope our work could pave the way for a more comprehensive and practical point cloud similarity evaluation. Our code will be available at: https://github.com/wutong16/Density_aware_Chamfer_Distance .

[221]  arXiv:2111.12704 [pdf, other]
Title: Investigating Tradeoffs in Real-World Video Super-Resolution
Comments: Tech report, 14 pages, 14 figures. Code can be found at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The diversity and complexity of degradations in real-world video super-resolution (VSR) pose non-trivial challenges in inference and training. First, while long-term propagation leads to improved performance in cases of mild degradations, severe in-the-wild degradations could be exaggerated through propagation, impairing output quality. To balance the tradeoff between detail synthesis and artifact suppression, we found an image pre-cleaning stage indispensable to reduce noises and artifacts prior to propagation. Equipped with a carefully designed cleaning module, our RealBasicVSR outperforms existing methods in both quality and efficiency. Second, real-world VSR models are often trained with diverse degradations to improve generalizability, requiring increased batch size to produce a stable gradient. Inevitably, the increased computational burden results in various problems, including 1) speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the first tradeoff, we propose a stochastic degradation scheme that reduces up to 40\% of training time without sacrificing performance. We then analyze different training settings and suggest that employing longer sequences rather than larger batches during training allows more effective uses of temporal information, leading to more stable performance during inference. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Code, models, and the dataset will be made publicly available.

[222]  arXiv:2111.12705 [pdf, other]
Title: MixSyn: Learning Composition and Style for Multi-Source Image Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Synthetic images created by generative models increase in quality and expressiveness as newer models utilize larger datasets and novel architectures. Although this photorealism is a positive side-effect from a creative standpoint, it becomes problematic when such generative models are used for impersonation without consent. Most of these approaches are built on the partial transfer between source and target pairs, or they generate completely new samples based on an ideal distribution, still resembling the closest real sample in the dataset. We propose MixSyn (read as " mixin' ") for learning novel fuzzy compositions from multiple sources and creating novel images as a mix of image regions corresponding to the compositions. MixSyn not only combines uncorrelated regions from multiple source masks into a coherent semantic composition, but also generates mask-aware high quality reconstructions of non-existing images. We compare MixSyn to state-of-the-art single-source sequential generation and collage generation approaches in terms of quality, diversity, realism, and expressive power; while also showcasing interactive synthesis, mix & match, and edit propagation tasks, with no mask dependency.

[223]  arXiv:2111.12706 [pdf, ps, other]
Title: Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal
Subjects: Data Structures and Algorithms (cs.DS)

We study the problem of approximating edit distance in sublinear time. This is formalized as a promise problem $(k,k^c)$-Gap Edit Distance, where the input is a pair of strings $X,Y$ and parameters $k,c>1$, and the goal is to return YES if $ED(X,Y)\leq k$ and NO if $ED(X,Y)> k^c$. Recent years have witnessed significant interest in designing sublinear-time algorithms for Gap Edit Distance.
We resolve the non-adaptive query complexity of Gap Edit Distance, improving over several previous results. Specifically, we design a non-adaptive algorithm with query complexity $\tilde{O}(\frac{n}{k^{c-0.5}})$, and further prove that this bound is optimal up to polylogarithmic factors.
Our algorithm also achieves optimal time complexity $\tilde{O}(\frac{n}{k^{c-0.5}})$ whenever $c\geq 1.5$. For $1<c<1.5$, the running time of our algorithm is $\tilde{O}(\frac{n}{k^{2c-1}})$. For the restricted case of $k^c=\Omega(n)$, this matches a known result [Batu, Erg\"un, Kilian, Magen, Raskhodnikova, Rubinfeld, and Sami, STOC 2003], and in all other (nontrivial) cases, our running time is strictly better than all previous algorithms, including the adaptive ones.

[224]  arXiv:2111.12707 [pdf, other]
Title: MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Comments: open sourced
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hypotheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and whistles, its performance surpasses the previous best result by a large margin of 3% on Human3.6M. Code and models are available at https://github.com/Vegetebird/MHFormer.

[225]  arXiv:2111.12710 [pdf, other]
Title: PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper explores a better codebook for BERT pre-training of vision transformers. The recent work BEiT successfully transfers BERT pre-training from NLP to the vision field. It directly adopts one simple discrete VAE as the visual tokenizer, but has not considered the semantic level of the resulting visual tokens. By contrast, the discrete tokens in NLP field are naturally highly semantic. This difference motivates us to learn a perceptual codebook. And we surprisingly find one simple yet effective idea: enforcing perceptual similarity during the dVAE training. We demonstrate that the visual tokens generated by the proposed perceptual codebook do exhibit better semantic meanings, and subsequently help pre-training achieve superior transfer performance in various downstream tasks. For example, we achieve 84.5 Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming the competitive method BEiT by +1.3 with the same pre-training epochs. It can also improve the performance of object detection and segmentation tasks on COCO val by +1.3 box AP and +1.0 mask AP, semantic segmentation on ADE20k by +1.0 mIoU, The code and models will be available at \url{https://github.com/microsoft/PeCo}.

Cross-lists for Thu, 25 Nov 21

[226]  arXiv:2111.12138 (cross-list from eess.IV) [pdf, other]
Title: Multi-Modality Microscopy Image Style Transfer for Nuclei Segmentation
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

Annotating microscopy images for nuclei segmentation is laborious and time-consuming. To leverage the few existing annotations, also across multiple modalities, we propose a novel microscopy-style augmentation technique based on a generative adversarial network (GAN). Unlike other style transfer methods, it can not only deal with different cell assay types and lighting conditions, but also with different imaging modalities, such as bright-field and fluorescence microscopy. Using disentangled representations for content and style, we can preserve the structure of the original image while altering its style during augmentation. We evaluate our data augmentation on the 2018 Data Science Bowl dataset consisting of various cell assays, lighting conditions, and imaging modalities. With our style augmentation, the segmentation accuracy of the two top-ranked Mask R-CNN-based nuclei segmentation algorithms in the competition increases significantly. Thus, our augmentation technique renders the downstream task more robust to the test data heterogeneity and helps counteract class imbalance without resampling of minority classes.

[227]  arXiv:2111.12157 (cross-list from stat.ML) [pdf, other]
Title: Bayesian Sample Size Prediction for Online Activity
Comments: 10 pages, 7 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will subsequently participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.

[228]  arXiv:2111.12203 (cross-list from eess.AS) [pdf, other]
Title: KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing
Comments: MDX Workshop @ ISMIR 2021, 7 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex architecture shows outstanding performance, it usually requires numerous computing resources and time for training and evaluation. This paper proposes a two-stream neural network for music demixing, called KUIELab-MDX-Net, which shows a good balance of performance and required resources. The proposed model has a time-frequency branch and a time-domain branch, where each branch separates stems, respectively. It blends results from two streams to generate the final estimation. KUIELab-MDX-Net took second place on leaderboard A and third place on leaderboard B in the Music Demixing Challenge at ISMIR 2021. This paper also summarizes experimental results on another benchmark, MUSDB18. Our source code is available online.

[229]  arXiv:2111.12215 (cross-list from eess.IV) [pdf, other]
Title: Explainable multiple abnormality classification of chest CT volumes with AxialNet and HiResCAM
Comments: 25 pages, 7 figures, 6 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Understanding model predictions is critical in healthcare, to facilitate rapid verification of model correctness and to guard against use of models that exploit confounding variables. We introduce the challenging new task of explainable multiple abnormality classification in volumetric medical images, in which a model must indicate the regions used to predict each abnormality. To solve this task, we propose a multiple instance learning convolutional neural network, AxialNet, that allows identification of top slices for each abnormality. Next we incorporate HiResCAM, an attention mechanism, to identify sub-slice regions. We prove that for AxialNet, HiResCAM explanations are guaranteed to reflect the locations the model used, unlike Grad-CAM which sometimes highlights irrelevant locations. Armed with a model that produces faithful explanations, we then aim to improve the model's learning through a novel mask loss that leverages HiResCAM and 3D allowed regions to encourage the model to predict abnormalities based only on the organs in which those abnormalities appear. The 3D allowed regions are obtained automatically through a new approach, PARTITION, that combines location information extracted from radiology reports with organ segmentation maps obtained through morphological image processing. Overall, we propose the first model for explainable multi-abnormality prediction in volumetric medical images, and then use the mask loss to achieve a 33% improvement in organ localization of multiple abnormalities in the RAD-ChestCT data set of 36,316 scans, representing the state of the art. This work advances the clinical applicability of multiple abnormality modeling in chest CT volumes.

[230]  arXiv:2111.12272 (cross-list from stat.AP) [pdf, other]
Title: Causal Analysis and Prediction of Human Mobility in the U.S. during the COVID-19 Pandemic
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Since the increasing outspread of COVID-19 in the U.S., with the highest number of confirmed cases and deaths in the world as of September 2020, most states in the country have enforced travel restrictions resulting in sharp reductions in mobility. However, the overall impact and long-term implications of this crisis to travel and mobility remain uncertain. To this end, this study develops an analytical framework that determines and analyzes the most dominant factors impacting human mobility and travel in the U.S. during this pandemic. In particular, the study uses Granger causality to determine the important predictors influencing daily vehicle miles traveled and utilize linear regularization algorithms, including Ridge and LASSO techniques, to model and predict mobility. State-level time-series data were obtained from various open-access sources for the period starting from March 1, 2020 through June 13, 2020 and the entire data set was divided into two parts for training and testing purposes. The variables selected by Granger causality were used to train the three different reduced order models by ordinary least square regression, Ridge regression, and LASSO regression algorithms. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that the factors including the number of new COVID cases, social distancing index, population staying at home, percent of out of county trips, trips to different destinations, socioeconomic status, percent of people working from home, and statewide closure, among others, were the most important factors influencing daily VMT. Also, among all the modeling techniques, Ridge regression provides the most superior performance with the least error, while LASSO regression also performed better than the ordinary least square model.

[231]  arXiv:2111.12277 (cross-list from eess.AS) [pdf, other]
Title: One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. First, a speaker normalization module is adopted to remove speaker-related information in bottleneck features extracted by ASR. Second, we adopt weight regularization in the adaptation process to prevent over-fitting caused by using only one utterance from target speaker as training data. Finally, to comprehensively decouple the speech factors, i.e., content, speaker, style, and transfer source style to the target, a prosody module is used to extract prosody representation. Experiments show that our approach is superior to the state-of-the-art one-shot VC systems in terms of style and speaker similarity; additionally, our approach also maintains good speech quality.

[232]  arXiv:2111.12312 (cross-list from math.PR) [pdf, ps, other]
Title: Lossy Compression of General Random Variables
Subjects: Probability (math.PR); Information Theory (cs.IT)

This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in i) manifolds, namely, hyperspheres and Grassmannians, and ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.

[233]  arXiv:2111.12316 (cross-list from math.DS) [pdf, ps, other]
Title: A comment on stabilizing reinforcement learning
Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Systems and Control (eess.SY)

This is a short comment on the paper "Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation" by Vamvoudakis et al. The question of stability of reinforcement learning (RL) agents remains hard and the said work suggested an on-policy approach with a suitable stability property using a technique from adaptive control - a robustifying term to be added to the action. However, there is an issue with this approach to stabilizing RL, which we will explain in this note. Furthermore, Vamvoudakis et al. seems to have made a fallacious assumption on the Hamiltonian under a generic policy. To provide a positive result, we will not only indicate this mistake, but show critic neural network weight convergence under a stochastic, continuous-time environment, provided certain conditions on the behavior policy hold.

[234]  arXiv:2111.12451 (cross-list from physics.flu-dyn) [pdf, other]
Title: Geometrically reduced modelling of pulsatile flow in perivascular networks
Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA)

Flow of cerebrospinal fluid in perivascular spaces is a key mechanism underlying brain transport and clearance. In this paper, we present a mathematical and numerical formalism for reduced models of pulsatile viscous fluid flow in networks of generalized annular cylinders. We apply this framework to study cerebrospinal fluid flow in perivascular spaces induced by pressure differences, cardiac pulse wave-induced vascular wall motion and vasomotion. The reduced models provide approximations of the cross-section average pressure and cross-section flux, both defined over the topologically one-dimensional centerlines of the network geometry. Comparing the full and reduced model predictions, we find that the reduced models capture pulsatile flow characteristics and provide accurate pressure and flux predictions across the range of idealized and image-based scenarios investigated at a fraction of the computational cost of the corresponding full models. The framework presented thus provides a robust and effective computational approach for large scale in-silico studies of pulsatile perivascular fluid flow and transport.

[235]  arXiv:2111.12470 (cross-list from math.OC) [pdf, other]
Title: Combinatorial Optimization Problems with Balanced Regret
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

For decision making under uncertainty, min-max regret has been established as a popular methodology to find robust solutions. In this approach, we compare the performance of our solution against the best possible performance had we known the true scenario in advance. We introduce a generalization of this setting which allows us to compare against solutions that are also affected by uncertainty, which we call balanced regret. Using budgeted uncertainty sets, this allows for a wider range of possible alternatives the decision maker may choose from. We analyze this approach for general combinatorial problems, providing an iterative solution method and insights into solution properties. We then consider a type of selection problem in more detail and show that, while the classic regret setting with budgeted uncertainty sets can be solved in polynomial time, the balanced regret problem becomes NP-hard. In computational experiments using random and real-world data, we show that balanced regret solutions provide a useful trade-off for the performance in classic performance measures.

[236]  arXiv:2111.12482 (cross-list from stat.ML) [pdf, other]
Title: One More Step Towards Reality: Cooperative Bandits with Imperfect Communication
Journal-ref: Conference on Neural Information Processing Systems, 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.

[237]  arXiv:2111.12483 (cross-list from eess.IV) [pdf, other]
Title: LDP-Net: An Unsupervised Pansharpening Network Based on Learnable Degradation Processes
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Pansharpening in remote sensing image aims at acquiring a high-resolution multispectral (HRMS) image directly by fusing a low-resolution multispectral (LRMS) image with a panchromatic (PAN) image. The main concern is how to effectively combine the rich spectral information of LRMS image with the abundant spatial information of PAN image. Recently, many methods based on deep learning have been proposed for the pansharpening task. However, these methods usually has two main drawbacks: 1) requiring HRMS for supervised learning; and 2) simply ignoring the latent relation between the MS and PAN image and fusing them directly. To solve these problems, we propose a novel unsupervised network based on learnable degradation processes, dubbed as LDP-Net. A reblurring block and a graying block are designed to learn the corresponding degradation processes, respectively. In addition, a novel hybrid loss function is proposed to constrain both spatial and spectral consistency between the pansharpened image and the PAN and LRMS images at different resolutions. Experiments on Worldview2 and Worldview3 images demonstrate that our proposed LDP-Net can fuse PAN and LRMS images effectively without the help of HRMS samples, achieving promising performance in terms of both qualitative visual effects and quantitative metrics.

[238]  arXiv:2111.12491 (cross-list from math.OC) [pdf, other]
Title: Efficient semidefinite bounds for multi-label discrete graphical models
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

By concisely representing a joint function of many variables as the combination of small functions, discrete graphical models (GMs) provide a powerful framework to analyze stochastic and deterministic systems of interacting variables. One of the main queries on such models is to identify the extremum of this joint function. This is known as the Weighted Constraint Satisfaction Problem (WCSP) on deterministic Cost Function Networks and as Maximum a Posteriori (MAP) inference on stochastic Markov Random Fields. Algorithms for approximate WCSP inference typically rely on local consistency algorithms or belief propagation. These methods are intimately related to linear programming (LP) relaxations and often coupled with reparametrizations defined by the dual solution of the associated LP. Since the seminal work of Goemans and Williamson, it is well understood that convex SDP relaxations can provide superior guarantees to LP. But the inherent computational cost of interior point methods has limited their application. The situation has improved with the introduction of non-convex Burer-Monteiro style methods which are well suited to handle the SDP relaxation of combinatorial problems with binary variables (such as MAXCUT, MaxSAT or MAP/Ising). We compute low rank SDP upper and lower bounds for discrete pairwise graphical models with arbitrary number of values and arbitrary binary cost functions by extending a Burer-Monteiro style method based on row-by-row updates. We consider a traditional dualized constraint approach and a dedicated Block Coordinate Descent approach which avoids introducing large penalty coefficients to the formulation. On increasingly hard and dense WCSP/CFN instances, we observe that the BCD approach can outperform the dualized approach and provide tighter bounds than local consistencies/convergent message passing approaches.

[239]  arXiv:2111.12516 (cross-list from eess.AS) [pdf, other]
Title: LightSAFT: Lightweight Latent Source Aware Frequency Transform for Source Separation
Comments: MDX Workshop @ ISMIR 2021, 7 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separation models. This paper presents LightSAFT-Net, a lightweight version of LaSAFT-Net. As a baseline, it provided a sufficient SDR performance for comparison during the Music Demixing Challenge at ISMIR 2021. This paper also enhances the existing LightSAFT-Net by replacing the LightSAFT blocks in the encoder with TFC-TDF blocks. Our enhanced LightSAFT-Net outperforms the previous one with fewer parameters.

[240]  arXiv:2111.12521 (cross-list from math.OC) [pdf, other]
Title: Probabilistic Behavioral Distance and Tuning - Reducing and aggregating complex systems
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)

Given a complex system with a given interface to the rest of the world, what does it mean for a the system to behave close to a simpler specification describing the behavior at the interface? We give several definitions for useful notions of distances between a complex system and a specification by combining a behavioral and probabilistic perspective. These distances can be used to tune a complex system to a specification. We show that our approach can successfully tune non-linear networked systems to behave like much smaller networks, allowing us to aggregate large sub-networks into one or two effective nodes. Finally, we discuss similarities and differences between our approach and $H_\infty$ model reduction.

[241]  arXiv:2111.12526 (cross-list from stat.AP) [pdf]
Title: Mining Meta-indicators of University Ranking: A Machine Learning Approach Based on SHAP
Authors: Shudong Yang (1), Miaomiao Liu (1) ((1) Dalian University of Technology)
Comments: 4 pages, 1 figure
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Machine Learning (stat.ML)

University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the meta-indicators of the index system by simplifying the complexity? This research discovered three meta-indicators based on interpretable machine learning. The first one is time, to be friends with time, and believe in the power of time, and accumulate historical deposits; the second one is space, to be friends with city, and grow together by co-develop; the third one is relationships, to be friends with alumni, and strive for more alumni donations without ceiling.

[242]  arXiv:2111.12533 (cross-list from math.CO) [pdf, other]
Title: Tight bounds on the expected number of holes in random point sets
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Probability (math.PR)

For integers $d \geq 2$ and $k \geq d+1$, a $k$-hole in a set $S$ of points in general position in $\mathbb{R}^d$ is a $k$-tuple of points from $S$ in convex position such that the interior of their convex hull does not contain any point from $S$. For a convex body $K \subseteq \mathbb{R}^d$ of unit $d$-dimensional volume, we study the expected number $EH^K_{d,k}(n)$ of $k$-holes in a set of $n$ points drawn uniformly and independently at random from $K$.
We prove an asymptotically tight lower bound on $EH^K_{d,k}(n)$ by showing that, for all fixed integers $d \geq 2$ and $k\geq d+1$, the number $EH_{d,k}^K(n)$ is at least $\Omega(n^d)$. For some small holes, we even determine the leading constant $\lim_{n \to \infty}n^{-d}EH^K_{d,k}(n)$ exactly. We improve the currently best known lower bound on $\lim_{n \to \infty}n^{-d}EH^K_{d,d+1}(n)$ by Reitzner and Temesvari (2019). In the plane, we show that the constant $\lim_{n \to \infty}n^{-2}EH^K_{2,k}(n)$ is independent of $K$ for every fixed $k \geq 3$ and we compute it exactly for $k=4$, improving earlier estimates by Fabila-Monroy, Huemer, and Mitsche (2015) and by the authors (2020).

[243]  arXiv:2111.12541 (cross-list from astro-ph.IM) [pdf, other]
Title: Rethinking the modeling of the instrumental response of telescopes with a differentiable optical model
Comments: 10 pages. Accepted for the Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

We propose a paradigm shift in the data-driven modeling of the instrumental response field of telescopes. By adding a differentiable optical forward model into the modeling framework, we change the data-driven modeling space from the pixels to the wavefront. This allows to transfer a great deal of complexity from the instrumental response into the forward model while being able to adapt to the observations, remaining data-driven. Our framework allows a way forward to building powerful models that are physically motivated, interpretable, and that do not require special calibration data. We show that for a simplified setting of a space telescope, this framework represents a real performance breakthrough compared to existing data-driven approaches with reconstruction errors decreasing 5 fold at observation resolution and more than 10 fold for a 3x super-resolution. We successfully model chromatic variations of the instrument's response only using noisy broad-band in-focus observations.

[244]  arXiv:2111.12559 (cross-list from physics.comp-ph) [pdf, ps, other]
Title: Two step clustering for data reduction combining DBSCAN and k-means clustering
Subjects: Computational Physics (physics.comp-ph); Databases (cs.DB); Plasma Physics (physics.plasm-ph)

A novel combination of two widely-used clustering algorithms is proposed here for the detection and reduction of high data density regions. The Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used for the detection of high data density regions and the k-means algorithm for reduction. The proposed algorithm iterates while successively decrementing the DBSCAN search radius, allowing for an adaptive reduction factor based on the effective data density. The algorithm is demonstrated for a physics simulation application, where a surrogate model for fusion reactor plasma turbulence is generated with neural networks. A training dataset for the surrogate model is created with a quasilinear gyrokinetics code for turbulent transport calculations in fusion plasmas. The training set consists of model inputs derived from a repository of experimental measurements, meaning there is a potential risk of over-representing specific regions of this input parameter space. By applying the proposed reduction algorithm to this dataset, this study demonstrates that the training dataset can be reduced by a factor ~20 using the proposed algorithm, without a noticeable loss in the surrogate model accuracy. This reduction provides a novel way of analyzing existing high-dimensional datasets for biases and consequently reducing them, which lowers the cost of re-populating that parameter space with higher quality data.

[245]  arXiv:2111.12566 (cross-list from q-bio.QM) [pdf, other]
Title: Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy
Comments: Submitted to Speech Prosody 2022
Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the person, as well as intensity of PA. The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA. This paper presents the development of a Cantonese speech database that contains audio recordings of speech before and after physical exercises of different intensity levels. The corpus design and data collection process are described. Preliminary results of acoustical analysis are presented to illustrate the impact of PA on pitch level, pitch range, speaking and articulation rate, and time duration of pauses. It is also noted that the effect of PA is correlated to some of the PA and PL measures.

[246]  arXiv:2111.12574 (cross-list from astro-ph.IM) [pdf, other]
Title: Citation method, please? A case study in astrophysics
Authors: Alice Allen
Comments: 11 pages, 6 figures, 1 table
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)

Software citation has accelerated in astrophysics in the past decade, resulting in the field now having multiple trackable ways to cite computational methods. Yet most software authors do not specify how they would like their code to be cited, while others specify a citation method that is not easily tracked (or tracked at all) by most indexers. Two metadata file formats, codemeta.json and CITATION.cff, developed in 2016 and 2017 respectively, are useful for specifying how software should be cited. In 2020, the Astrophysics Source Code Library (ASCL, ascl.net) undertook a year-long effort to generate and send these software metadata files, specific to each computational method, to code authors for editing and inclusion on their code sites. We wanted to answer the question, "Would sending these files to software authors increase adoption of one, the other, or both of these metadata files?" The answer in this case was no. Furthermore, only 41% of the 135 code sites examined for use of these files had citation information in any form available. The lack of such information creates an obstacle for article authors to provide credit to software creators, thus hindering citation of and recognition for computational contributions to research and the scientists who develop and maintain software.

[247]  arXiv:2111.12641 (cross-list from quant-ph) [pdf, other]
Title: A Classical Algorithm Which Also Beats $\frac{1}{2}+\frac{2}π\frac{1}{\sqrt{D}}$ For High Girth MAX-CUT
Comments: 4 pages, 0 figures
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

We give a simple classical algorithm which provably achieves the performance in the title. The algorithm is a simple modification of the Gaussian wave process.

[248]  arXiv:2111.12649 (cross-list from math.OC) [pdf, ps, other]
Title: Global Output Feedback Stabilization of Semilinear Reaction-Diffusion PDEs
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper addresses the topic of global output feedback stabilization of semilinear reaction-diffusion PDEs. The semilinearity is assumed to be confined into a sector condition. We consider two different types of actuation configurations, namely: bounded control operator and right Robin boundary control. The measurement is selected as a left Dirichlet trace. The control strategy is finite dimensional and is designed based on a linear version of the plant. We derive a set of sufficient conditions ensuring the global exponential stabilization of the semilinear reaction-diffusion PDE. These conditions are shown to be feasible provided the order of the controller is large enough and the size of the sector condition in which the semilinearity is confined into is small enough.

[249]  arXiv:2111.12676 (cross-list from stat.CO) [pdf, other]
Title: Super-polynomial accuracy of one dimensional randomized nets using the median-of-means
Subjects: Computation (stat.CO); Numerical Analysis (math.NA); Statistics Theory (math.ST)

Let $f$ be analytic on $[0,1]$ with $|f^{(k)}(1/2)|\leq A\alpha^kk!$ for some constant $A$ and $\alpha<2$. We show that the median estimate of $\mu=\int_0^1f(x)\,\mathrm{d}x$ under random linear scrambling with $n=2^m$ points converges at the rate $O(n^{-c\log(n)})$ for any $c< 3\log(2)/\pi^2\approx 0.21$. We also get a super-polynomial convergence rate for the sample median of $2k-1$ random linearly scrambled estimates, when $k=\Omega(m)$. When $f$ has a $p$'th derivative that satisfies a $\lambda$-H\"older condition then the median-of-means has error $O( n^{-(p+\lambda)+\epsilon})$ for any $\epsilon>0$, if $k\to\infty$ as $m\to\infty$.

[250]  arXiv:2111.12683 (cross-list from physics.ao-ph) [pdf, other]
Title: Data-Based Models for Hurricane Evolution Prediction: A Deep Learning Approach
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

Fast and accurate prediction of hurricane evolution from genesis onwards is needed to reduce loss of life and enhance community resilience. In this work, a novel model development methodology for predicting storm trajectory is proposed based on two classes of Recurrent Neural Networks (RNNs). The RNN models are trained on input features available in or derived from the HURDAT2 North Atlantic hurricane database maintained by the National Hurricane Center (NHC). The models use probabilities of storms passing through any location, computed from historical data. A detailed analysis of model forecasting error shows that Many-To-One prediction models are less accurate than Many-To-Many models owing to compounded error accumulation, with the exception of $6-hr$ predictions, for which the two types of model perform comparably. Application to 75 or more test storms in the North Atlantic basin showed that, for short-term forecasting up to 12 hours, the Many-to-Many RNN storm trajectory prediction models presented herein are significantly faster than ensemble models used by the NHC, while leading to errors of comparable magnitude.

Replacements for Thu, 25 Nov 21

[251]  arXiv:1607.03943 (replaced) [pdf, ps, other]
Title: Generalized hybrid iterative methods for large-scale Bayesian inverse problems
Subjects: Numerical Analysis (math.NA)
[252]  arXiv:1612.05924 (replaced) [src]
Title: Asymmetric Hat Game with three players and three colors
Authors: Theo van Uem
Comments: it is now part of arXiv:1612.00276 (Ebert's asymmetric Hat Game)
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)
[253]  arXiv:1704.04244 (replaced) [src]
Title: General three person two color Hat Game
Authors: Theo van Uem
Comments: it is now part of arXiv:1612.00276 (Ebert's asymmetric Hat Game)
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)
[254]  arXiv:1802.00938 (replaced) [pdf, other]
Title: DeepProcess: Supporting business process execution using a MANN-based recommender system
Comments: Accepted at ICSOC 2021
Subjects: Neural and Evolutionary Computing (cs.NE)
[255]  arXiv:1807.07686 (replaced) [pdf, other]
Title: Exact minimum number of bits to stabilize a linear system
Comments: Extended version of the paper accepted to IEEE Transactions on Automatic Control
Journal-ref: IEEE Transactions on Automatic Control, Oct. 2022
Subjects: Systems and Control (eess.SY)
[256]  arXiv:1904.12218 (replaced) [pdf, other]
Title: Graph Kernels: A Survey
Journal-ref: Journal of Artificial Intelligence Research (2021), Volume 72, Pages 943-1027
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[257]  arXiv:1905.11968 (replaced) [pdf, ps, other]
Title: Chasing Convex Bodies Optimally
Authors: Mark Sellke
Subjects: Data Structures and Algorithms (cs.DS); Metric Geometry (math.MG)
[258]  arXiv:1907.02064 (replaced) [pdf]
Title: Accelerator-level Parallelism
Comments: 6 pages, 3 figures, & 7 references
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Performance (cs.PF); Programming Languages (cs.PL)
[259]  arXiv:1909.00426 (replaced) [pdf, other]
Title: Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[260]  arXiv:1910.02534 (replaced) [pdf, other]
Title: The CEO problem with inter-block memory
Journal-ref: IEEE Transactions on Information Theory, v. 67, No. 12, pp. 7752--7768, Dec. 2021
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
[261]  arXiv:1911.06442 (replaced) [pdf, ps, other]
Title: Weak Monotone Comparative Statics
Subjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
[262]  arXiv:2001.04194 (replaced) [pdf, other]
Title: Cascaded Coded Distributed Computing Schemes Based on Placement Delivery Arrays
Subjects: Information Theory (cs.IT)
[263]  arXiv:2002.09827 (replaced) [pdf, ps, other]
Title: A Formal Treatment of Contract Signature
Comments: This paper has been accepted to IEEE Transactions on Services Computing. Revisions to the previous version include expanded material on smart contracts in Section 9
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
[264]  arXiv:2003.04175 (replaced) [pdf, other]
Title: Phase Transition Analysis for Covariance Based Massive Random Access with Massive MIMO
Comments: Accepted in IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
[265]  arXiv:2004.09963 (replaced) [pdf, other]
Title: Structural clustering of volatility regimes for dynamic trading strategies
Comments: Accepted manuscript
Subjects: Statistical Finance (q-fin.ST); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Risk Management (q-fin.RM)
[266]  arXiv:2006.10679 (replaced) [pdf, other]
Title: REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust Predictions
Comments: WACV,2022. Project Page : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[267]  arXiv:2007.04171 (replaced) [pdf, other]
Title: Domain Adaptation with Auxiliary Target Domain-Oriented Classifier
Comments: Fix typos after CVPR 2021. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[268]  arXiv:2007.05046 (replaced) [pdf, other]
Title: RulePad: Interactive Authoring of Checkable Design Rules
Subjects: Software Engineering (cs.SE)
[269]  arXiv:2007.06226 (replaced) [pdf, other]
Title: AMITE: A Novel Polynomial Expansion for Analyzing Neural Network Nonlinearities
Comments: 13 pages, 2 tables, 9 figures, LaTeX; minor grammar updates, equation numbering, and exposition clarification updates
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[270]  arXiv:2009.01826 (replaced) [pdf, other]
Title: A Python Library for Exploratory Data Analysis on Twitter Data based on Tokens and Aggregated Origin-Destination Information
Subjects: Computation and Language (cs.CL)
[271]  arXiv:2009.09379 (replaced) [pdf, other]
Title: Exploring the Generalizability of Spatio-Temporal Crowd Flow Prediction: Meta-Modeling and an Analytic Framework
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[272]  arXiv:2009.09525 (replaced) [pdf, other]
Title: Deep Autoencoders: From Understanding to Generalization Guarantees
Journal-ref: R. Cosentino, R. Balestriero, R. Baraniuk, B. Aazhang, 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021)
Subjects: Machine Learning (cs.LG); Group Theory (math.GR); Machine Learning (stat.ML)
[273]  arXiv:2010.01184 (replaced) [pdf, other]
Title: Effective Sample Size, Dimensionality, and Generalization in Covariate Shift Adaptation
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
[274]  arXiv:2010.09145 (replaced) [pdf, other]
Title: MROS: Runtime Adaptation For Robot Control Architectures
Subjects: Robotics (cs.RO); Software Engineering (cs.SE)
[275]  arXiv:2010.11625 (replaced) [src]
Title: One-shot Distributed Algorithm for Generalized Eigenvalue Problem
Comments: The derivation of the bound in the proof of Theorem 1 contains some errors. And it cannot be resolved at this time.
Subjects: Machine Learning (cs.LG)
[276]  arXiv:2010.15764 (replaced) [pdf, other]
Title: Domain adaptation under structural causal models
Comments: 80 pages, 22 figures, accepted in JMLR
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[277]  arXiv:2011.04138 (replaced) [pdf, other]
Title: Posture Adjustment for a Wheel-legged Robotic System via Leg Force Control with Prescribed Transient Performance
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[278]  arXiv:2011.09468 (replaced) [pdf, other]
Title: Gradient Starvation: A Learning Proclivity in Neural Networks
Comments: Proceeding of NeurIPS 2021
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[279]  arXiv:2011.14814 (replaced) [pdf, other]
Title: Cost Function Unrolling in Unsupervised Optical Flow
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[280]  arXiv:2012.00968 (replaced) [pdf, other]
Title: Reconfigurable Intelligent Surfaces in Action for Non-Terrestrial Networks
Comments: 7 pages, 6 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[281]  arXiv:2012.04515 (replaced) [pdf, other]
Title: Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[282]  arXiv:2012.04946 (replaced) [pdf, other]
Title: Generating semantic maps through multidimensional scaling: linguistic applications and theory
Comments: 40 pages; pre-print; accepted for publication in Corpus Linguistics & Linguistic Theory (Nov. 2021)
Subjects: Computation and Language (cs.CL)
[283]  arXiv:2012.08729 (replaced) [pdf, ps, other]
Title: Data Trading with a Monopoly Social Network: Outcomes are Mostly Privacy Welfare Damaging
Comments: incrementally updated version to version in IEEE Networking Letters; This work is based upon results in NBER w26296
Subjects: Social and Information Networks (cs.SI)
[284]  arXiv:2012.12468 (replaced) [pdf, other]
Title: CN-Celeb: multi-genre speaker recognition
Comments: submitted to Speech Communication
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[285]  arXiv:2012.12471 (replaced) [pdf, other]
Title: A Principle Solution for Enroll-Test Mismatch in Speaker Recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[286]  arXiv:2012.13806 (replaced) [pdf, other]
Title: Time-Fluid Field-Based Coordination through Programmable Distributed Schedulers
Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC)
[287]  arXiv:2012.15197 (replaced) [pdf, other]
Title: SemGloVe: Semantic Co-occurrences for GloVe from BERT
Comments: 10 pages, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[288]  arXiv:2101.00304 (replaced) [pdf]
Title: Interval Type-2 Enhanced Possibilistic Fuzzy C-Means Clustering for Gene Expression Data Analysis
Subjects: Genomics (q-bio.GN); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[289]  arXiv:2101.03329 (replaced) [pdf, ps, other]
Title: Coupling a generative model with a discriminative learning framework for speaker verification
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[290]  arXiv:2101.11529 (replaced) [pdf, other]
Title: NTU-X: An Enhanced Large-scale Dataset for Improving Pose-based Recognition of Subtle Human Actions
Comments: First two authors contributed equally. Code repository at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[291]  arXiv:2101.11948 (replaced) [pdf]
Title: Choice modelling in the age of machine learning - discussion paper
Comments: 40 pages, 2 tables, 0 figures
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG)
[292]  arXiv:2102.03906 (replaced) [pdf, ps, other]
Title: Causal versions of Maximum Entropy and Principle of Insufficient Reason
Authors: Dominik Janzing
Comments: 16 pages
Journal-ref: Journal of Causal Inference (2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[293]  arXiv:2102.04525 (replaced) [pdf, other]
Title: Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[294]  arXiv:2102.06462 (replaced) [pdf, other]
Title: Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks
Comments: Including 15 page supplement
Subjects: Machine Learning (cs.LG)
[295]  arXiv:2102.08098 (replaced) [pdf, other]
Title: GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Comments: NeurIPS 2021, fixing typos
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[296]  arXiv:2102.09159 (replaced) [pdf, other]
Title: Robust and Differentially Private Mean Estimation
Comments: 58 pages, 2 figures, both exponential time and efficient algorithms no longer require a known bound on the true mean
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (stat.ML)
[297]  arXiv:2102.09788 (replaced) [pdf, other]
Title: Sequential- and Parallel- Constrained Max-value Entropy Search via Information Lower Bound
Comments: 39pages, 10 figures
Subjects: Machine Learning (cs.LG)
[298]  arXiv:2103.00782 (replaced) [pdf, other]
Title: Sparse Activity Detection in Multi-Cell Massive MIMO Exploiting Channel Large-Scale Fading
Comments: This is the final version published in IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)
[299]  arXiv:2103.02434 (replaced) [pdf]
Title: 5G New Radio for Public Safety Mission Critical Communications
Comments: 8 pages, 5 figures, 1 table, Accepted by IEEE Communications Standards Magazine
Subjects: Networking and Internet Architecture (cs.NI)
[300]  arXiv:2103.06342 (replaced) [pdf, other]
Title: Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
Comments: CVPR 2021. 22 pages, 10 figures, 11 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[301]  arXiv:2103.13581 (replaced) [pdf, other]
Title: EfficientTDNN: Efficient Architecture Search for Speaker Recognition
Comments: 13 pages, 12 figures, submitted to TASLP
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[302]  arXiv:2103.14799 (replaced) [pdf, other]
Title: A Survey of Orthogonal Moments for Image Representation: Theory, Implementation, and Evaluation
Comments: ACM Computing Surveys, Volume 55, Issue 1, January 2023, Article No 1, pp 1-35, this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[303]  arXiv:2104.06219 (replaced) [pdf, other]
Title: UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-identification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[304]  arXiv:2104.08013 (replaced) [pdf, other]
Title: Data-Driven 3D Reconstruction of Dressed Humans From Sparse Views
Comments: 3DV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[305]  arXiv:2105.05364 (replaced) [pdf, ps, other]
Title: A Hermite Method with a Discontinuity Sensor for Hamilton-Jacobi Equations
Subjects: Numerical Analysis (math.NA)
[306]  arXiv:2105.05458 (replaced) [pdf, other]
Title: Distributionally Robust Graph Learning from Smooth Signals under Moment Uncertainty
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)
[307]  arXiv:2105.05801 (replaced) [pdf, ps, other]
Title: SoK: Practical Foundations for Software Spectre Defenses
Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)
[308]  arXiv:2105.08291 (replaced) [pdf, other]
Title: Independent Asymmetric Embedding for Cascade Prediction on Social Networks
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[309]  arXiv:2105.10362 (replaced) [pdf, other]
Title: Functionals in the Clouds: An abstract architecture of serverless Cloud-Native Apps
Comments: improved version submitted to CCGrid_2022
Subjects: Computation and Language (cs.CL); Logic in Computer Science (cs.LO)
[310]  arXiv:2105.10598 (replaced) [pdf, other]
Title: Embracing New Techniques in Deep Learning for Estimating Image Memorability
Comments: 27 pages, 15 figures, Presented at the Proceedings of the Vision Sciences Society 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[311]  arXiv:2105.13889 (replaced) [pdf, other]
Title: Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
Comments: 12 page, 4 figures. accepted to Neurips 2021 Supplementary Material
Journal-ref: NeurIps 2021
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech)
[312]  arXiv:2105.14024 (replaced) [pdf, other]
Title: Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning
Authors: Scott Sussex (1), Andreas Krause (1), Caroline Uhler (2) ((1) Department of Computer Science, ETH Zürich, (2) Laboratory for Information & Decision Systems, Massachusetts Institute of Technology)
Comments: 10 pages, 2 figures, appendix, to be published in 35th Conference on Neural Information Processing Systems (NeurIPS 2021), fixed typos and clarified wording
Subjects: Machine Learning (cs.LG)
[313]  arXiv:2106.00058 (replaced) [pdf, other]
Title: PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[314]  arXiv:2106.03719 (replaced) [pdf, other]
Title: Incremental False Negative Detection for Contrastive Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[315]  arXiv:2106.03969 (replaced) [pdf, other]
Title: Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models
Comments: 49 pages, 3 figures, to appear in FOCS'21
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
[316]  arXiv:2106.04284 (replaced) [pdf, other]
Title: LLAMA: The Low-Level Abstraction For Memory Access
Comments: 40 pages, 10 figures, 11 listings
Subjects: Performance (cs.PF)
[317]  arXiv:2106.05234 (replaced) [pdf, other]
Title: Do Transformers Really Perform Bad for Graph Representation?
Journal-ref: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[318]  arXiv:2106.06935 (replaced) [pdf, other]
Title: Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG)
[319]  arXiv:2106.08827 (replaced) [pdf, other]
Title: JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[320]  arXiv:2106.10717 (replaced) [pdf, other]
Title: Strategies for convex potential games and an application to decision-theoretic online learning
Authors: Yoav Freund
Subjects: Machine Learning (cs.LG)
[321]  arXiv:2106.10933 (replaced) [pdf, ps, other]
Title: Semi-uniform Input-to-state Stability of Infinite-dimensional Systems
Authors: Masashi Wakaiki
Comments: 27 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[322]  arXiv:2106.14020 (replaced) [pdf, ps, other]
Title: An Improved Physical ZKP for Nonogram
Comments: This paper has appeared at COCOA 2021
Subjects: Computational Complexity (cs.CC); Cryptography and Security (cs.CR)
[323]  arXiv:2106.14472 (replaced) [pdf, other]
Title: Hyperbolic Busemann Learning with Ideal Prototypes
Comments: accepted at NeurIPS 2021 (35th Conference on Neural Information Processing Systems)
Subjects: Machine Learning (cs.LG)
[324]  arXiv:2106.15715 (replaced) [pdf, other]
Title: No Calm in The Storm: Investigating QAnon Website Relationships
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
[325]  arXiv:2107.07653 (replaced) [pdf, other]
Title: TAPEX: Table Pre-training via Learning a Neural SQL Executor
Comments: Work in progress, the project homepage is at this https URL, the code is released at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[326]  arXiv:2107.09507 (replaced) [pdf]
Title: EEG-based Cross-Subject Driver Drowsiness Recognition with an Interpretable Convolutional Neural Network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
[327]  arXiv:2107.10138 (replaced) [pdf, other]
Title: Secure Random Sampling in Differential Privacy
Subjects: Cryptography and Security (cs.CR)
[328]  arXiv:2107.10294 (replaced) [pdf, other]
Title: User-Centric Perspective in Random Access Cell-Free Aided by Spatial Separability
Comments: 14 pages, 8 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[329]  arXiv:2107.11665 (replaced) [pdf, other]
Title: Clinical Utility of the Automatic Phenotype Annotation in Unstructured Clinical Notes: ICU Use Cases
Comments: Manuscript under review
Subjects: Computation and Language (cs.CL)
[330]  arXiv:2107.14669 (replaced) [pdf, ps, other]
Title: Representing preorders with injective monotones
Subjects: Information Theory (cs.IT)
[331]  arXiv:2108.01115 (replaced) [pdf, other]
Title: Triangular body-cover model of the vocal folds with coordinated activation of the five intrinsic laryngeal muscles
Comments: Primitive version, 54 pages, 8 figures, 4 tables. The present manuscript has been submitted to the Journal of the Acoustical Society of America (JASA)
Subjects: Medical Physics (physics.med-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS); Biological Physics (physics.bio-ph)
[332]  arXiv:2108.01265 (replaced) [pdf, other]
Title: Memorize, Factorize, or be Naïve: Learning Optimal Feature Interaction Methods for CTR Prediction
Comments: Published in ICDE 2022
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[333]  arXiv:2108.01584 (replaced) [pdf, other]
Title: Numerical Solution of Stiff ODEs with Physics-Informed RPNNs
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[334]  arXiv:2108.02040 (replaced) [pdf, other]
Title: Convergence of gradient descent for learning linear neural networks
Comments: Minor changes
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[335]  arXiv:2108.02446 (replaced) [pdf, other]
Title: Finetuning Pretrained Transformers into Variational Autoencoders
Comments: Proceedings of the Second Workshop on Insights from Negative Results in NLP
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[336]  arXiv:2108.03227 (replaced) [pdf, other]
Title: Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images
Comments: 17 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[337]  arXiv:2108.03980 (replaced) [pdf]
Title: Decentralized Deep Learning for Multi-Access Edge Computing: A Survey on Communication Efficiency and Trustworthiness
Comments: 11 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[338]  arXiv:2108.08723 (replaced) [pdf, other]
Title: Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[339]  arXiv:2108.09640 (replaced) [pdf, other]
Title: DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
Comments: Accepted to ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[340]  arXiv:2108.10218 (replaced) [pdf]
Title: Analysis of Chronic Pain Experiences Based on Online Reports: the RRCP Dataset for quality-of-life assessment
Comments: 10 pages, 5 figures, 3 tables
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Quantitative Methods (q-bio.QM)
[341]  arXiv:2108.10573 (replaced) [pdf, other]
Title: The staircase property: How hierarchical structure can guide deep learning
Comments: 60 pages, accepted to NeurIPS '21
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[342]  arXiv:2108.13033 (replaced) [pdf, ps, other]
Title: Resource Allocation for Active IRS-Assisted Multiuser Communication Systems
Comments: 3 figures, submitted to Asilomar 2021
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[343]  arXiv:2109.01303 (replaced) [pdf, other]
Title: Self-supervised Multi-class Pre-training for Unsupervised Anomaly Detection and Segmentation in Medical Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[344]  arXiv:2109.01801 (replaced) [pdf, other]
Title: Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
Comments: ICCV 2021 (updated references in this version)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[345]  arXiv:2109.05735 (replaced) [pdf, ps, other]
Title: The decidability of the genus of regular languages and directed emulators
Comments: 35 pages
Subjects: Formal Languages and Automata Theory (cs.FL)
[346]  arXiv:2109.07346 (replaced) [pdf, other]
Title: Introducing an Abusive Language Classification Framework for Telegram to Investigate the German Hater Community
Subjects: Computation and Language (cs.CL)
[347]  arXiv:2109.08229 (replaced) [pdf, ps, other]
Title: Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling
Comments: Submitted to Econometrica
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)
[348]  arXiv:2109.08717 (replaced) [pdf]
Title: The Optimization of the Constant Flow Parallel Micropump Using RBF Neural Network
Comments: Accepted to International Conference on Robotics and Automation Engineering (ICRAE), 2021
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
[349]  arXiv:2109.10617 (replaced) [pdf, other]
Title: Solving Large Steiner Tree Problems in Graphs for Cost-Efficient Fiber-To-The-Home Network Expansion
Comments: Accepted at ICAART 2022, 10 pages, 18 figures
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[350]  arXiv:2109.11159 (replaced) [pdf, other]
Title: OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[351]  arXiv:2109.11939 (replaced) [pdf, other]
Title: Discovering PDEs from Multiple Experiments
Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[352]  arXiv:2109.12848 (replaced) [pdf, other]
Title: A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection
Comments: 15 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[353]  arXiv:2109.14501 (replaced) [pdf, other]
Title: Towards a theory of out-of-distribution learning
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[354]  arXiv:2109.15169 (replaced) [pdf, other]
Title: Variational learning of quantum ground states on spiking neuromorphic hardware
Comments: 14 pages, 7 figures
Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)
[355]  arXiv:2110.05428 (replaced) [pdf, other]
Title: Learning Temporally Causal Latent Processes from General Temporal Data
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[356]  arXiv:2110.06178 (replaced) [pdf, other]
Title: TAda! Temporally-Adaptive Convolutions for Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[357]  arXiv:2110.06532 (replaced) [pdf, ps, other]
Title: An Efficient Source Model Selection Framework in Model Databases
Subjects: Machine Learning (cs.LG); Databases (cs.DB)
[358]  arXiv:2110.09748 (replaced) [pdf, other]
Title: User Based Design and Evaluation Pipeline for Indoor Airships
Comments: Submitting to ICRA 2022
Subjects: Systems and Control (eess.SY)
[359]  arXiv:2110.11024 (replaced) [pdf, other]
Title: Watermarking Graph Neural Networks based on Backdoor Attacks
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[360]  arXiv:2110.15292 (replaced) [pdf, other]
Title: Class-wise Thresholding for Detecting Out-of-Distribution Data
Comments: 12 pages, 7 figures, 7 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[361]  arXiv:2110.15742 (replaced) [pdf, other]
Title: Barlow Graph Auto-Encoder for Unsupervised Network Embedding
Subjects: Machine Learning (cs.LG)
[362]  arXiv:2111.03380 (replaced) [pdf, other]
Title: Integral state-feedback control of linear time-varying systems: A performance preserving approach
Subjects: Systems and Control (eess.SY)
[363]  arXiv:2111.03977 (replaced) [pdf, other]
Title: A Virtual Reality Simulation Pipeline for Online Mental Workload Modeling
Comments: 7 pages, 4 figures, and 1 table Currently under review as a conference paper for IEEE VR 2022, v2 - Spelling Corrections
Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
[364]  arXiv:2111.03984 (replaced) [pdf, other]
Title: Proposing an Interactive Audit Pipeline for Visual Privacy Research
Comments: Extended version of IEEE BigData 2021 Short Paper, 14 pages, grammar edits
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
[365]  arXiv:2111.04398 (replaced) [pdf, other]
Title: Sub-realtime simulation of a neuronal network of natural density
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[366]  arXiv:2111.04805 (replaced) [pdf, other]
Title: Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression
Comments: 8 pages, 14 figures, IEEE conference format
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[367]  arXiv:2111.05070 (replaced) [pdf, other]
Title: Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions
Comments: Added a new upper bound
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Methodology (stat.ME); Machine Learning (stat.ML)
[368]  arXiv:2111.05177 (replaced) [pdf, other]
Title: On Training Implicit Models
Comments: 24 pages, 4 figures, in The 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG)
[369]  arXiv:2111.05225 (replaced) [pdf, ps, other]
Title: Helly systems and certificates in optimization
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC)
[370]  arXiv:2111.05792 (replaced) [pdf, other]
Title: HARPO: Learning to Subvert Online Behavioral Advertising
Comments: Accepted at NDSS'22
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[371]  arXiv:2111.06240 (replaced) [pdf, other]
Title: Improvements to short-term weather prediction with recurrent-convolutional networks
Authors: Jussi Leinonen
Comments: 6 pages, 4 figures. Accepted to the session "Bigdata Cup Challenges: IARAI's Weather4cast Competition" at IEEE Big Data Conference 2021
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
[372]  arXiv:2111.06366 (replaced) [pdf, ps, other]
Title: Answer Set Programming Made Easy
Subjects: Artificial Intelligence (cs.AI)
[373]  arXiv:2111.06628 (replaced) [pdf, other]
Title: Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash
Comments: 22 pages, 15 figures, 5 tables
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[374]  arXiv:2111.08283 (replaced) [pdf, other]
Title: Hierarchical Topometric Representation of 3D Robotic Maps
Comments: Temporarily
Journal-ref: Autonomous Robots (2021): 1-17
Subjects: Robotics (cs.RO)
[375]  arXiv:2111.09298 (replaced) [pdf, other]
Title: SeCGAN: Parallel Conditional Generative Adversarial Networks for Face Editing via Semantic Consistency
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[376]  arXiv:2111.09444 (replaced) [pdf, ps, other]
Title: Hypercontractivity on High Dimensional Expanders: a Local-to-Global Approach for Higher Moments
Comments: New title to distinguish from independent work of Gur, Lifshitz, and Liu
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[377]  arXiv:2111.10434 (replaced) [pdf, other]
Title: Machine Learning for Mechanical Ventilation Control (Extended Abstract)
Comments: Machine Learning for Health (ML4H) at NeurIPS 2021 - Extended Abstract. arXiv admin note: substantial text overlap with arXiv:2102.06779
Subjects: Machine Learning (cs.LG)
[378]  arXiv:2111.10491 (replaced) [pdf, other]
Title: Malicious Selling Strategies in Livestream Shopping: A Case Study of Alibaba's Taobao and ByteDance's Douyin
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
[379]  arXiv:2111.10520 (replaced) [pdf, other]
Title: StylePart: Image-based Shape Part Manipulation
Comments: 10 pages, Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[380]  arXiv:2111.10954 (replaced) [pdf, other]
Title: Generation Drawing/Grinding Trajectoy Based on Hierarchical CVAE
Comments: 7pages, 18figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[381]  arXiv:2111.11001 (replaced) [pdf]
Title: Easy construction of representations of multivariate functions with low-dimensional terms via Gaussian process regression kernel design
Comments: 8 pages, 1 figure, 2 tables
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an)
[382]  arXiv:2111.11025 (replaced) [pdf, other]
Title: On immersed boundary kernel functions: a constrained quadratic minimization perspective
Subjects: Numerical Analysis (math.NA)
[383]  arXiv:2111.11087 (replaced) [pdf, ps, other]
Title: Bayesian Inversion of Log-normal Eikonal Equations
Comments: fixed bbl errors on page 3, immediately before and after eq. (2.4)
Subjects: Numerical Analysis (math.NA)
[384]  arXiv:2111.11133 (replaced) [pdf, other]
Title: L-Verse: Bidirectional Generation Between Image and Text
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[385]  arXiv:2111.11148 (replaced) [pdf, ps, other]
Title: A Novel Randomized XR-Based Preconditioned CholeskyQR Algorithm
Comments: 23 pages, 11 figures, 6 tables
Subjects: Numerical Analysis (math.NA)
[386]  arXiv:2111.11426 (replaced) [pdf, other]
Title: Neural Fields in Visual Computing and Beyond
Comments: Equal advising: Vincent Sitzmann and Srinath Sridhar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[387]  arXiv:2111.11534 (replaced) [pdf, other]
Title: Poisoning Attacks to Local Differential Privacy Protocols for Key-Value Data
Comments: To appear in USENIX Security Symposium, 2022
Subjects: Cryptography and Security (cs.CR)
[388]  arXiv:2111.11646 (replaced) [pdf, other]
Title: CytoImageNet: A large-scale pretraining dataset for bioimage transfer learning
Comments: Accepted paper at NeurIPS 2021 Learning Meaningful Representations for Life (LMRL) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
[389]  arXiv:2111.11655 (replaced) [pdf, other]
Title: Multi-task manifold learning for small sample size datasets
Comments: 22 pages, 15 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[390]  arXiv:2111.11720 (replaced) [pdf]
Title: Gait Identification under Surveillance Environment based on Human Skeleton
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[391]  arXiv:2111.11723 (replaced) [pdf, ps, other]
Title: A new dynamical model for solving rotation averaging problem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392]  arXiv:2111.11750 (replaced) [pdf, ps, other]
Title: S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding
Comments: 2 pages
Subjects: Computation and Language (cs.CL)
[393]  arXiv:2111.11843 (replaced) [pdf, other]
Title: U-shape Transformer for Underwater Image Enhancement
Comments: 8 pages, 6 images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[394]  arXiv:2111.12000 (replaced) [pdf, other]
Title: Virtual element method for elliptic bulk-surface PDEs in three space dimensions
Comments: 24 pages, 4 figures, 1 table. This replacement adds a "Data availability" statement to the manuscript and fixes a capital letter in the bibliography. arXiv admin note: substantial text overlap with arXiv:2002.11748
Subjects: Numerical Analysis (math.NA)
[395]  arXiv:2111.12026 (replaced) [pdf, other]
Title: CINNAMON: A Module for AUTOSAR Secure Onboard Communication
Journal-ref: G. Bella, P. Biondi, G. Costantino and I. Matteucci, "CINNAMON: A Module for AUTOSAR Secure Onboard Communication," 2020 16th European Dependable Computing Conference (EDCC), 2020, pp. 103-110
Subjects: Cryptography and Security (cs.CR)
[396]  arXiv:2111.12027 (replaced) [pdf, other]
Title: Privacy and modern cars through a dual lens
Journal-ref: G. Bella, P. Biondi, M. D. Vincenzi and G. Tudisco, "Privacy and modern cars through a dual lens," 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2021, pp. 136-143
Subjects: Cryptography and Security (cs.CR)
[397]  arXiv:2111.12077 (replaced) [pdf, other]
Title: Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[ total of 397 entries: 1-397 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2111, contact, help  (Access key information)