1,721,017 research outputs found
Recommended from our members
The Sample Complexity of Learning Dynamical Systems
Machine learning has emerged as a leading force in revolutionizing technology, education, and almost every aspect of our lives. Reinforcement learning is a sub-field of machine learning that deals with the effects of dynamic feedback and systems that interact with the environment. In these settings, classic statistical and algorithmic guarantees often do not hold because of non i.i.d. data, dynamic feedback, and distribution shift.We develop a framework for single trajectory learning of nonlinear dynamical systems using mixing arguments. Our main result studies the landscape of empirical risk minimization for learning nonlinear dynamical systems from a single trajectory, and provides uniform gradient convergence guarantee, which is combined with novel one-point convexity to facilitate the learning of nonlinear dynamical systems. Our proposed framework allows for non-convex loss landscape and our sample complexity and statistical error rates are optimal in terms of the trajectory length, dimensions of the system and input/noise strength.Next, we study the problem of learning bilinear dynamical systems from a single trajectory of the system’s states and inputs. Our main contribution is the application of martingale small-ball arguments to derive learning guarantees for non-mixing bilinear dynamical systems. We further extend our analysis to time varying dynamical systems by studying the problem of learning non-mixing Markov jump systems. Specifically, we learn the dynamics in each mode and the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system’s states, inputs, and modes. Our sample complexity and statistical error rates are optimal in terms of the trajectory length, the dimensions of the system and the input/noise strength.Lastly, as a preliminary to the problem of finding the best LTI dynamical system that can minimize least-squares loss given a single trajectory of an unknown dynamical system, we study the simpler problem of finding the best linear model in high dimensions, given a dataset. Specifically, we analyze projected gradient descent algorithm to estimate the population minimizer in the finite sample regime. We show that the nonlinearity of the problem can be treated as uncorrelated noise and establish linear convergence rate and data-dependent estimation error bounds for the projected gradient descent algorithm
Recommended from our members
The Role of Data Quality and Heterogeneity on the Calibration of Neural Networks
Neural networks have been widely studied and used in recent years due to its highclassification accuracy and training efficiency. With the increase of network depth, however,the models become worse calibrated, meaning they cannot reflect the true probabilities. Onthe other hand, in many applications such as medical diagnosis, facial recognition and selfdriving cars, the calibrated output probabilities are of critical importance. Therefore, theunderstanding of the cause of deep neural network uncalibration is of much concern.The influence of model structures on the output calibration has been explored.However, the impact of the training dataset quality and heterogeneity, such as dataset sizeand label noise remains unclear. In this thesis, the impact of data quality and heterogeneityon the output calibration is investigated theoretically and experimentally. Afterwards, thedefect of calibration methods using single global parameter are discussed. To overcomethe calibration issues resulting from the dataset heterogeneity, we propose an improvedcalibration technique that can give better performance
Recommended from our members
A-GWR: Fast and Accurate Geospatial Inference via Augmented Geographically Weighted Regression
Geographically Weighted Regression (GWR) is a seminal technique with rich applications in geospatial data analysis. However, it has critical drawbacks in the age of big data in terms of expressiveness,i.e., predictive power, and scalability. This work proposes Augmented GWR (A-GWR) that alleviates these drawbacks. A-GWRadapts a novel technique, Stateless-MGWR or S-MGWR, that en-riches the predictive power by allowing distinct bandwidths for individual features of the training data. S-MGWR uses a customized black-box optimization approach for discovering optimal band widths in a fast and efficient way. In addition, A-GWR modularly combines S-MGWR or GWR with versatile models such as random forest models. Moreover, A-GWR enables scalability by operating on flexible partitions of the data that can adapt to the computational budget. Our extensive evaluations on various real and synthetic datasets demonstrate the scalability and accuracy benefits of the proposed techniques over state-of-the-art competitors
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Understanding Language Models: Optimization, Architecture, and Emergent Abilities
The remarkable success of large language models (LLMs) has led to significant advances across a wide range of tasks. However, their underlying mechanisms remain poorly understood, largely due to the complexity of their architectures (e.g., Transformers, Mamba) and the intricate ways in which predictions depend on data relationships. This thesis aims to uncover the fundamental principles behind the effectiveness of LLMs.
A central focus of this work is the optimization behavior of attention mechanisms, the core computational component of Transformer architectures. Unlike traditional neural networks, attention allows models to capture rich dependencies across sequences through token-to-token interactions. This thesis investigates the underlying mechanism of attention by analyzing its optimization dynamics. We show that optimized attention behaves similarly to a Support Vector Machine (SVM), effectively separating important tokens from less relevant ones using linear constraints on token-pair outer products. These selected tokens contribute most significantly to model performance. We further extend this analysis to next-token prediction, where we theoretically prove that a similar implicit bias holds.
While softmax attention has demonstrated strong empirical performance, its quadratic time and memory complexity limits its efficiency. To address this, recent architectures such as linear attention, state-space models, and gated linear attention have been proposed, achieving near-linear complexity per token via recurrent formulations. In addition to analyzing softmax attention, this thesis studies the optimization landscapes of these efficient alternatives in the context of in-context learning (ICL). We show that they implicitly perform variants of gradient descent over the in-context demonstrations, treating them as training data. We also investigate the role of model depth in leveraging unlabeled data. Our analysis reveals that while single-layer architectures fail to benefit from unlabeled in-context examples, multi-layer attention models can effectively exploit them, highlighting the importance of depth in semi-supervised in-context learning. Beyond architectural differences, this thesis explores optimization behavior across diverse problem settings, including retrieval-augmented generation (RAG), LoRA adaptation, and multitask prompting, providing insights that align more closely with real-world applications.
Finally, we examine the emergent abilities of LLMs through both theoretical and empirical lenses. We formalize ICL as an algorithm learning problem, where the sequence model implicitly constructs a hypothesis function from the input prompt at inference time. We show numerically that sufficiently large, well-pretrained models can implement near-optimal algorithms. We also investigate chain-of-thought (CoT) reasoning, where models decompose complex tasks into simpler subproblems. We propose a two-stage interpretation of CoT: first, filtering and grouping relevant reasoning steps; second, performing in-context learning over each group. This framework explains the benefits of CoT reasoning in enhancing model expressivity and reducing in-context sample complexity.
Overall, this thesis aims to uncover the foundations of LLM effectiveness through the lens of optimization behavior, model architecture, and emergent capabilities.PhDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studie
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
