1,721,033 research outputs found

    Knowledge and social relatedness shape research portfolio diversification

    Full text link
    Scientific discovery is shaped by scientists’ choices and thus by their career patterns. The increasing knowledge required to work at the frontier of science makes it harder for an individual to embark on unexplored paths. Yet collaborations can reduce learning costs—albeit at the expense of increased coordination costs. In this article, we use data on the publication histories of a very large sample of physicists to measure the effects of knowledge and social relatedness on their diversification strategies. Using bipartite networks, we compute a measure of topic similarity and a measure of social proximity. We find that scientists’ strategies are not random, and that they are significantly affected by both. Knowledge relatedness across topics explains ≈ 10 % of logistic regression deviances and social relatedness as much as ≈ 30 % , suggesting that science is an eminently social enterprise: when scientists move out of their core specialization, they do so through collaborations. Interestingly, we also find a significant negative interaction between knowledge and social relatedness, suggesting that the farther scientists move from their specialization, the more they rely on collaborations. Our results provide a starting point for broader quantitative analyses of scientific diversification strategies, which could also be extended to the domain of technological innovation—offering insights from a comparative and policy perspective

    MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression

    No full text
    Recent advances in mathematical programming have made mixed integer optimization a competitive alternative to popular regularization methods for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. Here, we propose MIP-BOOST, a revision of standard mixed integer programming feature selection that reduces the computational burden of tuning the critical sparsity bound parameter and improves performance in the presence of feature collinearity and of signals that vary in nature and strength. The final outcome is a more efficient and effective L 0 feature selection method for applications of realistic size and complexity, grounded on rigorous cross-validation tuning and exact optimization of the associated mixed integer program. Computational viability and improved performance in realistic scenarios is achieved through three independent but synergistic proposals. Supplementary materials including additional results, pseudocode, and computer code are available online

    Towards Novel Statistical Methods for Anomaly Detection in Industrial Processes

    Full text link
    This paper presents a novel methodology based on first principles of statistics and statistical learning for anomaly detection in industrial processes and IoT environments. We present a 5-level analytical pipeline that cleans, smooths, and eliminates redundancies from the data, and identifies outliers as well as the features that contribute most to these anomalies. We show how smoothing can make our methodology less sensitive to short-lived anomalies that might be, e.g., due to sensor noise. We validate the methodology on a dataset freely available in the literature. Our results show that we can identify all anomalies in the considered dataset, with the ability of controlling the amount of false positives. This work is the result of a research project co-funded by the Tuscany Region and a company leader in the paper and nonwovens sector. Although the methodology was developed for this domain, we consider here a dataset from a different industrial sector. This shows that our methodology can be generalized to other contexts with similar constraints on limited resources, interpretability, time, and budget

    Process Mining Meets Statistical Model Checking: Towards a Novel Approach to Model Validation and Enhancement

    Full text link
    We propose a novel research line integrating Statistical Model Checking (SMC), a family of simulation-based analysis techniques from quantitative formal methods, with Process Mining (PM), a collection of data-driven process-oriented techniques. SMC and PM are complementary. SMC focuses on performing the right number of simulations to obtain statistically-reliable estimations (e.g., the probability of success of an attack). PM focuses on reconstructing a model of a system using logs of its traces. Nevertheless, both approaches aim at providing evidence of issues/guarantees of the system, and at proposing enhancements. We aim at enriching SMC by explaining why it produced specific estimates. This might help, e.g., identifying issues in the model (validation) or suggesting improvements (enhancement). Given that SMC uses statistics to decide what is the correct number of simulations (or traces), we avoid by-construction the complex issue of under-representation of system behavior in the logs crucial to many PM exercises. This work-in-progress paper demonstrates the proposed methodology and its usefulness using a simple example from the security threat modeling domain. We show how PM helps highlighting both mistakes in the model, and possibilities for improvement

    Investigating Functional Data Analysis for Wearable Physiological Sensor Data in Stress Evaluation

    Full text link
    Measuring stress level objectively is crucial for personalized health monitoring. While traditional methods require a clinical setting, wearables provide a valuable alternative. In this paper, we approach stress assessment as a regression task, focusing on stress exposure, and evaluate Functional Data Analysis (FDA) to extract richer information from physiological signals. We apply scalar-on-function regression and functional clustering to WESAD, a public dataset which contains signals from wearables and psychometric questionnaires that we use as a ground truth for stress. We compare the results obtained by applying FDA with those achieved by methods using features extracted from signals rather than the signals themselves. The comparison reveals that FDA excels in capturing signal variations and their association with stress, offering new insights into how this association changes with different stressful activities. While non-functional techniques suffice for some analyses, FDA is key to capture overtime patterns linked to stress levels

    Support vector machines categorize the scaling of human grip configurations

    Full text link
    In previous work (Cesari & Newell, 2002), we used a graphical dimensional analysis to show that grip transitions obey the body-scaled relation K = InLo + InMo/(a + bM(h) + cL(h)), where L-o and M-o are the object's length and mass, and L-h and M-h the length and mass of the grasper's hand. However, the generality of the equation was limited by the ad hoc graphical method that defined the lines for grip separation and by the assumption that these lines be negatively sloped and parallel to one another. This article reports an independent test of this relation by the geometrical and statistical categorization of body-scaled invariants for the transition of human grip configurations through support vector machines (SVMs). The SVM analysis confirmed the fit of linear, negatively sloped, and approximately parallel transition boundaries in the scaling of human grip configuration within a single hand. The SVM analysis has provided a theoretical refinement to the scaling model of human grip configurations
    corecore