1,720,995 research outputs found

    Poster: LLMs for online customer reviews analysis: Oracles or tools? Experiments with GPT 3.5

    No full text
    Generative Large Language Models, pre-trained on a huge amount of human authored text, are showing emergent capabilities in understanding and accomplishing a variety of NLP and text comprehension tasks. Recently, interest is growing in understanding to what extent LLMs can support humans, or even replace them, in accomplishing non trivial data analysis tasks. In this paper, we explore OpenAI's GPT capabilities in online customer review analysis, a multi-step analysis activity which typically involves both human knowledge and predictive data analysis techniques (e.g. topic extraction, aspect-based sentiment analysis). We explore different interaction modalities where the LLM covers all or part of the analysis process, and provide a preliminary evaluation against human annotation outcomes

    Leveraging n-gram neural embeddings to improve deep learning DGA detection

    No full text
    Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain Generation Algorithm (DGA). Hence, the malware that has infected a particular host uses the same DGA to make DNS queries in order to establish a connection with the C&C server. The identification of "malicious" domain names used in DNS queries is therefore crucial for their detection. For this purpose, various machine learning techniques have been used, in particular, recently, deep learning techniques have proved especially effective. However, to get good results, these techniques require very large and labelled training datasets. Nevertheless, the construction of such datasets, decidedly with regard to the collection of malicious domain names, is a very difficult and nonscalable task. In this paper, therefore, we explore the possibility of exploiting unsupervised character n-gram embeddings to improve the performance of a Deep Learning DGA classifier. Embeddings are trained using a large dataset of benign names, opening up the possibility of using a small classifier training dataset requiring a small number of malicious names. A series of experiments, which use the same embedding for classifiers trained with datasets of increasing size, are then presented. These experiments show how the embedding is particularly effective for classifiers trained with small datasets having a small number of malicious names

    Measurement of stride time by machine learning: sensitivity analysis for the simplification of the experimental protocol

    No full text
    Limited stride-time variability is considered a marker of safe walking. Thus, the measurement of stride time is a meaningful information for gait analysis. The use of machine-learning (ML) techniques has been proven to be useful to this aim, even if the amount of data provided as input influences the computation process. The present study is aiming to analyze the sensitivity of the experimental protocol (number of sensors and signals) on the performance of a stride-time measurement system based on ML interpretation of surface EMG signals (sEMG). To this purpose, sEMG signals from ten leg muscles of 30 volunteers are used to train a single-layer neural network. Five experimental protocols (from five to one sEMG sensors per leg) are comparatively tested. Results show that reducing the sEMG-protocol complexity (less sensors utilized) is decreasing the prediction performances. Based on the test results, this study proposes an experimental protocol composed of two sEMG sensors per leg (over gastrocnemius lateralis and tibialis anterior), as the best compromise between the need of a simplified experimental set-up and the necessity of high performances (F1-score±SD = 99.0±1.2%; mean absolute value, MAE±SD = 17.9±4.3 ms). The use of only two sEMG probes is going to have a great impact on gait analysis, improving patient comfort and reducing clinical costs and time consumption. A possible, further reduction of experimental protocol to a single muscle (gastrocnemius lateralis) is feasible accepting a less efficient prediction of the stride-time

    Influence of EMG-signal processing and experimental set-up on prediction of gait events by neural network

    Full text link
    Machine-learning approaches are satisfactorily implemented for classifying and assessing gait events from only surface electromyographic (sEMG) signals during walking. However, it is acknowledged that the choice of sEMG-processing type may affect the reliability of methodologies based on it. Analogously, the number of sEMG signals involved in machine-learning procedure could influence the classification process. Aim of this study is to quantify the impact of different EMGsignal- processing specifications and/or different complexity of the experimental sEMG-protocol (different number of sEMG-sensors) on the performance of a neural-network-based approach for binary classifying gait phases and predicting gait-event timing. To this purpose, sEMG signals are collected from eight leg-muscles in about 10.000 strides from 23 healthy adults during walking and then fed to a multi-layer perceptron model. Four different signal-processing approaches are tested and five experimental set-ups (from four to one sEMG sensors per leg) are compared. Results indicate that both the choice of sEMG processing and the reduction of sEMG-protocol complexity actually affect classification/prediction performances. Moreover, the study succeeds in the double goal of identifying the linear envelope as the sEMG-processing type which reaches the best neural-network performance (classification accuracy of 93.4 ± 2.3 %; mean absolute error 21.6 ± 7.0 and 38.1 ± 15.2 ms for heel-strike/toe-off prediction, respectively) and providing a quantification of the progressive deterioration of classification/prediction performances with the reduction of the number of sensors used (from 93.4 ± 2.3%–79.9 ± 6.1 % for classification accuracy). These findings could be very useful for clinics to the aim of choosing the most suitable approach balancing technical performances, patient comfort, and clinical needs

    Social media analytics system for action inspection on social networks

    No full text
    Social networks are increasingly used for discussing all kinds of topics, including those related to politics, serving as a virtual arena. Consequently, analysing online conversations, for example, to predict election outcomes, is becoming a popular and challenging research area. On social networking sites, citizens express themselves spontaneously regarding political topics, often driven by specific events in social life. Real-time analysis of social media can provide valuable feedback and insights to both politicians and news agencies. In this paper, we discuss the design and implementation of a system for tracking and analysing social media. The SocMINT system provides an easy-to-use, visual dashboard to monitor the discussion on specific topics, to capture trends in communities and, by iteratively applying multidimensional data analysis and filtering, to deeply analyse posts and influencers. SocMINT aggregates data from multiple social sources and performs sentiment analysis on textual, visual and mixed content via a specifically designed neural network architecture. The system was applied in a real context by administrative staff of a political party to effectively analyse candidates’ political communication on Facebook, Instagram and Twitter and the related online community reactions and discussion. In the paper, we report on this real-world case study, showing how the system meaningfully captures trends in public opinion, comparing the main KPIs provided by SocMINT with the outcomes of traditional polls

    A Comparative Analysis of Datasets for Intrusion Detection in Software-Defined Networks

    No full text
    Software-Defined Networking (SDN) offers centralized management, programmability, flexibility and scalability but has significant security risks, especially DDoS attacks against the SDN controller, threatening network availability. Machine learning (ML) and deep learning (DL) show promise in mitigating these threats, but their success depends on available datasets quality. Existing SDN datasets often focus narrowly on specific DDoS scenarios or synthetic environments, limiting their real-world applicability. This paper analyzes SDN threats datasets, evaluating their methodologies, features and ML applications. It highlights strengths like realistic traffic emulation and accessibility, alongside limitations such as narrow attack coverage and synthetic biases. A roadmap is proposed to guide the generation of new datasets, emphasizing diverse attacks, richer features, realistic augmentation and public access to enable robust ML/DL-based SDN security solutions

    A deep learning approach to EMG-based classification of gait phases during level ground walking

    Full text link
    Correctly identifying gait phases is a prerequisite to achieve a spatial/temporal characterization of muscular recruitment during walking. Literature reported few machine-learning-based approaches for gait-phases classification from surface electromyographic (sEMG) signal during treadmill walking. To our knowledge, no attempts were made during ground walking in daily-life conditions. A methodology for classification of stance/swing and prediction of foot-floor-contact signal during ground walking in conditions similar to daily life is proposed here, based on the application of Multi-Layer Perceptron models to sEMG signal alone. sEMG were acquired from eight lower-limb muscles in about 13.000 strides from 23 healthy adults, during ground walking, following an eight-shaped path including natural deceleration, reversing, curve, and acceleration. Classification and prediction accuracy were tested vs. the ground truth, represented by the basographic signal provided by three foot-switches, through samples not used in the learning phase, coming from both the same group of subjects used to generate the learning set (LS-Test) and brand-new subjects (unlearned, US). Results showed an average classification accuracy (± SD) over 23 folds of 94.9 ± 0.3 for LS-test and 93.4 ± 2.3 for US. Prediction of foot-floor-contact signal was quantified in terms of timing of heel strike and toe off: mean (over ten folds) absolute difference between predictions and footswitch data for UL was 15 ± 17 ms and 36 ± 22 ms for heel-strike and toe off, respectively. The suitable performance achieved by the proposed method suggests that it could be successfully used to automatically classify gait phases and predict foot-floor-contact signal from sEMG signals during ground walking in daily-life conditions

    Feature selection in ML-based SDN intrusion detection system

    No full text
    Within the branch of Software-Defined Networking (SDN), research in Cyber Security has underscored the pressing need to combat cyber-attacks. These crimes include the unauthorized access and manipulation of critical data, jeopardizing user confidentiality, authenticity, and system integrity. To address these challenges, the deployment of Intrusion Detection Systems (IDS) has become paramount. These systems play a crucial role in safeguarding both the SDN infrastructure and its users. IDSs operate much like classification systems, making them suitable for the application of machine learning techniques in identifying intrusions. These techniques rely on labeled datasets to train the system to differentiate between benign and malicious events based on various features. Once trained, the system can categorize new events as benign or malicious. Therefore, identifying which features are relevant for classification purposes is crucial. In the current literature, few studies have focused on the effectiveness of IDSs applied to SDNs. The performance evaluation of IDSs based on machine learning techniques within SDN environments involves the development of specialized datasets, comprising network traffic features essential for discerning attack patterns. Moreover, as the landscape of network attacks within SDN evolves, there arises a need for continuously updated datasets to evaluate IDS effectiveness. This paper aims to investigate which features are relevant to detect the most common attack types in an SDN. To do this, labeled datasets of network traffic in an SDN must be available. Unfortunately, to the best of our knowledge, there is only one publicly available dataset for SDN traffic: InSDN. In this paper, we present the result of a feature selection process on the InSDN dataset, based on the SHAP toolset, aimed at identifying the most relevant features for different types of attacks. We also compare the performances of different classification algorithms trained on both the full dataset and the reduced one, showing that, for many attack types, the classifiers performances are comparable

    Prediction of stride duration by neural-network interpretation of surface EMG signals

    No full text
    Measuring stride duration as a marker of regular walking is a relevant issue, also in the modern gait analysis. The present project was designed to test the hypothesis that an artificial-neural-network approach is able to provide a reliable prediction of stride, stance, and swing duration, based on the analysis of only EMG signals acquired during able-bodied walking. To this objective, surface EMG signals from ten leg muscles of 23 adult subjects are used to train a multi-layer perceptron model. Performance of classifiers is tested vs. gold standard, represented by foot-floor-contact signals measured by means of three footswitches positioned under each foot. Outcomes indicate an accurate prediction of stride duration (mean absolute value, MAE ± SD = 18.1 ± 6.2 ms), stance duration (MAE ± SD = 29.2 ± 10.3 ms), and swing duration (MAE ± SD = 28.8 ± 9.6 ms), at least comparable to those reported in IMU-based studies. A significant contribution of this approach is that only sEMG signals (and no further data) during patient walking are needed to get the gait durations, after training the neural network. This contributes to reduce the costs of the test, the clinical time-wasting, and the invasiveness of instrumentation worn by the patient, making this approach very suitable especially for the clinical analysis of neuromuscular disorders where the evaluation of muscular recruitment is recommended

    A General Approach to Uniformly Handle Different String Metrics Based on Heterogeneous Alphabets

    No full text
    In the last few years, we have assisted in a great increase of the usage of strings in the most disparate areas. In the meantime, the development of the Internet has brought the necessity of managing strings from very different contexts and possibly using different alphabets. This issue is not addressed by the numerous string comparison metrics previously proposed in the literature. In this paper, we aim at providing a contribution in this context. In fact, first we propose an approach to measure the similarity of strings based on different alphabets. Then we show that our approach can be specifically adapted to several classic string comparison metrics and that each specialization can lead to addressing completely different issues
    corecore