1,720,982 research outputs found

    Student research abstract: A key-entity graph for clustering multichannel news

    No full text
    Social networks (SN) have gained a very important role in the dissemination of news, since they allow a greater share of news than web sites and are more timely to provide updates, publishing more updated versions of the same news on the same day. The use of a variety of communication media (or channels) stimulates the need for integration and analysis of the huge amount of information published globally. The scale and heterogeneity of these messages makes the analysis of news very challenging. This paper presents an in-progress research work: the definition of a tool for clustering news according to their topics in order to understand whether there are correlations between news published by different newspapers on the same channel or by the same newspaper on different channels. We started the implementation of a method [3] based on the Keygraph algorithm [4] in order to perform multichannel clustering of news according to their topics. In this paper, we extend the proposed method [3] by considering entities in addition to the keywords to detect topics. We argue that each event can be described by entities such as times, locations, persons, things and topics. Detecting entities in a news might improve the clustering results

    Verso soluzioni di sostenibilità e sicurezza per una città intelligente

    No full text
    Una città intelligente è un luogo in cui la tecnologia viene sfruttata per aiutare le amministrazioni pubbliche a prendere decisioni. La tecnologia può contribuire alla gestione di numerosi aspetti della vita quotidiana, offrendo ai cittadini servizi più affidabili e migliorando la qualità della vita. Tuttavia, la tecnologia da sola non è sufficiente per rendere una città intelligente; sono necessari metodi adeguati per analizzare i dati raccolti e gestirli in modo da generare informazioni utili. Alcuni esempi di servizi intelligenti sono le app che permettono di raggiungere una destinazione attraverso il percorso più breve oppure di trovare il parcheggio libero più vicino, o le app che suggeriscono i percorsi migliori per una passeggiata in base alla qualità dell'aria. Questa tesi si concentra su due aspetti delle smart city: sostenibilità e sicurezza. Il primo aspetto riguarda lo studio dell'impatto del traffico sulla qualità dell'aria attraverso lo sviluppo di una rete di sensori di traffico e qualità dell'aria e l'implementazione di una catena di modelli di simulazione. Questo lavoro fa parte del progetto TRAFAIR, cofinanziato dall'Unione Europea, il primo progetto che monitora la qualità dell'aria in tempo reale e fa previsioni su scala urbana in 6 città europee, tra cui Modena. Il progetto ha richiesto la gestione di una grande quantità di dati eterogenei e la loro integrazione su una piattaforma dati complessa e scalabile condivisa da tutti i partner del progetto. La piattaforma è un database PostgreSQL, adatto a gestire dati spazio-temporali, che contiene più di 60 tabelle e 435 GB di dati (solo per Modena). Tutti i processi della pipeline di TRAFAIR, le dashboard e le app sfruttano il database per ottenere i dati di input ed eventualmente memorizzare l'output. I modelli di simulazione, eseguiti su risorse di HPC, utilizzano i dati dei sensori e devono fornire risultati in tempo reale. Pertanto le tecniche di identificazione delle anomalie applicate ai dati dei sensori devono eseguire in tempo reale e in breve tempo. Dopo un attento studio della distribuzione dei dati dei sensori e della correlazione tra le misure, sono state implementate e applicate alcune tecniche di identificazione delle anomalie. Per i dati di traffico è stato sviluppato un nuovo approccio che utilizza un filtro di correlazione flusso-velocità, la decomposizione STL e l'analisi IQR. Per i dati di qualità dell'aria è stato creato un framework innovativo che implementa 3 algoritmi. I risultati degli esperimenti sono stati confrontati con quelli dell'Autoencoder LSTM. L'aspetto relativo alla sicurezza nella città intelligente è legato a un progetto di analisi dei crimini, i processi analitici volti a fornire informazioni tempestive e pertinenti per aiutare la polizia nella riduzione, prevenzione e valutazione del crimine. A causa della mancanza di dati ufficiali, questo progetto sfrutta le notizie pubblicate sui giornali online. L'obiettivo è quello di classificare le notizie in base alla categoria di crimine, geolocalizzare i crimini, identificare la data dell'evento, e individuare alcune caratteristiche. È stata sviluppata un'applicazione per l'analisi delle notizie, l'estrazione di informazioni semantiche attraverso l'uso di tecniche di NLP e la connessione delle entità a risorse Linked Data. La tecnologia dei Word Embedding è stata utilizzata per la categorizzazione del testo, mentre il Question Answering tramite BERT è stato utilizzato per estrarre le 5W+1H. Le notizie che si riferiscono allo stesso evento sono state identificate attraverso la cosine similarity sul testo delle notizie. Infine, è stata implementata un'interfaccia per mostrare su mappa i crimini geolocalizzati e fornire statistiche e rapporti annuali. Questo è l'unico progetto presente in Italia che partendo da notizie online cerca di fornire un'analisi sui crimini e la mette a disposizione attraverso uno strumento di visualizzazione.A smart city is a place where technology is exploited to help public administrations make decisions. The technology can contribute to the management of multiple aspects of everyday life, offering more reliable services to citizens and improving the quality of life. However, technology alone is not enough to make a smart city; suitable methods are needed to analyze the data collected by technology and manage them in such a way as to generate useful information. Some examples of smart services are the apps that allow to reach a destination through the least busy road route or to find the nearest parking slot, or the apps that suggest better paths for a walk based on air quality. This thesis focuses on two aspects of smart cities: sustainability and safety. The first aspect concerns studying the impact of vehicular traffic on air quality through the development of a network of traffic and air quality sensors, and the implementation of a chain of simulation models. This work is part of the TRAFAIR project, co-financed by the European Union, which is the first project with the scope of monitoring in real-time and predicting air quality on an urban scale in 6 European cities, including Modena. The project required the management of a large amount of heterogeneous data and their integration on a complex and scalable data platform shared by all the partners of the project. The data platform is a PostgreSQL database, suitable for dealing with spatio-temporal data, and contains more than 60 tables and 435 GB of data (only for Modena). All the processes of the TRAFAIR pipeline, the dashboards and the mobile apps exploit the database to get the input data and, eventually, store the output, generating big data streams. The simulation models, executed on HPC resources, use the sensor data and provide results in real-time (as soon as the sensor data are stored in the database). Therefore, the anomaly detection techniques applied to sensor data need to perform in real-time in a short time. After a careful study of the distribution of the sensor data and the correlation among the measurements, several anomaly detection techniques have been implemented and applied to sensor data. A novel approach for traffic data that employs a flow-speed correlation filter, STL decomposition and IQR analysis has been developed. In addition, an innovative framework that implements 3 algorithms for anomaly detection in air quality sensor data has been created. The results of the experiments have been compared to the ones of the LSTM autoencoder, and the performances have been evaluated after the calibration process. The safety aspect in the smart city is related to a crime analysis project, the analytical processes directed at providing timely and pertinent information to assist the police in crime reduction, prevention, and evaluation. Due to the lack of official data to produce the analysis, this project exploits the news articles published in online newspapers. The goal is to categorize the news articles based on the crime category, geolocate the crime events, detect the date of the event, and identify some features (e.g. what has been stolen during the theft). A Java application has been developed for the analysis of news articles, the extraction of semantic information through the use of NLP techniques, and the connection of entities to Linked Data. The emerging technology of Word Embeddings has been employed for the text categorization, while the Question Answering through BERT has been used for extracting the 5W+1H. The news articles referring to the same event have been identified through the application of cosine similarity to the shingles of the news articles' text. Finally, a tool has been developed to show the geolocalized events and provide some statistics and annual reports. This is the only project in Italy that starting from news articles tries to provide analyses on crimes and makes them available through a visualization tool

    Building an Urban Theft Map by Analyzing Newspaper Crime Reports

    Full text link
    One of the main issues in today's cities is related to public safety, which can be improved by implementing a systematic analysis for identifying and analyzing patterns and trends in crime also called crime mapping. Mapping crime allows police analysts to identify crime hot spots, moreover it increases public confidence and citizen engagement and promotes transparency.This paper is focused on analyzing and mapping thefts through on-line newspaper using text mining techniques for an Italian city

    Document-level event extraction from Italian crime news using minimal data

    Full text link
    Event extraction from unstructured text is a critical task in natural language processing, often requiring substantial annotated data. This study presents an approach to document-level event extraction applied to Italian crime news, utilizing large language models (LLMs) with minimal labeled data. Our method leverages zero-shot prompting and in-context learning to effectively extract relevant event information. We address three key challenges: (1) identifying text spans corresponding to event entities, (2) associating related spans dispersed throughout the text with the same entity, and (3) formatting the extracted data into a structured JSON. The findings are promising: LLMs achieve an F1-score of approximately 60% for detecting event-related text spans, demonstrating their potential even in resource-constrained settings. This work represents a significant advancement in utilizing LLMs for tasks traditionally dependent on extensive data, showing that meaningful results are achievable with minimal data annotation. Additionally, the proposed approach outperforms several baselines, confirming its robustness and adaptability to various event extraction scenarios

    Topic detection in multichannel Italian newspapers

    No full text
    Nowadays, any person, company or public institution uses and exploits different channels to share private or public information with other people (friends, customers, relatives, etc.) or institutions. This context has changed the journalism, thus, the major newspapers report news not just on its own web site, but also on several social media such as Twitter or YouTube. The use of multiple communication media stimulates the need for integration and analysis of the content published globally and not just at the level of a single medium. An analysis to achieve a comprehensive overview of the information that reaches the end users and how they consume the information is needed. This analysis should identify the main topics in the news flow and reveal the mechanisms of publication of news on different media (e.g. news timeline). Currently, most of the work on this area is still focused on a single medium. So, an analysis across different media (channels) should improve the result of topic detection. This paper shows the application of a graph analytical approach, called Keygraph, to a set of very heterogeneous documents such as the news published on various media. A preliminary evaluation on the news published in a 5 days period was able to identify the main topics within the publications of a single newspaper, and also within the publications of 20 newspapers on several on-line channels

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore