1,721,033 research outputs found
University Students: Who are they? Results from a cluster analysis based on Cognitive and Personality Traits
Pesi e Metriche nell'Analisi dei Dati Testuali
The paper goes through some tools considered nowadays classical in Text Mining procedures and software. We are speaking of Latent Semantic Indexing for dimensionality reduction, and the wide literature devoted to the problem of how to weight the word importance, and how to measure similarities between words and between words and queries. Visualisation is strongly affected by these choices. Here we compare some alternatives from a statistical viewpoint. A corpus consisting of six years of the Italian edition of Le Monde Diplomatique is analysed in order to show the effects of the different weighting systems together with the potentiality of Textual Data Analysis in summarising and representing newspaper information
Visualization Techniques in Non Symmetrical Relationships
Many strategies of Text retrieval are based on Latent Semantic Indexing and its variations, by considering different weighting systems for words and documents, Correspondence Analysis and LSI share the basic algebraic tool, i.e. Singular Value decomposition and its generalisations, related to the use of different way for measuring the importance of each element, both in determining and representing similarities between documents and words. Aim of the paper is to propose a peculiar factorial approach for better visualising the relations between textual data and documents, compared with classical Correspondence Analysis. Here we consider a TF/IDF index scheme, mainly developed for Text Retrieval, in a textual data analysis context
What volunteers do? A textual analysis of voluntary activities in the Italian context
The complex phenomena of volunteering was mainly analyzed in economic literature with respect to its "economic value added", i.e the capability of this kind of activities to increase the level of productivity of some specific gods or services. In this paper, the point of view switches and voluntary organizations are analyzed as place of job market innovation, where new jobs arise and where people acquire new skills. Thus, volunteering can be thought as "social innovation" factor. In order to analyze the contents of voluntary works we use data coming from Istat survey "Multiscopo, Aspetti della vita quotidiana" (Multi-purposes survey, daily life aspects), for the year 2013. In our textual analysis, we use information included in the open answers given by people about the description of the tasks performed individually as volunteer. After stemming, lemmatization, and cleaning, data have been analyzed by means of Community Detection based on Semantic Network Analysis in order to discover patterns of jobs and through Correspondence Analysis on Generalized Aggregated Lexical Tables (CA-GALT) in order to discover profiles of volunteers. In particular, we look for differences given by gender, age, educational level, region of residence and type of voluntary association
BMS: An improved Dunn index for Document Clustering validation
Document Clustering aims at organizing a large quantity of unlabeled documents into a smaller number of meaningful and coherent clusters. One of the main unsolved problems in the literature is the lack of a reliable methodology to evaluate the results, although a wide variety of validation measures has been proposed. Validation measures are often unsatisfactory with numerical data, and even underperforming with textual data. Our attention focuses on the use of cosine similarity into the clustering process. A new measure based on the same criterion is here proposed. The effectiveness of the proposal is shown by an extensive comparative study
Text Analytics
Focusing on methodologies, applications and challenges of textual data analysis and related fields, this book gathers selected and peer-reviewed contributions presented at the 14th International Conference on Statistical Analysis of Textual Data (JADT 2018), held in Rome, Italy, on June 12-15, 2018. Statistical analysis of textual data is a multidisciplinary field of research that has been mainly fostered by statistics, linguistics, mathematics and computer science. The respective sections of the book focus on techniques, methods and models for text analytics, dictionaries and specific languages, multilingual text analysis, and the applications of text analytics. The interdisciplinary contributions cover topics including text mining, text analytics, network text analysis, information extraction, sentiment analysis, web mining, social media analysis, corpus and quantitative linguistics, statistical and computational methods, and textual data in sociology, psychology, politics, law and marketing
JADT'18 PROCEEDING OF THE 14TH INTERNATIONAL CONFERENCE ON STATISTICAL ANALYSIS OF TEXTUAL DATA
- …
