1,721,005 research outputs found
A topic detection method for high dimensional datasets
Topics extraction from documents has become increasingly important due to its effectiveness in many tasks, including information retrieval, information filtering and organization of document collections in digital libraries. The Topic Detection consists to find the most significant topics within a document corpus. In this paper we explore the adoption of a methodology of feature ex- Traction and reduction to underline the most significant topics within a corpus. We used an approach based on a clustering algorithm (X-means) over the tf - idf matrix calculated starting from the corpus, by which we describe the frequency of terms, represented by the columns, that occur in each document, represented by a row. To extract the topics, we build n binary problems, where n is the numbers of clusters produced by an unsupervised clustering approach and we operate a supervised feature selection over them considering the top features as the topic descriptors. We will show the results obtained on two different corpora. Both collections are expressed in Italian: The first collection consists of documents of the University of Naples Federico II, the second one consists in a col- lection of medical records. Copyright © (2014) by Universita Reggio Calabria & Centro di Competenza (ICT-SUD) All rights reserved
An RDF-Based Framework for Semantic Indexing of Web Pages
Managing efficiently and effectively very large amount of digital documents requires the definition of indexes able to capture and express documents' semantics. In this work, we propose an RDF based framework for semantic indexing of web pages considering the related textual information. In particular, we propose to capture the semantic nature of a given document, commonly expressed in natural language, by retrieving a number of RDF triples and to semantically index the documents on the base of meaning of the triples' elements (i.e. subject, verb, object). Preliminary experiments are reported to evaluate the proposed index strategy
Recommendation of Multimedia Objects for Social Network Applications
Recommender systems help people in retrieving information that match their preferences by recommending products or services from a large number of candidates, and support people in making decisions in various contexts: what items to buy, which movie to watch or even who they can invite to their social network. They are especially useful in environments characterized by a vast amount of information, since they can e.ectively select a small subset of items that appear to fit the user's needs. We present the main points related to recommender systems using multimedia data, especially for social networks applications. We also describe, as an example, a framework developed at the University of Naples Federico II. It provides customized recommendations by originally combining intrinsic features of multimedia objects (low-level and semantic similarity), past behavior of individual users and overall behavior of the entire community of users, and eventually considering users' preferences and social interests
Inter-disciplinary and multi-level study of seismic liquefaction susceptibility in the coastal area of Casamicciola Terme (Ischia Island, Italy)
Combining syntactic and semantic vector space models in the health domain by using a clustering ensemble
The adoption of services for automatic information management is one of the most interesting open problems in various professional and social fields. We focus on the health domain characterized by the production of huge amount of documents, in which the adoption of innovative systems for information management can significantly improve the tasks performed by the actors involved and the quality of the health services offered. In this work we propose a methodology for automatic documents categorization based on the adoption of unsupervised learning techniques. We extracted both semantic and syntactic features in order to define the vector space models and proposed the use of a clustering ensemble in order to increase the discriminative power of our approach. Results on real medical records, digitalized by means of a state-of-the-art OCR technique, demonstrated the effectiveness of the proposed approach
- …
