Search CORE

1,720,994 research outputs found

Threshold optimisation for multi-label classifiers

Author: FUMERA GIORGIO
ROLI FABIO
Pillai I
Publication venue
Publication date: 01/01/2013
Field of study

Many multi-label classifiers provide a real-valued score for each class. A well known design approach consists of tuning the corresponding decision thresholds by optimising the performance measure of interest. We address two open issues related to the optimisation of the widely used F measure and precision–recall (P–R) curve, with respect to the class-related decision thresholds, on a given data set. (i) We derive properties of the micro-averaged F, which allow its global maximum to be found by an optimisation strategy with a low computational cost. So far, only a suboptimal threshold selection rule and a greedy algorithm with no optimality guarantee were known. (ii) We rigorously define the macro- and micro-P–R curves, analyse a previously suggested strategy for computing them, based on maximising F, and develop two possible implementations, which can be also exploited for optimising related performance measures. We evaluate our algorithms on five data sets related to three different application domains

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

Spam filtering based on the analysis of text information embedded into images

Author: FUMERA GIORGIO
ROLI FABIO
Pillai I
Publication venue
Publication date: 01/01/2006
Field of study

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analysis of the semantic content of e-mails, due to their potentially higher generalisation capability with respect to manually derived classification rules used in current server-side filters. However, very recently spammers introduced a new trick consisting of embedding the spam message into attached images, which can make all current techniques based on the analysis of digital text in the subject and body fields of e-mails ineffective. In this paper we propose an approach to anti-spam filtering which exploits the text information embedded into images sent as attachments. Our approach is based on the application of state-of-the-art text categorisation techniques to the analysis of text extracted by OCR tools from images attached to e-mails. The effectiveness of the proposed approach is experimentally evaluated on two large corpora of spam e-mails

Archivio istituzionale della ricerca - Università di Genova

Multi-label classification with a reject option

Author: FUMERA GIORGIO
ROLI FABIO
Pillai I
Publication venue
Publication date: 01/01/2013
Field of study

We consider multi-label classification problems in application scenarios where classifier accuracy is not satisfactory, but manual annotation is too costly. In single-label problems, a well known solution consists of using a reject option, i.e., allowing a classifier to withhold unreliable decisions, leaving them (and only them) to human operators. We argue that this solution can be exploited also in multi-label problems. However, the current theoretical framework for classification with a reject option applies only to single-label problems. We thus develop a specific framework for multi-label ones. In particular, we extend multi-label accuracy measures to take into account rejections, and define manual annotation cost as a cost function. We then formalise the goal of attaining a desired trade-off between classifier accuracy on non-rejected decisions, and the cost of manually handling rejected decisions, as a constrained optimisation problem. We finally develop two possible implementations of our framework, tailored to the widely used F accuracy measure, and to the only cost models proposed so far for multi- label annotation tasks, and experimentally evaluate them on five application domains

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

F-measure optimisation in multi-label classifiers

Author: FUMERA GIORGIO
ROLI FABIO
Pillai I
Publication venue
Publication date: 01/01/2012
Field of study

When a multi-label classifier outputs a real-valued score for each class, a well known design strategy consists of tuning the corresponding decision thresholds by optimising the performance measure of interest on validation data. In this paper we focus on the F-measure, which is widely used in multi-label problems. We derive two properties of the micro-averaged F measure, viewed as a function of the threshold values, which allow its global maximum to be found by an optimisation strategy with an upper bound on computational complexity of O(n2N2), where N and n are respectively the number of classes and of validation samples. So far, only a suboptimal threshold selection rule and a greedy algorithm without any optimality guarantee were known for this task. We then devise a possible optimisation algorithm based on our strategy, and evaluate it on three benchmark, multi-label data sets

Archivio istituzionale della ricerca - Università di Genova

F-measure optimisation in multi-label classifiers

Author: FUMERA GIORGIO
ROLI FABIO
Pillai I
Publication venue
Publication date: 01/01/2012
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Evading SpamAssassin

Author: Satta R.
FUMERA GIORGIO
ROLI FABIO
BIGGIO BATTISTA
Pillai I
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

Evading SpamAssassin with obfuscated text images

Author: SATTA R
FUMERA G
BIGGIO B
PILLAI I
ROLI F
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Genova

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository