1,721,088 research outputs found
Judgment attribution in IMDb
<p dir="ltr">This research uses topic models to obtain author representations in order to identify authors of anonymous texts.</p>
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Non-intrusive user modelling and behaviour prediction in museums
This thesis investigates non-intrusive user modelling techniques for predicting museum visitors' movements and interests in exhibits. Our research is motivated by the need to provide automated support to visitors of museums. Such support is needed as visitors can be overwhelmed by the vast amount of information in museum spaces, making it difficult to select personally interesting exhibits. To assist visitors in this selection process, computer-based technology can process non-intrusive observations of visitors' movements in the physical museum to provide input to our models. Our models in turn will eventually enable personalised exhibit recommendations based on the predictions they generate. The physicality of the museum domain poses practical challenges for developing predictive user models. For example, datasets of visitor pathways through a museum are difficult to obtain prior to deploying positioning technology in the physical museum space. However, such datasets are necessary to assess different modelling techniques. This thesis describes an approach for computer-supported semi-automated collection of visitor pathways by observationally tracking visitors in a museum. We used this approach to conduct a data collection at Melbourne Museum (Melbourne, Australia). The resultant dataset of 158 complete visit trajectories serves as a basis for evaluating our user models. For predicting visitor pathways, we discuss distance-based transition models derived from the spatial layout of the museum, and develop frequency-based transition models derived from non-intrusive observations of other visitors' previous movements. These models are then used to predict a visitor's next few most likely exhibits as a ranked set and sequence. Our results show that the frequency-based models mostly outperform the distance-based baselines, which suggests that other people's movements are better predictors of a visitor's movements than the spatial layout of the museum. Additionally, our results indicate that sequence-based prediction outperforms set-based prediction when predicting more than one next exhibit, which suggests that sequence information aids prediction. To measure interest, we transform a visitor's previous viewing durations at museum exhibits into implicit exhibit ratings. These ratings serve as input to two nearest-neighbour collaborative filters and two content-based models for interest prediction. We also develop an interest model based on the theory of spatial processes, which models visitors' rating vectors as independent Gaussian random vectors, but shares the mean vector and exhibit-to-exhibit covariance matrix across visitors. This covariance matrix has a special structure, which requires a notion of distance between exhibits. We develop models of museum exhibit distance derived from viewing-time similarity, semantic similarity, and walking distance. Our results suggest that utilising walking and semantic distances between exhibits enables more accurate predictions of a visitor's interests in unseen exhibits than using distances derived from observed exhibit viewing times. Our evaluation also shows that content-based interest prediction yields better results than nearest-neighbour collaborative prediction, and that our model based on spatial processes attains the highest predictive accuracy overall. We also explore ways of improving the performance of our pathway and interest models by means of model hybridisation: (1) we incorporate a visitor's interests in exhibits into one of our models for pathway prediction; and (2) propose a generic user- and item-aware weighting scheme for linearly combining predictive user models, which is used to combine two variants of our interest model based on spatial processes. Personalising the museum experience is a challenging task, as predictions differ from recommendations (we do not want to recommend exhibits that visitors are going to see anyway). This is in contrast to traditional recommender systems for the virtual domain, where predictions regarding a user's interests directly determine the ranking of items and recommendations. To round off the thesis, we suggest an approach for generating interesting exhibit recommendations based on the predictions of our models. This approach compares the exhibits predicted to be of interest to a visitor (generated by our interest models) with a prediction of the visitor's short-term pathway through the museum (generated by our pathway models), and supports the recommendation of personally interesting exhibits that are not going to be seen immediately if the predicted pathway is followed. The key contributions of this thesis are as follows: - A computer-supported approach for recording, visualising and analysing the movements and viewing behaviour of museum visitors - Models for predicting visitors' next few most likely exhibits from non-intrusive observations of the visitors' previous movements through the museum - Models for predicting visitors' interests in exhibits from non-intrusive observations of the visitors' previous viewing behaviour in the museum - Ways of improving predictive accuracy by means of model hybridisatio
Sentiment analysis under resource constraints
Sentiment Analysis (SA) deals with the detection of sentiment of a textual content from a speaker’s perspective. Both supervised and unsupervised approaches exist for this task. Previous studies show that supervised approaches perform better than unsupervised approaches. However, supervised approaches heavily depend on the availability of training data. We present two resource constraints with respect to training data for SA, one in the language of operation and the other in the domain of operation. In this thesis, we propose approaches which can alleviate the problems caused by these constraints. Majority research on SA are in English. This has led to a skewness of resource development in favour of the popular language of the web. Two SA resources are i) sentiment lexicons ii) annotated corpora. In this thesis, we address the problem of unavailability or inadequacy of annotated corpora. We present an approach to leverage data from languages which have annotated data. Our approach uses wordnet sense (or otherwise known as synsets) and is based on the fact that semantics influences sentiment. We compared the results of sense based and lexeme based features for sentiment analysis in a monolingual setting. We found that sense based features perform better than lexeme based features. Also, as we move from lexeme feature space to sense feature space, dimensionality reduces. This dimensionality reduction additionally solves the data sparsity problem. As per this approach, we replace synsets not present in the test set with similar synsets from the training set using a wordnet similarity metric. A significant improvement in the classification accuracy is obtained through this approach. Sense identifiers for same concepts belonging to different languages are same if their wordnets are developed using merge method. We leverage this fact to address the problem of unavailability or inadequacy of annotated corpora in a language. A document in test set language (L_Test ) is tested for polarity through a classifier trained on sense marked and polarity labeled corpora of training language (L_Train ). We perform our experiments on two widely spoken Indian languages, Hindi and Marathi. Results show that wordnet sense can bridge the language gaps for SA. However, sense annotation is an additional task in a sentiment analysis system. Hence, to study the cost of annotation and its benefit to the end application, we introduce an economic model. Our model suggests that annotation is beneficial in terms of the performance achieved vis-a-vis the cost associated for developing the system. Existing approaches to reduce resource constraints based on the language of opera- tion depend on machine translation. However, we question the efficacy of these approaches since machine translation is very resource intensive. To test this, we convert data in a resource scarce language, RL_Test , to a resource rich language, RL_Train , using various machine translation techniques. We perform our analysis on 4 European languages (English, French, German, Russian). Our study shows that such a strategy ignores the fact that a machine translation system is much more demanding in terms of resources than a SA engine. Moreover, these approaches fail to take into account the divergence in the expression of sentiments across languages. We provide strong experimental evidence to prove that the performance of such systems comes nowhere close to that obtained by using only a few polarity annotated documents in the target language. Drop in accuracy due to a shift in domain is a common problem for all NLP tasks including sentiment analysis. To address resource constraints in the domain of operation, we present an approach for cross domain sentiment analysis. The idea is to use a group of classifiers trained on the source domain to generate noisy tagged data for the target domain. A small amount of hand-labeled target domain data is then used to decide a confidence threshold for filtering out the noise. The remaining data which is tagged with a high confidence is then used to train a high accuracy sentiment tagger for the target domain. On a training domain similar to the target domain, our system performs on par with or even better than a classifier trained using in-domain data.Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the Indian Institute of Technology Bombay, India and Monash University, Australia
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
