1,721,002 research outputs found
Analysis and Diagnostics for Censored Regression and Multivariate Data
This thesis investigates three research problems which arise in multivariate
data and censored regression. The first is the identification of outliers
in multivariate data. The second is a dissimilarity measure for clustering
purposes. The third is the diagnostics analysis for the Buckley-James
method in censored regression.
Outliers can be defined simply as an observation (or a subset of observations)
that is isolated from the other observations in the data set. There
are two main reasons that motivate people to find outliers; the first is the
researcher's intention. The second is the effects of an outlier on analyses,
i.e. the existence of outliers will affect means, variances and regression
coefficients; they will also cause a bias or distortion of estimates; likewise,
they will inflate the sums of squares and hence, false conclusions are likely
to be created. Sometimes, the identification of outliers is the main objective
of the analysis, and whether to remove the outliers or for them to be
down-weighted prior to fitting a non-robust model.
This thesis does not differentiate between the various justifications for
outlier detection. The aim is to advise the analyst of observations that
are considerably different from the majority. Note that the techniques for
identification of outliers introduce in this thesis is applicable to a wide
variety of settings. Those techniques are performed on large and small
data sets. In this thesis, observations that are located far away from the
remaining data are considered to be outliers.
Additionally, it is noted that some techniques for the identification of
outliers are available for finding clusters. There are two major challenges
in clustering. The first is identifying clusters in high-dimensional data sets
is a difficult task because of the curse of dimensionality. The second is a
new dissimilarity measure is needed as some traditional distance functions
cannot capture the pattern dissimilarity among the objects. This thesis
deals with the latter challenge. This thesis introduces Influence Angle
Cluster Approach (iaca) that may be used as a dissimilarity matrix and
the author has managed to show that iaca successfully develops a cluster
when it is used in partitioning clustering, even if the data set has mixed
variables, i.e. interval and categorical variables. The iaca is developed
based on the influence eigenstructure.
The first two problems in this thesis deal with a complete data set. It is
also interesting to study about the incomplete data set, i.e. censored data
set. The term 'censored' is mostly used in biological science areas such as
a survival analysis. Nowadays, researchers are interested in comparing
the survival distribution of two samples. Even though this can be done
by using the logrank test, this method cannot examine the effects of more
than one variable at a time. This difficulty can easily be overcome by using
the survival regression model. Examples of the survival regression model
are the Cox model, Miller's model, the Buckely James model and the Koul-
Susarla-Van Ryzin model.
The Buckley James model's performance is comparable with the Cox
model and the former performs best when compared both to the Miller
model and the Koul-Susarla-Van Ryzin model. Previous comparison studies
proved that the Buckley-James estimator is more stable and easier to
explain to non-statisticians than the Cox model. Today, researchers are interested
in using the Cox model instead of the Buckley-James model. This
is because of the lack of function of Buckley-James model in the computer
software and choices of diagnostics analysis. Currently, there are only a
few diagnostics analyses for Buckley James model that exist.
Therefore, this thesis proposes two new diagnostics analyses for the
Buckley-James model. The first proposed diagnostics analysis is called
renovated Cook's distance. This method produces comparable results with
the previous findings. Nevertheless, this method cannot identify influential
observations from the censored group. It can only detect influential
observations from the uncensored group. This issue needs further investigation
because of the possibility of censored points becoming influential
cases in censored regression.
Secondly, the local influence approach for the Buckley-James model
is proposed. This thesis presents the local influence diagnostics of the
Buckley-James model which consist of variance perturbation, response
variable perturbation, censoring status perturbation, and independent variables
perturbation. The proposed diagnostics improves and also challenge
findings of the previous ones by taking into account both censored and uncensored
data to have a possibility to become an influential observation
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Eigenstructure-based angle for detecting outliers in multivariate data
There are two main reasons that motivate people to detect outliers; the first is the researchers' intention; see the example of Mr Haldum's cases in Barnett and Lewis. The second is the effect of outliers on analyses. This article does not differentiate between the various justifications for outlier detection.The aim was to advise the analyst about observations that are isolated from the other observations in the data set. In this article, we introduce the eigenstructure based angle for outlier detection.This method is simple and effective in dealing with masking and swamping problems. The method proposed is illustrated and compared with Mahalanobis distance by using several data sets
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
koamabayili/VECTRON-author-checklist: VECTRON author checklist
We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
- …
