1,721,003 research outputs found
Support vector machines per la classificazione e la selezione di espressioni geniche da microarray
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Preparation and in vitro evaluation of chitosan matrices for the colonic controlled release of proteins
Matrices for colonic controlled release of the model protein drug bovine serum albumin (BSA) were prepared by direct compression of chitosan (CH) or chitosan hydrochloride (CHHCl) microspheres containing 4% BSA, without the aid of any other ingredient. The matrices (6 mm diameter, 50 mg weight each) are destined to be filled into enteric coated size 00 capsules and thereby conveyed to the proximal colon. Protein release was studied in vitro using pH 7.4 buffer as the elution medium. Release from microspheres was exceedingly fast, whereas the matrices were able to control release over prolonged periods. In the elution medium both the CH- and the CHHCl-based matrices swelled without disintegrating. The latter matrix type swelled to a much higher degree. Protein release from matrices was diffusion-controlled and independent of external medium hydrodynamics. The time for release of 50 % dose was much shorter for the CHHCl-based than for the CH-based matrix (0.9 vs. 12.0 h), so the latter is more suitable as a controlled-release system for proteins
Evaluation of the solution impregnation method for loading drugs into suspension-type polymer matrices: a study of factors determining the patterns of solid drug distribution in matrix and drug release from matrix
Control of selection bias in microarray data analysis
We present an experimental setup for analysis and prediction on microarray data, specifically designed to identify and correct the impact of the selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes incurring in overfitting effects. We outline the selection bias problem and we demonstrate its effect on synthetic and microarray data. Then we introduce and describe a procedure to successfully deals with the problem through extensive resampling and label randomization techniques, employing Support Vector Machines as base classifier and an improved version of the Recursive Feature Elimination algorithm for gene rankin
Gene selection and classification by entropy-based recursive feature elimination
We analyse E-RFE (Entropy-based Recursive Feature Elimination), a new wrapper algorithm for fast feature ranking in classification problems. The E-RFE method operates the elimination of chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. The method is designed to support computationally intensive model selection in classification problems in which the number of features is much largerthan the number of samples. We proofread the elimination procedure on synthetic data sets, and we demonstrate the applicability of E-RFE for the identification of biomedically important genes in predictive classification of microarray dat
Gene selection and classification with support vector machines applied to microarray data
Microarray expression studies are producing massive high-throughput quantities of gene expression and other functional genomics data. One of the most challenging factors of the discovery process which may be sourced from gene expression data matrices is the identification of small subsets of genes likely to be strongly related to the biological pathways involved in the experiment.
We developed a gene selection method based on the Recursive Feature Elimination procedure for Support Vector Machines (SVM RFE, Guyon et al 2002): in order to better control and speed-up the elimination of genes (typically from several thousands to less than 10), we introduced a reduction algorithm E-RFE based on the structure of the distributions of weights obtained from a SVM classifier by feature elimination. The reduction algorithm is based on an entropy measure of the distribution and it allows to eliminate chunks of uninteresting genes until the remaining distribution stabilizes, typically at 50 genes. Then the single step SVM-RFE is operated. Our first experiments on public and on unpublished microarray data are very promising: the accuracy of SVM classification is maintained also with very few remaining genes, with a remarkable acceleration with respect to the SVM-RFE procedure. At a first analysis of the oncological interest of the selected genes, performed by specialists, also gave interesting results, as only genes relevant to cancer classification were selected. In particular, one gene related to tissue composition selected by RFE resulted not selected by E-RFE.
This is a case in which the features selected may matter more than the classifier used. On tasks such as prediction of patients' response to therapy, we aim to develop accurate classification systems based on a very reduced number of genes in order to provide, at the same time, a predictive methodology and an analysis tool in experimental oncology. The automatic selection of genes relevant to the underlying oncological basis is thus crucial in the design of targeted experiments.
In this paper we present an application of the method to three different microarray data sets: a data set of diffuse large B-cell lymphoma (96 samples and 4026 genes, where 7 genes where selected), the AML/ALL dataset (discriminate Acute Myeloid Leukemia versus Acute Lymphoblastic Leukemia, 72 cases and 7129 genes, of which 9 selected), colon tumor (discriminate tumor and normal colon tissues, 62 cases and 2000 genes, with 7 genes selected). In all cases, classification accuracy with the reduced models resulted comparable to previously published results.
In a perspective of automating the gene selection procedure within an integrated discovery process, we are now developing a system for a complete bionformatics treatment, such as interaction with data through a database system connection, facilitation of tasks as comparing gene selection results with BLAST service, and specialized data displays produced by statistical softwar
- …
