1,721,077 research outputs found
A computational procedure for functional characterization of potential marker genes from molecular data: Alzheimer's as a case study
Abstract Background A molecular characterization of Alzheimer's Disease (AD) is the key to the identification of altered gene sets that lead to AD progression. We rely on the assumption that candidate marker genes for a given disease belong to specific pathogenic pathways, and we aim at unveiling those pathways stable across tissues, treatments and measurement systems. In this context, we analyzed three heterogeneous datasets, two microarray gene expression sets and one protein abundance set, applying a recently proposed feature selection method based on regularization. Results For each dataset we identified a signature that was successively evaluated both from the computational and functional characterization viewpoints, estimating the classification error and retrieving the most relevant biological knowledge from different repositories. Each signature includes genes already known to be related to AD and genes that are likely to be involved in the pathogenesis or in the disease progression. The integrated analysis revealed a meaningful overlap at the functional level. Conclusions The identification of three gene signatures showing a relevant overlap of pathways and ontologies, increases the likelihood of finding potential marker genes for AD.</p
Missing Values in Multiple Joint Inference of Gaussian Graphical Models
Real-world phenomena are often not fully measured or completely observable, raising the so-called
missing data problem. As a consequence, the need of developing ad-hoc techniques that cope
with such issue arises in many inference contexts. In this paper, we focus on the inference of
Gaussian Graphical Models (GGMs) from multiple input datasets having complex relationships
(e.g. multi-class or temporal). We propose a method that generalises state-of-the-art approaches
to the inference of both multi-class and temporal GGMs while naturally dealing with two types
of missing data: partial and latent. Synthetic experiments show that our performance is better
than state-of-the-art. In particular, we compared results with single network inference methods
that suitably deal with missing data, and multiple joint network inference methods coupled with
standard pre-processing techniques (e.g. imputing). When dealing with fully observed datasets
our method analytically reduces to state-of-the-art approaches providing a good alternative as our
implementation reaches convergence in shorter or comparable time. Finally, we show that properly
addressing the missing data problem in a multi-class real-world example, allows us to discover
interesting varying patterns
Building Kernels from Binary Strings for Image Matching
In the statistical learning framework, the use of appropriate kernels may be the key for substantial improvement in solving a given problem. In essence, a kernel is a similarity mea- sure between input points satisfying some mathematical requirements and possibly capturing the domain knowledge. In this paper, we focus on kernels for images: we represent the image informa- tion content with binary strings and discuss various bitwise manipulations obtained using logical operators and convolution with nonbinary stencils. In the theoretical contribution of our work, we show that histogram intersection is a Mercer’s kernel and we determine the modifications under which a similarity measure based on the notion of Hausdorff distance is also a Mercer’s kernel. In both cases, we determine explicitly the mapping from input to feature space. The presented experimental results support the relevance of our analysis for developing effective trainable systems
Metodo e sistema di user-centered design per strutturazione ed aggiornamento automatico di contenuti informativi
Il brevetto, frutto di un lavoro interdisciplinare di design e data science, descrive un metodo di user centered design che riorganizza contenuti testuali complessi, per fornire all’utente le sole informazioni che riguardano il suo caso specifico, in forma chiara, sintetica e organizzata secondo l’ordine di fruizione.
Produce un corpus di testi strutturati, che mantiene automaticamente aggiornati siti web e app, migliorandone enormemente efficacia ed efficienza.
Particolarmente adatto a pubbliche amministrazioni ed enti complessi che debbano presentare i propri servizi agli utenti, si fonda sull'analisi dei processi e sulla profilazione degli utenti in funzione delle possibili posture nei confronti dei servizi erogati. Il risultato è un corpus di testi riusabili, strutturati, e interconnessi, che descrivono le informazioni secondo un approccio di design incentrato sull'utente
Functional characterization of Parkinson by high-throughput data analysis with l1l2 regularization
- …
