1,720,958 research outputs found
Progetto ARES: Advanced networking for EU genomic RESearch
La velocità con cui si generano i dati di genomica sta aumentando con un tasso più alto della legge di
Moore, pertanto significativamente superiore all’ammodernamento della capacità trasmissiva e di
immagazzinamento nelle rete per dati. Di conseguenza, gli utenti sperimentano difficoltà crescenti
nella gestione dei dati di genomica, al punto tale che a volte i dati sono traferiti mediante soluzioni
alternative alle reti. Ad esempio, il Beijing Genomics Institute, che elabora attualmente 2.000 genomi
umani al giorno, invece di trasmetterli attraverso Internet o altre reti, invia hard-disk contenenti i dati
tramite corriere espresso [8] Per avere un’idea della serietà del problema, supponiamo che un
ricercatore voglia determinare le caratteristiche di un genoma rispetto ad una specifica malattia
distribuita in diversi paesi del mondo. In tal caso, non solo il numero di file di genoma da gestire ed
analizzare diventa estremamente grande, ma anche ogni insieme di dati che riguarda l’individuo
stesso è significativamente grande, dell’ordine delle decine di GB. L’elaborazione del genoma, in
particolare quello umano, in genere procede attraverso l’esecuzione di una pipeline di pacchetti
software. Esistono diversi tipi di pipeline, ognuno specifico per esigenze di ricerca o diagnostiche [1].
I file di ingresso delle pipeline sono sia file di genoma, sia risultati di elaborazioni precedenti, detti
annotazioni, sia il modello di riferimento del genoma umano [6] utilizzato per eseguire l’allineamento
dei dati [5], [6], [7].
Anche se il genoma di un paziente può essere immagazzinato in un data base locale, tutti gli altri file,
che si trovano in database localizzati fisicamente e geograficamente su server diversi, devono essere
scaricati dalla rete. La dimensione globale di questi file è variabile, da pochi GB a decine di GB. Solo
quando tutti i file sono stati trasferiti, allora può iniziare l’elaborazione dei dati, che può anche durare
ore. In sostanza, il tempo totale chiesto per avere i risultati di una richiesta di elaborazione potrebbe
essere superiore alle 24 ore. Nella prospettiva di una veloce ed imminente diffusione del
sequenziamento e dell’utilizzo dei dati di genomica ai fini diagnostici, questa problematica pone
almeno due aspetti critici. La minimizzazione dei tempi di consegna del servizio, nel caso in cui si
debba trattare ad esempio la diagnosi di una malattia grave, e la gestione del traffico dati in rete.
Mentre in un numero relativamente piccolo di prestigiose organizzazioni i ricercatori hanno a
disposizione potenti strutture di calcolo parallelo [3], in generale questo non è vero per centri medici
generici e ospedali pubblici, in particolare per paesi in cui l’infrastruttura di rete e dei servizi non ha
prestazioni elevate.
In tale contesto, l’unità di ricerca di Perugia è responsabile di unità per il progetto ARES (Advanced
networking for EU genomic RESearch) [4] che ha come obiettivo principale l’ottimizzazione della
gestione delle risorse di rete finalizzata alla elaborazione e trasferimento di dati di genoma umano
che, se trattati come generici “big data”, implicano una gestione delle risorse di rete con prestazioni
sub-ottime. In questa memoria, oltre alla descrizione del sistema, sono riportati i risultati sperimentali
preliminari che evidenziano come una attenta scelta dei parametri degli algoritmi di elaborazione, di
gestione e di consegna dei servizi, che si basano sull’integrazione del modello Content Distribution
Network (CDN) e di quello Cloud, permette di personalizzare servizi di rete alle esigenze specifiche
di personale medico sanitario che richieda elaborazione di dati genomici caratterizzati da dimensioni
molto grandi dei file. Il progetto ARES, accettato nell’ambito della prima open-call del progetto
Géant/GN3plus, è co-finanziato dalla Commissione Europea
ARES: Advanced Networking for Distributing Genomic Data
This paper shows the network and service architecture being implemented within the project ARES (Advanced networking for the EU genomic RESearch). This architecture is designed for both providing delivery of genomic data set over the GÉANT network and supporting the genomic research in EU countries. For this purpose, the strategic objective of the project ARES is to create a novel Content Distribution Network (CDN) architecture, suitable for handling the rapidly increasing diffusion of genomic data. This paper summarizes the status of the project, the ongoing research, and the achieved and expected results. This CDN architecture is based on an evolved NSIS signalling, and addresses the major challenges for managing genomic data sets over a shared wideband network with limited amount of resources made available to the service.
Besides a detailed description of the functional entities included in the ARES architecture, we illustrate the signalling protocols that support their interaction, and provide preliminary experimental results obtained by the implementation and deployment of two significant research scenarios within our research laboratories
A resource discovery framework for cloud-based genomics computing
In recent years scientific computing has evolved into a massive usage of cloud computing, due to its flexibility in managing computing resources. In this paper, we focus on genomic data processing, which is rapidly gaining momentum in research and medical activities. The main characteristics of these data sets is that not only the number of available genome files is
becoming extremely large, but also each individual data set is significantly large, in the order of tens of GB. Hence, a wide diffusion of cloud-based genomic data processing will have a significant impact on network resources, since each processing request will require the transfer of tens of GBs into computing nodes. To face this issue, in this paper we propose a resource discovery framework which provides decision agents with the
needed information for selecting the most suitable computing nodes. We have implemented this resource discovery function in a distributed fashion, and extensively tested it in a lab testbed consisting of about 70 nodes. We found that the overhead of the proposed solution is negligible in comparison with the amount of transferred data
The ARES Project: Cloud Services for Medical Genomics
This paper shows the cloud services provided by the project ARES. The network solutions have been illustrated in a companion paper in the same conference. The ARES project aims to deploy CDN services over a broadband network for accessing and exchanging genomic datasets, accessible by medical and research personnel through a Cloud interface. This paper illustrates the procedure defined to access such services, also providing a case-study simulation to show the implementation of the bioinformatics pipeline included. The experimental activity in ARES aims to gain a detailed understanding of the network problems relating to its sustainability given the increasing use of genomics for diagnostic purposes. The main aim is to allow an extensive use of genomic data through the collection of relevant information available from the network in the medical and diagnostic field diseases
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
