1,720,983 research outputs found
Hydrophobic similarity between molecules: Application to three-dimensional molecular overlays with PharmScreen
DrugProt corpus: Biocreative VII Track 1 - Text mining drug and chemical-protein interactions
Gold Standard annotations of the DrugProt corpus (training and development sets)
Introduction
The aim of the DrugProt track (similar to the previous CHEMPROT task of BioCreative VI) is to promote the development and evaluation of systems that are able to automatically detect in relations between chemical compounds/drug and genes/proteins. We have therefore generated a manually annotated corpus, the DrugProt corpus, where domain experts have exhaustively labeled:(a) all chemical and gene mentions, and (b) all binary relationships between them corresponding to a specific set of biologically relevant relation types (DrugProt relation classes). There is also an increasing interested in the integration of chemical and biomedical data understood as curation of relationships between biological and chemical entities from text and storing such information in form of structured annotation databases. Such databases are of key relevance not only for biological but also for pharmacological and clinical research. A range of different types chemical-protein/gene interactions are of key relevance for biology, including metabolic relations (e.g. substrates, products) inhibition, binding or induction associations.
The DrugProt track aims to address these needs and to promote the development of systems able to extract chemical-protein interactions that might be of relevance for precision medicine as well as for drug discovery and basic biomedical research.
The DrugProt track in BioCreative VII (BC VII) will explore recognition of chemical-protein entity relations from abstracts.
Teams participating in this track are provided with:
PubMed abstracts
Manually annotated chemical compound mentions
Manually annotated gene/protein mentions
Manually annotated chemical compound-protein relations
Zip structure:
Training set folder with
drugprot_training_abstracts.tsv: PubMed records
drugprot_training_entities.tsv: manually labeled mention annotations of chemical compounds and genes/proteins
drugprot_training_relations.tsv: chemical-protein relation annotations
Development set folder with
drugprot_development_abstracts.tsv
drugprot_development_entities.tsv
drugprot_development_relations.tsv
Data format description
The input text files for the DrugProt track will be plain-text, UTF8-encoded PubMed records in a tab-separated format with the following three columns:
Article identifier (PMID, PubMed identifier)
Title of the article
Abstract of the article
DrugProt entity mention annotation files contain manually labeled mention annotations of chemical compounds and genes/proteins. Such files consist of tab-separated fields containing the following six columns:
Article identifier (PMID)
Term number (for this record)
Type of entity mention (CHEMICAL, GENE-Y, GENE-N)
Start character offset of the entity mention
End character offset of the entity mention
Text string of the entity mention
Each line contains one entity, and each entity is uniquely identified by its PMID and the Term Number. Besides, each annotation contains an annotation type, the start-offset -the index of the first character of the annotated span in the text-, the end-offset -the index of the first character after the annotated span- and the text spanned by the annotation.
Example DrugProt training entity mention annotations:
11808879 T1 GENE-Y 1860 1866 KIR6.2
11808879 T2 GENE-N 1993 2016 glutamate dehydrogenase
11808879 T3 GENE-Y 2242 2253 glucokinase
23017395 T1 CHEMICAL 216 223 HMG-CoA
23017395 T2 CHEMICAL 258 261 EPA
Example DrugProt development entity mention annotations (no distinction between GENE-Y and GENE-N):
11808879 T1 GENE 1860 1866 KIR6.2
11808879 T2 GENE 1993 2016 glutamate dehydrogenase
11808879 T3 GENE 2242 2253 glucokinase
23017395 T1 CHEMICAL 216 223 HMG-CoA
23017395 T2 CHEMICAL 258 261 EPA
DrugProt relation annotations will be distributed as a file that contains the detailed chemical-protein relation annotations prepared for the DrugProt track. It consists of tab-separated columns containing:
Article identifier (PMID)
DrugProt relation
Interactor argument 1 (of type CHEMICAL)
Interactor argument 2 (of type GENE)
Each line contains one relation, and each relation is identified by the PMID, the relation type and the two related entities. In the below example, to find the entities involved in the first relation, you must find the entities with Term Identifier T1 and T52 within the PMID 12488248.
Example DrugProt relation annotations:
12488248 INHIBITOR Arg1:T1 Arg2:T52
12488248 INHIBITOR Arg1:T2 Arg2:T52
23220562 ACTIVATOR Arg1:T12 Arg2:T42
23220562 ACTIVATOR Arg1:T12 Arg2:T43
23220562 INDIRECT-DOWNREGULATOR Arg1:T1 Arg2:T14
Please, cite:
@inproceedings{krallinger2017overview, title={Overview of the BioCreative VI chemical-protein interaction Track}, author={Krallinger, Martin and Rabal, Obdulia and Akhondi, Saber A and P{\'e}rez, Mart{\i}n P{\'e}rez and Santamar{\'\i}a, Jes{\'u}s and Rodr{\'\i}guez, Gael P{\'e}rez and others}, booktitle={Proceedings of the sixth BioCreative challenge evaluation workshop}, volume={1}, pages={141--146}, year={2017}}
Summary statistics:
Training set Development set
Documents 3500 750
Tokens 1001168 199620
Annotated Entities 89529 18858
Annotated Relations 17288 3765
Annotated Entities:
Training Entities Development Entities
CHEMICAL 46274 9853
GENE-Y [Normalizable] 28421 -
GENE-N [Non-Normalizable] 14834 -
Gene Total (N+Y) 43255 9005
Total 89529 18858
Annotated Relations:
Training Relations Development Relations
INDIRECT-DOWNREGULATOR 1330 332
INDIRECT-UPREGULATOR 1379 302
DIRECT-REGULATOR 2250 458
ACTIVATOR 1429 246
INHIBITOR 5392 1152
AGONIST 659 131
AGONIST-ACTIVATOR 29 10
AGONIST-INHIBITOR 13 2
ANTAGONIST 972 218
PRODUCT-OF 921 158
SUBSTRATE 2003 495
SUBSTRATE_PRODUCT-OF 25 3
PART-OF 886 258
Total 17288 3765
For further information, please visit https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/ or email us at [email protected] and [email protected]
Related resources:
Web
Evaluation library
Relation annotation guidelines
Gene and protein annotation guidelines
Chemicals and drugs annotation guidelines
FAQDrugProt corpus is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL)
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
koamabayili/VECTRON-author-checklist: VECTRON author checklist
We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
Author-wise bibliometric analysis based on entropy.
Author-wise bibliometric analysis based on entropy.</p
- …
