1,721,025 research outputs found

    Assessment of the E3C corpus for the recognition of disorders in clinical texts

    No full text
    Disorder named entity recognition (DNER) is a fundamental task of biomedical natural language processing, which has attracted plenty of attention. This task consists in extracting named entities of disorders such as diseases, symptoms, and pathological functions from unstructured text. The European Clinical Case Corpus (E3C) is a freely available multilingual corpus (English, French, Italian, Spanish, and Basque) of semantically annotated clinical case texts. The entities of type disorder in the clinical cases are annotated at both mention and concept level. At mention -level, the annotation identifies the entity text spans, for example, abdominal pain. At concept level, the entity text spans are associated with their concept identifiers in Unified Medical Language System, for example, C0000737. This corpus can be exploited as a benchmark for training and assessing information extraction systems. Within the context of the present work, multiple experiments have been conducted in order to test the appropriateness of the mention-level annotation of the E3C corpus for training DNER models. In these experiments, traditional machine learning models like conditional random fields and more recent multilingual pre-trained models based on deep learning were compared with standard baselines. With regard to the multilingual pre-trained models, they were fine-tuned (i) on each language of the corpus to test per-language performance, (ii) on all languages to test multilingual learning, and (iii) on all languages except the target language to test cross-lingual transfer learning. Results show the appropriateness of the E3C corpus for training a system capable of mining disorder entities from clinical case texts. Researchers can use these results as the baselines for this corpus to compare their own models. The implemented models have been made available through the European Language Grid platform for quick and easy access

    DockingApp RF: A State-of-the-Art Novel Scoring Function for Molecular Docking in a User-Friendly Interface to AutoDock Vina

    Full text link
    Motivation: Bringing a new drug to the market is expensive and time-consuming. To cut the costs and time, computer-aided drug design (CADD) approaches have been increasingly included in the drug discovery pipeline. However, despite traditional docking tools show a good conformational space sampling ability, they are still unable to produce accurate binding affinity predictions. This work presents a novel scoring function for molecular docking seamlessly integrated into DockingApp, a user-friendly graphical interface for AutoDock Vina. The proposed function is based on a random forest model and a selection of specific features to overcome the existing limits of Vina’s original scoring mechanism. A novel version of DockingApp, named DockingApp RF, has been developed to host the proposed scoring function and to automatize the rescoring procedure of the output of AutoDock Vina, even to nonexpert users. Results: By coupling intermolecular interaction, solvent accessible surface area features and Vina’s energy terms, DockingApp RF’s new scoring function is able to improve the binding affinity prediction of AutoDock Vina. Furthermore, comparison tests carried out on the CASF-2013 and CASF-2016 datasets demonstrate that DockingApp RF’s performance is comparable to other state-of-the-art machine-learning- and deep-learning-based scoring functions. The new scoring function thus represents a significant advancement in terms of the reliability and effectiveness of docking compared to AutoDock Vina’s scoring function. At the same time, the characteristics that made DockingApp appealing to a wide range of users are retained in this new version and have been complemented with additional features

    FGDB: a comprehensive graph database of ligand fragments from the Protein Data Bank

    No full text
    This work presents Fragment Graph DataBase (FGDB), a graph database of ligand fragments extracted and generated from the protein entries available in the Protein Data Bank (PDB). FGDB is meant to support and elicit campaigns of fragment-based drug design, by enabling users to query it in order to construct ad hoc, target-specific libraries. In this regard, the database features more than 17 000 fragments, typically small, highly soluble and chemically stable molecules expressed via their canonical Simplified Molecular Input Line Entry System (SMILES) representation. For these fragments, the database provides information related to their contact frequencies with the amino acids, the ligands they are contained in and the proteins the latter bind to. The graph database can be queried via standard web forms and textual searches by a number of identifiers (SMILES, ligand and protein PDB ids) as well as via graphical queries that can be performed against the graph itself, providing users with an intuitive and effective view upon the underlying biological entities. Further search mechanisms via advanced conjunctive/disjunctive/negated textual queries are also possible, in order to allow scientists to look for specific relationships and export their results for further studies. This work also presents two sample use cases where maternal embryonic leucine zipper kinase and mesotrypsin are used as a target, being proteins of high biomedical relevance for the development of cancer therapies. Database URL: http://biochimica3.bio.uniroma3.it/fragments-web

    Neural Network-Based Imitation Learning for Approximating Stochastic Battery Management Systems

    Full text link
    Lithium-ion batteries play a pivotal role in enabling eco-friendly mobility, particularly in electric vehicles, but optimizing their charging process to improve battery lifespan, safety, and overall efficiency remains a significant challenge. Traditional predictive control methods are limited by their reliance on precise models, which are often hindered by uncertainties in battery parameters due to aging, production variability, and operational conditions. While stochastic predictive control policies can address these uncertainties by incorporating them directly into the optimization process, they typically introduce considerable computational complexity. In response to this challenge, this paper presents a novel approach that adapts imitation learning to efficiently approximate stochastic predictive control strategies, thus significantly reducing the computational burden through offline training. Specifically, the proposed method leverages the Dataset Aggregation algorithm to overcome the issue of distributional shift, a common limitation in imitation learning frameworks. Simulations based on a detailed electrochemical model demonstrate the effectiveness of the method, adhering to probabilistic constraints while offering a scalable and computationally efficient solution for advanced battery management systems

    Protein-ligand binding site detection as an alternative route to molecular docking and drug repurposing

    No full text
    After the onset of the genomic era, the detection of ligand binding sites in proteins has emerged over the last few years as a powerful tool for protein function prediction. Several approaches, both sequence and structure based, have been developed, but the full potential of the corresponding tools has not been exploited yet. Here, we describe the development and classification of a large, almost exhaustive, collection of protein-ligand binding sites to be used, in conjunction with the Ligand Binding Site Recognition Application Web Application developed in our laboratory, as an alternative to virtual screening through molecular docking simulations to identify novel lead compounds for known targets. Ligand binding sites derived from the Protein Data Bank have been clustered according to ligand similarity, and given a known ligand, the binding mode of related ligands to the same target can be predicted. The collection of ligand binding sites contains more than 200,000 sites corresponding to more than 20,000 different ligands. Furthermore, the ligand binding sites of all Food and Drug Administration-Approved drugs have been classified as well, allowing to investigate the possible binding of each of them (and related compounds) to a given target for drug repurposing and redesign initiatives. Sample usage cases are also described to demonstrate the effectiveness of this approach

    Ontology-driven Generation of Training Paths in the Legal Domain

    Full text link
    This paper presents a methodology for helping citizens obtain guidance and training when submitting a natural language description of a legal case they are interested in. This is done via an automatic mechanism, which firstly extracts relevant legal concepts from the given textual description, by relying upon an underlying legal ontology built for such a purpose and an enrichment process based on common-sense knowledge. Then, it proceeds to generate a training path meant to provide citizens with a better understanding of the legal issues arising from the given case, with corresponding links to relevant laws and jurisprudence retrieved from an external legal repository. This work describes the creation of the underlying legal ontology from existing sources and the ontology integration algorithm used for its production; besides, it details the generation of the training paths and reports the results of the preliminary experimentation that has been carried out so far. This methodology has been implemented in an Online Dispute Resolution (ODR) system that is part of an Italian initiative for assisted legal mediation

    ARISTOTELE: An environment for managing knowledge-intensive enterprises

    No full text
    We present ARISTOTELE, a platform for managing activities of knowledge-intensive enterprises, via the integration of tools and services for knowledge discovery, competence management, collaborative work and adaptive learning. We first present an overview of the platform with its main characteristics and its high-level tools. Then, we delve deeper into the conceptual models established for the representation of semantic data and knowledge coming from the inside as well as the outside of the adopting organization. Subsequently, we detail and motivate its architectural choices, by describing the services and data layers building it up and how they fit within the whole platform

    DockingApp: a user friendly interface for facilitated docking simulations with AutoDock Vina

    No full text
    Molecular docking is a powerful technique that helps uncover the structural and energetic bases of the interaction between macromolecules and substrates, endogenous and exogenous ligands, and inhibitors. Moreover, this technique plays a pivotal role in accelerating the screening of large libraries of compounds for drug development purposes. The need to promote community-driven drug development efforts, especially as far as neglected diseases are concerned, calls for user-friendly tools to allow non-expert users to exploit the full potential of molecular docking. Along this path, here is described the implementation of DockingApp, a freely available, extremely user-friendly, platform-independent application for performing docking simulations and virtual screening tasks using AutoDock Vina. DockingApp sports an intuitive graphical user interface which greatly facilitates both the input phase and the analysis of the results, which can be visualized in graphical form using the embedded JMol applet. The application comes with the DrugBank set of more than 1400 ready-to-dock, FDA-approved drugs, to facilitate virtual screening and drug repurposing initiatives. Furthermore, other databases of compounds such as ZINC, available also in AutoDock format, can be readily and easily plugged in

    LIBRA-WA: a web application for ligand binding site detection and protein function recognition

    No full text
    Recently, LIBRA, a tool for active/ligand binding site prediction, was described. LIBRA's effectiveness was comparable to similar state-of-the-art tools; however, its scoring scheme, output presentation, dependence on local resources and overall convenience were amenable to improvements. To solve these issues, LIBRA-WA, a web application based on an improved LIBRA engine, has been developed, featuring a novel scoring scheme consistently improving LIBRA's performance, and a refined algorithm that can identify binding sites hosted at the interface between different subunits. LIBRA-WA also sports additional functionalities like ligand clustering and a completely redesigned interface for an easier analysis of the output. Extensive tests on 373 apoprotein structures indicate that LIBRA-WA is able to identify the biologically relevant ligand/ligand binding site in 357 cases (9̃6%), with the correct prediction ranking first in 349 cases (%̃ of the latter, 9̃4% of the total). The earlier stand-alone tool has also been updated and dubbed LIBRA+, by integrating LIBRA-WA's improved engine for cross-compatibility purposes

    A Semantic-Based Architecture for Collaborative Enterprise Management: The ARISTOTELE Platform

    No full text
    We present the semantic-based architecture of the ARISTOTELE platform, which is based on the definition and development of models, methodologies, technologies and tools to support the emergence of competences and creativity within workers by self-organizing acquisition, processing and sharing of new information inside knowledge-intensive organizations. ARISTOTELE's architecture relies on semantic data by means of a number of conceptual models, which define the context of interest for an enterprise via a set of concepts and relationships among them. Instances of these models are used to annotate content data, thus creating a semantic network of information that actualizes the Linked Data paradigm within the information space of an organization. In this paper we describe the building elements of the ARISTOTELE platform, the conceptual models which lie behind them and the core Linked Data Layer component responsible of managing information for the whole system
    corecore