1,720,984 research outputs found
On the Use of Knowledge Transfer Techniques for Biomedical Named Entity Recognition †
Biomedical named entity recognition (BioNER) is a preliminary task for many other tasks, e.g., relation extraction and semantic search. Extracting the text of interest from biomedical documents becomes more demanding as the availability of online data is increasing. Deep learning models have been adopted for biomedical named entity recognition (BioNER) as deep learning has been found very successful in many other tasks. Nevertheless, the complex structure of biomedical text data is still a challenging aspect for deep learning models. Limited annotated biomedical text data make it more difficult to train deep learning models with millions of trainable parameters. The single-task model, which focuses on learning a specific task, has issues in learning complex feature representations from a limited quantity of annotated data. Moreover, manually constructing annotated data is a time-consuming job. It is, therefore, vital to exploit other efficient ways to train deep learning models on the available annotated data. This work enhances the performance of the BioNER task by taking advantage of various knowledge transfer techniques: multitask learning and transfer learning. This work presents two multitask models (MTMs), which learn shared features and task-specific features by implementing the shared and task-specific layers. In addition, the presented trained MTM is also fine-tuned for each specific dataset to tailor it from a general features representation to a specialized features representation. The presented empirical results and statistical analysis from this work illustrate that the proposed techniques enhance significantly the performance of the corresponding single-task model (STM)
Goal Recognition with Deep Learning and Embedded Representation of State Traces
The identification of the goal that an agent is going to achieve is an important task with several applications in robotics and security. Despite several approaches on Goal Recognition (GR) relied on automated planning techniques, recently this task has been addressed by GRNet, which exploits deep learning techniques and has reached a new state-of-the-art that solves GR instances more accurately and more quickly. The information required by GRNet is a trace of actions, indicating the names of the observed actions. However, we intend to study this approach in the case of having as input a state trace instead of an action trace. In this situation, two problems arise immediately: how to encode a state in a form that can be processed by a neural network? Is it possible to analyse a sequence of states with the same techniques used for the actions? In this work, we propose a modification of GRNet in order to make it effective also for observations made by traces of states. In particular, we add an autoencoder which has the capability of deriving a numerical representation of a state. We then perform an experimental analysis over two well known benchmark domains
Supervised Bias Detection in Transformers-based Language Models
Training Large Language Models on biased datasets tends to teach a discriminatory behavior to the systems themselves, as it has been proven by the last years literature on fairness in AI and Machine Learning algorithms. The developed bias-detection strategies often ignores the inner body of the model, making it easy to generalize the methodology, but harder to understand the underlying motivations. In this paper, we present a general approach for detecting unwanted prejudices in Language Models, requiring only a small set of input data. Our strategy works on the embedding representation of languages, without any constraint on model architecture, but it is able to detect which parts of the representation is the most prejudice-affected. © 2024 Copyright for this paper by its authors
On the Behaviour of BERT’s Attention for the Classification of Medical Reports
Since BERT and the other Transformer-based models have been proved successful in many NLP tasks, several studies have been conducted to understand why these complex deep learning architectures are able to reach such remarkable results. Such studies have focused on visualising and analysing the behaviour of each self-attention mechanism and are often conducted with large, generic and annotated datasets for the English language, using supervised probing tasks in order to test specific linguistic capabilities. However, in several practical contexts there are some difficulties: probing tasks may not be available, the documents can contain a strict technical lexicon, and the datasets can be noisy. In this work we analyse the behaviour of BERT in a specific context, i.e. the classification of radiology reports collected from an Italian hospital. We propose (i) a simplified way to classify head patterns without relying on probing tasks or manual observations, and (ii) an algorithm for extracting the most relevant relations among words captured by each self-attention. Combining these techniques with manual observations, we present several examples of linguistic information that can be extracted from BERT in our application
Learning General Policies for Planning through GPT Models
Transformer-based architectures, such as T5, BERT and GPT, have demonstrated revolutionary capabilities in Natural Language Processing. Several studies showed that deep learning models using these architectures not only possess remarkable linguistic knowledge, but they also exhibit forms of factual knowledge, common sense, and even programming skills. However, the scientific community still debates about their reasoning capabilities, which have been recently tested in the context of automated AI planning; the literature presents mixed results, and the prevailing view is that current transformer-based models may not be adequate for planning. In this paper, we address this challenge differently. We introduce a GPT-based model customised for planning (PLANGPT) to learn a general policy for classical planning by training the model from scratch with a dataset of solved planning instances. Once PLANGPT has been trained for a domain, it can be used to generate a solution plan for an input problem instance in that domain. Our training procedure exploits automated planning knowledge to enhance the performance of the trained model. We build and evaluate our GPT model with several planning domains, and we compare its performance w.r.t. other recent deep learning techniques for generalised planning, demonstrating the effectiveness of the proposed approach
- …
