1,720,966 research outputs found

    Letter: Shifting focus—From ChatGPT to specialised medical LLMs: Authors' reply

    No full text
    Editors, We appreciate the comments provided by Wang et al.1 and the idea of having specialised medical large language models (LLMs) over general models (e.g. ChatGPT). We concur with the authors that, ideally, we would have specialised medical LLMs that leverage domain-specific datasets from sources such as PubMed and the Cochrane Library. How to achieve this, however, can take different forms: utilising prompt architecture tweaks such as retrieval augmented generation with medical guidelines,2 altering the weights of the language model through fine-tuning2, 3 with medical expertise or real/synthetic4 patients' data (clinical, imaging or genomics) derived from electronic health records, moulding the language models to reflect human preferences through reinforcement learning with human feedback, or training language models with tasks specified to match specialised clinical goals. Our review quantifies the variability of general-purpose LLMs like ChatGPT in the field of gastroenterology and hepatology in terms of accuracy (from 6.4% to 91.4%),5 which highlights the nascent stage of LLM application in specialised medical fields. This variability underscores not only the current limitations when general LLMs tackle complex medical topics but also the crucial need for advancements that focus on specialised fields. While general-purpose models like ChatGPT are groundbreaking, they are preliminary steps towards more refined applications. Our findings advocate for a prudent approach to developing next-generation LLMs that meet the stringent requirements of clinical accuracy and reliability and reduce the risk of patient harm due to plausible-sounding but inaccurate answers (i.e., hallucinations).6, 7 As we look to the future, it is crucial to understand whether LLMs can provide accurate answers and perform clinical reasoning tasks.8, 9 Clinical reasoning involves complex decision-making processes that may not be fully captured by existing model training paradigms. This points to a fundamental requirement for defining new tasks that are custom-tailored to facilitate such sophisticated functionalities in LLMs. We appreciate the dialogue initiated by the comments and are excited about the potential transformations in health care that specialised LLMs could bring. We are committed to contributing robustly to this evolving field and to furthering the discourse on effectively integrating LLM technologies into healthcare practices

    Does Ectopic Beats Bring More Discriminatory Information to Diagnose Ischemic Heart Disease?

    Full text link
    Early non-invasive diagnosis of Ischemic Heart Disease (IHD) can often be challenging. HRV features have a potentially important role in risk stratification for subjects with suspected heart disease. However , there is no consensus on the HRV preprocessing steps, particularly on how to properly treat ectopic beats.We aimed to investigate the performance of the models for classification of early IHD versus healthy subjects (HC) based on HRV features extracted from signals excluding ectopic beats and based on the same features extracted from the signals that contain both ectopic and normal heartbeats. This study encompassed 385 subjects (170 IHD and 215 HC). The models were produced by logistic regression method considering two sets of HRV features obtained by two preprocessing approaches. The results showed that the model with the input features from HRV signals including normal and ectopic beats presented a higher classification accuracy (72.7%) than the model based on features extracted only from normal heart beats (67.8%). In addition, the evaluation of the feature importance by analysis of produced nomograms and observed significant differences between features extracted with two preprocessing approaches, showed also that the exclusion of the ectopic beats modifies the features' discriminatory power between HC and IHD

    Systematic review: The use of large language models as medical chatbots in digestive diseases

    No full text
    Background: Interest in large language models (LLMs), such as OpenAI's ChatGPT, across multiple specialties has grown as a source of patient-facing medical advice and provider-facing clinical decision support. The accuracy of LLM responses for gastroenterology and hepatology-related questions is unknown. Aims: To evaluate the accuracy and potential safety implications for LLMs for the diagnosis, management and treatment of questions related to gastroenterology and hepatology. Methods: We conducted a systematic literature search including Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus and the Web of Science Core Collection to identify relevant articles published from inception until January 28, 2024, using a combination of keywords and controlled vocabulary for LLMs and gastroenterology or hepatology. Accuracy was defined as the percentage of entirely correct answers. Results: Among the 1671 reports screened, we identified 33 full-text articles on using LLMs in gastroenterology and hepatology and included 18 in the final analysis. The accuracy of question-responding varied across different model versions. For example, accuracy ranged from 6.4% to 45.5% with ChatGPT-3.5 and was between 40% and 91.4% with ChatGPT-4. In addition, the absence of standardised methodology and reporting metrics for studies involving LLMs places all the studies at a high risk of bias and does not allow for the generalisation of single-study results. Conclusions: Current general-purpose LLMs have unacceptably low accuracy on clinical gastroenterology and hepatology tasks, which may lead to adverse patient safety events through incorrect information or triage recommendations, which might overburden healthcare systems or delay necessary care

    Semi-automatic Approach to Estimate the Degree of Non-alcoholic Fatty Liver Disease (NAFLD) from Ultrasound Images

    No full text
    The early diagnosis of Non-Alcoholic Fatty Liver Disease (NAFLD) is crucial to prevent fibrosis progression or the onset of advanced chronic liver disease. Among the non-invasive methods, ultrasound (US) B-mode imaging is recommended for population screening and follow-up. Hamaguchi’s score was proposed to improve the evaluation of the fatty liver from US images. In our study, we aimed to assess objectively the Hamaguchi score through an advanced semi-automatic analysis of US images. The study encompassed a dataset of 325 bariatric patients with NAFLD diagnosed by liver biopsy who underwent ultrasound assessment at the Liver Clinic at Trieste University Hospital. The classification models for the estimation of the three Hamaguchi sub-scores were produced by semiautomatic US image analysis based on clustering and Convolutional Neural Network (CNN) with transfer learning techniques. The results showed that the produced models were able to estimate the three sub-scores with high classification accuracy. The predictive models produced for the estimation of liver brightness hepatorenal echo contrast, the diaphragm deep attenuation, and the vessel blurring sub-scores presented a classification accuracy of 92.6%, 84.8%, and 90.9%, respectively. In conclusion, in this preliminary study, the results assessed the possibility to produce the NAFLD computer-aided diagnostic models based on analysis of US images

    Optimizing Liver Stiffness Assessment in HCV Patients: A Machine Learning Approach to Identify Confounding Factors in Fibrosis Estimation

    No full text
    Hepatitis C Virus (HCV) infection is a significant global health concern with approximately 1.5 million new infections yearly. The choice of the most appropriate HCV treatment depends on several factors, including liver fibrosis status. Current guidelines recommend liver fibrosis evaluation using non-invasive techniques such as Liver Stiffness Measurement (LSM) using liver elastography. Although LSM revolutionized patient care in the last decade, allowing biopsy-free treatments, several factors can lead to overestimation or underestimation of liver stiffness values, affecting management strategies. This study presents a machine-learning approach using an eXtreme Gradient Boosting model to predict possible LSM inaccuracies in a cohort of 509 HCV-positive treated patients. The dataset, characterized by 55 variables, underwent feature reduction and balancing to mitigate class imbalance to train the predictive algorithm. The developed model can identify inaccuracy in LSM and achieves an accuracy of 88.0% on the training set and 92.0% on the test set. Furthermore, it exhibited a consistent mean Area Under the Curve One-vs-One (AUC-ovo) of 0.97 across both datasets. The model’s performance in predicting abnormal LSM may enable healthcare providers to tailor treatment plans more precisely, optimize patient follow-up, and reduce unnecessary invasive procedures. These findings highlight the potential of machine learning in improving patient care in the context of chronic HCV management

    Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework

    Full text link
    Abstract Large language models (LLMs) can potentially transform healthcare, particularly in providing the right information to the right provider at the right time in the hospital workflow. This study investigates the integration of LLMs into healthcare, specifically focusing on improving clinical decision support systems (CDSSs) through accurate interpretation of medical guidelines for chronic Hepatitis C Virus infection management. Utilizing OpenAI’s GPT-4 Turbo model, we developed a customized LLM framework that incorporates retrieval augmented generation (RAG) and prompt engineering. Our framework involved guideline conversion into the best-structured format that can be efficiently processed by LLMs to provide the most accurate output. An ablation study was conducted to evaluate the impact of different formatting and learning strategies on the LLM’s answer generation accuracy. The baseline GPT-4 Turbo model’s performance was compared against five experimental setups with increasing levels of complexity: inclusion of in-context guidelines, guideline reformatting, and implementation of few-shot learning. Our primary outcome was the qualitative assessment of accuracy based on expert review, while secondary outcomes included the quantitative measurement of similarity of LLM-generated responses to expert-provided answers using text-similarity scores. The results showed a significant improvement in accuracy from 43 to 99% (p < 0.001), when guidelines were provided as context in a coherent corpus of text and non-text sources were converted into text. In addition, few-shot learning did not seem to improve overall accuracy. The study highlights that structured guideline reformatting and advanced prompt engineering (data quality vs. data quantity) can enhance the efficacy of LLM integrations to CDSSs for guideline delivery

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore