1,721,117 research outputs found
MIRACLE at ImageCLEFanot 2007: Machine Learning Experiments on Medical Image Annotation
This paper describes the participation of MIRACLE research consortium at the ImageCLEF Medical Image Annotation task of ImageCLEF 2007. Our areas of expertise do not include image analysis, thus we approach this task as a machine-learning problem, regardless of the domain. FIRE is used as a black-box algorithm to extract different groups of image features that are later used for training different classifiers based on kNN algorithm in order to predict the IRMA code. The main idea behind the definition of our experiments is to evaluate whether an axis-by-axis prediction is better than a prediction by pairs of axes or the complete code, or vice versa
MIRACLE at ImageCLEFannot 2008: Classification of Image Features for Medical Image Annotation
This paper describes the participation of MIRACLE research consortium at the ImageCLEF Medical Image Annotation task of ImageCLEF 2008. A lot of effort was invested this year to develop our own image analysis system, based on MATLAB, to be used in our experiments. This system extracts a variety of global and local features including histogram, image statistics, Gabor features, fractal dimension, DCT and DWT coefficients, Tamura features and coocurrency matrix statistics. Then a k-Nearest Neighbour algorithm analyzes the extracted image feature vectors to determine the IRMA code associated to a given image. The focus of our experiments is mainly to test and evaluate this system in-depth and to make a comparison among diverse configuration parameters such as number of images for the relevance feedback to use in the classification module
Overview of the first workshop on medical content-based retrieval for clinical decision support at MICCAI 2009
In this paper, we provide an overview of the first workshop on Medical Content-Based Retrieval for Clinical Decision Support (MCBR-CDS), which was held in conjunction with the Medical Image Computing and Computer Assisted Intervention (MICCAI) conference in 2009 in London, UK. The goal of the workshop was to bring together researchers from diverse communities including medical image analyses, text and image retrieval, data mining, and machine learning to discuss new techniques for multimodal image retrieval and the use of images in clinical decision support. We discuss the motivation for this workshop, provide details about the organization and participation, discuss the current state-of-the-art in clinical image retrieval and the use of images for clinical decision support. We conclude with open issues and challenges that lie ahead for the domain of medical content-based retrieval. © 2010 Springer Berlin Heidelberg
Classification and retrieval of endoscopic images from the clinical outcomes research initiative (CORI) collection
Traditionally, image retrieval systems have been text-based, relying on the annotations or captions associated with the images. Although text-based information retrieval methods are mature and well-researched, they are limited by the quality and availability of the annotations associated with the images. Advances in techniques in computer vision have led to methods for using the image as the search entity. Our project aimed to create an image retrieval system with a set of 1500 upper endoscopic images from the Clinical Outcomes Research Initiative Collection
Recommended from our members
Machine-learning derived Pulmonary X-ray Severity (PXS) score to predict COVID-19 severe outcomes
Abstract
Background
Using a previously validated machine learning algorithm that extracts a quantitative measure of Pulmonary Xray Severity (PXS) score. We compared lung involvement among different SARS-COV-2 variants and the association between PXS score and severe pulmonary outcomes in adult COVID-19 patients at MGH.
Methods
We performed a retrospective study of 8,207 COVID-19 patients who tested positive for COVID-19 by (PCR) at MGH between March 4, 2020, and February 20, 2022, and had at least one chest x-ray within ±14 days of their COVID-19 testing. Patients were categorized into three groups according to the COVID-19 waves trend in the US (Original virus, Delta variant, and Omicron variant). Unadjusted and adjusted quantile regression models were used to estimate the association between PXS scores and COVID-19 waves. While we used logistic regression models to measure the association between PXS score and ICU admission and death
Results
At baseline, the mean age was 56.8 ±18, 51.5% were male, and 50.4% were white. Participants in the Omicron group had lower PXS scores >9 proportions compared to Delta and Original groups (2.5% vs. 5.8% vs. 5.9 respectively, P.001). They were less likely to be admitted to the ICU compared to Delta and Original groups (3.8% vs. 5.9% vs. 6.8%, p.001). Death proportions were significantly lower in Omicron group 4% compared to Delta group 5.6% and Original group 6.3% (P.001). The adjusted ORs of association between PXS scores and ICU admission were (10.4, 12.6, 9.8, P.001) in Original, Delta and Omicron waves, respectively. While the ORs of association between PXS scores and death were (8.3, 10.3, 2.7, P.001) in the three groups.
Conclusions
The Omicron group was likely to have lower PXS scores than the original virus and Delta variant. Moreover, we found that patients with high PXS scores >9 have a higher likelihood of being admitted to the ICU or death compared to those with PXS scores ≤9
Improving Segmentation Pipelines for Medical Imaging using Deep Learning
One of the most important steps in the clinical workflow is the segmentation of medical imaging, which can be used for a variety of clinical decision-making tasks such as disease diagnosis and treatment response evaluation. Manual segmentation of 3D medical imaging (such as computed tomography (CT) or magnetic resonance imaging (MRI)) by a clinical expert can be too time-consuming to be feasible in a routine clinical workflow, and can moreover be susceptible to human errors and inconsistencies. In recent years, deep learning (DL) based methods have exhibited human-level performance for a variety of computer vision tasks, making them an attractive choice for researchers aiming to automate the segmentation of medical imaging. This thesis considers two medical imaging scenarios and examines how fully automatic image segmentation via DL can enhance downstream clinical tasks.
The first scenario evaluates the clinical workflow for diagnosing incidental adrenal masses on CT. Despite standardized reporting systems and strict guidelines for defining an adrenal mass, there exists significant inter-rater variability for this task. To enable objective and reproducible characterization of the adrenal gland, this thesis develops the first DL method for segmentation and classification on CT. Using a large-scale retrospectively acquired dataset, this method is used to identify potential missed detections by radiologists and discuss the clinical implications of this.
The second scenario focuses on the treatment response assessment of metastatic brain tumor patients on MRI. Due to the large number of metastases a patient can have, standard radiographic analyses track only a select few target lesions through the course of therapy in order to assess the efficacy of a treatment. With this paradigm, smaller non-target lesions may be neglected or even missed due to the lack of quantitative emphasis. To that end, a pipeline is developed to automatically segment brain tumor metastases on MRI and output standard response assessment metrics. With the prevalence of longitudinal imaging data available for brain metastases patients, a secondary model is formulated to improve the detection and segmentation of micro-metastases by utilizing known prior time point information.Ph.D
Recommended from our members
Generating Clinically Translatable AI Models for Cancer Diagnostics
Given the large heterogeneity in the oncogenesis, pathophysiology and therapeutic targeting of cancers and the significant morbidity that cancer imposes on society, standard-of-care oncotherapy has focused on earlier screening and detection, and subsequent preferential removal of lesions via a combination of chemo-, radio- and/or targeted therapy to improve overall survival. Despite significant advancements in the therapeutic arm for most cancers, the clinical diagnostic gold standard remains the biopsy, particularly for identifying the mutational status of therapeutically relevant molecular markers and oncogenic drivers, and for shedding light into the tumor microenvironment. However, biopsies are invasive (requiring organ penetration), expensive (both in terms of cost and time) and frequently has poor sensitivity. Therefore, there is a critical need for minimally-invasive, well-validated diagnostic biomarkers that can supplant biopsies and facilitate precision medicine.
The use of artificial intelligence (AI) and deep learning (DL) models trained on routinely collected imaging (e.g., CT, MRI, colposcopy) has recently emerged as a minimally-invasive alternative diagnostic biomarker in several clinical domains, with optimized models reporting near-clinician-level performance. However, translation of DL models from bench to bedside remain sparse. To be clinically translatable, deep neural networks (DNN) should be robust, computationally efficient, low-cost, and blend well with existing clinical workflows, ensuring the inputs/outputs of the model and the task it performs are most relevant to the clinician for a given use case. This is often not the case with current state-of-the-art (SOA) models, which are frequently hindered by several key methodological flaws in their design, thereby undermining their validity, and hindering clinical translation. In the context of my work, model robustness refers to two key attributes: 1. repeatability or reproducibility, defined as the ability of a model to generate near-identical predictions for the same patient under identical conditions, ensuring that the model produces precise, reliable and consistent outputs in the clinical setting; and 2. generalizability or portability, defined as the ability of a model to adapt well to domain expansion or, alternatively, the ability of a model to perform well on datasets that are out of distribution i.e., having different characteristics from the training data (e.g., different device, geography, quality, and/or patient population). There is a paucity of work in the current literature that assess one or both of these attributes, with models tending to overfit the training data distribution.
In my thesis work, I address both these attributes head-on, via comprehensively-designed, biologically-inspired AI pipelines that are optimized specifically for clinical translation. Key technical innovations highlighted in my work include repeatability-centric model optimization and combination loss functions with Monte Carlo dropout (for improved repeatability), as well as novel metrics for distribution distance characterization and optimized retraining (for improved generalizability). My work incorporates these innovations into DL-based pipelines in several oncology domains. In particular, for cervical cancer, we developed a multi-stage AI pipeline utilizing a cervical-colposcope-thermocoagulator device in a workflow involving, in sequence: 1. image capture, 2. cervix detection, 3. an image quality classifier that filters the images for quality, 4. a diagnostic classifier that classifies the cervix into one of three diagnoses (normal, intermediate and precancer or above), and 5. thermal ablation using the attached thermocoagulator if indicated, otherwise surgical excision or deferral. Chapters 1 through 4 highlight multi-stage, comprehensive development of our cervical cancer screening pipeline, together with the technical innovations explored in our work to assess model repeatability and generalizability. Given the high clinical burden of cervical cancer, this pipeline can serve as a triage tool for human papillomavirus (HPV) positive women in low resource settings, where existing screening methods, such as biopsy and visual inspection after application of acetic acid (VIA) are invasive, expensive and/or inaccurate. A large-scale prospective efficacy and effectiveness study of this pipeline is already underway at the partner institutions of our consortium in low and middle-income countries. Further work towards generating AI models in other clinical domains, in particular, brain tumors is highlighted in the Appendix Chapter 5
Domain and User-Centered Machine Learning for Medical Image Analysis
The utilization of diagnostic imaging in the United States and worldwide is steadily growing. Due to a shortage of trained staff, the result is an increased and unsustainable workload for radiologists. Consequently, there is a high clinical need for the automation of cognitively challenging tasks, such as analyzing and interpreting medical images, to lighten the burden on radiologists and avoid a further increase in healthcare expenditure. Machine learning (ML), including deep learning (DL) offer a potential solution as these algorithms can learn to automatically recognize subtle patterns from large amounts of data and augment clinical decision-making.
Despite the high enthusiasm for ML algorithms, concerns regarding their readiness for clinical deployment are impeding their clinical translation. In this thesis, we address three fundamental challenges to the translation of ML algorithms into clinical care settings.
First, algorithms must perform robustly in routine clinical care settings. We demonstrate how appropriate image preprocessing improves the stability of handcrafted radiomic features extracted from brain MRIs. Second, the selected network design must be appropriate for a specific task. Here, we illustrate the advantages of shifting from a strictly discrete (ordinal) model of disease severity distribution to a continuously valued one. We introduce a generalized framework that can recover information lost by discretizing continuous variables into discrete training labels. Furthermore, disagreements in the labels generated by different annotators can be caused by individually varying decision thresholds. Therefore, we present the first design and demonstration of two methods that enable the joint learning of annotators’ ordinal classification and their individual biases for a latent, continuously valued target variable like disease severity. Lastly, the performance of ML algorithms needs to be evaluated in a clinically meaningful manner. We address the disconnect between the subjective quality perception of clinical experts and the metrics that are typically used to evaluate performance. Furthermore, we identify criteria that experts use to evaluate the quality of automatically generated segmentations and describe their thought processes as they correct them.
Based on the learnings from our work, we conclude with concrete recommendations for developing robust and trustworthy ML tools for medical imaging.Ph.D
Image registration and bias evaluation for a COVID-19 pulmonary X-ray severity (PXS) score prediction algorithm
As COVID-19 spreads, it is increasingly important to track disease trajectory in order to provide better care for patients. Analyzing chest x-rays (CXRs) is one method used by radiologists to assess disease severity, but manual interpretation is time-consuming and subject to inter- and intra-rater variability. One study has used a Siamese neural network to predict numerical COVID-19 pulmonary disease severity scores [19], but because CXRs from the same patient tend to have differences in positioning and acquisition unrelated to disease progression, image registration can be used to standardize the CXRs to improve longitudinal comparison. In this study, we show that affine image registration using Voxelmorph [3] has the potential to improve the prediction of longitudinal change in COVID-19 severity. Additionally, external generalization is a challenging problem for medical AI, and a model used in healthcare settings must be free of bias in order to be clinically valid. To this end, we analyze the performance of the Siamese prediction model on an external dataset and show that its predictions correlate with expert disease severity labels, and that it performs similarly for different demographic groups (age, sex, BMI, and international location). These preliminary results suggest that the model may be a reliable and equitable way, among the subgroups evaluated, to quantify disease severity labels.M.Eng
- …
