1,721,080 research outputs found

    Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

    No full text
    Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN

    A multiresolution approach for tensor completion from coarse and partial observations

    Full text link
    Existing tensor completion formulation mostly relies on partial observations from a single tensor. However, tensors extracted from real-world data often are more complex due to: (i) Partial observation: Only a small subset of tensor elements are available. (ii) Coarse observation: Some tensor modes only present coarse and aggregated patterns (e.g., monthly summary instead of daily reports). In this paper, we are given a subset of the tensor and some aggregated/coarse observations (along with one or more modes) and seek to recover the original fine-granular tensor with low-rank factorization. We formulate a coupled tensor completion problem and propose an efficient Multi-resolution Tensor Completion model (MTC) to solve the problem. Our MTC model explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares. MTC ensures low computational and space complexity. We evaluate our model on two COVID-19 related spatio-temporal tensors. The experiments show that MTC could provide 65.20% and 75.79% percentage of fitness (PoF) in tensor completion with only 5% fine granular observations, which is a 27.96% relative improvement over the best baseline. To evaluate the learned low-rank factors, we also design a tensor prediction task for daily and cumulative disease case predictions, where MTC achieves 50% in PoF and 30% relative improvements over the best baseline.Submission published under a 24 month embargo labeled 'U of I Access', the embargo will last until 2023-05-01The student, Chaoqi Yang, accepted the attached license on 2021-04-21 at 12:53.The student, Chaoqi Yang, submitted this Thesis for approval on 2021-04-21 at 13:49.This Thesis was approved for publication on 2021-04-23 at 16:46.DSpace SAF Submission Ingestion Package generated from Vireo submission #16467 on 2021-09-16 at 17:04:18Made available in DSpace on 2021-09-17T02:34:41Z (GMT). No. of bitstreams: 3 YANG-THESIS-2021.pdf: 1327009 bytes, checksum: 6e11175b7ab3b1625d5b5977553f07ff (MD5) MTC-main.zip: 33164799 bytes, checksum: 32745437c82edccc970455fda969b3f0 (MD5) LICENSE.txt: 4208 bytes, checksum: f7efebf4bd82759aee72ad53701fd1e4 (MD5) Previous issue date: 2021-04-23Embargo set by: Seth Robbins for item 118557 Lift date: 2023-09-17T02:34:57Z Reason: Author requested U of Illinois access only (OA after 2yrs) in Vireo ETD systemAuthor requested U of Illinois access only (OA after 2yrs) in Vireo ETD systemU of I Onl

    StructInf: Mining structural influence from social streams

    No full text
    Social influence is a fundamental issue in social network analysis and has attracted tremendous attention with the rapid growth of online social networks. However, existing research mainly focuses on studying peer influence. This paper introduces a novel notion of structural influence and studies how to efficiently discover structural influence patterns from social streams. We present three sampling algorithms with theoretical unbiased guarantee to speed up the discovery process. Experiments on a big microblogging dataset show that the proposed sampling algorithms can achieve a 10 times speedup compared to the exact influence pattern mining algorithm, with an average error rate of only 1.0%. The extracted structural influence patterns have many applications. We apply them to predict retweet behavior, with performance being significantly improved.<br/

    Automatic rare disease extraction based on large language models

    Full text link
    Submission original under an indefinite embargo labeled 'Open Access'. The submission was exported from vireo on 2024-09-16 without embargo termsThe student, Lang Cao, accepted the attached license on 2024-04-04 at 16:31.The student, Lang Cao, submitted this Thesis for approval on 2024-04-04 at 16:39.This Thesis was approved for publication on 2024-04-05 at 15:19.DSpace SAF Submission Ingestion Package generated from Vireo submission #20212 on 2024-09-16 at 00:33:04Identifying and extracting information on rare diseases is crucial in various medical contexts. However, mature rare disease extraction methods are lacking in low-resource settings. In this paper, we aim to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We achieve this using large language models and medical knowledge graphs developed from open-source medical ontologies. Large language models (LLMs) aid in language analysis, while knowledge graphs provide content-specific facts, thus filling in any information gaps. Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. Large language models and open-source data are leveraged throughout the pipeline. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. We quantitatively evaluate our system in terms of entity extraction, relation extraction, and the performance of knowledge graph construction. AutoRD achieves an overall F1 score of 47.3%, an improvement of 0.8% compared to the fine-tuned model, and a 14.4% improvement compared to the base LLM. In detail, AutoRD achieves an overall entity extraction F1 score of 56.1% (rare_disease: 83.5%, disease: 35.8%, symptom_and_sign: 46.1%, anaphor: 67.5%) and an overall relation extraction F1 score of 38.6% (produces: 34.7%, increases_risk_of: 12.4%, is_a: 37.4%, is_acronym: 44.1%, is_synonym: 16.3%, anaphora: 57.5%). Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Several designs, including the incorporation of ontologies-enhanced LLMs, contribute to the improvement of AutoRD. AutoRD demonstrates superior performance compared to other methods, demonstrating the potential of LLM applications in rare disease detection and AI for healthcare

    Automatic rare disease extraction based on large language models

    Full text link
    Submission original under an indefinite embargo labeled 'Open Access'. The submission was exported from vireo on 2024-09-16 without embargo termsThe student, Lang Cao, accepted the attached license on 2024-04-04 at 16:31.The student, Lang Cao, submitted this Thesis for approval on 2024-04-04 at 16:39.This Thesis was approved for publication on 2024-04-05 at 15:19.DSpace SAF Submission Ingestion Package generated from Vireo submission #20212 on 2024-09-16 at 00:33:04Identifying and extracting information on rare diseases is crucial in various medical contexts. However, mature rare disease extraction methods are lacking in low-resource settings. In this paper, we aim to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We achieve this using large language models and medical knowledge graphs developed from open-source medical ontologies. Large language models (LLMs) aid in language analysis, while knowledge graphs provide content-specific facts, thus filling in any information gaps. Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. Large language models and open-source data are leveraged throughout the pipeline. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. We quantitatively evaluate our system in terms of entity extraction, relation extraction, and the performance of knowledge graph construction. AutoRD achieves an overall F1 score of 47.3%, an improvement of 0.8% compared to the fine-tuned model, and a 14.4% improvement compared to the base LLM. In detail, AutoRD achieves an overall entity extraction F1 score of 56.1% (rare_disease: 83.5%, disease: 35.8%, symptom_and_sign: 46.1%, anaphor: 67.5%) and an overall relation extraction F1 score of 38.6% (produces: 34.7%, increases_risk_of: 12.4%, is_a: 37.4%, is_acronym: 44.1%, is_synonym: 16.3%, anaphora: 57.5%). Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Several designs, including the incorporation of ontologies-enhanced LLMs, contribute to the improvement of AutoRD. AutoRD demonstrates superior performance compared to other methods, demonstrating the potential of LLM applications in rare disease detection and AI for healthcare

    Aligning synthetic clinical trial data with human-preferred clinical endpoints

    No full text
    Submission published under a 24 month embargo labeled 'Closed Access', the embargo will last until 2026-12-01The student, Trisha Das, accepted the attached license on 2024-12-02 at 09:59.The student, Trisha Das, submitted this Thesis for approval on 2024-12-02 at 10:09.This Thesis was approved for publication on 2024-12-02 at 11:49.DSpace SAF Submission Ingestion Package generated from Vireo submission #21436 on 2025-03-28 at 14:55:31Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical trial data disregard the usage requirements of these data, including maintaining specific properties of clinical outcomes, and only use post hoc assessments that are not coupled with the data generation process. In this paper, we propose SynRL which leverages reinforcement learning to improve the performance of patient data generators by customizing the generated data to meet the user-specified requirements for synthetic data outcomes and endpoints. Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users’ needs based on the critic’s feedback. We performed experiments on four clinical trial datasets and demonstrated the advantages of SynRL in improving the quality of the generated synthetic data while keeping the privacy risks low. We also show that SynRL can be utilized as a general framework that can customize data generation of multiple types of synthetic data generators. Our code is available at https://anonymous.4open.science/r/SynRL-DB0F/

    Effective knowledge extraction and knowledge-enhanced machine learning for health

    Full text link
    Submission original under an indefinite embargo labeled 'Open Access'. The submission was exported from vireo on 2024-09-16 without embargo termsThe student, Pengcheng Jiang, accepted the attached license on 2024-04-16 at 00:16.The student, Pengcheng Jiang, submitted this Thesis for approval on 2024-04-16 at 00:32.This Thesis was approved for publication on 2024-04-29 at 10:00.DSpace SAF Submission Ingestion Package generated from Vireo submission #20428 on 2024-09-16 at 00:34:37This work explores the frontier of knowledge extraction and its application in enhancing machine learning models, with a special focus on healthcare. Through innovative methodologies, it presents a novel approach to deriving structured knowledge from unstructured data, leveraging the power of pre-trained language models and sophisticated text analysis techniques. The work introduces groundbreaking strategies for optimizing knowledge graph completion tasks, evaluating the efficiency and accuracy of knowledge extraction from textual data, and revolutionizing text summarization to improve knowledge extraction processes. Furthermore, it delves into the application of this extracted knowledge in healthcare, demonstrating the potential of knowledge-enhanced machine learning in predicting healthcare outcomes and molecule properties with unprecedented precision. This research not only advances the field of knowledge extraction and machine learning but also opens up new avenues for future research and applications, particularly in enhancing the quality of healthcare and drug discovery. Through its innovative methodologies and significant findings, this thesis underscores the transformative potential of artificial intelligence in extracting and leveraging knowledge for scientific and medical advancements

    Towards robust clinical predictive modeling with heterogeneous electronic health record data

    No full text
    Submission published under a 24 month embargo labeled 'U of I Access', the embargo will last until 2026-08-01The student, Zhenbang Wu, accepted the attached license on 2024-07-09 at 19:28.The student, Zhenbang Wu, submitted this Thesis for approval on 2024-07-09 at 19:37.This Thesis was approved for publication on 2024-07-15 at 11:43.DSpace SAF Submission Ingestion Package generated from Vireo submission #21028 on 2025-02-04 at 21:16:47With the widespread adoption of Electronic Health Record (EHR) systems, there has been increasing interest in leveraging deep learning for clinical predictive modeling. However, existing models typically assume a uniform feature and label space. In contrast, different hospitals often use varied EHR systems with unique schemas (i.e., feature space). Additionally, the clinical tasks (i.e., label space) can change dynamically. In this work, we present two methods to address discrepancies in feature and label spaces across different healthcare settings. AutoMap is designed to enable the deployment of clinical predictive models across hospitals with diverse medical coding systems. It automatically aligns medical codes across different EHR systems via ontology-level alignment and code-level refinement. EDGE is designed to recommend newly developed drugs, which often lack extensive historical prescription data. By formulating new drug recommendation as a few-shot learning problem, it employs a drug-dependent multi-phenotype few-shot learner to quickly adapt to new drugs. We validate both methods using real-world EHR datasets from MIMIC-III, MIMIC-IV, eICU, and Claims databases. Our results demonstrate their effectiveness in addressing the challenges posed by unmatched feature and label spaces in clinical predictive modeling
    corecore