Search CORE

1,721,080 research outputs found

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

Author: Malin Bradley
Sun Jimeng
Duke Jon
Choi Yoonjae
Stewart Walter F
Biswal Siddharth
Publication venue
Publication date: 2017
Field of study

Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN

KAIST Institutional Repository

A multiresolution approach for tensor completion from coarse and partial observations

Author: Yang Chaoqi
Publication venue
Publication date: 2021
Field of study

Existing tensor completion formulation mostly relies on partial observations from a single tensor. However, tensors extracted from real-world data often are more complex due to: (i) Partial observation: Only a small subset of tensor elements are available. (ii) Coarse observation: Some tensor modes only present coarse and aggregated patterns (e.g., monthly summary instead of daily reports). In this paper, we are given a subset of the tensor and some aggregated/coarse observations (along with one or more modes) and seek to recover the original fine-granular tensor with low-rank factorization. We formulate a coupled tensor completion problem and propose an efficient Multi-resolution Tensor Completion model (MTC) to solve the problem. Our MTC model explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares. MTC ensures low computational and space complexity. We evaluate our model on two COVID-19 related spatio-temporal tensors. The experiments show that MTC could provide 65.20% and 75.79% percentage of fitness (PoF) in tensor completion with only 5% fine granular observations, which is a 27.96% relative improvement over the best baseline. To evaluate the learned low-rank factors, we also design a tensor prediction task for daily and cumulative disease case predictions, where MTC achieves 50% in PoF and 30% relative improvements over the best baseline.Submission published under a 24 month embargo labeled 'U of I Access', the embargo will last until 2023-05-01The student, Chaoqi Yang, accepted the attached license on 2021-04-21 at 12:53.The student, Chaoqi Yang, submitted this Thesis for approval on 2021-04-21 at 13:49.This Thesis was approved for publication on 2021-04-23 at 16:46.DSpace SAF Submission Ingestion Package generated from Vireo submission #16467 on 2021-09-16 at 17:04:18Made available in DSpace on 2021-09-17T02:34:41Z (GMT). No. of bitstreams: 3 YANG-THESIS-2021.pdf: 1327009 bytes, checksum: 6e11175b7ab3b1625d5b5977553f07ff (MD5) MTC-main.zip: 33164799 bytes, checksum: 32745437c82edccc970455fda969b3f0 (MD5) LICENSE.txt: 4208 bytes, checksum: f7efebf4bd82759aee72ad53701fd1e4 (MD5) Previous issue date: 2021-04-23Embargo set by: Seth Robbins for item 118557 Lift date: 2023-09-17T02:34:57Z Reason: Author requested U of Illinois access only (OA after 2yrs) in Vireo ETD systemAuthor requested U of Illinois access only (OA after 2yrs) in Vireo ETD systemU of I Onl

Illinois Digital Environment for Access to Learning and Scholarship Repository

GBASE: A Scalable and General Graph Management System

Author: Tong Hanghang
Christos Faloutsos
Kang U
Lin Ching-Yung
Sun Jimeng
Publication venue
Publication date: 22/08/2011
Field of study

KAIST Institutional Repository

StructInf: Mining structural influence from social streams

Author: Zhang Jing
Hall Wendy
Zhong Yuanyi
Yuanyi Zhong
Li Juanzi
Sun Jimeng
Mo Yuchen
Tang Jie
Song Guojie
Publication venue
Publication date: 10/02/2017
Field of study

Social influence is a fundamental issue in social network analysis and has attracted tremendous attention with the rapid growth of online social networks. However, existing research mainly focuses on studying peer influence. This paper introduces a novel notion of structural influence and studies how to efficiently discover structural influence patterns from social streams. We present three sampling algorithms with theoretical unbiased guarantee to speed up the discovery process. Experiments on a big microblogging dataset show that the proposed sampling algorithms can achieve a 10 times speedup compared to the exact influence pattern mining algorithm, with an average error rate of only 1.0%. The extracted structural influence patterns have many applications. We apply them to predict retweet behavior, with performance being significantly improved.<br/

Southampton (e-Prints Soton)

Association for the Advancement of Artificial Intelligence: AAAI Publications

Automatic rare disease extraction based on large language models

Author: Cao Lang
Publication venue
Publication date: 05/04/2024
Field of study

Submission original under an indefinite embargo labeled 'Open Access'. The submission was exported from vireo on 2024-09-16 without embargo termsThe student, Lang Cao, accepted the attached license on 2024-04-04 at 16:31.The student, Lang Cao, submitted this Thesis for approval on 2024-04-04 at 16:39.This Thesis was approved for publication on 2024-04-05 at 15:19.DSpace SAF Submission Ingestion Package generated from Vireo submission #20212 on 2024-09-16 at 00:33:04Identifying and extracting information on rare diseases is crucial in various medical contexts. However, mature rare disease extraction methods are lacking in low-resource settings. In this paper, we aim to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We achieve this using large language models and medical knowledge graphs developed from open-source medical ontologies. Large language models (LLMs) aid in language analysis, while knowledge graphs provide content-specific facts, thus filling in any information gaps. Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. Large language models and open-source data are leveraged throughout the pipeline. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. We quantitatively evaluate our system in terms of entity extraction, relation extraction, and the performance of knowledge graph construction. AutoRD achieves an overall F1 score of 47.3%, an improvement of 0.8% compared to the fine-tuned model, and a 14.4% improvement compared to the base LLM. In detail, AutoRD achieves an overall entity extraction F1 score of 56.1% (rare_disease: 83.5%, disease: 35.8%, symptom_and_sign: 46.1%, anaphor: 67.5%) and an overall relation extraction F1 score of 38.6% (produces: 34.7%, increases_risk_of: 12.4%, is_a: 37.4%, is_acronym: 44.1%, is_synonym: 16.3%, anaphora: 57.5%). Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Several designs, including the incorporation of ontologies-enhanced LLMs, contribute to the improvement of AutoRD. AutoRD demonstrates superior performance compared to other methods, demonstrating the potential of LLM applications in rare disease detection and AI for healthcare

Illinois Digital Environment for Access to Learning and Scholarship Repository

Automatic rare disease extraction based on large language models

Author: Cao Lang
Publication venue
Publication date: 05/04/2024
Field of study

Illinois Digital Environment for Access to Learning and Scholarship Repository

Aligning synthetic clinical trial data with human-preferred clinical endpoints

Author: Das Trisha
Publication venue
Publication date: 2024
Field of study

Submission published under a 24 month embargo labeled 'Closed Access', the embargo will last until 2026-12-01The student, Trisha Das, accepted the attached license on 2024-12-02 at 09:59.The student, Trisha Das, submitted this Thesis for approval on 2024-12-02 at 10:09.This Thesis was approved for publication on 2024-12-02 at 11:49.DSpace SAF Submission Ingestion Package generated from Vireo submission #21436 on 2025-03-28 at 14:55:31Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical trial data disregard the usage requirements of these data, including maintaining specific properties of clinical outcomes, and only use post hoc assessments that are not coupled with the data generation process. In this paper, we propose SynRL which leverages reinforcement learning to improve the performance of patient data generators by customizing the generated data to meet the user-specified requirements for synthetic data outcomes and endpoints. Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users’ needs based on the critic’s feedback. We performed experiments on four clinical trial datasets and demonstrated the advantages of SynRL in improving the quality of the generated synthetic data while keeping the privacy risks low. We also show that SynRL can be utilized as a general framework that can customize data generation of multiple types of synthetic data generators. Our code is available at https://anonymous.4open.science/r/SynRL-DB0F/

Illinois Digital Environment for Access to Learning and Scholarship Repository

Effective knowledge extraction and knowledge-enhanced machine learning for health

Author: Jiang Pengcheng
Publication venue
Publication date: 29/04/2024
Field of study

Submission original under an indefinite embargo labeled 'Open Access'. The submission was exported from vireo on 2024-09-16 without embargo termsThe student, Pengcheng Jiang, accepted the attached license on 2024-04-16 at 00:16.The student, Pengcheng Jiang, submitted this Thesis for approval on 2024-04-16 at 00:32.This Thesis was approved for publication on 2024-04-29 at 10:00.DSpace SAF Submission Ingestion Package generated from Vireo submission #20428 on 2024-09-16 at 00:34:37This work explores the frontier of knowledge extraction and its application in enhancing machine learning models, with a special focus on healthcare. Through innovative methodologies, it presents a novel approach to deriving structured knowledge from unstructured data, leveraging the power of pre-trained language models and sophisticated text analysis techniques. The work introduces groundbreaking strategies for optimizing knowledge graph completion tasks, evaluating the efficiency and accuracy of knowledge extraction from textual data, and revolutionizing text summarization to improve knowledge extraction processes. Furthermore, it delves into the application of this extracted knowledge in healthcare, demonstrating the potential of knowledge-enhanced machine learning in predicting healthcare outcomes and molecule properties with unprecedented precision. This research not only advances the field of knowledge extraction and machine learning but also opens up new avenues for future research and applications, particularly in enhancing the quality of healthcare and drug discovery. Through its innovative methodologies and significant findings, this thesis underscores the transformative potential of artificial intelligence in extracting and leveraging knowledge for scientific and medical advancements

Illinois Digital Environment for Access to Learning and Scholarship Repository

Towards robust clinical predictive modeling with heterogeneous electronic health record data

Author: Wu Zhenbang
Publication venue
Publication date: 2024
Field of study

Submission published under a 24 month embargo labeled 'U of I Access', the embargo will last until 2026-08-01The student, Zhenbang Wu, accepted the attached license on 2024-07-09 at 19:28.The student, Zhenbang Wu, submitted this Thesis for approval on 2024-07-09 at 19:37.This Thesis was approved for publication on 2024-07-15 at 11:43.DSpace SAF Submission Ingestion Package generated from Vireo submission #21028 on 2025-02-04 at 21:16:47With the widespread adoption of Electronic Health Record (EHR) systems, there has been increasing interest in leveraging deep learning for clinical predictive modeling. However, existing models typically assume a uniform feature and label space. In contrast, different hospitals often use varied EHR systems with unique schemas (i.e., feature space). Additionally, the clinical tasks (i.e., label space) can change dynamically. In this work, we present two methods to address discrepancies in feature and label spaces across different healthcare settings. AutoMap is designed to enable the deployment of clinical predictive models across hospitals with diverse medical coding systems. It automatically aligns medical codes across different EHR systems via ontology-level alignment and code-level refinement. EDGE is designed to recommend newly developed drugs, which often lack extensive historical prescription data. By formulating new drug recommendation as a few-shot learning problem, it employs a drug-dependent multi-phenotype few-shot learner to quickly adapt to new drugs. We validate both methods using real-world EHR datasets from MIMIC-III, MIMIC-IV, eICU, and Claims databases. Our results demonstrate their effectiveness in addressing the challenges posed by unmatched feature and label spaces in clinical predictive modeling

Illinois Digital Environment for Access to Learning and Scholarship Repository

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)