1,721,036 research outputs found
Predicting the Onset of Chronic Obstructive Pulmonary Disease in the English Longitudinal Study of Ageing
Chronic obstructive pulmonary disease (COPD) is a chronic lung disease estimated to be responsible of about 5% of all deaths worldwide. The identification of subjects at risk of developing COPD is important to reduce its global burden, as early interventions on modifiable risk factors (e.g. smoking) can delay or even prevent the decline of lung function. A few models to predict risk of COPD onset in the general population were developed, which included a small set of risk factors. The aim of this work is to develop a new predictive model of COPD onset, testing the predictive ability of a variety of variables, including socio-economic and lifestyle factors, wellbeing status, respiratory symptoms, medical history, lung function measurements and blood test biomarkers. The model was developed by applying logistic regression to a training set (n=2897) extracted from the English Longitudinal Study of Ageing. Most important variables for COPD prediction were selected by least absolute shrinkage and selection operator regularization. The analysis showed that variables not considered by the literature models, such as physical activity, depression, marital status, self-reported health, fibrinogen, C-reactive protein and cholesterol can be important predictors of COPD onset. The derived model presented good discrimination and calibration performance on an independent test set (n=724), with area under the receiver-operating characteristic curve equal to 0.81 and expected-to-observed event ratio equal to 0.93. Future works include an external validation of the model, the use of different modelling techniques (e.g. survival models) and the application of variable ranking methods
Development of predictive models for short-term prediction of disability progression in multiple sclerosis
Multiple Sclerosis (MS) is an autoimmune degenerative disease of the central nervous system, in which chronic inflammation leads to demyelination with transient or permanent axon damage. Symptoms of MS include problems with vision, movement, sensation and balance, which can be intermittent or progressively increasing over time until bringing to permanent disability. Predictive models of MS disability progression can be very useful to support the clinician in choosing the best care for each patient. The aim of this work is to develop predictive models of short-term MS disability progression. Data are part of the Multiple Sclerosis Outcome Assessments Consortium (MSOAC) Placebo database, which includes longitudinal demographic and clinical data of 2465 MS patients who were enrolled in the control arm of different MS clinical trials. Variables collected in the first visit were used to predict a binary outcome of disability progression at 6 months and 18 months from the baseline, using a logistic regression model. Disability progression was defined as a 1.5 increase in the Expanded Disability Status Scale (EDSS) value compared to the baseline time. 20 input variables were considered in each model, including demographics, medical history, functional tests, questionnaires, and MS phenotype. Preprocessed data were split into a training and a test set with an 80%-20% proportion. Logistic regression models were trained on the training set, using over-/undersampling techniques for balancing the classes. The identified models were tested on the test set by assessing the area under the receiver operating characteristic curve (AUC). Prediction performance on the test set was satisfactory, although not optimal, with AUC equal to 0.74 at 6 months and 0.71 at 18 months. These prediction performances are comparable with results obtained by other literature studies on smaller cohorts. Future developments of this work include the use of other machine learning techniques for model training, the application of feature selection and variable ranking techniques, the incorporation of new variables (e.g., imaging variables), and the external validation of the models on new populations
AirPredict: a wearable sensor-based app to track particulate matter exposure and respiratory health
Air pollution poses a significant threat to public health, with Particulate Matter (PM) being one of the most harmful pollutants, especially for those suffering of chronic respiratory diseases. In this work, we propose AirPredict, a digital health mobile application designed to monitor personal PM exposure and respiratory outcomes for asthma patients. By integrating data from wearable sensors, the platform accurately assesses inhaled pollutant doses and estimates individual PM exposure, while users log essential clinical data daily offering a one-in-all solution. The evaluation in a 14-day beta session with an asthma patient demonstrated the platform's intuitive nature and positive user experience. The application's user-friendly interface empowers individuals to make informed decisions to minimize exposure and enhance their quality of life
A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models
Developing a prognostic model for biomedical applications typically requires mapping an individual's set of covariates to a measure of the risk that he or she may experience the event to be predicted. Many scenarios, however, especially those involving adverse pathological outcomes, are better described by explicitly accounting for the timing of these events, as well as their probability. As a result, in these cases, traditional classification or ranking metrics may be inadequate to inform model evaluation or selection. To address this limitation, it is common practice to reframe the problem in the context of survival analysis, and resort, instead, to the concordance index (C-index), which summarises how well a predicted risk score describes an observed sequence of events. A practically meaningful interpretation of the C-index, however, may present several difficulties and pitfalls. Specifically, we identify two main issues: i) the C-index remains implicitly, and subtly, dependent on time, and ii) its relationship with the number of subjects whose risk was incorrectly predicted is not straightforward. Failure to consider these two aspects may introduce undesirable and unwanted biases in the evaluation process, and even result in the selection of a suboptimal model. Hence, here, we discuss ways to obtain a meaningful interpretation in spite of these difficulties. Aiming to assist experimenters regardless of their familiarity with the C-index, we start from an introductory-level presentation of its most popular estimator, highlighting the latter's temporal dependency, and suggesting how it might be correctly used to inform model selection. We also address the nonlinearity of the C-index with respect to the number of correct risk predictions, elaborating a simplified framework that may enable an easier interpretation and quantification of C-index improvements or deteriorations
A Variable Ranking Method for Machine Learning Models with Correlated Features: In-Silico Validation and Application for Diabetes Prediction
When building a predictive model for predicting a clinical outcome using machine learning techniques, the model developers are often interested in ranking the features according to their predictive ability. A commonly used approach to obtain a robust variable ranking is to apply recursive feature elimination (RFE) on multiple resamplings of the training set and then to aggregate the ranking results using the Borda count method. However, the presence of highly correlated features in the training set can deteriorate the ranking performance. In this work, we propose a variant of the method based on RFE and Borda count that takes into account the correlation between variables during the ranking procedure in order to improve the ranking performance in the presence of highly correlated features. The proposed algorithm is tested on simulated datasets in which the true variable importance is known and compared to the standard RFE-Borda count method. According to the root mean square error between the estimated rank and the true (i.e., simulated) feature importance, the proposed algorithm overcomes the standard RFE-Borda count method. Finally, the proposed algorithm is applied to a case study related to the development of a predictive model of type 2 diabetes onset
Development of an error model for a factory-calibrated continuous glucose monitoring sensor with 10-day lifetime
Factory-calibrated continuous glucose monitoring (FC-CGM) sensors are new devices used in type 1 diabetes (T1D) therapy to measure the glucose concentration almost continuously for 10–14 days without requiring any in vivo calibration. Understanding and modelling CGM errors is important when designing new tools for T1D therapy. Available literature CGM error models are not suitable to describe the FC-CGM sensor error, since their domain of validity is limited to 12-h time windows, i.e., the time between two consecutive in vivo calibrations. The aim of this paper is to develop a model of the error of FC-CGM sensors. The dataset used contains 79 FC-CGM traces collected by the Dexcom G6 sensor. The model is designed to dissect the error into its three main components: effect of plasma-interstitium kinetics, calibration error, and random measurement noise. The main novelties are the model extension to cover the entire sensor lifetime and the use of a new single-step identification procedure. The final error model, which combines a first-order linear dynamic model to describe plasma-interstitium kinetics, a second-order polynomial model to describe calibration error, and an autoregressive model to describe measurement noise, proved to be suitable to describe FC-CGM sensor errors, in particular improving the estimation of the physiological time-delay
A model to forecast the two-year variation of subjective wellbeing in the elderly population
Background
The ageing global population presents significant public health challenges, especially in relation to the subjective wellbeing of the elderly. In this study, our aim was to investigate the potential for developing a model to forecast the two-year variation of the perceived wellbeing of individuals aged over 50. We also aimed to identify the variables that predict changes in subjective wellbeing, as measured by the CASP-12 scale, over a two-year period.
Methods
Data from the European SHARE project were used, specifically the demographic, health, social and financial variables of 9422 subjects. The subjective wellbeing was measured through the CASP-12 scale. The study outcome was defined as binary, i.e., worsening/not worsening of the variation of CASP-12 in 2 years. Logistic regression, logistic regression with LASSO regularisation, and random forest were considered candidate models. Performance was assessed in terms of accuracy in correctly predicting the outcome, Area Under the Curve (AUC), and F1 score.
Results
The best-performing model was the random forest, achieving an accuracy of 65%, AUC = 0.659, and F1 = 0.710. All models proved to be able to generalise both across subjects and over time. The most predictive variables were the CASP-12 score at baseline, the presence of depression and financial difficulties.
Conclusions
While we identify the random forest model as the more suitable, given the similarity of performance, the models based on logistic regression or on logistic regression with LASSO regularisation are also possible options
Assessment and comparison of the measurement error components of Dexcom G5 Mobile and Eversense continuous glucose monitoring systems
Assessing Personal Exposure to Airborne Particulate Matter with Wearable Sensors and Ventilation Rate Models
Air pollution is a major contributor to global morbidity and mortality. Accurate assessment of individual's exposure to air pollution is important to quantify the impact of air pollution on human health. Historically, human exposure to air pollution has been quantified using pollutant concentrations from fixed air quality monitoring stations. This approach does not consider the subject's activities and the differences between indoor and outdoor air pollution; however, these limitations can be overcome using wearable sensors. In this work, we propose a new approach to measure personal exposure to airborne Particulate Matter (PM) that consists in using a wearable/portable air quality sensor to measure air quality at the subject's location, a wearable Heart Rate (HR) sensor to collect HR timeseries, and a ventilation rate (VE) model to estimate the volume of inhaled air per minute (L/min) based on HR and other subject's covariates. Finally, VE and PM timeseries are combined to estimate the inhaled pollutant doses over time, as a measure of personal exposure. To model VE as a function of HR, 4 literature models are considered. The estimates obtained with the 4 models are compared in 3 representative subjects. Initial data analysis showed that the 4 models may drive to statistically significant differences in exposure estimates, thus the choice of the model can be a critical aspect of this approach. Regardless of the model used, timeseries of inhaled PM revealed significant daily variations in pollutant exposure, highlighting the importance of methodologies for accurate personal exposure assessment
Mid-Term Blood Glucose Prediction: A Hybrid Approach Using Grammatical Evolution and Physiological Models
- …
