Search CORE

1,770,464 research outputs found

Academic Data Science Alliance: Member Book 2022-2023

Author: Academic Data Science Alliance
Publication venue
Publication date: 09/03/2023
Field of study

The Academic Data Science Alliance is a community network for data science leaders, practitioners, and educators who take responsibility for a just, equitable future where data science approaches are thoughtfully applied in all domains for the benefit of all. The 2022-2023 ADSA Member Book highlights the institutions who: have provided funds for ADSA through membership dues; support ADSA as a leading organization for academic data science; and agree with the guiding principles in ADSA's Mission, Vision, Values. ADSA member institutions represent a range of models, maturity levels, and target audiences at academic institutions in the US and beyond

ZENODO

Diversifying the genomic data science research community

Author: Genomic Data Science Community Network .
Publication venue
Publication date: 20/07/2022
Field of study

Over the past 20 years, the explosion of genomic data collection and the cloud computing revolution have made computational and data science research accessible to anyone with a web browser and an internet connection. However, students at institutions with limited resources have received relatively little exposure to curricula or professional development opportunities that lead to careers in genomic data science. To broaden participation in genomics research, the scientific community needs to support these programs in local education and research at underserved institutions (UIs). These include community colleges, historically Black colleges and universities, Hispanic-serving institutions, and tribal colleges and universities that support ethnically, racially, and socioeconomically underrepresented students in the United States. We have formed the Genomic Data Science Community Network to support students, faculty, and their networks to identify opportunities and broaden access to genomic data science. These opportunities include expanding access to infrastructure and data, providing UI faculty development opportunities, strengthening collaborations among faculty, recognizing UI teaching and research excellence, fostering student awareness, developing modular and open-source resources, expanding course-based undergraduate research experiences (CUREs), building curriculum, supporting student professional development and research, and removing financial barriers through funding programs and collaborator support

Cold Spring Harbor Laboratory Institutional Repository

Data science and the policy completion problem

Author: Chawla Sanjay
Girosi Federico (R16991)
Wang Fei
Publication venue
Publication date: 01/01/2014
Field of study

The link between policy analysis and data science is more delicate than it may appear. A new policy, by de_nition, will change the underlying data generating model, rendering classi_cation or supervised learning inapplicable. Perhaps eliciting causal relations from observational data is the correct framework for estimating policy impact. However, there are substantial gaps between the theory, practice and feasibility of causal models. In this paper we argue that transduction, a form of inference where we reason from speci _c training instances to speci_c test instances, may provide an appropriate framework for evidence-based policy analysis. In particular, we will demonstrate that the matrix completion problem, introduced in the data science community for making predictions in recommendation systems, can be a powerful tool for both predicting and evaluating the impact of new policy changes

Western Sydney ResearchDirect

Improving Interoperability and Digitalization of the European Union Healthcare System: A Data Science Perspective on the ATHINA Platform and Threat Response.

Author: TELLEZ JUAREZ BRENDA ELOISA
Publication venue
Publication date: 2023
Field of study

reservedThis thesis represents the culmination of an end-of-studies project undertaken to obtain a Master's degree in Data Science from the prestigious University of Padova. Conducted as an internship in collaboration with Intellera Consulting and the University of Padova, this research delves into the imperative task of digitalizing the healthcare system. Specifically, it explores the efforts made by the European Union to establish an optimal framework for interoperability of healthcare information, aiming to enhance efficiency, improve patient care, promote interoperability, empower patients, and foster innovation. My involvement in this project encompassed its development, implementation, and assessment from a technical perspective. The global health crisis in recent times has underscored the urgent need for a more efficient and coordinated approach to address such threats. To enhance our response capabilities, I will elucidate my contributions to the development of the Advanced Technology for Health INtelligence and Action IT System. This cutting-edge system leverages advanced data science techniques to support anticipatory threat assessment, enable swift evaluation of health threats, facilitate the prioritization of Medical Countermeasures (MCM) through sophisticated data analysis and decision-making models, and more. Through this research, we aim to contribute to the ongoing efforts in creating a resilient and proactive healthcare system capable of effectively addressing emerging challenges.This thesis represents the culmination of an end-of-studies project undertaken to obtain a Master's degree in Data Science from the prestigious University of Padova. Conducted as an internship in collaboration with Intellera Consulting and the University of Padova, this research delves into the imperative task of digitalizing the healthcare system. Specifically, it explores the efforts made by the European Union to establish an optimal framework for interoperability of healthcare information, aiming to enhance efficiency, improve patient care, promote interoperability, empower patients, and foster innovation. My involvement in this project encompassed its development, implementation, and assessment from a technical perspective. The global health crisis in recent times has underscored the urgent need for a more efficient and coordinated approach to address such threats. To enhance our response capabilities, I will elucidate my contributions to the development of the Advanced Technology for Health INtelligence and Action IT System. This cutting-edge system leverages advanced data science techniques to support anticipatory threat assessment, enable swift evaluation of health threats, facilitate the prioritization of Medical Countermeasures (MCM) through sophisticated data analysis and decision-making models, and more. Through this research, we aim to contribute to the ongoing efforts in creating a resilient and proactive healthcare system capable of effectively addressing emerging challenges

Padua Thesis and Dissertation Archive

Kaggle: Nigerians in Data Science

Author: Alao David I.
Publication venue
Publication date: 24/11/2021
Field of study

Analytical Report Publication on the 2021 Kaggle Machine Learning & Data Science Survey. Documentation Project Source Competition Kaggle Survey 2021 Publisher Kaggle License Apache 2.0 NOTE: Make sure to Cite the Author, when you use any part of this report

ZENODO

The Generali Case: Deploying Data Science for Risk Selection in Life Insurance Underwriting

Author: KACI FLAVIO
Publication venue
Publication date: 2025
Field of study

reservedThis thesis investigates how Data Science can enhance the underwriting process in life insurance, through a case study on Dr. Mouse, an AI-based virtual underwriter developed by Generali Italia. Underwriting plays a central role in assessing the risk profile of applicants, particularly for protection products such as Term Life and Long-Term Care. While most applications can be processed through standard business rules, more complex or ambiguous cases—often involving unstructured medical documentation or free-text health disclosures—still require manual review. The research was conducted during my internship at Generali Italia, where I worked at the intersection of business and data functions. Specifically, I collaborated with both the Chief Life Office, including the underwriting and policy issuance teams, and the Data Office, within the Advanced Analytics Models team—comprising data scientists and engineers—where I now serve as a Data Scientist. This setting provided a unique perspective on both the operational constraints and the technical opportunities in automating risk selection. Dr. Mouse tackles the challenge by automating the extraction and interpretation of health-related information from scanned medical reports, blood tests, and free-text answers. Its architecture combines OCR, BERT-based document classification, ML- and rule-based data extraction, and a risk prediction model built with XGBoost and SHAP explainability. Fully integrated into Generali’s digital sales platform, the system supports underwriters in making more consistent, transparent, and timely decisions. The thesis describes the end-to-end pipeline, evaluates model performance, and discusses challenges such as document heterogeneity and limited training data. It also explores ongoing developments involving Generative AI, which are showing promising results in handling complex documents like oncological and cardiological reports—historically difficult for traditional approaches. Ultimately, this work illustrates how intelligent automation can extend underwriting capabilities, reduce operational burden, and pave the way for more scalable and customer-friendly insurance processes.This thesis investigates how Data Science can enhance the underwriting process in life insurance, through a case study on Dr. Mouse, an AI-based virtual underwriter developed by Generali Italia. Underwriting plays a central role in assessing the risk profile of applicants, particularly for protection products such as Term Life and Long-Term Care. While most applications can be processed through standard business rules, more complex or ambiguous cases—often involving unstructured medical documentation or free-text health disclosures—still require manual review. The research was conducted during my internship at Generali Italia, where I worked at the intersection of business and data functions. Specifically, I collaborated with both the Chief Life Office, including the underwriting and policy issuance teams, and the Data Office, within the Advanced Analytics Models team—comprising data scientists and engineers—where I now serve as a Data Scientist. This setting provided a unique perspective on both the operational constraints and the technical opportunities in automating risk selection. Dr. Mouse tackles the challenge by automating the extraction and interpretation of health-related information from scanned medical reports, blood tests, and free-text answers. Its architecture combines OCR, BERT-based document classification, ML- and rule-based data extraction, and a risk prediction model built with XGBoost and SHAP explainability. Fully integrated into Generali’s digital sales platform, the system supports underwriters in making more consistent, transparent, and timely decisions. The thesis describes the end-to-end pipeline, evaluates model performance, and discusses challenges such as document heterogeneity and limited training data. It also explores ongoing developments involving Generative AI, which are showing promising results in handling complex documents like oncological and cardiological reports—historically difficult for traditional approaches. Ultimately, this work illustrates how intelligent automation can extend underwriting capabilities, reduce operational burden, and pave the way for more scalable and customer-friendly insurance processes

Padua Thesis and Dissertation Archive

Data Science: An Introduction

Author: Said Alan,
Vicenç Torra
Torra Vicenç,
Alan Said
Publication venue
Publication date: 20/09/2018
Field of study

This chapter gives a general introduction to data science as a concept and to the topics covered in this book. First, we present a rough definition of data science, and point out how it relates to the areas of statistics, machine learning and big data technologies. Then, we review some of the most relevant tools that can be used in data science ranging from optimization to software. We also discuss the relevance of building models from data. The chapter ends with a detailed review of the structure of the book.</p

Crossref

Swepub

A summary of the Data Science Session at INFORMATIK 2020

Author: Böhm Klemens
König-Ries Birgitta
Publication venue
Publication date: 01/01/2021
Field of study

In this short article, we briefly summarize the Data Science session at INFORMATIK 2020. With three invited talks, the session focused on data-science challenges beyond the development of new machine learning models

Digital Library of Gesellschaft für Informatik e.V.

Template (Jupyter Notebook) published in the Environmental Data Science book - snapshot

Author: Environmental Data Science Book Community
Publication venue
Publication date: 31/10/2022
Field of study

The research object refers to the Template notebook published in the Environmental Data Science book.Research Object in rohub2020: https://w3id.org/ro-id/92654099-ca41-4bc3-8450-0b5b267861a

ZENODO

Security of data science and data science for security

Author: Schweizer Remo
Tellenbach Bernhard
Marc Rennhard
Bernhard Tellenbach
Rennhard Marc
Remo Schweizer
Publication venue
Publication date: 01/01/2019
Field of study

In this chapter, we present a brief overview of important topics regarding the connection of data science and security. In the first part, we focus on the security of data science and discuss a selection of security aspects that data scientists should consider to make their services and products more secure. In the second part about security for data science, we switch sides and present some applications where data science plays a critical role in pushing the state-of-the-art in securing information systems. This includes a detailed look at the potential and challenges of applying machine learning to the problem of detecting obfuscated JavaScripts

Crossref

ZHAW digitalcollection

ZHAW digitalcollection (Zurich Univ. of Applied Sciences)