1,770,464 research outputs found
Academic Data Science Alliance: Member Book 2022-2023
The Academic Data Science Alliance is a community network for data science leaders, practitioners, and educators who take responsibility for a just, equitable future where data science approaches are thoughtfully applied in all domains for the benefit of all.
The 2022-2023 ADSA Member Book highlights the institutions who: have provided funds for ADSA through membership dues; support ADSA as a leading organization for academic data science; and agree with the guiding principles in ADSA's Mission, Vision, Values. ADSA member institutions represent a range of models, maturity levels, and target audiences at academic institutions in the US and beyond
Diversifying the genomic data science research community
Over the past 20 years, the explosion of genomic data collection and the cloud computing revolution have made computational and data science research accessible to anyone with a web browser and an internet connection. However, students at institutions with limited resources have received relatively little exposure to curricula or professional development opportunities that lead to careers in genomic data science. To broaden participation in genomics research, the scientific community needs to support these programs in local education and research at underserved institutions (UIs). These include community colleges, historically Black colleges and universities, Hispanic-serving institutions, and tribal colleges and universities that support ethnically, racially, and socioeconomically underrepresented students in the United States. We have formed the Genomic Data Science Community Network to support students, faculty, and their networks to identify opportunities and broaden access to genomic data science. These opportunities include expanding access to infrastructure and data, providing UI faculty development opportunities, strengthening collaborations among faculty, recognizing UI teaching and research excellence, fostering student awareness, developing modular and open-source resources, expanding course-based undergraduate research experiences (CUREs), building curriculum, supporting student professional development and research, and removing financial barriers through funding programs and collaborator support
Data science and the policy completion problem
The link between policy analysis and data science is more delicate than it may appear. A new policy, by de_nition, will change the underlying data generating model, rendering classi_cation or supervised learning inapplicable. Perhaps eliciting causal relations from observational data is the correct framework for estimating policy impact. However, there are substantial gaps between the theory, practice and feasibility of causal models. In this paper we argue that transduction, a form of inference where we reason from speci _c training instances to speci_c test instances, may provide an appropriate framework for evidence-based policy analysis. In particular, we will demonstrate that the matrix completion problem, introduced in the data science community for making predictions in recommendation systems, can be a powerful tool for both predicting and evaluating the impact of new policy changes
Improving Interoperability and Digitalization of the European Union Healthcare System: A Data Science Perspective on the ATHINA Platform and Threat Response.
reservedThis thesis represents the culmination of an end-of-studies project undertaken to obtain a Master's degree in Data Science from the prestigious University of Padova. Conducted as an internship in collaboration with Intellera Consulting and the University of Padova, this research delves into the imperative task of digitalizing the healthcare system. Specifically, it explores the efforts made by the European Union to establish an optimal framework for interoperability of healthcare information, aiming to enhance efficiency, improve patient care, promote interoperability, empower patients, and foster innovation. My involvement in this project encompassed its development, implementation, and assessment from a technical perspective.
The global health crisis in recent times has underscored the urgent need for a more efficient and coordinated approach to address such threats. To enhance our response capabilities, I will elucidate my contributions to the development of the Advanced Technology for Health INtelligence and Action IT System. This cutting-edge system leverages advanced data science techniques to support anticipatory threat assessment, enable swift evaluation of health threats, facilitate the prioritization of Medical Countermeasures (MCM) through sophisticated data analysis and decision-making models, and more. Through this research, we aim to contribute to the ongoing efforts in creating a resilient and proactive healthcare system capable of effectively addressing emerging challenges.This thesis represents the culmination of an end-of-studies project undertaken to obtain a Master's degree in Data Science from the prestigious University of Padova. Conducted as an internship in collaboration with Intellera Consulting and the University of Padova, this research delves into the imperative task of digitalizing the healthcare system. Specifically, it explores the efforts made by the European Union to establish an optimal framework for interoperability of healthcare information, aiming to enhance efficiency, improve patient care, promote interoperability, empower patients, and foster innovation. My involvement in this project encompassed its development, implementation, and assessment from a technical perspective.
The global health crisis in recent times has underscored the urgent need for a more efficient and coordinated approach to address such threats. To enhance our response capabilities, I will elucidate my contributions to the development of the Advanced Technology for Health INtelligence and Action IT System. This cutting-edge system leverages advanced data science techniques to support anticipatory threat assessment, enable swift evaluation of health threats, facilitate the prioritization of Medical Countermeasures (MCM) through sophisticated data analysis and decision-making models, and more. Through this research, we aim to contribute to the ongoing efforts in creating a resilient and proactive healthcare system capable of effectively addressing emerging challenges
Kaggle: Nigerians in Data Science
Analytical Report Publication on the 2021 Kaggle Machine Learning & Data Science Survey.
Documentation
Project Source
Competition
Kaggle Survey 2021
Publisher
Kaggle
License
Apache 2.0
NOTE: Make sure to Cite the Author, when you use any part of this report
The Generali Case: Deploying Data Science for Risk Selection in Life Insurance Underwriting
reservedThis thesis investigates how Data Science can enhance the underwriting process in life insurance, through a case study on Dr. Mouse, an AI-based virtual underwriter developed by Generali Italia. Underwriting plays a central role in assessing the risk profile of applicants, particularly for protection products such as Term Life and Long-Term Care. While most applications can be processed through standard business rules, more complex or ambiguous cases—often involving unstructured medical documentation or free-text health disclosures—still require manual review.
The research was conducted during my internship at Generali Italia, where I worked at the intersection of business and data functions. Specifically, I collaborated with both the Chief Life Office, including the underwriting and policy issuance teams, and the Data Office, within the Advanced Analytics Models team—comprising data scientists and engineers—where I now serve as a Data Scientist. This setting provided a unique perspective on both the operational constraints and the technical opportunities in automating risk selection.
Dr. Mouse tackles the challenge by automating the extraction and interpretation of health-related information from scanned medical reports, blood tests, and free-text answers. Its architecture combines OCR, BERT-based document classification, ML- and rule-based data extraction, and a risk prediction model built with XGBoost and SHAP explainability. Fully integrated into Generali’s digital sales platform, the system supports underwriters in making more consistent, transparent, and timely decisions.
The thesis describes the end-to-end pipeline, evaluates model performance, and discusses challenges such as document heterogeneity and limited training data. It also explores ongoing developments involving Generative AI, which are showing promising results in handling complex documents like oncological and cardiological reports—historically difficult for traditional approaches.
Ultimately, this work illustrates how intelligent automation can extend underwriting capabilities, reduce operational burden, and pave the way for more scalable and customer-friendly insurance processes.This thesis investigates how Data Science can enhance the underwriting process in life insurance, through a case study on Dr. Mouse, an AI-based virtual underwriter developed by Generali Italia. Underwriting plays a central role in assessing the risk profile of applicants, particularly for protection products such as Term Life and Long-Term Care. While most applications can be processed through standard business rules, more complex or ambiguous cases—often involving unstructured medical documentation or free-text health disclosures—still require manual review.
The research was conducted during my internship at Generali Italia, where I worked at the intersection of business and data functions. Specifically, I collaborated with both the Chief Life Office, including the underwriting and policy issuance teams, and the Data Office, within the Advanced Analytics Models team—comprising data scientists and engineers—where I now serve as a Data Scientist. This setting provided a unique perspective on both the operational constraints and the technical opportunities in automating risk selection.
Dr. Mouse tackles the challenge by automating the extraction and interpretation of health-related information from scanned medical reports, blood tests, and free-text answers. Its architecture combines OCR, BERT-based document classification, ML- and rule-based data extraction, and a risk prediction model built with XGBoost and SHAP explainability. Fully integrated into Generali’s digital sales platform, the system supports underwriters in making more consistent, transparent, and timely decisions.
The thesis describes the end-to-end pipeline, evaluates model performance, and discusses challenges such as document heterogeneity and limited training data. It also explores ongoing developments involving Generative AI, which are showing promising results in handling complex documents like oncological and cardiological reports—historically difficult for traditional approaches.
Ultimately, this work illustrates how intelligent automation can extend underwriting capabilities, reduce operational burden, and pave the way for more scalable and customer-friendly insurance processes
Data Science: An Introduction
This chapter gives a general introduction to data science as a concept and to the topics covered in this book. First, we present a rough definition of data science, and point out how it relates to the areas of statistics, machine learning and big data technologies. Then, we review some of the most relevant tools that can be used in data science ranging from optimization to software. We also discuss the relevance of building models from data. The chapter ends with a detailed review of the structure of the book.</p
A summary of the Data Science Session at INFORMATIK 2020
In this short article, we briefly summarize the Data Science session at INFORMATIK 2020. With three invited talks, the session focused on data-science challenges beyond the development of new machine learning models
Template (Jupyter Notebook) published in the Environmental Data Science book - snapshot
The research object refers to the Template notebook published in the Environmental Data Science book.Research Object in rohub2020: https://w3id.org/ro-id/92654099-ca41-4bc3-8450-0b5b267861a
Security of data science and data science for security
In this chapter, we present a brief overview of important topics regarding the connection of data science and security. In the first part, we focus on the security of data science and discuss a selection of security aspects that data scientists should consider to make their services and products more secure. In the second part about security for data science, we switch sides and present some applications where data science plays a critical role in pushing the state-of-the-art in securing information systems. This includes a detailed look at the potential and challenges of applying machine learning to the problem of detecting obfuscated JavaScripts
- …
