1,720,967 research outputs found
Empowering Health Care Actors to Contribute to the Implementation of Health Data Integration Platforms: Retrospective of the medEmotion Project
Health data integration platforms are vital to drive collaborative, interdisciplinary medical research projects. Developing such a platform requires input from different stakeholders. Managing these stakeholders and steering platform development is challenging, and misaligning the platform to the partners' strategies might lead to a low acceptance of the final platform. We present the medEmotion project, a collaborative effort among 7 partners from health care, academia, and industry to develop a health data integration platform for the region of Limburg in Belgium. We focus on the development process and stakeholder engagement, aiming to give practical advice for similar future efforts based on our reflections on medEmotion. We introduce Personas to paraphrase different roles that stakeholders take and Demonstrators that summarize personas' requirements with respect to the platform. Both the personas and the demonstrators serve 2 purposes. First, they are used to define technical requirements for the medEmotion platform. Second, they represent a communication vehicle that simplifies discussions among all stakeholders. Based on the personas and demonstrators, we present the medEmotion platform based on components from the Microsoft Azure cloud. The demonstrators are based on real-world use cases and showcase the utility of the platform. We reflect on the development process of medEmotion and distill takeaway messages that will be helpful for future projects. Investing in community building, stakeholder engagement, and education is vital to building an ecosystem for a health data integration platform. Instead of academic-led projects, the health care providers themselves ideally drive collaboration among health care providers. The providers are best positioned to address hospital-specific requirements, while academics take a neutral mediator role. This also includes the ideation phase, where it is vital to ensure the involvement of all stakeholders. Finally, balancing innovation with implementation is key to developing an innovative yet sustainable health data integration platform.We thank our 3 partner hospitals Jessa Ziekenhuis, Noorderhart, and Ziekenhuis Oost-Limburg for their contributions toward the medEmotion project. Further, we thank the Limburg Clinical Research Center for sharing their expertise in clinical research projects. The software development of the medEmotion platform was funded by LRM, with the support of the European Regional Development Fund (EFRO-1308). This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program
Schema Matching with Large Language Models: an Experimental Study
Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality,
verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government
Schema Matching with Large Language Models: an Experimental Study
Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality,
verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government
Applying FAIRness: Redesigning a Biomedical Informatics Research Data Management Pipeline
Lehrkräfte an öffentlichen Schulen 2015/2016 ; Ergebnisbericht
Im Rahmen einer empirischen Studie im Auftrag der Gewerkschaft Erziehung und Wissenschaften Niedersachsen haben 2.869 Lehrerinnen und Lehrer ein Jahr lang ihre Arbeitszeiten zeitnah und minutengenau in einem speziell entwickelten Zeiterfassungstool für schulische Lehrtätigkeiten eingetragen. Auf diese Weise wurde erstmalig im großen Umfang die Arbeitszeit von Lehrkräften systematisch in Niedersachsen erfasst, statt auf Schätzverfahren zurückgreifen zu müssen. Die Arbeitszeit wurde nach ihrem tatsächlichen Umfang, ihrer Lage, der Zeit- und Tätigkeitsstruktur sowie nach Schulformen erfasst und in einem Soll-Ist-Vergleich mit Normvorgaben verglichen. Erhebungszeitraum war Ostern 2015 bis Ostern 2016. Studiendesign: a.) Der Untersuchungszeitraum erstreckt sich auf ein komplettes pädagogisches Jahr, es wurden alle relevanten Arbeitsphasen, die Einfluss auf die Arbeitszeit von Lehrerinnen und Lehrern haben, berücksichtigt. b.) Untersuchungsgegenstand sind alle niedersächsischen Schulformen in öffentlicher Trägerschaft. In den drei Schulformen Grundschule, Gesamtschule und Gymnasium liegen repräsentative Stichprobenergebnisse vor und können auf die niedersächsische Grundgesamtheit übertragen werden. c.) Grundlage ist ein normenkonformes und praxistaugliches Verfahren zur Erfassung und Systematisierung von Lehrerarbeitszeiten, das im Rahmen der Studie erstmalig landesweit umgesetzt wurde. Die differenzierte Auswertung von sechs Schulformen (drei repräsentativ, drei nicht-repräsentativ) wird um schulformübergreifende Auswertungen nach Tätigkeiten, Alter, Beschäftigungsumfang (Teilzeit/Vollzeit), Geschlecht, Region (Bezirke) und Einzugsgebiet der Schülerschaft (Stadt/Land) ergänzt, bei denen auch arbeitswissenschaftliche Gesundheitsfragen (Überlange Arbeitszeiten, Mehrarbeit, Arbeit trotz Krankheit sowie Erholungsmöglichkeiten und Entgrenzungstendenzen) untersucht werden
Measuring Approximate Functional Dependencies: A Comparative Study
Approximate functional dependencies (AFDs) are functional dependencies (FDs) that “almost” hold in a relation. While various measures have been proposed to quantify the level to which an FD holds approximately, they are difficult to compare and it is unclear which measure is preferable when one needs to discover FDs in real-world data, i.e., data that only approximately satisfies the FD. In response, this paper formally and qualitatively compares AFD measures. We obtain a formal comparison through a novel presentation of measures in terms of Shannon and logical entropy. Qualitatively, we perform a sensitivity analysis w.r.t. structural properties of input relations and quantitatively study the effectiveness of AFD measures for ranking AFDs on real world data. Based on this analysis, we give clear recommendations for the AFD measures to use in practice.We thank Dan Suciu for helpful discussions. S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under
Grant No. BOF20ZAP02. This research received funding from the Flemish
Government under the “Onderzoeksprogramma Artificiele Intelligentie (AI) ¨
Vlaanderen” programme. This work was supported by Research Foundation—Flanders (FWO) for ELIXIR Belgium (I002819N). The resources and
services used in this work were provided by the VSC (Flemish Supercomputer
Center), funded by the Research Foundation – Flanders (FWO) and the
Flemish Government
Measuring approximate functional dependencies: a comparative study
Approximate functional dependencies (abbreviated: AFDs) are functional dependencies (FDs) that "almost" hold in a relation. While various measures have been proposed to quantify the level to which an FD holds approximately, they are difficult to compare and it is unclear which measure is preferable when one needs to discover FDs in real-world data, i.e., data that only approximately satisfies the FD. In response, this paper formally and qualitatively compares AFD measures. We obtain a formal comparison through a novel presentation of measures in terms of Shannon and logical entropy. Qualitatively, we perform a sensitivity analysis w.r.t. structural properties of input relations. Quantitatively, we study the effectiveness of AFD measures for ranking linear AFDs on real world data. Based on this analysis, we give clear recommendations for the AFD measures to use in practice.We thank Dan Suciu for helpful discussions. S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders (FWO) for ELIXIR Belgium (I002819N). The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation – Flanders (FWO) and the Flemish Government
- …
