IT University of Copenhagen

The IT University of Copenhagen's Repository

Not a member yet

9607 research outputs found

Sort by

data2lang2vec: Data Driven Typological Features Completion

Author: Amirzadeh Hamidreza
Jafari Sadegh
Harju Anika
van der Goot Rob
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2025
Field of study

Language typology databases enhance multilingual Natural Language Processing (NLP) by improving model adaptability to diverse linguistic structures. The widely-used lang2vec toolkit integrates several such databases, but its coverage remains limited at 28.9%. Previous work on automatically increasing coverage predicts missing values based on features from other languages or focuses on single features; we propose to use textual data for better-informed feature prediction. To this end, we introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70% accuracy across 1,749 languages, and experiment with external statistical features and a variety of machine learning algorithms. We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features, and show that our approach outperforms previous work in both setups

Toward more realistic career path prediction: evaluation and methods

Author: Senger Elena
Campbell Yuri
van der Goot Rob
Plank Barbara
Publication venue
Publication date: 25/08/2025
Field of study

Predicting career trajectories is a complex yet impactful task, offering significant benefits for personalized career counseling, recruitment optimization, and workforce planning. However, effective career path prediction (CPP) modeling faces challenges including highly variable career trajectories, free-text resume data, and limited publicly available benchmark datasets. In this study, we present a comprehensive comparative evaluation of CPP models—linear projection, multilayer perceptron (MLP), LSTM, and large language models (LLMs)—across multiple input settings and two recently introduced public datasets. Our contributions are threefold: (1) we propose novel model variants, including an MLP extension and a standardized LLM approach, (2) we systematically evaluate model performance across input types (titles only vs. title+description, standardized vs. free-text), and (3) we investigate the role of synthetic data and fine-tuning strategies in addressing data scarcity and improving model generalization. Additionally, we provide a detailed qualitative analysis of prediction behaviors across industries, career lengths, and transitions. Our findings establish new baselines, reveal the trade-offs of different modeling strategies, and offer practical insights for deploying CPP systems in real-world settings

"I tell him everything that I do": An investigation of privacy and safety implications of AI companion usage

Author: Henriksen Anine
Asadi Raha
Gerdes Anne
Kulyk Oksana; id_orcid
Mayer Peter
Publication venue
Publication date: 2025
Field of study

Prediction

Author: Galis Vasilis; id_orcid
Oppen Ingebrigtsen Gundhus Helen
Kilis Emil
Publication venue: De Gruyter
Publication date: 03/03/2025
Field of study

Prediction has a long history in the social sciences, and advances in comput-ing and statistics have transformed our ability to predict in a wide range of domains.However, concerns have been raised about an indiscriminate application of a predic-tive logic, and crime is an area where this is quite pronounced. Indeed, while the po-lice, correctional service, and criminal courts have become increasingly reliant on dig-ital systems of prediction, critics have drawn our attention to numerous issues andcomplexities attendant to this process. This chapter looks at prediction in the crimino-logical realm and provides an overview of key arguments concerning the way data aregenerated, organized, and used as input for predictive tools and technologies, and howthe results are interpreted in the context of criminal justice. By doing so, it aims toshow that the discussions surrounding prediction highlight how digital tools are trans-forming the nature of knowledge and expertise within the criminal justice syste

Path to GPU-Initiated I/O for Data-Intensive Systems

Author: Torp Karl B.
Lund Simon A. F.
Tözün Pinar; id_orcid
Publication venue: Association for Computing Machinery
Publication date: 22/06/2025
Field of study

The process of training and serving deep learning (DL) models is computationally expensive, mandating the use of powerful and expensive accelerators such as GPUs and TPUs. Furthermore, the prevalence of GPUs in data centers today motivate developing database systems that can leverage the available GPU resources. Both the latency of DL tasks and database queries and high utilization of these accelerators depend on how efficiently we can move the data to the accelerators. Given today’s dataset sizes, fitting everything in GPU or even CPU memory is not always feasible or can be expensive. The I/O path while fetching the data from disks, however, still dominantly relies on CPUs.In this work, we take a step toward understanding today’s landscape for optimizing the I/O path for reading data to GPUs from disks, with a focus on SSDs. First, we review the prominent technologies that target GPU-centric storage accesses. Then, we dive deeper into BaM, as the state-of-the-art method for GPU-centric storage, and evaluate its performance in comparison to the state-of-theart CPU-centric storage interface SPDK. Our results demonstrate that while BaM is able to match the performance of SPDK without involving CPUs on the I/O path, this comes at the cost of a very high GPU use. Finally, we highlight future research directions to enable an I/O path that is both efficient and easy-to-adopt for data-intensive systems that use GPUs

Quantifiers for Differentiable Logics in Rocq (Extended Abstract)

Author: Bruni Alessandro
Publication venue
Publication date: 01/01/2025
Field of study

The interpretation of logical expressions into loss functions has given rise to so-called differentiable logics. They function as a bridge between formal logic and machine learning, offering a novel approach for property-driven training. The added expressiveness of these logics comes at the price of a more intricate semantics for first-order quantifiers. To ease their integration into machine-learning backends, we explore how to formalize semantics for first-order differentiable logics using the Mathematical Components library in the Rocq proof assistant. We seek to give rigorous semantics for quantifiers, verify their properties with respect to other logical connectives, as well as prove the soundness and completeness of the resulting logics

Toward more realistic career path prediction: evaluation and methods

Author: Senger Elena
Campbell Yuri
van der Goot Rob
Plank Barbara
Publication venue
Publication date: 25/08/2025
Field of study

Evaluating Quality of Gaming Narratives Co-created with AI

Author: Valdivia Martinez Arturo
Burelli Paolo; id_orcid
Publication venue: IEEE
Publication date: 01/01/2025
Field of study

This paper proposes a structured methodology to evaluate AI-generated game narratives, leveraging the Delphi study structure with a panel of narrative design experts. Our approach synthesizes story quality dimensions from literature and expert insights, mapping them into the Kano model framework to understand their impact on player satisfaction. The results can inform game developers on prioritizing quality aspects when co-creating game narratives with generative AI

Mask of Truth: Model Sensitivity to Unexpected Regions of Medical Images

Author: Sourget Théo; id_orcid
Hestbek-Møller Michelle
Jiménez Sánchez Amelia
Xu Jack Junchi
Cheplygina Veronika; id_orcid
Publication venue
Publication date: 20/05/2025
Field of study

The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an area under the curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chákṣu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge

Functional Reactive GUI Programming with Modal Types.

Author: Disch Jean-Claude
Heegaard Asger
Bahr Patrick; id_orcid
Publication venue: Springer
Publication date: 01/10/2025
Field of study

Functional reactive programming (FRP) is a programming paradigm for implementing reactive systems, i.e. programs that continuously interact with their environments. While FRP allows for a functional, high-level programming style, FRP programs are prone to undesirable operational behaviours such as space leaks. To ensure favourable operational properties of FRP programs, modal type systems have been introduced, which – among other things – make it impossible to write FRP programs with implicit space leaks. In a recent development, several modal FRP languages have been introduced that are able to accommodate asynchronous events and behaviours – motivated by the goal to use such languages for GUI programming.This paper explores the suitability of one such asynchronous modal FRP language – called Async Rattus – for GUI programming in practice. To this end, we have implemented a mild extension of the Async Rattus language and used it to implement a small GUI framework. We demonstrate the language and its GUI framework by a number of case studies

4,472

full texts

9,607

metadata records

Updated in last 30 days.

The IT University of Copenhagen's Repository

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇