IT University of Copenhagen

The IT University of Copenhagen's Repository
Not a member yet
    9607 research outputs found

    data2lang2vec: Data Driven Typological Features Completion

    No full text
    Language typology databases enhance multilingual Natural Language Processing (NLP) by improving model adaptability to diverse linguistic structures. The widely-used lang2vec toolkit integrates several such databases, but its coverage remains limited at 28.9%. Previous work on automatically increasing coverage predicts missing values based on features from other languages or focuses on single features; we propose to use textual data for better-informed feature prediction. To this end, we introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70% accuracy across 1,749 languages, and experiment with external statistical features and a variety of machine learning algorithms. We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features, and show that our approach outperforms previous work in both setups

    Toward more realistic career path prediction: evaluation and methods

    No full text
    Predicting career trajectories is a complex yet impactful task, offering significant benefits for personalized career counseling, recruitment optimization, and workforce planning. However, effective career path prediction (CPP) modeling faces challenges including highly variable career trajectories, free-text resume data, and limited publicly available benchmark datasets. In this study, we present a comprehensive comparative evaluation of CPP models—linear projection, multilayer perceptron (MLP), LSTM, and large language models (LLMs)—across multiple input settings and two recently introduced public datasets. Our contributions are threefold: (1) we propose novel model variants, including an MLP extension and a standardized LLM approach, (2) we systematically evaluate model performance across input types (titles only vs. title+description, standardized vs. free-text), and (3) we investigate the role of synthetic data and fine-tuning strategies in addressing data scarcity and improving model generalization. Additionally, we provide a detailed qualitative analysis of prediction behaviors across industries, career lengths, and transitions. Our findings establish new baselines, reveal the trade-offs of different modeling strategies, and offer practical insights for deploying CPP systems in real-world settings

    Prediction

    No full text
    Prediction has a long history in the social sciences, and advances in comput-ing and statistics have transformed our ability to predict in a wide range of domains.However, concerns have been raised about an indiscriminate application of a predic-tive logic, and crime is an area where this is quite pronounced. Indeed, while the po-lice, correctional service, and criminal courts have become increasingly reliant on dig-ital systems of prediction, critics have drawn our attention to numerous issues andcomplexities attendant to this process. This chapter looks at prediction in the crimino-logical realm and provides an overview of key arguments concerning the way data aregenerated, organized, and used as input for predictive tools and technologies, and howthe results are interpreted in the context of criminal justice. By doing so, it aims toshow that the discussions surrounding prediction highlight how digital tools are trans-forming the nature of knowledge and expertise within the criminal justice syste

    Path to GPU-Initiated I/O for Data-Intensive Systems

    No full text
    The process of training and serving deep learning (DL) models is computationally expensive, mandating the use of powerful and expensive accelerators such as GPUs and TPUs. Furthermore, the prevalence of GPUs in data centers today motivate developing database systems that can leverage the available GPU resources. Both the latency of DL tasks and database queries and high utilization of these accelerators depend on how efficiently we can move the data to the accelerators. Given today’s dataset sizes, fitting everything in GPU or even CPU memory is not always feasible or can be expensive. The I/O path while fetching the data from disks, however, still dominantly relies on CPUs.In this work, we take a step toward understanding today’s landscape for optimizing the I/O path for reading data to GPUs from disks, with a focus on SSDs. First, we review the prominent technologies that target GPU-centric storage accesses. Then, we dive deeper into BaM, as the state-of-the-art method for GPU-centric storage, and evaluate its performance in comparison to the state-of-theart CPU-centric storage interface SPDK. Our results demonstrate that while BaM is able to match the performance of SPDK without involving CPUs on the I/O path, this comes at the cost of a very high GPU use. Finally, we highlight future research directions to enable an I/O path that is both efficient and easy-to-adopt for data-intensive systems that use GPUs

    Quantifiers for Differentiable Logics in Rocq (Extended Abstract)

    No full text
    The interpretation of logical expressions into loss functions has given rise to so-called differentiable logics. They function as a bridge between formal logic and machine learning, offering a novel approach for property-driven training. The added expressiveness of these logics comes at the price of a more intricate semantics for first-order quantifiers. To ease their integration into machine-learning backends, we explore how to formalize semantics for first-order differentiable logics using the Mathematical Components library in the Rocq proof assistant. We seek to give rigorous semantics for quantifiers, verify their properties with respect to other logical connectives, as well as prove the soundness and completeness of the resulting logics

    Toward more realistic career path prediction: evaluation and methods

    No full text
    Predicting career trajectories is a complex yet impactful task, offering significant benefits for personalized career counseling, recruitment optimization, and workforce planning. However, effective career path prediction (CPP) modeling faces challenges including highly variable career trajectories, free-text resume data, and limited publicly available benchmark datasets. In this study, we present a comprehensive comparative evaluation of CPP models—linear projection, multilayer perceptron (MLP), LSTM, and large language models (LLMs)—across multiple input settings and two recently introduced public datasets. Our contributions are threefold: (1) we propose novel model variants, including an MLP extension and a standardized LLM approach, (2) we systematically evaluate model performance across input types (titles only vs. title+description, standardized vs. free-text), and (3) we investigate the role of synthetic data and fine-tuning strategies in addressing data scarcity and improving model generalization. Additionally, we provide a detailed qualitative analysis of prediction behaviors across industries, career lengths, and transitions. Our findings establish new baselines, reveal the trade-offs of different modeling strategies, and offer practical insights for deploying CPP systems in real-world settings

    Evaluating Quality of Gaming Narratives Co-created with AI

    No full text
    This paper proposes a structured methodology to evaluate AI-generated game narratives, leveraging the Delphi study structure with a panel of narrative design experts. Our approach synthesizes story quality dimensions from literature and expert insights, mapping them into the Kano model framework to understand their impact on player satisfaction. The results can inform game developers on prioritizing quality aspects when co-creating game narratives with generative AI

    Mask of Truth: Model Sensitivity to Unexpected Regions of Medical Images

    No full text
    The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an area under the curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chákṣu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge

    Functional Reactive GUI Programming with Modal Types.

    Full text link
    Functional reactive programming (FRP) is a programming paradigm for implementing reactive systems, i.e. programs that continuously interact with their environments. While FRP allows for a functional, high-level programming style, FRP programs are prone to undesirable operational behaviours such as space leaks. To ensure favourable operational properties of FRP programs, modal type systems have been introduced, which – among other things – make it impossible to write FRP programs with implicit space leaks. In a recent development, several modal FRP languages have been introduced that are able to accommodate asynchronous events and behaviours – motivated by the goal to use such languages for GUI programming.This paper explores the suitability of one such asynchronous modal FRP language – called Async Rattus – for GUI programming in practice. To this end, we have implemented a mild extension of the Async Rattus language and used it to implement a small GUI framework. We demonstrate the language and its GUI framework by a number of case studies

    4,472

    full texts

    9,607

    metadata records
    Updated in last 30 days.
    The IT University of Copenhagen's Repository
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇