22455 research outputs found
Sort by
Advancing hyperparameter optimization
Hyperparameter optimization (HPO) is a fundamental aspect of machine learning (ML), directly influencing model performance and adaptability.
As a computationally expensive black-box optimization problem, HPO requires efficient algorithms to identify optimal hyperparameter configurations.
This thesis advances the field of HPO along three key dimensions: foundational insights, HPO in the presence of more than one objective, and algorithmic innovations through benchmarking.
First, we revisit resampling strategies for performance estimation, demonstrating both theoretically and empirically that reshuffling resampling splits across hyperparameter configurations enhances generalization.
Additionally, we conduct an in-depth analysis of HPO validation landscapes, revealing characteristics such as low multimodality and broad plateaus that differentiate them from conventional black-box optimization benchmarks.
Second, we introduce novel algorithms for HPO in multi-objective and quality diversity settings.
We propose a new approach for simultaneously optimizing model performance and interpretability, quantifying interpretability through feature sparsity, sparsity of interaction effects, and sparsity of non-monotone features.
Furthermore, we bridge the field of quality diversity optimization with HPO, which allows us to discover diverse yet well-performing neural architectures that satisfy varying hardware constraints within a single optimization run.
Third, we use benchmarking to drive algorithmic innovation and insights in HPO.
We present YAHPO Gym, a scalable benchmarking suite supporting single-objective, multi-fidelity, and multi-objective HPO via surrogate benchmarks.
Using this framework, we define new quality diversity problems inspired by HPO and develop a novel multi-fidelity optimization algorithm guided by programming by optimization principles. Additionally, we ablate a state-of-the-art neural architecture search algorithm to assess the impact of individual components and introduce a systematic approach for constructing synthetic black-box functions that admit specific optimization landscape properties.
By deepening our general understanding of HPO, proposing novel multi-objective and quality diversity optimization strategies, and developing scalable benchmarking tools, this thesis enhances the efficiency and effectiveness of HPO across diverse ML applications
The Irish in the Caribbean as an online post historical phenomenon
This dissertation investigates how the narrative of "Irish slavery" in the Caribbean was reshaped and instrumentalized through digital media, particularly during the online visibility period from 2015 to 2020. It argues that both the popular meme portraying Irish as slaves and the counter-reactions to it perpetuate historical distortions by valuing narratives for their political or emotional utility rather than factual accuracy. The study introduces the concept of Ophelialogy—a heuristic tool that analyzes how historical claims are used and what effects they produce. Drawing on a wide array of sources including memes, fact-checkers, journalists, and academics, the research shows that viral narratives often bypass epistemological standards in favor of simplified, polarizing claims. The work concludes that this post-historical shift in discourse has serious implications for how history is produced, circulated, and consumed in the digital age.Diese Dissertation untersucht, wie das Narrativ der „irischen Sklaverei“ in der Karibik durch digitale Medien – insbesondere während der Sichtbarkeitsperiode von 2015 bis 2020 – umgeformt und instrumentalisiert wurde. Sie argumentiert, dass sowohl das populäre Meme als auch viele Gegenreaktionen historische Verzerrungen aufrechterhalten, indem sie den Nutzwert einer Erzählung über ihre faktische Richtigkeit stellen. Die Arbeit führt das Konzept der Ophelialogie ein – ein heuristisches Instrument zur Analyse, wie historische Behauptungen verwendet werden und welche Wirkung sie entfalten. Anhand von Memes, Faktenprüfern, Journalismus und wissenschaftlichen Stimmen zeigt die Untersuchung, dass virale Narrative epistemologische Standards umgehen und oft vereinfachte, polarisierende Aussagen bevorzugen. Die Studie schließt mit dem Befund, dass dieser posthistorische Wandel erhebliche Auswirkungen auf die Produktion, Verbreitung und Rezeption von Geschichte im digitalen Zeitalter hat
Eine externe Validierungsstudie zur diagnostischen Genauigkeit eines auf künstlicher Intelligenz basierenden Modells zur Erkennung von Karies auf Fotografien
Identification of thunderstorm occurrence in convection-permitting ensemble forecasts using deep neural networks
Thunderstorms have potentially hazardous impacts on society and the economy due to accompanying phenomena, such as lightning, strong winds, and intense precipitation, creating a demand for accurate and timely thunderstorm forecasts. Thunderstorm forecasts several hours in advance are based on simulations of the future atmosphere via numerical weather prediction (NWP). However, as none of the NWP state variables, such as temperature, pressure, or specific humidity, directly indicates thunderstorm occurrence, surrogate variables like convective available potential energy or synthetic radar reflectivity are used as proxies instead.
Surrogate variables of thunderstorm occurrence are typically derived from NWP state variables through the consideration of physical principles and empirical knowledge. In this thesis, however, we present a machine learning (ML) model based on deep learning which bypasses the use of such surrogate variables; instead, the model directly processes the vertical variation of the NWP state variables with height to infer the corresponding probability of thunderstorm occurrence. In addition, this thesis makes use of a convection-permitting ensemble NWP model, i.e., an NWP model which (1) allows for resolving atmospheric convection without parameterizations, and (2) generates multiple possible forecasts consistent with forecast uncertainty. While these two properties have individually shown promise for improving thunderstorm forecasts, their combined potential for this task has so far been less explored. Specifically, we train our model on forecasts of ICON-D2-EPS, a limited-area model for Central Europe run operationally by the German Meteorological Service (DWD), with observations from the lightning detection network LINET serving as the ground truth. With regard to model architecture, we employ considerations based on physics and symmetries to keep model size and inference times computationally efficient. For instance, a sparse layer encourages interactions at similar height levels, whereas a shuffling mechanism forces the model to learn pressure coordinates instead of non-physical patterns tied to the vertical NWP grid.
Evaluating our model for lead times up to 11 hours, we find that it outperforms a baseline model relying on traditional thunderstorm surrogate variables, which shows the capability of deep learning methods to discover—on their own—skillful representations of thunderstorm occurrence in NWP data. A linear sensitivity analysis (saliency map) suggests that these patterns found in the data are to a considerable extent physically interpretable: our model has learned the climatological propagation direction of thunderstorms in the study region and relies on fine-grained structures, such as ice-particle content near the tropopause and cloud cover, as well as mesoscale structures related to atmospheric instability and moisture. As additional results, we quantitatively explain skill gains resulting from our use of ensemble data. Finally, we demonstrate how neural network models like ours help keeping thunderstorm occurrence predictable for longer lead times compared to models which do not rely on ML.
This thesis primarily contributes to improving the skill of thunderstorm forecasts by combining high-resolution NWP and ensemble systems with deep learning. On the other hand, many concepts and methods derived here apply to general binary classification problems, especially when high class imbalance is involved. More generally, our results exemplify the usefulness of incorporating physical considerations and symmetry principles into ML architectures to achieve lightweight models