26838 research outputs found
Sort by
A Better Multi-Objective GP-GOMEA - But do we Need it?
In Symbolic Regression (SR), achieving a proper balance between accuracy and interpretability remains a key challenge. The Genetic Programming variant of the Gene-pool Optimal Mixing Evolutionary Algorithm (GP-GOMEA) is of particular interest as it achieves state-of-the-art performance using a template that limits the size of expressions. A recently introduced expansion, modular GP-GOMEA, is capable of decomposing expressions using multiple subexpressions, further increasing chances of interpretability. However, modular GP-GOMEA may create larger expressions, increasing the need to balance size and accuracy. A multi-objective variant of GP-GOMEA exists, which can be used, for instance, to optimize for size and accuracy simultaneously, discovering their trade-off. However, even with enhancements that we propose in this paper to improve the performance of multi-objective modular GP-GOMEA, when optimizing for size and accuracy, the single-objective version in which a multi-objective archive is used only for logging, still consistently finds a better average hypervolume. We consequently analyze when a single-objective approach should be preferred. Additionally, we explore an objective that stimulates re-use in multi-objective modular GP-GOMEA
The Pitfalls and Potentials of Adding Gene-invariance to Optimal Mixing
Optimal Mixing (OM) is a variation operator that integrates local search with genetic recombination. EAs with OM are capable of state-of-the-art optimization in discrete spaces, offering significant advantages over classic recombination-based EAs. This success is partly due to high selection pressure that drives rapid convergence. However, this can also negatively impact population diversity, complicating the solving of hierarchical problems, which feature multiple layers of complexity. While there have been attempts to address this issue, these solutions are often complicated and prone to bias. To overcome this, we propose a solution inspired by the Gene Invariant Genetic Algorithm (GIGA), which preserves gene frequencies in the population throughout the process. This technique is tailored to and integrated with the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA), resulting in GI-GOMEA. The simple, yet elegant changes are found to have striking potential: GI-GOMEA outperforms GOMEA on a range of well-known problems, even when these problems are adjusted for pitfalls - biases in much-used benchmark problems that can be easily exploited by maintaining gene invariance. Perhaps even more notably, GI-GOMEA is also found to be effective at solving hierarchical problems, including newly introduced asymmetric hierarchical trap functions
U-Index: A universal indexing framework for matching long patterns
Motivation. Text indexing is a fundamental and well-studied problem. Classic solutions to this problem either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an index - to accelerate matching, e.g., the suffix array. The former solutions thus retain excellent compressed space, but are practically slow to construct and query. The latter approaches, instead, sacrifice space efficiency but are typically faster; for example, the suffix array takes much more space than the text itself for commonly used alphabets, like ASCII or DNA, but it is very fast to construct and query. Methods. In this paper, we show that efficient text indexing can be achieved using just a small extra space on top of the original text, provided that the query patterns are sufficiently long. More specifically, we develop a new indexing paradigm in which a sketch of a query pattern is first matched against a sketch of the text. Once candidate matches are retrieved, they are verified using the original text. This paradigm is thus universal in the sense that it allows us to use any solution to index the sketched text, like a suffix array, FM-index, or r-index. Results. We explore both the theory and the practice of this universal framework. With an extensive experimental analysis, we show that, surprisingly, universal indexes can be constructed much faster than their unsketched counterparts and take a fraction of the space, as a direct consequence of (i) having a lower bound on the length of patterns and (ii) working in sketch space. Furthermore, these data structures have the potential of retaining or even improving query time, because matching against the sketched text is faster and verifying candidates can be theoretically done in constant time per occurrence (or, in practice, by short and cache-friendly scans of the text). Finally, we discuss some important applications of this novel indexing paradigm to computational biology. We hypothesize that such indexes will be particularly effective when the queries are sufficiently long, and so we demonstrate applications in long-read mapping
On String and Graph Sanitization
Data sanitization is a process that conceals sensitive patterns from a given dataset. A secondary goal is to not severely harm the utility of the underlying data along this process. We survey some recent advancements on two related data sanitization topics: string and graph sanitization. In particular, we highlight the important contributions of our friend Prof. Roberto Grossi along this journey
Lattice reduction via dense sublattices: A cryptanalytic No-Go
Most concrete analyses of lattice reduction focus on the BKZ algorithm or its variants relying on Shortest Vector Problem (SVP) oracles. However, a variant by Li and Nguyen (Cambridge U. Press 2014) exploits more powerful oracles, namely for the Densest rank-k Sublattice Problem (DSPk) for k ≥ 2. We first observe that, for random lattices, DSP2 –and possibly even DSP3– seems heuristically not much more expensive than solving SVP with the current best algorithm. We indeed argue that a densest sublattice can be found among pairs or triples of vectors produced by lattice sieving, at a negligible additional cost. This gives hope that this approach could be competitive. We therefore proceed to a heuristic and average-case analysis of the slope of DSPk-BKZ output, inspired by a theorem of Kim (Journal of Number Theory 2022) which suggests a prediction for the volume of the densest rank-k sublattice of a random lattice. Under this heuristic, the slope for k = 2 or 3 appears tenuously better than that of BKZ, making this approach less effective than regular BKZ using the “Dimensions for Free” of Ducas (EUROCRYPT 2018). Furthermore, our experiments show that this heuristic is overly optimistic. Despite the hope raised by our first observation, we therefore conclude that this approach appears to be a No-Go for cryptanalysis of generic lattice problems
Script-strategy aligned generation: Aligning LLMs with expert-crafted dialogue scripts and therapeutic strategies for psychotherapy
Chatbots or conversational agents (CAs) are increasingly used to improve access to digital psychotherapy. Many current systems rely on rigid, rule-based designs, heavily dependent on expert-crafted dialogue scripts for guiding therapeutic conversations. Although advances in large language models (LLMs) offer potential for more flexible interactions, their lack of controllability and explanability poses challenges in high-stakes contexts like psychotherapy. To address this, we conducted two studies in this work to explore how aligning LLMs with expert-crafted scripts can enhance psychotherapeutic chatbot performance. In Study 1 (N=43), an online experiment with a within-subjects design, we compared rule-based, pure LLM, and LLMs aligned with expert-crafted scripts via fine-tuning and prompting. Results showed that aligned LLMs significantly outperformed the other types of chatbots in empathy, dialogue relevance, and adherence to therapeutic principles. Building on findings, we proposed "Script-Strategy Aligned Generation (SSAG)", a more flexible alignment approach that reduces reliance on fully scripted content while maintaining LLMs' therapeutic adherence and controllability. In a 10-day field Study 2 (N=21), SSAG achieved comparable therapeutic effectiveness to full-scripted LLMs while requiring less than 40% of expert-crafted dialogue content. Beyond these results, this work advances LLM applications in psychotherapy by providing a controllable and scalable solution, reducing reliance on expert effort. By enabling domain experts to align LLMs through high-level strategies rather than full scripts, SSAG supports more efficient co-development and expands access to a broader context of psychotherapy
SIG PhysioCHI: Human-centered physiological computing in practice
In recent years, integrating physiological signals in Human-Computer Interaction research has significantly advanced our understanding of user experiences and interactions. However, the interdisciplinary nature of this research presents numerous challenges, including the need for standardized protocols and reporting guidelines. By fostering cross-disciplinary collaboration, we seek to enhance the reproducibility, transparency, and ethical considerations of physiological data in HCI. The purpose of this SIG is to offer a lightweight opportunity for CHI attendees to connect with the community around the center point of physiological computing. This SIG will address key topics such as technical challenges, ethical implications, reproducibility, and open science. We aim to meet as a community and connect with HCI researchers and practitioners to network and exchange bi-directional ideas. Ultimately, our goal is to create a foundation for future research and to establish a community around physiological computing
Topology reduction for determining worst-case attacks in radially operated distribution networks
With the increasing digitalization of the power system, cyber attacks that threaten physical disruption, such as power outages, are increasing. Hence, understanding system resilience and the operator responses is crucial for anticipating and mitigating threats that may cause outages through power line disconnections. A common approach to assessing grid resilience is to consider the worst-case attack, in which the attack is assumed to maximize the potential damage while the operators react to minimize such loss. This assessment, however, can have a vast number of possible actions by the attacker as the size of the power network and the severity of the attack increase, making it computationally expensive. We propose a topology reduction technique on radially operated distribution networks, which reduces the set of lines to be considered in the worstcase attack. The reduced network can determine such an attack more efficiently compared to the original network. Case studies on 33- and 119-bus systems showed that the introduced method reduced the network sizes by 25% and 38%, respectively, and its effectiveness on the worst-case attack computation increased as larger attacks with more line disconnections were considered