Portail HAL de Télécom Paris
Not a member yet
    14191 research outputs found

    Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions

    No full text
    Computational methods applied to legal scholarship hold the promise of analyzing law at scale. We start from a simple question: how often do courts implicitly apply statutory rules? This requires distinguishing legal reasoning from semantic similarity. We focus on implicit citation of the French Civil Code in first-instance court decisions and introduce a benchmark of 1,015 passage-article pairs annotated by three legal experts. We show that expert disagreement predicts model failures. Inter-annotator agreement is moderate (κ = 0.33) with 43% of disagreements involving the boundary between factual description and legal reasoning. Our supervised ensemble achieves F1 = 0.70 (77% accuracy), but this figure conceals an asymmetry: 68% of false positives fall on the 33% of cases where the annotators disagreed. Despite these limits, reframing the task as top-k ranking and leveraging multi-model consensus yields 76% precision at k = 200 in an unsupervised setting. Moreover, the remaining false positives tend to surface legally ambiguous applications rather than obvious errors

    Communications optiques en espace libre dans le brouillard : comparaison des longueurs d'onde pour la résilience aux interférences intersymboles

    No full text
    International audienceFree-space optical (FSO) communications at 1.55 μm are gaining increasing recognition due to their high data rates and smaller beam divergence compared to radio-frequency systems. However, FSO links are highly sensitive to adverse atmospheric conditions and turbulence, which limits their practicality in urban environments, particularly in the presence of fog. To mitigate these impairments, the use of longer wavelengths has been proposed. The aim of this work is to investigate the effects of fog on optical communication links operating at 1.55 μm and 10.3 μm. The study focuses on the temporal spreading of transmitted optical pulses caused by multiple scattering interactions between photons and water droplets. This pulse broadening can significantly limit the achievable data rate due to inter-symbol interference (ISI). A radiative transfer model based on the radiative transfer equation (RTE) is employed to compute the impulse response of a foggy atmospheric slab. These impulse responses are then applied to an amplitude-modulated optical signal to assess system performance under fog conditions. The results show that the link operating at 10.2 μm is less affected by temporal spreading than the 1.55 μm link, demonstrating improved robustness against fog-induced ISI degradation

    INSTANT: COMPRESSING GRADIENTS AND ACTIVATIONS FOR RESOURCE-EFFICIENT TRAINING

    No full text
    International audienceDeep learning has advanced at an unprecedented pace. This progress has led to a significant increase in its complexity. However, despite extensive research on accelerating inference, training deep models directly within a resource-constrained budget remains a considerable challenge due to its high computational and memory requirements. In this paper, we introduce INSTANT (compressIng gradieNtS and acTivAtions for resource-efficieNt Training), a method designed to address both the computational and the memory bottlenecks when training. INSTANT reduces resource demands during backpropagation by projecting gradients and activations into a low-rank subspace and performing computation within that compressed representation. Experimental results demonstrate that INSTANT achieves a 15× reduction in computational cost and 32× reduction in activation memory with negligible impact on model performance. The code is available at INSTANT. * Equal contribution.• We introduce a low-cost calibration technique to generate calibrated orthonormal bases for tensor projection, enabling significant reductions in memory and computations (Sec. 3.2). • We project activation tensors and gradients onto these orthonormal bases. To our knowledge, this is the first work to exploit the low-rank structure of activation gradients for all types of data distribution. We provide an error analysis of our gradient compression, illustrating that a high compression ratio is achievable with limited performance degradation (Sec. 3.3). • We evaluate INSTANT across multiple datasets and model architectures, consistently demonstrating good performance, achieving up to 32× memory savings and 15× computational cost reduction with only a 1% trade-off in accuracy compared to vanilla fine-tuning (Sec. 4). RELATED WORKActivation compression. Activation compression is a recently emerging research direction that addresses the memory challenges during training. This approach offers several key advantages based on the following observations: (i) model weights remain uncompressed during training, thereby preserving their expressive capacity; (ii) activations are often large and exhibit significant redundancy, making them suitable for compression (Sakr &amp; Khailany, 2024; Miles et al., 2024). (Nguyen et al., 2024) applies SVD to compress activations to reduce huge memory usage for activations. However, this approach raises substantial computational overhead due to the high cost of performing SVD in each training iteration. (Sakr &amp; Khailany, 2024) (ESPACE) tackles SVD computational expense by using calibrated subspaces, which are periodically updated, to compress activations. They enable activation compression in the forward pass, reducing computational overhead in both the forward and backward phases. However, ESPACE is prone to error accumulation, as it relies on the universal fixed subspace across varying activations.Optimizer state compression. Weight gradients are inherently low-rank (Yang et al., 2023a). Previous studies (Bernstein et al., 2018; Vogels et al., 2019) have leveraged this characteristic to address communication bottlenecks in distributed learning by reducing inter-device data transmission. GaLore (Zhao et al., 2024) and its variances (Muhamed et al., 2024; Shamshoum et al., 2025) leverage the low-rank property of weight gradients for compressing them to reduce memory usage in the optimizer state significantly. CompAct Shamshoum et al. ( 2025) further reduces the memory overhead</div

    Towards Reliable LLM-Based Model Driven Engineering: when Full Syntax Checking and Formal Verification Join the Loop

    No full text
    International audienceModel-Driven Engineering facilitates the design of embedded systems by promoting abstraction and enabling early verification of design correctness. Recent approaches have integrated Large Language Models into MDE workflows to automatically generate models from textual specifications. However, these models often require extensive prompt refinement and lack formal guarantees of correctness. This paper introduces an enhanced LLM-based generation process in TTool-AI, incorporating a novel dual feedback loop that combines automated syntactic checking with formal verification of safety properties. The loop iteratively refines LLM-generated SysML block and state-machine diagrams to ensure syntactic validity and verify safety properties. First experimental evaluation on both academic and industrial-grade specifications demonstrates that the proposed mechanism reliably produces syntactically correct models, enabling direct model checking of LLM-produced models and reducing the effort required by engineers to obtain correct-by-construction designs

    From AI models to agents: technical aspects and legal risks

    No full text
    International audienceThis presentation introduces the foundations of agentic AI systems, the technical challenges ahead and the legal risks they raise. Made in partnership with the leading law firm Baker McKenzie Paris, it is intended to general counsel, business leaders and legal directors

    AI Agents and the Future of Deliberation: Designing Human-AI Collaboration for Democratic Dialogue

    No full text
    International audienceAs societies grapple with increasing polarization and information complexity, the need for constructive, inclusive, and well-informed deliberation has reached an unparalleled level. At the same time, AI agents, ranging from LLMs and multi-agent simulations and systems to conversational assistants and reflective companions, are rapidly reshaping how people communicate, reason together, and form collective judgments. These technologies hold the potential to scale democratic participation, foster inclusivity by bridging linguistic and cultural barriers, and introduce new forms of collaborative reasoning. Yet they also pose epistemic challenges to established notions of authenticity, legitimacy, and human autonomy in civic dialogue. This panel brings together leading researchers from Asia, Europe and North America to examine how AI technologies are transforming deliberation as both a social process and a design problem. It interrogates AI's role in shaping deliberative norms, influencing group dynamics, and redefining what it means to "reason together" in hybrid human-AI spaces. Through interactive polling, structured debates, and audience co-deliberation, the session invites CHI participants to collectively explore how we can design responsible, inclusive, and trustworthy deliberation interfaces that preserve the democratic values of deliberation while embracing the creative potential of AI

    The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

    No full text
    International audienceWe present the Inverse Drum Machine, a novel approach to Drum Source Separation that leverages an analysis-by-synthesis framework combined with deep learning. Unlike recent supervised methods that require isolated stem recordings for training, our approach is trained on drum mixtures with only transcription annotations. IDM integrates Automatic Drum Transcription and One-shot Drum Sample Synthesis, jointly optimizing these tasks in an end-to-end manner. By convolving synthesized one-shot samples with estimated onsets, akin to a drum machine, we reconstruct the individual drum stems and train a Deep Neural Network on the reconstruction of the mixture. Experiments on the StemGMD dataset demonstrate that IDM achieves separation quality comparable to state-of-the-art supervised methods that require isolated stems data

    An Autoethnography on Visualization Literacy: A Wicked Measurement Problem

    No full text
    International audienceWe contribute an autoethnographic reflection on the complexity of defining and measuring visualization literacy (i.e., the ability to interpret and construct visualizations) to expose our tacit thoughts that often exist in-between polished works and remain unreported in individual research papers. Our work is inspired by the growing number of empirical studies in visualization research that rely on visualization literacy as a basis for developing effective data representations or educational interventions. Researchers have already made various efforts to assess this construct, yet it is often hard to pinpoint either what we want to measure or what we are effectively measuring. In this autoethnography, we gather insights from 14 internal interviews with researchers who are users or designers of visualization literacy tests. We aim to identify what makes visualization literacy assessment a “wicked” problem. We further reflect on the fluidity of visualization literacy and discuss how this property may lead to misalignment between what the construct is and how measurements of it are used or designed. We also examine potential threats to measurement validity from conceptual, operational, and methodological perspectives. Based on our experiences and reflections, we propose several calls to action aimed at tackling the wicked problem of visualization literacy measurement, such as by broadening test scopes and modalities, improving test ecological validity, making it easier to use tests, seeking interdisciplinary collaboration, and drawing from continued dialogue on visualization literacy to expect and be more comfortable with its fluidity

    What is AI Doing to Job Quality? Platformization, Fissured Workplaces and Dispersion

    No full text
    International audienceDebates about AI and labour are focused on job losses, yet give little consideration to how this technology affects actual working experiences. This chapter examines the impact of AI on job quality through its use and through the invisibilized labour that sustains it, using the framework of the European Job Quality Index. AI deployment often intensifies work, increases surveillance and limits worker autonomy, while its production relies on a highly stratified global workforce of engineers and hidden data workers. These arrangements have resulted in companies outsourcing functions while maintaining control over production in what is known as a ‘fissured workplace’. Furthermore, AI-driven workplaces are creating a ‘regime of dispersion’ where workers juggle numerous fragmented tasks. The chapter explores these phenomena, correlated as they are in terms of AI and job quality, while identifying the research agenda and policy implications which would safeguard all users in AI’s global supply chains

    NeuroSnitch: Exploiting Inter-Spike Interval Statistics for Timing Side-Channel Attacks on Noisy Neuromorphic Systems

    No full text
    International audienceNeuromorphic computing promises energy-efficient solutions for embedded and edge systems, but introduces unique security challenges and a new attack surface. This paper presents NeuroSnitch, a first-ever timing side-channel attack to leverage subtle statistical variations in Inter-Spike Intervals (ISIs) on Spiking Neural Networks (SNNs) to extract secret information. We show that secret data, when modulating a neuron's input current, can be profiled through higher-order ISI statistics-mean, variance, skewness, and kurtosis-even under realistic noise sources, including observation noise, current fluctuation, and voltage jitter. Using the Leaky Integrateand-Fire (LIF) neuron model, we demonstrate that a Random Forest classifier can achieve 98.41% character-level classification accuracy on noisy ISI traces, enabling complete recovery of a 33-character secret string. This work exposes a previously underexplored and robust timing leakage vector in SNNs, underscoring the urgent need for tailored security measures in this emerging computing paradigm, particularly for sensitive embedded and IoT applications.</div

    0

    full texts

    14,191

    metadata records
    Updated in last 30 days.
    Portail HAL de Télécom Paris
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇