Portail HAL ENC (École nationale des chartes-PSL)
Not a member yet
3792 research outputs found
Sort by
How to Efficiently Explore Noisy Historical Data? Leveraging Corpus Pre-Targeting to Enhance Graph-based RAG
International audienceGraph-based Retrieval-Augmented Generation (RAG) is increasingly used to explore long, heterogeneous, and weakly structured corpora, including historical archives. However, in such settings, naive full-corpus indexing is often computationally costly and sensitive to OCR noise, document redundancy, and topical dispersion. In this paper, we investigate corpus pre-targeting strategies as an intermediate layer to improve the efficiency and effectiveness of graph-based RAG for historical research.We evaluate a set of pre-targeting heuristics tailored to single-hop and multi-hop of historical questions on HistoriQA-ThirdRepublic, a French question-answering dataset derived from parliamentary debates and contemporary newspapers. Our results show that appropriate pre-targeting strategies can improve retrieval recall by 3–5\% while reducing token consumption by 32–37\% compared to full-corpus indexing, without degrading coverage of relevant documents.Beyond performance gains, this work highlights the importance of corpus-level optimization for applying RAG to large-scale historical collections, and provides practical insights for adapting graph-based RAG pipelines to the specific constraints of digitized archives
Timing In stand-up Comedy: Text, Audio, Laughter, Kinesics (TIC-TALK): Pipeline and Database for the Multimodal Study of Comedic Timing
Stand-up comedy, and humor in general, are often studied through their verbal content. Yet live performance relies just as much on embodied presence and audience feedback. We introduce TIC-TALK, a multimodal resource with 5,400+ temporally aligned topic segments capturing language, gesture, and audience response across 90 professionally filmed stand-up comedy specials (2015-2024). The pipeline combines BERTopic for 60 s thematic segmentation with dense sentence embeddings, Whisper-AT for 0.8 s laughter detection, a fine-tuned YOLOv8-cls shot classifier, and YOLOv8s-pose for raw keypoint extraction at 1 fps. Raw 17-joint skeletal coordinates are retained without prior clustering, enabling the computation of continuous kinematic signals-arm spread, kinetic energy, and trunk lean-that serve as proxies for performance dynamics. All streams are aligned by hierarchical temporal containment without resampling, and each topic segment stores its sentence-BERT embedding for downstream similarity and clustering tasks. As a concrete use case, we study laughter dynamics across 24 thematic topics: kinetic energy negatively predicts audience laughter rate (r = -0.75, N = 24), consistent with a stillness-before-punchline pattern; personal and bodily content elicits more laughter than geopolitical themes; and shot close-up proportion correlates positively with laughter (r = +0.28), consistent with reactive montage
Sull’edizione di Bartolomeo Anglico, o della collazione a campione di manoscritti digitalizzati
International audienc
Rares ou invisibles ? Les crabes durant la Préhistoire
International audiencePrehistoric shell middens have been studied since the 19th century in archaeology along the European Atlantic façade. These sites correspond to refuses of daily activities of human populations living near the sea. Amongst these, food remains are numerous and correspond to the exploitation of the marine environments, such as seashells. In order to understand the place of crustaceans in the daily life of past populations, this article reviews crab remains found in prehistoric shell middens in Atlantic Europe, from the Palaeolithic to the Neolithic. Data from the Mesolithic and Neolithic periods come from the online European Atlantic Prehistoric Shell-middens database (EAPSM). In addition, a synthesis of previous publications on crab remains from Palaeolithic sites in Atlantic Europe is provided. The main objective is to assess the presence and role of crab in the diet of prehistoric populations from the Palaeolithic to the Neolithic. These topics will be addressed by examining the impact of the excavation methods used to detect them and also the preservation of coastal sites linked to variations of the sea level over time. Analysis of published data shows a very uneven representation of sites across periods and spaces, with for example a high concentration in the Mesolithic and in southern regions (Iberian Peninsula). This first result can be explained by a lower level of the sea during the Palaeolithic. Except for the high cliffs of Spain or Portugal, Palaeolithic shell middens are not preserved if they ever existed. For all periods, crab remains are generally underrepresented. This result seems to be largely due both to their fragmentary state and the lack of systematic identification protocols. These gaps limit our vision of how human populations exploited crab. The article recommends a reassessment of archaeological material when sieved sediment existed and the implementation of standardized methods on new ones for the study of crabs in coastal subsistence economies during prehistory.Les amas coquilliers préhistoriques ont été étudiés depuis le XIXe siècle en archéologie le long de la façade atlantique européenne. Ces sites correspondent aux déchets des activités quotidiennes des populations humaines vivant près de la mer. Parmi ceux-ci, les restes alimentaires sont nombreux et correspondent à l'exploitation des environnements marins, tels que les coquillages.Afin de comprendre la place des crustacés dans la vie quotidienne des populations passées, cet article examine les restes de crabes trouvés dans les amas coquilliers préhistoriques en Europe atlantique, du Paléolithique au Néolithique. Les données des périodes mésolithique et néolithique proviennent de la base de données en ligne European Atlantic Prehistoric Shell-middens (EAPSM). De plus, une synthèse des publications précédentes sur les restes de crabes provenant de sites paléolithiques en Europe atlantique est fournie.L'objectif principal est d'évaluer la présence et le rôle du crabe dans l'alimentation des populations préhistoriques du Paléolithique au Néolithique. Ces sujets seront abordés en examinant l'impact des méthodes de fouille utilisées pour les détecter ainsi que la préservation des sites côtiers liés aux variations du niveau marin au cours du temps.L'analyse des données publiées montre une représentation très inégale des sites à travers les périodes et les espaces, avec par exemple une forte concentration au Mésolithique et dans les régions méridionales (Péninsule Ibérique). Ce premier résultat peut s'expliquer par un niveau de la mer plus bas pendant le Paléolithique. À l'exception des hautes falaises d'Espagne ou du Portugal, les amas coquilliers paléolithiques ne sont pas conservés s'ils ont jamais existé. Pour toutes les périodes, les restes de crabes sont généralement sous-représentés. Ce résultat semble être largement dû à la fois à leur état fragmentaire et à l'absence de protocoles d'identification systématiques. Ces lacunes limitent notre vision de la manière dont les populations humaines exploitaient les crabes. L'article recommande une réévaluation du matériel archéologique lorsque des sédiments tamisés existaient et la mise en œuvre de méthodes standardisées sur de nouveaux pour l'étude des crabes dans les économies de subsistance côtières pendant la préhistoire
Aligner méthode historique et RAG : transformer un assistant conversationnel en chaîne de preuve auditable et discutable
Cet article examine les défis suscitées par le déploiement de systèmes de Génération Augmentée par Récupération (RAG) dans l'exploration de sources historiques numérisées. Partant du constat d'une acceptabilité disciplinaire fragile, il pose la question suivante : comment garantir, avec un RAG appliqué à des archives bruitées et hétérogènes, des conditions de vérification et de critique compatibles avec la méthode historique ? Présenté comme un exposé de position, ce texte ne décrit pas un système stabilisé : il propose un cadrage et des pistes préliminaires pour orienter le développement de dispositifs RAG alignés avec ces exigences. Il propose de rétablir un contrôle sur la chaîne d'interprétation en articulant trois conditions : traçabilité (retrouver précisément documents et passages), auditabilité (rendre inspectables les transformations et paramètres de la chaîne), et discutabilité (mettre l'énoncé en débat en séparant preuve et interprétation). La contribution principale est une grille d'auditabilité traduisant des exigences historiennes en conditions instrumentées : (1) ancrage documentaire (provenance et intégrité), (2) séparation explicite citation/paraphrase/inférence, (3) restitution du contexte et pluralité des sources, (4) traçabilité des conditions d'exécution et diagnostic d'erreurs (récupération vs génération), ( 5) mécanismes d'abstention lorsque la preuve est insuffisante
Seamless Integration Process of Model-Based Approaches for Aircraft Design
International audienceThe aviation industry continually strives to improve the safety, efficiency, and reliability of aircraft. With an increasing focus on novel aircraft configurations to reduce the environmental impact of aviation, aircraft have become even more complex than before. Model Based Systems Engineering (MBSE), Model Based Safety Assessment (MBSA), and Multidisciplinary Design Analysis and Optimization (MDAO) have thereby gained popularity over the document-centric approach in the past decade. The common integration strategyextending the model within a single MBSE frameworkinherently reduces the use of unique capabilities of specialized MBSA and MDAO platforms. This position paper proposes an alternative methodology utilizing model transformation techniques to develop a robust link between these domains. This is achieved via a custom script extracting and transforming relevant information from an MBSE model file into file formats required by MBSA/MDAO tools. The primary contribution is maintaining consistency through a novel iterative design process formalizing the model transformation. This approach ensures the preservation of extensive capabilities offered by each domain-specific tool. Formalism is supported by implementing QVTo mapping rules and strengthening the verification with OCL constraints and Python codes developed to ensure that bidirectional transformations occur without information loss or distortion. This systematic integration streamlines the design process by enabling parallel safety assessment from an early design phase and facilitating a comprehensive exploration of the design space, thereby fostering informed decision-making. The technical feasibility of this methodology is demonstrated through its application on a UAVcase study, establishing a foundation for future development and real aerospace applications
Une institution pontificale en crise. La daterie d’Avignon au milieu du XVIIIe siècle
International audienceL’administration gracieuse de la daterie d’Avignon est examinée à l’aune d’un conflit entre deux officiers au milieu du XVIIIe siècle. Loin de se limiter à une opposition strictement interpersonnelle, il met en relief la mutation de l’activité de l’institution pontificale elle-même. Toujours plus règlementée par l’action législative de la monarchie, la collation des bénéfices ecclésiastiques par la papauté n’est plus qu’un objet secondaire de l’économie de la grâce, désormais massivement occupée par les dispenses matrimoniales
From Veracity to Diffusion: Adressing Operational Challenges in Moving From Fake-News Detection to Information Disorders
xA wide part of research on misinformation has relied lies on fake-news detection, a task framed as the prediction of veracity labels attached to articles or claims. Yet social-science research has repeatedly emphasized that information manipulation goes beyond fabricated content and often relies on amplification dynamics. This theoretical turn has consequences for operationalization in applied social science research. What changes empirically when prediction targets move from veracity to diffusion? And which performance level can be attained in limited resources setups ? In this paper we compare fake-news detection and virality prediction across two datasets, EVONS and FakeNewsNet. We adopt an evaluation-first perspective and examine how benchmark behavior changes when the prediction target shifts from veracity to diffusion. Our experiments show that fake-news detection is comparatively stable once strong textual embeddings are available, whereas virality prediction is much more sensitive to operational choices such as threshold definition and early observation windows. The paper proposes practical ways to operationalize lightweight, transparent pipelines for misinformation-related prediction tasks that can rival with state-of-the-art