9086 research outputs found
Sort by
„Forschung begeistert“ : Interview mir Dr. Alexandra Bormann zur Forschung an der Hochschule Furtwangen
Effect of riboflavin and roseoflavin, a structural analogue of riboflavin, on the growth of selected microbial isolates from human saliva using a riboflavin-free cultivation medium
Catch Me If You Can : Rogue AI Detection and Correction at Scale
Modern AI systems can strategically misreport information when incentives diverge from truthfulness, posing risks for oversight and deployment. Prior studies often examine this behavior within a single paradigm; systematic, cross-architecture evidence under a unified protocol has been limited. We introduce the Strategy Elicitation Battery (SEB), a standardized probe suite for measuring deceptive reporting across large language models (LLMs), reinforcement-learning agents, vision-only classifiers, multimodal encoders, state-space models, and diffusion models. SEB uses Bayesian inference tasks with persona-controlled instructions, schema-constrained outputs, deterministic decoding where supported, and a probe mix (near-threshold, repeats, neutralized, cross-checks). Estimates use clustered bootstrap intervals, and significance is assessed with a logistic regression by architecture; a mixed-effects analysis is planned once the per-round agent/episode traces are exported. On the latest pre-correction runs, SEB shows a consistent cross-architecture pattern in deception rates: ViT 80.0%, CLIP 15.0%, Mamba 10.0%, RL agents 10.0%, Stable Diffusion 10.0%, and LLMs 5.0% (20 scenarios/architecture). A logistic regression on per-scenario flags finds a significant overall architecture effect (likelihood-ratio test vs. intercept-only: 2(5)=41.56, =7.22×10−8). Holm-adjusted contrasts indicate ViT is significantly higher than all other architectures in this snapshot; the remaining pairs are not significant. Post-correction acceptance decisions are evaluated separately using residual deception and override rates under SEB-Correct. Latency varies by architecture (sub-second to minutes), enabling pre-deployment screening broadly and real-time auditing for low-latency classes. Results indicate that SEB-Detect deception flags are not confined to any one paradigm, that distinct architectures can converge to similar levels under a common interface, and that reporting interfaces and incentive framing are central levers for mitigation. We operationalize “deception” as reward-sensitive misreport flags, and we separate detection from intervention via a correction wrapper (SEB-Correct), supporting principled acceptance decisions for deployment