Ruhr-Universität Bochum (RUB): Open Journal Systems
Not a member yet
4280 research outputs found
Sort by
Homomorphic Field Trace Revisited : Breaking the Cubic Noise Barrier
We present a novel homomorphic trace evaluation algorithm RevHomTrace, which mitigates the phase amplification problem that comes with the definition of the field trace. Our RevHomTrace overcomes the phase amplification with only a negligible computational overhead, thereby improving the usability of the homomorphic field trace algorithm. Moreover, our tweak also improves the noise propagation of the HomTrace and breaks the traditional O (N3) variance bound in previous works into O (N log N). Our experimental results obtained by integrating RevHomTrace into state-of-theart homomorphic encryption algorithms further demonstrate the usefulness of our algorithm. Specifically, RevHomTrace improves the noise accumulation of the (high precision) circuit bootstrapping, which also achieves maximal 1.30× speedup by replacing the costly high precision trace evaluation. Also, based on our idea of RevHomTrace, we present a low latency, high precision LWE-to-GLWE packing algorithm MS-PackLWEs. We also show that our MS-PackLWEs significantly reduces the packing error without severe degradation of performance
PWNN: Power-Wasting Neural Network As Remote Fault Injector
The explosive growth of AI-driven services has led to cloud-based Field Programmable Gate Array (FPGA) accelerators as key enablers of high-performance training and inference in modern data centers. Since 2024, the demand for deploying large AI workloads, especially Large Language Model (LLM), in the cloud has increased dramatically, intensifying competition among cloud providers and increasing pressure on shared FPGA infrastructures. This increasing reliance highlights the need for robust hardware security measures for cloud FPGAs. A particularly serious threat is fault injection attacks, which exploit dynamic voltage fluctuations to induce timing faults, potentially compromising functional integrity and bypassing cryptographic protections. However, existing verification procedures and structural Design Rule Check (DRC) remain blind to attacks embedded in benign-looking circuits. In this paper, we present Power-Wasting Neural Network (PWNN), a novel adversarial technique that leverages the inherent switching behavior of neural network operations to act as a power-waster circuit under adversarial input patterns. We systematically explore network architectures, and input patterns to craft configurations that induce voltage fluctuations capable of triggering timing faults for successful Differential Fault Analysis (DFA). Our PWNN implementation uses a standard open-source tool chain and passes all pre-implementation verification checks, while covertly inducing faults at runtime. We demonstrate on both the AMD ZCU104 and PYNQ-Z2 that PWNN can reliably cause timing faults on the critical path of a co-located AES-128 block cipher, enabling the rapid collection of correct/faulty ciphertext pairs needed for DFA-based key recovery. These results show that functionally correct, DRC compliant accelerators can serve as powerful, adaptive fault injectors that invalidate assumptions about bitstream security and hardware isolation
Sota Voce: Low-Noise Sampling of Sparse Fixed-Weight Vectors
Many post-quantum cryptosystems require generating an n-bit binary vector with a prescribed Hamming weight ω, a process known as fixed-weight sampling. When ω = O(n), we call this dense fixed-weight sampling, which commonly appears in lattice-based cryptosystems, like those in the NTRU family. In contrast, code-based cryptosystems typically use sparse fixed-weight sampling with ω = o(n) (e.g., O(√n). Sparse fixed-weight sampling generally involves three constant-time steps to keep the sampled vector secret: 1. sample ω nearly uniform random integers from a series of decreasing intervals; 2. map these integers into a set of ω distinct indices in [0, n), called the support; 3. generate a binary n-bit vector with bits set only at the support indices. Remarkably, some of the core algorithms employed in fixed-weight sampling date back to nearly a century, yet developing efficient and secure techniques remains essential for modern post-quantum cryptographic applications.In this paper, we present novel algorithms for steps two and three of the fixedweight sampling process. We demonstrate their practical applicability by replacing the current fixed-weight sampling routine in the HQC post-quantum key exchange mechanism, recently selected for NIST standardization. We rigorously prove that our procedures are sound, secure, and introduce little to no bias. Our implementation of the proposed algorithms accelerates step 2 by up to 2.9x and step 3 by up to 5.8x compared to an optimized version of the fixed-weight sampler currently used in HQC. Since fixed-weight sampling constitutes a significant portion of HQC’s execution time, these speedups translate into protocol-level improvements of up to 1.37x, 1.28x and 1.21x for key generation, encapsulation and decapsulation, respectively
M3-Mix: A Multi-Coin, Memory-Light, Mixer Architecture for Privacy-Preserving Embedded Devices
Embedded systems, as computational platforms with limited memory, compute capability, and bandwidth-constrained interfaces, are deployed across industrial infrastructures such as smart grid metering units, and supply chain modules to manage transactional credits bound to physical commodities or operational entitlements. To facilitate decentralized coordination across administrative domains of embedded systems, blockchain-based ledgers provide globally verifiable settlement and tokenbased credits, yet their inherent transparency exposes transaction metadata: credit transaction flow, credit types and value transferred and exchanged, etc., thus raising confidentiality concerns. These concerns are compounded by the requirement to enable multi-type credit transfer and exchange across heterogeneous embedded systems, each backed by distinct blockchains, and by the challenge of coordinating cryptographic protocol execution where only edge servers, connected to the local embedded devices, maintain as blockchain nodes. We present M3-Mix, a system with protocols suite for cross-chain credit transfer and exchange that prevents source–destination traceability (mixer for unlinkability) and supports multi-token-type, mixing credit exchange under embedded system constraints via embedding devices’ cryptographic offloading to edge servers as blockchain relayers. The architecture integrates SNARK-verifiable nullifier-based mixing, commitment-compatible credit encoding, and a Book-Settle coordination protocol that resolves nondeterminism in token exchanges under adversarial reordering. We implement a comprehensive system using Groth16 over the hash within R1CS via the Gnark framework, and benchmark through emulated ARM embedded processors, demonstrating low-latency proof generation, small and bounded RAM usage, and constant-size proofs consisting of eight 256-bit words, using smart contract deployments on Ethereum Sepolia confirming practical gas efficiency with various number of transactions in a mixer obfuscated set (Merkle depths)
A Mirrored-Circuits Approach for Low-Latency and Low-Randomness Composable PINI Gadgets
Masking is a crucial countermeasure against side-channel attacks, yet it is challenging to implement in hardware. At CHES 2024, Kumar S.V. et al. introduced the concept of Time Sharing Masking (TSM) for constructing low-latency first-order PINI gadgets. While TSM has demonstrated its effectiveness, we propose a new approach called mirrored-circuits to further enhance hardware performance. This approach constructs two MIRROR circuits that share randomness between them. For Boolean functions represented in the Algebraic Normal Form (ANF), the mirroredcircuits approach outperforms previous works in terms of the number of randomness bits and registers. By reconstructing the ANF form, the above two metrics can be further optimized using an SMT approach. We provide experimental results from several case studies to evaluate the efficiency of our approach. For 4-bit S-boxes, we achieve substantial reductions in randomness, delay, and area. Specifically, the 1-cycle SKINNY S-box achieves reductions of 50%, 37%, and 29% in randomness, delay, and area, respectively. For the AES S-box, the 1-cycle design minimizes randomness by 49%, and the 2-cycle design reduces randomness by 35% and area by 47%. To ensure the security of our schemes, we conduct comprehensive verification through three dimensions: theoretical proof, formal verification using SILVER, and FPGA-based experiments
NeonCROSS: Vectorized Implementation of Post-Quantum Signature CROSS on Cortex-A72 and Apple M3
The advancement of quantum computing threatens traditional public-key cryptographic systems, prompting the development of post-quantum cryptography (PQC). As part of the NIST PQC standardization process, code-based cryptographic schemes have gained attention due to their strong security foundations and longstanding resistance to both classical and quantum attacks. CROSS is a code-based digital signature scheme that relies on the hardness of decoding restricted vectors. It is designed to offer a flexible trade-off between signature size and speed performance, making it a promising candidate for post-quantum cryptography. This work introduces a faster modular reduction method for the pseudo-Mersenne prime in CROSS, significantly improving modular arithmetic efficiency. The proposed method is also applicable to other primes of similar structure. In addition, we present vectorized implementations of key operations. Matrix-vector multiplication is vectorized with parameter-specific strategies, and the processing of tree structures in CROSS is optimized using batch hashing methods adapted to the Neon extension. This paper presents the first vectorized implementation of the CROSS digital signature scheme on the ARMv8-A architecture, leveraging the Neon extension for optimized performance. The implementation is evaluated on both ARM Cortex-A72 and Apple M3 processors, achieving up to 63.3% speedup in signature generation and 56.3% in verification, demonstrating significant performance improvements over the reference implementation. These results highlight the potential of CROSS for efficient post-quantum cryptography on ARMv8-A architecture
Active Electromagnetic Side-Channel Analysis: Crossing Physical Security Boundaries through Impedance Variations
Embedded devices that are physically unreachable or contained within tamperproof enclosures are often considered naturally resilient to physical side-channel attacks. We present an active EM side-channel attack technique that enables sidechannel attacks across the security boundaries introduced by these physical security measures. Our technique actively induces side-channel leakage based on a relay mechanism, which serves as a means of leakage propagation from the cryptographic IC to the attacker via intermediate Relay Points located within the boundary. In the first leakage path, the instantaneous current consumption caused by transistor switching inside the cryptographic IC affects the impedance of nearby nonlinear elements (such as regulators) that act as Relay Points. This phenomenon is inevitable due to the necessity of the regulator’s operations to ensure the provision of a stable voltage to the cryptographic IC. In the second path, EM waves irradiated by the attacker from outside the security boundary create a leakage channel where reflected waves are modulated by the impedance variation of Relay Points, and leak outside as side-channel information. We experimentally demonstrate that side-channel attacks can be performed from the power cable on the primary side of an AC/DC adapter, even in environments protected by physical security measures, including a shielded box and ferrite cores. This attack can be executed non-invasively using EM waves in the hundreds of MHz band, and has the advantage of being able to actively control the presence and intensity of leakage. The proposed attack technique may be applicable in a wide range of applications as it exploits the behavior of nonlinear elements present in all embedded systems. As countermeasures against the proposed attack, we discuss the effectiveness of EM fault sensors for detecting continuous wave irradiation, induced EM interference detectors with broadband monitoring capabilities, and tamper detection systems utilizing radio frequencies. This research highlights the importance of considering active EM measurements as a new threat model in physical security evaluations
FAST: Fast and Accurate Security Testing of HRP UWB Chips
High-Rate Pulse (HRP) Ultra-Wide Band (UWB) technology is used for secure distance measurement. It was standardized by IEEE 802.15.4z in 2020 and is implemented in chips widely deployed in consumer devices. However, due to the use of proprietary signal processing algorithms, complex implementations, and subtle physical layer considerations, evaluating the security of such chips against distance-reduction attacks analytically is challenging. In this work, we investigate how to empirically evaluate the security of HRP UWB chips against random-guess attacks that were recently proven practical on real-world chips (GhostPeak). We propose FAST, a generic and efficient testing methodology that we use to accurately characterize the security of HRP UWB chips against random-guess distance reduction attacks. FAST relies on importance sampling and can accurately estimate very low success rates using a small and practical number of tests.Using FAST, we characterize the security of four HRP UWB receivers having different levels of obscurity (Qorvo DWM3000EVB, PURE, NXP SR040, and NXP SR150) across different chip configurations and attack conditions. FAST revealed success rates ranging from 2−10 to as low as 2−128 using only tens of thousands of test samples
Combined Stability: Protecting against Combined Attacks
Physical attacks pose serious challenges to the secure implementation of cryptographic algorithms. While side-channel analysis (SCA) has received significant attention, leading to well-established countermeasures, fault attacks and especially their combination with SCA (i.e., combined attacks) remain less researched. Addressing such combined attacks often requires a careful integration of masking and redundancy techniques to resist the reciprocal effects of faults and probes. Recent research on combined security has gained momentum, with most approaches relying on composable security notions involving error correction, typically applied after each nonlinear operation. While effective, this approach introduces an area and performance overhead, along with additional security challenges posed by the correction circuits themselves.In this work, we take a different direction, following the concept of stability introduced in StaTI (CHES 2024), which ensures fault propagation to protect against ineffective faults. We extend this concept to combined security by proposing a new composable security notion, combined stability, which integrates an extended stability notion, diffused stability, with arbitrarily composable glitch-extended probing security notions. Notably, this framework requires only a single error detection at the end of the computation, avoiding costly intermediate error checks and corrections. To demonstrate practicality, we describe a combined secure AES S-box hardware implementation. Our results show that this approach, achieving combined security with competitive implementation costs, offers a promising alternative to error-correction-based schemes