Ruhr-Universität Bochum (RUB): Open Journal Systems
Not a member yet
4280 research outputs found
Sort by
UP TO 50% OFF: Efficient Implementation of Polynomial Masking
While passive probing attacks and active fault attacks have been studied for multiple decades, research has only started to consider combined attacks that use both probes and faults relatively recently. During this period, polynomial masking became a promising, provably secure countermeasure to protect cryptographic computations against such combined attacks. Unlike other countermeasures, such as duplicated additive masking, polynomial masking can be implemented using a linear number of shares, as shown by Berndt et al. at CRYPTO ’23. Based upon this fact, Arnold et al. noted at CHES ’24 that polynomial masking is particularly well-suited for parallel computation. This characteristic is especially effective in scenarios involving multiple circuits with identical structures, such as the 16 SBoxes in AES. Just recently, Faust et al. showed at CHES ’25 that one can also incorporate the technique of packed secret sharing into these masking schemes, given that the state-of-the-art polynomial masking scheme is secure against combined attacks.In this work, we present provably secure advancements regarding this state-of-the-art scheme in both computational and randomness efficiency, reducing the randomness complexity by up to 50% and the computational complexity even more by going from a quadratic term to a linear one for many parameters. Moreover, we present the first implementation of a polynomial masking scheme against combined attacks along with an extensive experimental evaluation for a wide range of parameters and configurations as well as a statistical leakage detection to evaluate the security of the implementation on an Arm Cortex-M processor. Our implementation is publicly available to encourage further research in practical combined resilience
PipFHE: Resource-Efficient Privacy-Preserving Deep CNN Inference via Padded Batch Packing and Channel Merging over FHE
Privacy-Preserving Machine Learning (PPML) has demonstrated great potential in data-sensitive industries, driving the development of low-latency CNN architectures using Fully Homomorphic Encryption (FHE). However, existing methods encounter two primary challenges when processing large image datasets: 1) Multithreading approaches, which classify one image per thread, necessitate a large number of threads and demand significant memory and CPU resources. 2) Batch packing methods are hampered by inflated ciphertext counts and inefficient handling of manage image padding and consecutive convolutions, limiting their use in deep networks. These issues create a clear need for more resource-efficient and architecturally flexible FHE-based CNN inference. To address this, we propose PipFHE, an FHE-friendly privacy-preserving CNN inference approach based on RNS-CKKS. We leverage the batch packing method and introduce two effective padding strategies for efficient encrypted image convolution. Furthermore, we propose a Channel Merging method, which notably reduces the ciphertext numbers, enabling deep network architecture. PipFHE is also compatible with pre-trained standard model parameters, ensuring high flexibility. Evaluations on the CIFAR-10 and CIFAR-100 show that PipFHE achieves an amortized inference speedup and throughput increase of 1.35x to 1.83x compared to state-of-the-art designs on the same test platform. Moreover, PipFHE performs inference on 227 encrypted images using only 36 threads and 144 GB of memory, which is 2.8x lower than other research. While PipFHE incurs an accuracy drop of <0.9% compared to plaintext inference, this strategic trade-off delivers substantial reductions in hardware requirements and enable deep network architectures. Its significant resource efficiency and support for deep networks make PipFHE a practical solution for processing large image batches in resource-constrained, privacy-sensitive cloud environments
Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV
Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work aims to address this bottleneck.We design a vectorized version of the BaseSample and provide optimized implementations across six different instruction sets: SSE2, AVX2, AVX-512F, NEON, RISC-V Vector (RVV), and RV64IM. The AVX2 implementation, for instance, achieves an 8.4x speedup over prior work. Additionally, we optimize the FFT/iFFT operations using RVV and RV64D. For the RVV implementation, we introduce a new method using strided load/store instructions, with 4+4 and 4+5 layer merging strategies for Falcon-{512,1024}, respectively, resulting in a speedup of more than 4x. Finally, we present the results of our optimized implementations across eight different instruction sets for signature generation of Falcon. For instance, our AVX2, AVX- 512F, and RV64GCVB implementations achieve performance improvements of 23%, 36%, and 59%, respectively, for signature generation of Falcon-512
Evaluating Larger Lookup Tables using CKKS
The Cheon–Kim–Kim–Song (CKKS) scheme is a fully homomorphic encryption scheme that traditionally supports only the evaluation of smooth functions. Recent works have enabled the evaluation of arbitrary (discontinuous) integer functions represented as lookup tables (LUT) on small inputs using the method of functional bootstrapping (FBT). Although well-suited for small integers (up to around 10 bits), the efficiency of FBT quickly declines for large LUTs, and a considerable increase in both runtime and memory requirements is observed. Building on CKKS functional bootstrapping, we propose in this paper two functional bootstrapping algorithms, specifically designed to target larger LUTs (up to 20 bits). For a 16-bit LUT, our implementation in OpenFHE achieves a speed-up of 47.5 in amortized time and 95.1 in latency for single-threaded execution, compared to the state-of-the-art CKKS-based functional bootstrapping method of Alexandru et al. (CRYPTO’25)
Fault Attacks on VOLEitH Signature Schemes
Ongoing efforts to transition to post-quantum public-key cryptosystems have created the need for algorithms with a variety of performance characteristics and security assumptions. Among the candidates in NIST’s post-quantum standardisation process for additional digital signatures are FAEST and SDitH, two Vector Oblivious Linear Evaluation in-the-Head (VOLEitH)-based schemes. The security of FAEST is based on the Advanced Encryption Standard (AES), while SDitH relies on the regular syndrome decoding problem. The VOLEitH paradigm enables competitive performance and signature sizes under conservative security assumptions. However, since it was introduced relatively recently, in 2023, its resistance to physical attacks has not yet been analysed. In this paper, we present the first security analysis of VOLEitH-based signature schemes in the context of fault injection attacks. We demonstrate two practical attacks on the reference implementations of the secondround variants of FAEST and SDitH in ARM Cortex-M4 capable of recovering the secret key from a single signature. These attacks exploit vulnerabilities of components specific to VOLEitH schemes, namely the batch all-but-one vector commitments and the VOLE generation. We also propose countermeasures to mitigate these attacks and enhance the physical security of VOLEitH-based digital signature schemes
Keep it Simple: Refreshing the NTT of Kyber’s Decapsulation to Prevent Plaintext-Checking Side-Channel Attacks
In this paper, we revisit the protection of Kyber’s NTT implementation against plaintext-checking side-channel attacks, for which the current mainstream solution is to combine masking and shuffling. For this purpose, we first bring consolidating arguments why shuffling alone is (theoretically) of limited help to improve security against such distinguishing attacks, and why masking alone may be (practically) hindered by a lack of physical noise in low-cost embedded devices. We then discuss the challenges to address when implementing and (mostly) evaluating masked and shuffled implementations, and the lack of easy-to-extrapolate scaling trends for such a mix of countermeasures. We use this discussion as a motivation for a simpler approach, namely refreshing the layers of the NTT thanks to simple gadgets with linear overheads. Using both simulated analyses and actual experiments, we show that such an approach can limit the propagation of information through the NTT layers via belief propagation, even in low-noise contexts. We also show that this combination can simplify the side-channel security evaluation of a protected NTT implementation, and lead to the exponential security amplification that is expected when masking. As side contributions, we discuss the significant differences between the (very leaky) reference C implementation of Kyber’s NTT and an efficient assembly one, together with the profiling difficulties raised by lazy reduction techniques
ML-DSA masking sweetened with SUCRE: Shuffle-and-Unmask Countermeasure for REjection sampling
We present SUCRE, a novel countermeasure designed to physically protect the rejection sampling step of ML-DSA, one of the post-quantum signature schemes standardized by NIST. At the core of SUCRE is a masking gadget that securely unmasks a vector while simultaneously applying a random permutation of its coefficients. This lightweight mechanism preserves the vector’s infinity norm, enabling rejection sampling to proceed as usual without requiring any complex mask conversions.We formally prove that a d-probing adversary can learn at most some permuted rejected values—information which, we show, should remain insufficient to endanger the security of the signature scheme. This security argument relies on a new variant of the Module Learning with Rounding (MLWR) assumption, for which we provide a dedicated concrete security analysis to assess its hardness relative to the standard MLWR assumption.Our implementation of SUCRE achieves a significant performance improvement over previous masked non-bitsliced implementations of rejection sampling—delivering four to six times faster execution than Coron et al. (TCHES 2024)—albeit at the cost of increased memory usage. Since the rejection step accounts for approximately 25% of the total runtime in masked ML-DSA implementations, and given the expected adoption of ML-DSA on embedded platforms, this speedup could significantly enhance efficiency in real-world applications
It’s TEEtime: Secure Interrupt Isolation for Normal-world Enclaves on Arm
Secure and direct peripheral access is an important building block for protecting sensitive applications on mobile and embedded devices: Otherwise, an untrusted OS or hypervisor can, for example, trivially intercept an app’s secrets when it renders them or restrict an app’s functionality by limiting the way it can interface with a peripheral. While Arm TrustZone supports secure peripheral access, its design choices limit the flexible deployment of apps in the secure world. Recent TEE designs address this by isolating applications outside the secure world, but at the cost of secure peripheral access. We propose TEEtime, a novel system that gives software running in normal-world isolated execution environments (domains) direct and secure peripheral access by relying on existing Arm TrustZone mechanisms and the secure monitor for enforcement. TEEtime introduces interrupt isolation as a novel key primitive; we design a fine-grained interrupt isolation framework for Armv8-A. We prototype TEEtime on Arm’s FVP simulator and on the Purism Librem 5 phone, showcasing a Signal messenger app running alongside an untrusted OS
Improving the Selection Rule of Correlation Attacks for Remote Power Analysis
Remote power analysis is a novel threat to information systems. Under this attack model, the adversary does not require direct physical access to the platform or specialized sensing equipment. Most of the literature in this field deals with advanced acquisition methods and adversarial models. In contrast, side-channel analysis techniques for remote attacks have not been sufficiently explored. We bridge this gap by taking a look at the characteristics of the data recovered from remote power analysis. We use these insights to propose a novel selection rule for correlationbased attacks that boosts success confidence. This improvement comes from the observation that the samples in a power trace are not independent. We show that adjacent samples can also provide useful information by proposing a post-processing step that capitalizes on these additional leakages. In contrast to previous work, the proposed technique does not rely on the selection of points of interest within the power traces. We further investigate the characteristics of “remote” power traces and their effect on the proposed selection rule through experiments with real (TDC, ChipWhisperer) and synthetic data sets. To assess the advantage of the proposed improvement, we also introduce novel performance metrics that divert from known-key evaluation techniques
Testing Security Equivalence in the Random Probing Model
The random probing model is a theoretical model that abstracts the physical leakage of an embedded device running a cryptographic scheme with more realistic assumptions compared to the threshold probing model. It assumes that the wires of the target device leak their assigned values with probability p, and the said values may reveal information about secret data, which could lead to a security violation. From that, we can compute the probability ϵ that a side-channel adversary may learn secret data from any random combination of wires as a function of the number of wire combinations that breaches security with rate p. This model is used to evaluate the security of masked cryptographic implementations, or simply named circuits; and the research community has been focusing so far on approximating or estimating the probability ϵ for one circuit. Yet, no proposition has been made to quickly compare the probability ϵ of different circuits, e.g., a circuit and its optimized version. In this context, we present two statistical tests to make decisions about the level of security in the random probing model: the equivalence test compares the security of two circuits in terms of ϵ’s and the superiority test decides whether the undetermined ϵ of one circuit falls below a security threshold ϵ0, both with quantified uncertainty about the computations of the probabilities ϵ’s. The validity of these tests is proven mathematically sound and verified via empirical studies on small masked S-boxes