ISI Digital Commons (Indian Statistical Institute )
Not a member yet
7571 research outputs found
Sort by
Advanced Techniques in Symmetric Key Cryptanalysis
Symmetric key cryptographic primitives are essential tools used extensively in daily digital interactions. These primitives are mainly designed to provide three key services: ensuring data confidentiality, maintaining data integrity, and verifying the authenticity of data sources. The primary types of symmetric key primitives that deliver these services include block ciphers, stream ciphers, hash functions, message authentication codes, and authenticated encryption with associated data. This thesis mainly explores the security analysis of hash functions, several block ciphers, and stream ciphers using some advanced cryptanalytic techniques. We begin by examining the collision security of a hash function, specifically under the assumption that the underlying compression functions are collision-resistant. This characteristic is termed the collision-resistance preserving property of a hash function. Notably, both the Merkle-Damgård and Merkle tree hash structures exhibit this property, prompting the question of whether it is possible to reduce the number of underlying compression function calls while maintaining the collision-resistance preserving property. In pursuit of this question, we prove that for an ℓn-to-sn-bit collision-preserving hash function, designed using r tn-to-n-bit compression function calls, it must hold that r ≥ ⌈(ℓ − s)/(t − 1)⌉, assuming all operations other than the compression function are linear. Shifting our focus, we delve into advanced techniques for enhanced cryptanalysis of block and stream ciphers. Initially, we concentrate on the impossible differential (ID) and zero correlation (ZC) attacks, which are pivotal cryptanalytic methods for block ciphers. We introduce an advanced, unified constraint programming (CP) approach based on satisfiability for identifying ID distinguishers in ARX and AndRX ciphers alongside a similar method for identifying ZC distinguishers. Furthermore, we extend our novel model to formulate a unified optimization problem that incorporates the distinguisher and key recovery for AndRX designs. Our approach not only enhances ID attacks but also unveils new distinguishers for various ciphers, including SIMON, SPECK, Simeck, ChaCha, Chaskey, LEA, and SipHash. Another significant cryptanalytic technique, particularly applicable to the analysis of block and stream ciphers, is the division property—an advanced version of integral cryptanalysis. Here, we explore the feasibility of the MILP method for the bit-based division property using three subsets (BDPT) propagation in ciphers with complex linear layers. We apply our novel method to discover integral distinguishers based on BDPT for the SIMON, SIMON(102), PRINCE, MANTIS, PRIDE, and KLEIN block ciphers. The integral distinguishers identified by our method are superior to or consistent with the longest existing distinguishers. Finally, we investigate the cube attack, a powerful cryptanalytic technique against stream ciphers. We study the NIST lightweight 3rd round candidate Grain-128AEAD through the lens of division property-based cube attacks. Initially, we introduce some effective cubes and construct an algorithm to identify conditional key bits for these cubes in Grain-128AEAD. Subsequently, we employ the three-subset division property without unknown subsets based cube attacks to recover exact superpolies for Grain-128AEAD in the weak-key setting, yielding improved results
Multiple Hypothesis Testing Under Dependence
We have examined various aspects of multiple hypothesis testing under dependence. Traditional algorithms designed to control the error arising from multiplicity become severely conservative when the hypotheses exhibit positive dependence, resulting in a loss of power. There is a paucity of literature explicating the behaviour of traditional algorithms when the hypotheses are dependent. In the realm of multiple testing, a popular multiplicity correction is the Bonferroni correction, which is perhaps the oldest classical approach for controlling the Family-Wise Error Rate (FWER) at a desired level, regardless of dependence among hypotheses. However, under the global null and equicorrelated normal model, the actual FWER of the Bonferroni procedure is bounded above by a line connecting 0 and the desired level (alpha), making it overly conservative. We have also examined the actual FWER of the Bonferroni method in a nearly independent setup and shown that it approximates the FWER under independence when the correlations are smaller than the order of log n. In the second part, we explored the connection between multiple testing problems and classification theory. The Bayes rule, which is optimal for traditional classification problems, is also optimal for multiple testing problems under certain mild assumptions. However, the test statistic derived from the Bayes rule is challenging to simplify under dependence, limiting its practical application. We have simplified the optimal test statistic under a Gaussian model and, through extensive simulations and demonstrated that the performance of the Oracle rule significantly surpasses that of traditional approaches like the Benjamini- Hochberg (BH) FDR controlling procedure. Finally, we addressed the problem of estimating the proportion of null hypotheses under dependence. We have shown that the estimator proposed by Benjamini and Hochberg converges to 1 under independence. Additionally, simulations have been conducted to evaluate the performance of this estimator under various dependent structures, including m- dependent and block-dependent structure
Self-Supervised Learning and its Applications in Medical Image Analysis
Self-supervised learning (SSL) enables learning robust representations from unlabeled data and it consists of two stages: pretext and downstream. The representations learnt in the pretext task are transferred to the downstream task. Self-supervised learning has appli- cations in various domains, such as computer vision tasks, natural language processing, speech and audio processing, etc. In transfer learning scenarios, due to differences in the data distribution of the source and the target data, the hierarchical co-adaptation of the representations is destroyed, and hence proper fine-tuning is required to achieve satisfactory performance. With self-supervised pre-training, it is possible to learn repre- sentations aligned with the target data distribution, thereby making it easier to fine-tune the parameters in the downstream task in the data-scarce medical image analysis domain. The primary objective of this thesis is to propose self-supervised learning frameworks that deal with specific challenges. Initially, jigsaw puzzle-solving strategy-based frameworks are devised where a semi-parallel architecture is used to decouple the representations of patches of a slice from a magnetic resonance scan to prevent learning of low-level signals and to learn context-invariant representations. The literature shows that contrastive learn- ing tasks are better than context-based tasks in learning representations. Thus, we propose a novel binary contrastive learning framework based on classifying a pair as positive or neg- ative. We also investigate the ability of self-supervised pre-training to boost the quality of transferable representations. To effectively control the uniformity-alignment trade-off, we re-formulate the binary contrastive framework from a variational perspective. We further improve this vanilla formulation by eliminating positive-positive repulsion and amplifying negative-negative repulsion. The reformulated binary contrastive learning framework out- performs the state-of-the-art contrastive and non-contrastive frameworks on benchmark datasets. Empirically, we observe that the temperature hyper-parameter plays a signifi- cant role in controlling the uniformity-alignment trade-off, consequently determining the downstream performance. Hence, we derive a form of the temperature function by solving a first-order differential equation obtained from the gradient of the InfoNCE loss with respect to the cosine similarity of a negative pair. This enables controlling the uniformity- alignment trade-off by computing an optimal temperature for each sample pair. From experimental evidence, we observe that the proposed temperature function improves the performance of a weak baseline framework to outperform the state-of-the-art contrastive and non-contrastive frameworks. Finally, to maximise the transferability of representa- tions, we propose a self-supervised few-shot segmentation pretext task to minimise the disparity between the pretext and downstream tasks. Using the Felzenszwalb-based seg- mentation method to generate the pseudo-masks, we train a segmentation network that learns representations aligned with the downstream task of one-shot segmentation. We propose a correlation-weighted prototype aggregation step to incorporate contextual in- formation efficiently. In the downstream task, we conduct inference without fine-tuning and the proposed self-supervised one-shot framework performs better or at par with the contemporary self-supervised segmentation frameworks. In conclusion, the proposed self-supervised learning frameworks offer significant improve- ments in representation learning, and enhancing performance on downstream medical im- age analysis tasks, as observed from the different experimental results of the thesis
Some Applications of Divergences to Robust Inference with Mixed Data
This thesis focuses on the application of the density power divergence to studies involving mixed-data problems. It also develops a unified the- ory of two-sample nonparametric tests for a general class of divergence measures. The main content of the thesis is divided into three parts. The first part explores parameter estimation in ordinal response mod- els, which are prevalent in many scientific studies. A typical data set generated through an ordinal response model includes continuous, non- stochastic regressors and a response variable with ordinal outcomes. The theory of non-homogeneous density power divergence is applicable here, provided appropriate conditions on the regressors and link functions are satisfied. The roles of different link functions in estimation are thor- oughly analyzed, and the robustness of the estimators is evaluated using the influence function, the (explosive) breakdown point, and the implosive breakdown point. The latter two measures are found to be very high, en- suring the robustness of the minimum density power divergence method against various types of outliers. The second part focuses on the estimation and development of Wald- type tests for polychoric correlation. Initially, the standard density power divergence is applied. Subsequently, a two-step approach is introduced, which, while theoretically more complex, substantially reduces the com- putational burden. The results from the two-step approach are highly consistent with those from the initial method. Additionally, a new divergence measure involving two tuning parame- ters, derived from the density power divergence (DPD), is proposed. These estimates perform at least as good as the DPD under pure data condi- tions, up to a threshold defined by the tuning parameters. Moreover, the proposed estimates exhibit enhanced robustness compared to the DPD. Given the effectiveness of polychoric correlation in quantifying associa- tions between categorical variables, this research provides valuable tools for applied scientists. The third part introduces a class of two-sample nonparametric tests based on the class of extended Bregman divergences to assess the equality of two completely unstructured absolutely continuous distributions. The asymptotic distributions of the test statistics are derived under both the null hypothesis and contiguous alternatives. The robustness of the pro- posed method is studied through the influence function and the asymp- totic breakdown point. Numerical studies are conducted for two spe- cific divergence families: the generalized S-Bregman divergence and the Exponential-Polynomial divergence measures. Notably, divergences out- side the power divergence family often perform better within this frame- work. Finally, a generic tuning parameter selection strategy is proposed, en- abling the application of the method to real-world data. The theoretical developments presented in this part hold the potential for extension to various other research areas in the future
Robust Matrix Factorization using the Density Power Divergence and its Applications
In the modern era of big data, high-dimensional datasets are becoming increasingly common across a range of disciplines, including machine learning, natural language processing, finance, and genomics. Extracting meaningful information from these datasets often requires uncovering low-dimensional structures hidden within the data. Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) are widely used matrix factorization techniques for this purpose. However, the traditional methods to compute these are extremely sensitive to outliers, with even a single aberrant observation potentially leading to highly imprecise results. This issue is exacerbated in high-dimensional datasets, where outliers are difficult to detect. Classical robust inference techniques, such as M-estimators, struggle due to their diminishing breakdown points as the data dimension becomes extremely large. This thesis addresses these challenges by proposing a novel class of robust matrix factorization techniques based on the minimum density power divergence estimator (MD- PDE). The MDPDE, a member of the broader class of minimum divergence estimators, is well-known for its robustness and efficiency across diverse applications. Crucially, it offers a dimension-free asymptotic breakdown point, making it particularly well-suited for high- dimensional settings. In this work, we leverage this estimator to develop robust versions of SVD and PCA, referred to as rSVDdpd and rPCAdpd, respectively. The thesis is structured as follows: In Chapter 1, we provide the necessary background on classical matrix factorization techniques, introduce key concepts related to minimum divergence estimators, particularly the MDPDE, and the notations to be used throughout the thesis. Chapter 2 presents the novel rSVDdpd algorithm, detailing its theoretical properties, including different equivariance properties, algorithmic convergence and consistency. Through simulation studies, we demonstrate the algorithm’s superior robustness compared to existing methods, particularly in high-dimensional settings. We also apply the rSVDdpd algorithm to the problem of video surveillance background modelling,showcasing its real-world applicability. Chapter 3 extends this methodology to robust PCA, resulting in the rPCAdpd al- gorithm. We establish its theoretical properties such as orthogonal equivariance, con- sistency and asymptotic normality. We also demonstrate that its influence function re- mains bounded, ensuring its robustness to outliers. Comparative studies with benchmark datasets reveal that rPCAdpd outperforms existing robust PCA algorithms, particularly in scenarios with high-dimensional data with a low signal-to-noise ratio. The robust SVD and the PCA algorithms introduced in Chapters 2 and 3 require a robust estimate of the rank of the low-dimensional component of the data matrix. To this end, we propose a new penalized criterion, DICMR, in Chapter 4. Theoretical results on selection consistency and B-robustness are established, and extensive simulation studies show that DICMR is the best-performing among penalized methods, and also provides competitive performance relative to cross-validation methods while being computationally efficient. A key contribution of this thesis, explored in Chapter 5, is the demonstration that the MDPDE has a dimension-free lower bound to its asymptotic breakdown point. This property makes it uniquely robust in high-dimensional settings, a significant improve- ment over classical M-estimators. We further generalize this result in Chapter 6, showing that the dimension-free breakdown point holds for a broader class of estimators known as minimum generalized Alpha-Beta divergence estimators. We derive the necessary and sufficient conditions under which the corresponding divergence measures are well-defined and nonnegative, contributing to the theoretical understanding of generating novel statistical divergence measures that may lead to robust estimation in high-dimensional data. Chapter 7 concludes the thesis, summarizing the key findings and outlining directions for future research. This includes potential extensions of the proposed algorithms to other matrix factorization problems and the exploration of more practical applications beyond those demonstrated in the thesis. Overall, this thesis aims to contribute to the field of robust statistics by developing scalable, robust matrix factorization techniques with strong theoretical guarantees and practical relevance in high-dimensional data analysis
Making Cloud Storages Secure and Efficient
Over the years, searchable symmetric encryption (SSE) schemes have emerged as a promising tool for enabling efficient query processing over encrypted data stored in untrusted cloud servers. This thesis mainly focuses on efficiency and security enhancements of dynamic searchable symmetric encryption (DSSE) schemes, which support various query types and are secure against several adversarial conditions. For any SSE scheme, its query processing, storage, and communication costs are directly related to the size of the encrypted index stored on the server. A reduction of the index size naturally leads to enhanced search efficiency and reduced storage and communication costs. We are unaware of any previous attempts to reduce the index size of SSE schemes. We introduce a novel technique to directly reduce the index size of any SSE. Our proposed method generically transforms any secure single keyword SSE into an equivalently functional and secure version with reduced storage requirements, resulting in faster search and reduced communication overhead. Our technique involves arranging the set of document identifiers db(w) related to a keyword w in the leaf nodes of a complete binary tree, eventually obtaining a succinct representation of the set db(w). This compact representation leads to smaller index sizes. We conduct extensive theoretical analysis to prove the correctness of our scheme. Additionally, our experiments on real and synthetic data validate the effectiveness of our approach and demonstrate its practical applicability. Among the few SSE schemes available in the literature which support complex query types like conjunctive queries, the oblivious cross tag (OXT) scheme from Crypto’13 is the most efficient one. OXT has the limitation that it only works for static databases. In NDSS’20, an extension of OXT called the oblivious dynamic cross tag (ODXT) was proposed. ODTX supports conjunctive queries with dynamic updates. However, ODXT is not forward private. We propose a generic framework for designing conjunctive dynamic SSE (CDSSE) schemes, supporting conjunctive queries that allow dynamic updates while being both forward and backward private simultaneously. To the best of our knowledge such a scheme does not exist till date. Our scheme assumes a restricted update model where a document with its associated keywords can be dynamically added to or deleted from the database as a whole, but the set of keywords for a document is not modified once uploaded. We define forward and backward privacy for this new setting of updates and extend the OXT scheme to make it dynamic in the new setting. We prove the security of our construction against adaptive adversaries and analyse the precise leakages to the adversarial server. Experiments show that our schemes are very efficient. Another less studied aspect of SSE schemes is verifiability. In an SSE scheme, the server may be dishonest and may not respond to a client’s queries following the prescribed protocol. A verifiable SSE can detect such anomalous behaviour of a server. To defend against such malicious adversaries, previous approaches employ authenticated encryption (AE) to furnish a “proof” for each update. We propose a new construction where we convert any forward and backward private adaptively secure SSE scheme into a verifiable SSE. Our construction uses a new class of message authentication codes (MAC), which we call updatable message authentication codes (UdMAC). A UdMAC allows the verification tag for a message to be updated with each modification to the message without recomputing the entire MAC, ensuring efficiency. We establish security requirements for such a MAC and introduce two constructions, ConCatU and XoRU, which work with two different types of message updates, namely, concatenation and exclusive-or (XOR), respectively. Furthermore, we present the first generic construction for a forward and backward private faulttolerant verifiable DSSE using a UdMAC construction and prove its security. Our construction converts any generic forward and backward secure SSE secure in an honest-but-curious adversarial model into an equivalently secure DSSE secure in a malicious adversarial model with faulty updates
Contribution to the Linear Complementarity Problem and Completely Mixed Games
This dissertation focuses on the linear complementarity problem (LCP ), two-person zero-sum matrix games, and Q-tensors. A matrix game is considered completely mixed if all the optimal pairs of strategies in the game are completely mixed. In this thesis, we provide new characterizations of Kaplansky’s results (1945 and 1995) on completely mixed games. Pang proved that within the class of semimonotone matrices, R0-matrices are Q- matrices and conjectured that the converse is also true. Gowda proved that the conjecture is true for symmetric matrices. We prove that semimonotone Q-matrices are R0-matrices up to order 3 and provide a counterexample to show that this statement does not hold for matrices of order 4 and higher. We also provide an application of this result using completely mixed games. Stone proposed that fully semimonotone Q0-matrices are P0-matrices. In this thesis, we establish that this conjecture holds true for matrices with certain sign patterns. Since fully semimonotone matrices are semimonotone and Z-matrices are Q0, we demonstrate that semimonotone Z-matrices are P0. Gowda proved that a Z-matrix with value zero is completely mixed if and only if it is irreducible. We provide new equivalent conditions for this statement. Additionally, we present results on completely mixed games, exploring their connection to various classes of matrices. We also extend some results of Q-matrices to Q-tensors
Domain Obedient Deep Learning
Deep learning, a family of data-driven artificial intelligence techniques, has shown immense promise in a plethora of applications, and it has even outpaced experts in several domains. However, unlike symbolic approaches to learning, these methods fall short when it comes to abiding by and learning from pre-existing established principles. This is a significant deficit for deployment in critical applications such as robotics, medicine, industrial automation, etc. For a decision system to be considered for adoption in such fields, it must demonstrate the ability to adhere to specified constraints, an ability missing in deep learning-based approaches. Exploring this problem serves as the core tenet of this dissertation. This dissertation starts with an exploration of the abilities of conventional deep learning-based systems vis-à-vis domain coherence. A large-scale rule-annotated dataset is introduced to mitigate the pronounced lack of suitable constraint adherence evaluation benchmarks, and with its aid, the rule adherence abilities of vision systems are analyzed. Additionally, this study probes language models to elicit their performance characteristics with regard to domain consistency. Examination of these language models with interventions illustrates their ineptitude at obeying domain principles, and a mitigation strategy is proposed. This is followed by an exploration of techniques for imbuing deep learning systems with domain constraint information. Also, a comprehensive study of standard evaluation metrics and their blind spots pertaining to domain-aware performance estimation is undertaken. Finally, a novel technique to enforce constraint compliance in models without training is introduced, which pairs a search strategy with large language models to achieve cutting-edge performance. A key highlight of this dissertation is the emphasis on addressing pertinent real-world problems with scalable and practicable solutions. We hope the results presented here pave the way for wider adoption of deep learning-based systems in pivotal situations with enhanced confidence in their trustworthiness
Some Studies On Information Set Decoding Algorithms and Universal Hash Functions
This thesis presents some studies on Information Set Decoding algorithms and Universal Hash Functions. In the context of Information Set Decoding (ISD) the thesis studies time/memory trade-off of ISD algorithms and in the context of universal hash functions, the thesis studies design and efficient implementations of polynomial hash functions defined over prime order fields. A cornerstone of ISD algorithms is the algorithm proposed by Stern and it introduced the meet-in-the-middle collision search approach to ISD algorithms. Though this algorithm is more efficient in terms of asymptotic time complex- ity than the preceding algorithms proposed by Prange, Lee and Brickell and Leon, the minimum time complexity of Stern’s algorithm is achieved by using a large amount of memory. The algorithms which have better time complexities than Stern’s algorithm like MMT and BJMM have higher space complexities than that of Stern’s algorithm. The memory complexities of Stern’s algorithm, MMT and BJMM are of form O(2^{cn}) where n is the code length and the value of the constant c depends on the corresponding algorithms. Hence study- ing time/memory trade-off of Stern’s algorithm is relevant. Hence studying time/memory trade-off of Stern’s algorithm is relevant. Our first contribution is a generalisation of Stern’s information set decoding (ISD) algorithm. Stern’s algorithm, a variant of Stern’s algorithm due to Dumer, as well as a recent generalisation of Stern’s algorithm due to Bernstein and Chou are obtained as special cases of our generalisation. Our second contribution is to introduce the notion of a set of effective time/memory tradeoff (TMTO) points for any ISD algorithm for given ranges of values of parameters of the algorithm. Such a set succinctly and uniquely captures the entire landscape of TMTO points with only a minor loss in precision. We further describe a method to compute a set of effective TMTO points. As an application, we compute sets of effective TMTO points for the five variants of the Classic McEliece cryptosystem corre- sponding to the new algorithm as well as for Stern’s, Dumer’s and Bernstein and Chou’s algorithms. The results show that while Dumer’s and Bernstein and Chou’s algorithms do not provide any interesting TMTO points beyond what is achieved by Stern’s algorithm, the new generalisation that we propose provide about twice the number of effective TMTO points that is obtained from Stern’s algorithm. We have also discussed the consequences of the obtained TMTO points to the classification of the variants of Classic McEliece in appropriate NIST categories. Polynomial hashing technique constructs a univariate polynomial using the input message such that the coefficients of the polynomial are elements of a finite field and evaluates the polynomial at a secret point, called the key, of the same finite field to obtain the digest. The security requirement on such hash functions is that all differential probabilities are provably small. Hash families with low differential probabilities are called Almost XOR Universal and are a generalisa- tion of universal hash functions. To construct the polynomial the input message is divided into ℓ number of blocks where each block is of equal size (the last block can be smaller than the rest of the blocks) and each of the blocks is an element of the finite field. These ℓ blocks form coefficients of the polynomial. Another approach of construction and evaluation has been proposed by Bernstein based on a previous work by Rabin and Winograd and such polynomials are called BRW polynomials. BRW polynomials require half the number of field multipli- cations than Horner based evaluations for the same input message. Polynomial hash functions have important applications in Cryptography including the con- struction of authentication schemes and methods of authenticated encryption schemes. Poly1305 is a polynomial hash function which has been widely used in various real-life scenarios. The AEAD scheme ChaCha20-Poly1305 is used in TLS, SSH etc. We provide an improved SIMD implementation of Poly1305. We propose a simple algorithmic technique (which we call the Balancing Technique) which leads to noticeable and significant improvements in terms of cycles/byte for various ranges of lengths of message. The message lengths considered by us lie in the range of 49 bytes to 4KB. Our next contribution is comprehensive study of usual polynomial based hashing and hashing based on BRW polynomi- als, and the various ways to combine them. Several hash functions are proposed and upper bounds on their differential probabilities are derived. Concrete in- stantiations are provided for the primes 2^{130}-5 and 2^{127}-1. We have done an extensive 64-bit implementation of all the proposed hash functions in assembly targeted at modern Intel processors. The timing results suggest that using the prime 2^{127}-1 is significantly faster than using the prime 2^{130}-5. Further, a judicious mix of the usual polynomial based hashing and BRW-polynomial based hashing can provide a significantly faster alterna- tive to only usual polynomial based hashing. In particular, the timing results of our implementations show that our final hash function proposal for the prime is much faster than the well known Poly1305 hash function defined over the prime, achieving speed improvements of up to 40%
Mazur’s Intersection Property and its Variants
We discuss various differentiability notions in connection with ball separation prop- erties. We characterise the uniform Mazur’s intersection property (UMIP) in terms of w*-semidenting points in attempt to resolve a long standing open question: “Does UMIP imply uniformly smooth renorming?” Further, we discuss a stronger version of UMIP called the hyperplane uniform Mazur intersection property (HUMIP) which is shown to characterise uniform smoothness. Similar ball separation char- acterisations are obtained for Fr´echet smoothness and asymptotic uniform smoothness (AUS). These ball separation properties are then shown to be residual properties. Thus, we obtain that norms which have UMIP or norms which are (asymototically) uniformly smooth are residual in the set of all equivalent norms. This also helps taking the open question forward which asks for residuality of Fr´echet smooth norms. Also, in attempt to understand UMIP better, we discuss UMIP from some quantitative aspects. We obtain conditions for the stability of UMIP under ℓp-sum and use an example by Borwein and Fabian to answer the following question in negative: “Does hereditary MIP imply Fr´echet smooth- ness?” Some interesting problems and possible approaches are discussed at the end