1,721,137 research outputs found

    Analysis and Design of Regular Structures for Robust Dynamic Fault Testability

    No full text
    Recent methods of synthesizing logic that is fully and robustly testable for dynamic faults, namely path delay, transistor stuck-open and gate delay faults, rely almost exclusively on flattening given logic expressions into sum-of-products form, minimizing the cover to obtain a fully dynamic-fault testable two-level representation of the functions, and performing structural transformations to resynthesize the circuit into a multilevel network, while also maintaining full dynamic-fault testability. While this technique will work well for random or control logic, it is not practical for many regular structures.To deal with the synthesis of regular structures for dynamic-fault testability, we present a method that involves the development of a library of cells for these regular structures such that the cells are all fully path-delay-fault, transistor stuck-open fault or gate-delay-fault testable. These cells can then be utilized whenever one of these standard functions is encountered.We analyze various regular structures such as adders, arithmetic logic units, comparators, multipliers, and parity generators to determine if they are testable for dynamic faults, or how they can be modified to be testable for dynamic faults while still maintaining good area and performance characteristics. In addition to minimizing the area and delay, another key consideration is to get designs which can be scaled to an arbitrary number of bits while still maintaining complete testability. In each case, the emphasis is on obtaining circuits which are fully path-delay-fault testable. In the process of design modification to produce fully robustly testable structures, we have derived a number of new composition rules that allow cascading individual modules while maintaining robust testability under dynamic fault models.United States. Defense Advanced Research Projects Agency (Contract N00014-87-K-0825)National Science Foundation (U.S.) (Young Investigator Award

    Secure and Robust Error Correction for Physical Unclonable Functions

    No full text
    Physical unclonable functions (PUFs) offer a promising mechanism that can be used in many security, protection, and digital rights management applications. One key issue is the stability of PUF responses that is often addressed by error correction codes. The authors propose a new syndrome coding scheme that limits the amount of leaked information by the PUF error-correcting codes

    Recombination of Physical Unclonable Functions

    No full text
    A new Physical Unclonable Function (PUF) construction is described, by treating silicon unique features extracted from PUF circuits as “genetic material” unique to each silicon, and recombining this chip-unique material in a way to obtain a combination of advantages not possible with the original PUF circuits, including altering PUF output statistics to better suit PUF-based key generation and authentication

    Reliable and efficient PUF-based key generation using pattern matching

    Full text link
    We describe a novel and efficient method to reliably provision and re-generate a finite and exact sequence of bits, for use with cryptographic applications, e.g., as a key, by employing one or more challengeable Physical Unclonable Function (PUF) circuit elements. Our method reverses the conventional paradigm of using public challenges to generate secret PUF responses; it exposes response patterns and keeps secret the particular challenges that generate response patterns. The key is assembled from a series of small (initially chosen or random), secret integers, each being an index into a string of bits produced by the PUF circuit(s); a PUF unique pattern at each respective index is then persistently stored between provisioning and all subsequent key re-generations. To obtain the secret integers again, a newly repeated PUF output string is searched for highest-probability matches with the stored patterns. This means that complex error correction logic such as BCH decoders are not required. The method reveals only relatively short PUF output data in public store, thwarting opportunities for modeling attacks. We provide experimental results using data obtained from PUF ASICs, which show that keys can be efficiently and reliably generated using our scheme under extreme environmental variation

    Thread Migration Prediction for Distributed Shared Caches

    Full text link
    Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware-level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 24% on average over the shared-NUCA design that only uses remote accesses

    Partitioning strategies: Spatiotemporal patterns of program decomposition

    No full text
    Link to the conference: http://www.iasted.org/conferences/pastinfo-668.htmlWe describe four partitioning strategies, or patterns, used to decompose a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel patterns by recognizing that applications are spatiotemporal in nature. There fore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, we arrive at four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), and catalog the benefits and drawbacks of each. In addition, the practical use of this work is demonstrated by applying these strategies, and combinations thereof, to implement several different parallelizations of a multicore H.264 encoder for high-definition video

    The Execution Migration Machine: Directoryless Shared-Memory Architecture

    No full text
    For certain applications involving chip multiprocessors with more than 16 cores, a directoryless architecture with fine-grained and partial-context thread migration can outperform directory-based coherence, providing lighter on-chip traffic and reduced verification complexity

    The locality-aware adaptive cache coherence protocol

    Full text link
    Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.United States. Defense Advanced Research Projects Agency. The Ubiquitous High Performance Computing Progra

    Locality-aware data replication in the Last-Level Cache

    Full text link
    Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes
    corecore