1,721,376 research outputs found

    Multi-Bit Flipping Decoding of LDPC Codes for NAND Storage Systems

    No full text
    This letter presents a new multi-bit flipping decoding algorithm for low-density parity-check codes, which can enhance hard-information-based decoding performance for NAND storage systems. Since the conventional enhancement techniques developed for bit-flipping decoding require soft information, the long latency taken to generate the soft information, makes it hard to apply them to practical NAND storage systems. The proposed algorithm requires only hard information and achieves the better performance than previous hard-informationbased algorithms. The proposed method flips multiple bits in each iteration, but the maximum number of bits to be flipped in an iteration is restricted to prevent overcorrection. To relax the hardware complexity of sorting, in addition, an efficient approximation method is proposed, reducing the hardware complexity of a 512-input sorter by 48.3% without degrading the performance noticeably.

    FIR filter synthesis algorithms for minimizing the delay and the number of adders

    No full text
    As the complexity of digital filters is dominated by the number of multiplications, many works have focused on minimizing the complexity of multiplier blocks that compute the constant coefficient multiplications required in filters. Although the complexity of multiplier blocks is significantly reduced by using efficient techniques such as decomposing multiplications into simple operations and sharing common subexpressions, previous works have not considered the delay of multiplier blocks which is a critical factor in the design of complex filters. In this paper, we present new algorithms to minimize the complexity of multiplier blocks under the given delay constraints. By analyzing multiplier blocks in view of delay, three delay reduction methods are proposed and combined into previous algorithms. Since the proposed algorithms can generate multiplier blocks that meet the specified delay, a trade-off between delay and hardware complexity is enabled by changing the delay constraints. Experimental results show that the proposed algorithms can reduce the delay of multiplier blocks at the cost of a little increase of complexity

    SIMD processor-based turbo decoder supporting multiple third-generation wireless standards

    No full text
    A programmable turbo decoder is designed to support multiple third-generation wireless communication standards. We propose a hybrid architecture of hardware and software, which has small size, low power, and high performance like. hardware implementations, as well as the flexibility and programmability of software. It mainly consists of a configurable hardware soft-input-soft-output (SISO) decoder and a 16-b single-instruction multiple-data processor, which is equipped with five processing elements and special instructions customized for interleaving in order to provide interleaved data at the speed of the hardware SISO. A fast and flexible software implementation of the block interleaving algorithm is also proposed. The interleaver generation is split into two parts, preprocessing and on-the-fly generation, to reduce the timing overhead of changing the interleaver structure. We present detailed descriptions of the interleaving implementation applied to the W-CDMA and cdma2000 standard turbo codes. The decoder occupies 8.90 mm(2) in a 0.25-mu m CMOS with five metal layers and exhibits the maximum decoding rate of 5.48 Mb/s.This work was supported in part by the Institute of Information Technology Assessment through the ITRC and IC Design Education Center (IDEC)

    Timed compiled-code functional simulation of embedded software for performance analysis of SOC design

    No full text
    A new timing generation method is proposed for the performance analysis of embedded software. The time stamp generation of input/output (I/O) accesses is crucial to performance estimation and architecture exploration in the timed functional simulation that simulates the whole design at a functional level with timing. A portable compiler is modified to generate time deltas which are the estimated,cycle counts between two adjacent I/O accesses by counting the cycles of the intermediate representation (IR) operations and using a machine description that contains information on a target processor. Since the proposed method is based on the machine-independent IR of a compiler, the method can be applied to various processors by changing the machine description. The experimental results show that the proposed method is effective in that the average estimation error is about 2% and the maximum speed-up over the corresponding instruction-set simulators is about 300 times. The proposed method is also verified in a timed functional simulation environment.This work was supported in part by the Korea Science and Engineering Foundation through the MICROS center and in part by the IC Design Education Center (IDEC). This paper was recommended by Associate Editor R. Camposano

    A fixed-point MPEG audio processor operating at low frequency

    No full text
    A fixed-point pipelined processor optimized for decoding MPEG-1 audio layer III (MP3) is presented. Various examination and experiments on the decoding algorithm are exploited to lower the operating frequency by providing efficient instructions and addressing modes, because low frequency is directly related to low power consumption. Accordingly, the dynamic scaling method is used to preserve the precision while computing in fixed-point arithmetic. Also, a novel instruction set tuned for MP3 decoding Is presented. The resulting cycle count required to decode a frame is so small that the proposed processor can operate with 12.8 MHz while decoding in real-time.Korea Science and Engineering Foundation through the MICROS center at KAIST, Kore

    Digital filter synthesis based on an algorithm to generate all minimal signed digit representations

    No full text
    In this paper, the authors propose an algorithm to find all the minimal signed digit (MSD) representations of a constant and present an algorithm to synthesize digital filters based on the MSD representation. The hardware complexity of a digital signal processing system is dependent on the number system used for the implementation. Although the canonical signed digit (CSD) representation is widely employed, as it is unique and guarantees the minimal number of nonzero digits for a constant, the MSD representation provides multiple representations that have the same number of nonzero digits as the CSD representation. The proposed filter synthesis algorithm utilizes this redundancy of the MSD representation to make common subexpressions, as many as possible, leading to smaller filters. By applying the proposed algorithm to the hardware synthesis of finite impulse response filters, the authors obtained multiplier blocks that are 7% smaller than those generated from the CSD representation

    Low complexity motion estimation utilising spatial correlation

    No full text
    An efficient algorithm is proposed to reduce the computational complexity of block matching motion estimation by using the characteristics of spatial correlation. The proposed algorithm is to skip the motion vector search of inside macroblocks surrounded by identical motion vectors. Experimental results show that the proposed algorithm reduces computational complexity by 52.5% compared to conventional motion estimation at the cost of negligible performance degradation
    corecore