1,721,048 research outputs found

    Towards Robust Design and Training of Deep Neural Networks

    Full text link
    Currently neural networks run as software, which typically requires expensive GPU resources. As the adoption of deep learning continues for a more diverse range of applications, direct hardware implemented neural networks (HNN) will provide deep learning solutions at far lower hardware requirements. However, Gaussian noise along hardware connections degrades model accuracy, an issue this research seeks to resolve using a novel analog error correcting code (ECC). To aid in developing noise tolerant deep neural networks (DNN), this research also investigates the impact of loss functions on training. This involves alternating multiple loss functions throughout training, aiming to prevent local optimals. The effects on training time and final accuracy are then analyzed. This research investigates analog ECCs and loss function variation to allow for future noise tolerant HNN networks. ECC results demonstrate three to five decibel improvements to model accuracy when correcting Gaussian noise. Loss variation results demonstrate a correlation between loss function similarity and training performance. Other correlations are also presented and addressed

    FACIAL EXPRESSION RECOGNITION: FROM NAMED EXPRESSIONS TO UNNAMED EXPRESSIONS

    Full text link
    Facial expressions plan a very important role in interpersonal relations as they convey nonverbal cues. Automatic recognition of facial expressions forms a crucial component in human-machine interfaces. The main motivation behind choosing this problem is that there are many facial expressions to recognize. It is a very difficult task to categorize them as they are very subtle. Same expressions can have different meanings for different people in different context. Facial Expression Recognition (FER) has applications across many domains like business, education, and health care. In the existing work, people mainly focus on recognizing the seven basic expressions like happy, sad, disgust, angry, surprise, neutral, and fear. In this research work, we try to explore a new direction where we try to recognize many more expressions apart from the basic seven expressions. These expressions are hard to name but exist in real life. The approach taken is One-shot learning. Every time we observe a new expression, we use one-shot learning technique to recall previous cases where same expression was seen. By doing this, people can understand in which context the same expression appears, which will lead to the understanding of each expression. In the present work, we train the neural network for the basic seven expressions. We later extract the features from the penultimate Fully Connected (FC) layer as a feature representation for the input image. These features are used in further processing and as a basis for one-shot learning. While the current research involves 2D static images, we further extend our research from 2D expression to 3D video clips. The main reason for doing this is, expression is not a static image of the face at a given time. Actually, if we involve the change in expression in a short period of time, it is more meaningful in recognizing expression. This aspect has been less explored before. The results obtained for 2D static images show that One-shot learning performs a very good job in recognizing new expressions with just one training example

    On The Generalization of Error-Correcting WOM Codes

    Full text link
    Abstract — WOM (Write Once Memory) codes are codes for efficiently storing and updating data in a memory whose state transition is irreversible. Storage media that can be classified as WOM includes flash memories, optical disks and punch cards. Error-correcting WOM codes can correct errors besides its regular data updating capability. They are increasingly important for electronic memories using MLCs (multi-level cells), where the stored data are prone to errors. In this paper, we study error-correcting WOM codes that generalize the classic models. In particular, we study codes for jointly storing and updating multiple variables – instead of one variable – in WOMs with multi-level cells. The error-correcting codes we study here are also a natural extension of the recently proposed floating codes [7]. We analyze the performance of the generalized errorcorrecting WOM codes and present several bounds. The number of valid states for a code is an important measure of its complexity. We present three optimal codes for storing two binary variables in n q-ary cells, where n = 1, 2, 3, respectively. We prove that among all the codes with the minimum number of valid states, the three codes maximize the total number of times the variables can be updated. I

    Network Coding for Joint Storage and Transmission with Minimum Cost

    No full text
    Abstract — Network coding provides elegant solutions to many data transmission problems. The usage of coding for distributed data storage has also been explored. In this work, we study a joint storage and transmission problem, where a source transmits a file to storage nodes whenever the file is updated, and clients read the file by retrieving data from the storage nodes. The cost includes the transmission cost for file update and file read, as well as the storage cost. We show that such a problem can be transformed into a pure flow problem and is solvable in polynomial time using linear programming. Coding is often necessary for obtaining the optimal solution with the minimum cost. However, we prove that for networks of generalized tree structures, where adjacent nodes can have asymmetric links between them, file splitting — instead of coding — is sufficient for achieving optimality. In particular, if there is no constraint on the numbers of bits that can be stored in storage nodes, there exists an optimal solution that always transmits and stores the file as a whole. The proof is accompanied by an algorithm that optimally assigns file segments to storage nodes. I

    Functional Error Correction for Robust Neural Networks

    Full text link
    When neural networks (NeuralNets) are implemented in hardware, their weights need to be stored in memory devices. As noise accumulates in the stored weights, the NeuralNet���s performance will degrade. This paper studies how to use error correcting codes (ECCs) to protect the weights. Different from classic error correction in data storage, the optimization objective is to optimize the NeuralNet���s performance after error correction, instead of minimizing the Uncorrectable Bit Error Rate in the protected bits. That is, by seeing the NeuralNet as a function of its input, the error correction scheme is function-oriented. A main challenge is that a deep NeuralNet often has millions to hundreds of millions of weights, causing a large redundancy overhead for ECCs, and the relationship between the weights and its NeuralNet���s performance can be highly complex. To address the challenge, we propose a Selective Protection (SP) scheme, which chooses only a subset of important bits for ECC protection. To find such bits and achieve an optimized tradeoff between ECC���s redundancy and NeuralNet���s performance, we present an algorithm based on deep reinforcement learning. Experimental results verify that compared to the natural baseline scheme, the proposed algorithm achieves substantially better performance for the functional error correction task

    Learning to Understand New Facial Expressions

    Full text link
    Facial expression recognition is getting popular in the research community because of its extensive use in understanding human sentiments. Among various medium of human interaction uses in daily life, the facial expression is the most direct form of communication that explains a lot about human emotions. Because of this reason, researchers are actively exploiting this field of human-computer interaction. The research aims for the development of automatic facial expression annotation for context-based database generation. We pointed out the limitation of an existing facial expression detection system for real-world application and studied new ways to bridge current research and user application. We proposed a one-shot learning-based automatic facial expression labeling technique which requires very few manual labels to understand the context of sentiment in expression and utilizes them to train facial expression system with a specific use case. The evaluation of the proposed model is done with two methods (i) we manually labeled few more examples and tested the model against those examples, and (ii) from the seven basic facial expressions, we kept one facial expression separate and used those example to test the efficiency of the model

    Ring-Based Resonant Standing Wave Oscillators for 3D Clocking Applications

    Full text link
    Ring-based resonant standing wave oscillators have been shown to be a useful clocking tech-nique that can distribute and generate a high frequency, low skew, low power, and stable clock signal. By using through-silicon-vias, this type of standing wave oscillator can be used to gener-ate the clocking scheme for 3D integrated circuits. In this thesis, we propose the use of such 3D standing wave oscillators and show how independent 3D oscillators in different stacks can syn-chronize through the use of a redistribution layer stub. Inter-chip clock synchronization is then accomplished without the need for a PLL. In addition, we propose the first 3D ring-based resonant standing wave oscillator bootstrap and reset circuit to initialize and stop oscillation. Using a 3D ring-based resonant standing wave oscillator, we propose a ring-based data fabric for 3D stacked DRAM and compare the results with existing approaches such as High Bandwidth Memory (HBM) or Wide I/O memory. We show that our Memory Architecture using a Ring-based Scheme (MARS) can provide the increases in speed necessary to overcome current memory bottlenecks, and can scale effectively as future 3D stacks become larger. Our MARS can trade off power, throughput, and latency to match different application requirements. By using a narrow bus, and connecting it to all channels, the MARS8 can provide an alternative memory configuration with ��� 6.9�� lower power consumption than HBM, and ��� 2.7�� faster speeds than Wide I/O. Using multiple ring topologies in the same stack, the channel count can double from 8 to 16, and then to 32. This is possible since MARS uses about 4�� fewer TSVs per channel than HBM or Wide I/O. This provides speeds up to ��� 4.2�� faster than traditional HBM. This scalable architecture allows higher throughput and faster system performance for next-generation DRAM. The MARS topology proposed in this thesis can be used in a variety of computing systems, from lightweight IoT to large-scale data centers

    Dynamic Feature Selection via Reinforcement Learning

    Full text link
    Monte Carlo REINFORCE is used to design an algorithm to not only find the optimal deep learning architecture but also the optimal set of features that can maximize the performance of the said deep learning model. The algorithm is applied to the problem of predicting the onset of severe sepsis (before 4 hours) and the results are compared with existing severe sepsis literature. Sepsis is a life-threatening condition caused by the patient body���s extreme response to an infection, causing tissue damage and multiple organ failures. MIMIC-III dataset, a publicly available medical dataset is used for all the experiments. Apart from the 6 common vital sign measurements, the dataset also contains 127 physiological and laboratory features to predict the onset of severe sepsis, mostly observed in intensive care units (ICUs). Reinforcement learning is used to reduce the number of features (from 133) without sacrificing peak model performance that uses all 133 features. Among the discovered deep learning models, the CNN-LSTM model using 110 features achieves the best performance: an AUC of 0.933 in predicting the onset of severe sepsis

    Robust Word Predictions

    Full text link
    Natural Language Processing (NLP) is a sub-field of Artificial Intelligence (AI) that allows machines to process and comprehend human languages in order to bring machines nearer to language understanding. In older days, statistical methods were predominant where the rules were written / calculated manually. Recent advances in Machine Learning and Deep Learning have led to many breakthroughs in various sub-fields of NLP which include language modeling, machine translation, speech recognition etc. Language Model (LM) forms a building block of all the NLP applications where the task is to predict the next word given previous words. The main drawback of the language model is that it is limited to predicting next word and the training requires significant amount of vocabulary. This research aims for the development of generalized language model in which the prediction is not just limited to next word but to any word in the future text. The research tried to expand the horizons of language modeling problem and make it more generalized in terms of understanding context as well as making prediction. This work proposes a Neural generalized language modeling technique and tested on two kinds of databases i.e., BBC News articles as well as Wikipedia exploring the use of word embeddings, attention mechanism to understand the context and making context based word predictions. The main focus lies on predicting non-stop words and meaningful words. Also, this work explored the memory capabilities and Noise tolerance of the developed model

    A Pipeline of Energy Efficient Action Detection

    Full text link
    Action detection has been an essential topic in computer vision tasks for the last decade. There is lots of research done to get high accuracy in action detection based on image features. However, image features consume heavy computation and energy. For those models deployed on edge devices, image features are not very suitable and applicable. A less computational method based on the skeleton is proposed by [1]. Compared to image features, skeleton features require significantly fewer flops. Nonetheless, most skeleton-based action detection research focuses on existing datasets. Little research has been put into running action detection on mobile devices. In this thesis, a pipeline in energy-efficient action detection is proposed to achieve action detection on unmanned aerial vehicles (UAVs). The environmental platform is Raspberry Pi. This pipeline could detect person real-time on Raspberry Pi and classify action through a skeleton-based action classification network. There is lots of research in this area for object detection, including deploying models on edge devices. However, most research considers multi-classes and generalize, and the network structure is not light enough in this situation. This thesis proposes a specific detection network with higher FPS but lower flops. Besides, a skeleton-based spatial-temporal transformer network is also proposed in this thesis. Action classification network consists of a graph convolution network and a multi-head transformer module. Gaussian distribution is introduced as weak supervision in action classification. The pipeline in this thesis consists of three parts: proposed object detection network, existing light pose estimation network, and skeleton-based action classification network. This pipeline achieves an end-to-end structure on mobile devices. To speed up inference, this thesis also prunes and quantizes each model. This pipeline has been tested with deployment on a Raspberry Pi and videos recorded by UAV under a public environment. An energy-measuring system is established to measure energy consumption
    corecore