1,721,017 research outputs found
Multi-Camera Monitoring of Human Activities at Critical Transportation Infrastructure Sites
The goal of this work is to provide a system which can aid in monitoring crowded urban environments, which often contain tight groups of people. In this report, we consider the problem of counting the number of people in the scene and also tracking them reliably. We propose a novel method for detecting and estimating the count of people in groups, dense or otherwise, as well as tracking them. Using prior knowledge obtained from the scene and accurate camera calibration, the system learns the parameters required for estimation. This information can then be used to estimate the count of people in the scene, in real time. Groups are tracked in the same manner as individuals, using Kalman filtering techniques. Favorable results are shown for groups of various sizes moving in an unconstrained fashion.Ribnick, Evan; Joshi, Ajay J.; Papanikolopoulos, Nikolaos P.. (2008). Multi-Camera Monitoring of Human Activities at Critical Transportation Infrastructure Sites. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/96284
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
True shared memory architecture for next-generation multi-GPU systems
Machine learning (ML) is now omnipresent in all spheres of life. The use of deep neural networks (DNNs) for ML has gained popularity over the past few years. This is because DNNs are capable of efficiently solving complex problems such as image processing, object detection, language processing, etc. To train these DNN workloads, graphics process- ing units (GPUs) have become the most widely used platform. A GPU can support a large number of parallel threads that execute simultaneously to achieve a very high throughput. However, as the sizes of the DNN workloads grow, a single GPU is no longer adequate to provide fast training, and developers resort to using multi-GPU (MGPU) systems that can reduce the training time significantly. Consequently, to keep pace with the growth of DNN applications, GPU vendors are actively developing novel and efficient MGPU systems.
To better understand the challenges associated with designing MGPU systems for DNN workloads, in this thesis, we first present our efforts to understand the behavior of the DNN workloads, in particular, the training of DNN workloads on MGPU systems. Using the DNN workloads as benchmarks, we observe the evolution of MGPU system architecture. Based on our profiling and characterization of DNN workloads on existing high-performance MGPU systems, we identify the computation- and communication- intensiveness of the DNN workloads and the hardware- and software-level inefficiencies present in the existing MGPU systems. We find that the data movement across multiple GPUs and high remote data access cost leading to NUMA effects, data duplication, and inefficient use of GPU memory leading to memory capacity issues, and the complexity in programming MGPUs pose serious limitations in the execution of ever-scaling DNN workloads on MGPU systems.
To overcome the limitations of existing MGPU systems, we propose to unify the main memory of GPUs to design an MGPU system with true shared memory (MGPU-TSM). Our proposed MGPU-TSM system demonstrates a significant performance boost (3.8× for a 4 GPU system) over the best-performing existing MGPU system. This is because MGPU-TSM system eliminates the NUMA effects and the necessity for data duplication. To provide seamless data sharing across multiple GPUs and ease programming of MGPU- TSM, we propose a light-weight coherence protocol called MGCC. MGCC is a timestamp- based protocol that provides both intra- and inter-GPU coherence. We implement a number of hardware features including unified memory controller, request tracker and timestamp storage unit to support MGCC. Using both standard and synthetic stress benchmarks, we evaluate the MGPU-TSM system with MGCC leveraging sequential as well as relaxed consistency. Our evaluation of a 4-GPU system using MGPUSim simulator suggests that our proposed coherent MGPU system achieves up to 3.8× improved performance than current best-performing MGPU system while the stress tests performed using synthetic benchmarks suggests that MGCC leads to up to 46.1% performance overhead
Domain-specific accelerators using optically-addressed phase change memory
2025In recent years, the exponential growth in data generation and the increasing complexity of computational tasks have created a pressing need for more efficient computing solutions. To address this demand, researchers have developed domain-specific accelerators (DSAs) for various applications, including machine learning (ML), combinatorial optimization, and fully homomorphic encryption (FHE). However, traditional electronic accelerators face significant challenges in both performance and energy efficiency, largely due to the memory wall problem and the limitations of complementary metal-oxide-semiconductor (CMOS) technology scaling. As a result, electronic devices are increasingly unable to meet the growing computational demands, necessitating the exploration of alternative computing paradigms. Among various solutions, optically-addressed phase change memory (OPCM) has emerged as a promising candidate, offering high computational and communication throughput, along with processing-in-memory (PIM) capabilities. However, OPCM also presents unique challenges-such as high programming overhead, low storage density, and limited computational precision-that differ significantly from those of traditional electronic devices. To fully exploit the potential of OPCM while mitigating these limitations, it is necessary to design OPCM-based DSAs that are specifically tailored to the distinct characteristics of the technology. Accordingly, this thesis focuses on the design of OPCM-based DSAs, incorporating optimizations at the device, architecture, and algorithm levels. We first present an ML accelerator using OPCM. OPCM-based PIM systems offer a promising solution to mitigate the data movement overhead in deep neural network (DNN) inference. Prior OPCM-based accelerators have primarily targeted small-scale DNNs that can fit entirely within a limited OPCM array, while neglecting the impact of programming cost. This assumption does not hold for practical deployments. To address this, we propose a system-level design that explicitly accounts for OPCM's high programming overhead and demonstrate that this cost becomes the dominant factor in DNN inference performance on OPCM-based PIM architectures. We conduct a thorough design space exploration to identify the most energy-efficient OPCM array size and batch size configurations. Additionally, we introduce a novel thresholding and weight block reordering technique to further reduce programming overhead. Through these optimizations, our approach achieves up to 65.2× higher throughput compared to existing photonic accelerators when applied to realistic DNN workloads. We then present an Ising machine accelerator using OPCM for solving combinatorial optimization problems. Previous implementations of Ising machines required the hardware capacity to be larger than the problem size; otherwise, their performance would have degraded significantly. We propose SOPHIE, a Scalable Optical PHase-change-memory based Ising Engine that targets the scalability challenge of Ising machines. SOPHIE's modified algorithm incorporates a symmetric local update technique and a stochastic global synchronization strategy, which reduces the overall computation demand and global synchronization overhead. We apply device-level optimizations to support the modified algorithm, including employing bi-directional OPCM arrays and dual-precision analog-to-digital converters (ADCs). Our symmetric tile mapping method at the architecture level reduces approximately half of the OPCM array area, enhancing the scalability of the system. SOPHIE is 3× faster than the state-of-the-art (SOTA) photonic Ising machines on small graphs and 125× faster than the field-programmable gate array (FPGA)-based designs on large problems. SOPHIE alleviates the hardware capacity constraints of Ising machines, offering a scalable and efficient alternative for solving Ising problems. Finally, we present our FHE over the torus (TFHE) accelerator using OPCM. FHE enables secure computation on encrypted data, making it a promising solution for privacy-preserving applications in the cloud. However, its high computation and communication overhead, particularly in the fast Fourier transform (FFT) operations required during bootstrapping, limits its practicality for real-world applications. To tackle these challenges, we propose PHAT, a Photonic Accelerator for TFHE that leverages OPCM. OPCM-based PIM systems offer high computational and communication throughput, making them well-suited for accelerating FFT operations in TFHE. Nonetheless, mapping FFT computations onto OPCM introduces new challenges, such as supporting high-precision analog operations and mitigating the latency and energy costs associated with OPCM programming. To address these issues, PHAT introduces a novel electro-photonic architecture that consists of OPCM-based FFT units, a twiddle-stationary dataflow optimized for OPCM, and a scheduling mechanism to improve FFT unit utilization. PHAT achieves 1.39×-1.77× speedup over the SOTA application-specific integrated circuit (ASIC) accelerator across four programmable bootstrapping configurations, and delivers 2.14×-5.10× speedup on real-world TFHE-based machine learning workloads. These results demonstrate that PHAT significantly improves the practicality and efficiency of TFHE, paving the way for scalable, privacy-preserving computation in cloud environments
Building next-generation deep learning hardware using photonic computing
In recent years, the demand for computational power has skyrocketed due to the rapid advancement of artificial intelligence (AI). As we move past Moore’s Law, the limitations of traditional digital computing are pushing the exploration of alternative computing paradigms. Among the emerging technologies, integrated photonics stands out as a highly promising candidate for the next generation of high-performance AI computing as it offers low latency, high bandwidth, and high parallelism. However, there still exist challenges associated with photonic hardware for AI acceleration including the need for slower and less efficient electronic circuits and memory units, lack of efficient nonlinearity in photonics, limited precision, analog noise, and various device non-idealities. In this thesis, we investigate the opportunities and challenges of photonics technology for accelerating state-of-the-art AI workloads from a realistic perspective, evaluate the performance benefits, and propose solutions to address the associated challenges.
First, we outline our strategy for designing and evaluating ADEPT, a complete electro-photonic accelerator for deep neural network (DNN) inference. ADEPT leverages a photonic computing unit for general matrix-matrix multiplication (GEMM) operations, a vectorized digital electronic application-specific integrated circuit (ASIC) for non-GEMM operations, and static random-access memory (SRAM) arrays for storing DNN parameters and activations. Unlike previous photonic DNN accelerators, we adopt a system-level perspective to provide a more realistic assessment of the photonics technology and its applicability in accelerating state-of-the-art DNNs. We detail our design steps and introduce optimizations to minimize the overhead of electronic devices. Our evaluation shows that ADEPT achieves, on average, 5.73× higher throughput per watt compared to systolic arrays (SAs), and more than 6.8× and 2.5× better throughput per watt compared to state-of-the-art electronic and photonic accelerators, respectively.
Second, we focus on the precision limitations in analog computing and propose using the residue number system (RNS) to compose high-precision operations from multiple low-precision operations. This approach eliminates the need for high-precision data converters and avoids information loss. Our study shows that our technology-agnostic RNS-based approach can achieve ≥ 99% of 32-bit floating-point (FP32) accuracy for state-of-the-art DNN inference with only 6-bit and training with 7-bit fixed-point (FXP) arithmetic. This indicates that using RNS can significantly reduce the energy consumption of analog accelerators while maintaining the same throughput and precision. In addition, we present a fault-tolerant dataflow using redundant RNS (RRNS) to protect computations against noise and errors inherent in analog hardware.
At last, leveraging this RNS-based framework, we propose Mirage, a photonic DNN training accelerator. Mirage employs a novel micro-architecture to support modular arithmetic in the analog domain, achieving high energy efficiency without compromising precision. Our study shows that, on average, Mirage achieves FP32 accuracy with 23.8× lower training time and 32.1× lower energy-delay product (EDP) in an iso-energy scenario, and 42.8× less power consumption with comparable or better EDP in an iso-area scenario, compared to SAs
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
