1,721,075 research outputs found
SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Given a pre-trained BERT, how can we compress it to a fast and lightweight one while maintaining its accuracy? Pre-training language model, such as BERT, is effective for improving the performance of natural language processing (NLP) tasks. However, heavy models like BERT have problems of large memory cost and long inference time. In this paper, we propose SENSIMIX (Sensitivity-Aware Mixed Precision Quantization), a novel quantizationbased BERT compression method that considers the sensitivity of different modules of BERT. SENSIMIX effectively applies 8-bit index quantization and 1-bit value quantization to the sensitive and insensitive parts of BERT, maximizing the compression rate while minimizing the accuracy drop. We also propose three novel 1-bit training methods to minimize the accuracy drop: Absolute Binary Weight Regularization, Prioritized Training, and Inverse Layer-wise Fine-tuning. Moreover, for fast inference, we apply FP16 general matrix multiplication (GEMM) and XNOR-Count GEMM for 8-bit and 1-bit quantization parts of the model, respectively. Experiments on four GLUE downstream tasks show that SENSIMIX compresses the original BERT model to an equally effective but lightweight one, reducing the model size by a factor of 8× and shrinking the inference time by around 80% without noticeable accuracy drop.N
DAO-CP: Data-Adaptive Online CP decomposition for tensor stream
© 2022 Son et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.How can we accurately and efficiently decompose a tensor stream? Tensor decomposition is a crucial task in a wide range of applications and plays a significant role in latent feature extraction and estimation of unobserved entries of data. The problem of efficiently decomposing tensor streams has been of great interest because many real-world data dynamically change over time. However, existing methods for dynamic tensor decomposition sacrifice the accuracy too much, which limits their usages in practice. Moreover, the accuracy loss becomes even more serious when the tensor stream has an inconsistent temporal pattern since the current methods cannot adapt quickly to a sudden change in data. In this paper, we propose DAO-CP, an accurate and efficient online CP decomposition method which adapts to data changes. DAO-CP tracks local error norms of the tensor streams, detecting a change point of the error norms. It then chooses the best strategy depending on the degree of changes to balance the trade-off between speed and accuracy. Specifically, DAO-CP decides whether to (1) reuse the previous factor matrices for the fast running time or (2) discard them and restart the decomposition to increase the accuracy. Experimental results show that DAO-CP achieves the state-of-the-art accuracy without noticeable loss of speed compared to existing methods.N
Fast and Memory-Efficient Tucker Decomposition for Answering Diverse Time Range Queries
© 2021 ACM.Given a temporal dense tensor and an arbitrary time range, how can we efficiently obtain latent factors in the range? Tucker decomposition is a fundamental tool for analyzing dense tensors to discover hidden factors, and has been exploited in many data mining applications. However, existing decomposition methods do not provide the functionality to analyze a specific range of a temporal tensor. The existing methods are one-off, with the main focus on performing Tucker decomposition once for a whole input tensor. Although a few existing methods with a preprocessing phase can deal with a time range query, they are still time-consuming and suffer from low accuracy. In this paper, we propose Zoom-Tucker, a fast and memory-efficient Tucker decomposition method for finding hidden factors of temporal tensor data in an arbitrary time range. Zoom-Tucker fully exploits block structure to compress a given tensor, supporting an efficient query and capturing local information. Zoom-Tucker answers diverse time range queries quickly and memory-efficiently, by elaborately decoupling the preprocessed results included in the range and carefully determining the order of computations. We demonstrate that Zoom-Tucker is up to 171.9x faster and requires up to 230x less space than existing methods while providing comparable accuracy.N
Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion
© 2021 ACM.Static knowledge graphs (KGs), despite their wide usage in relational reasoning and downstream tasks, fall short of realistic modeling of knowledge and facts that are only temporarily valid. Compared to static knowledge graphs, temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing models for TKG completion extend static KG embeddings that do not fully exploit TKG structure, thus lacking in 1) accounting for temporally relevant events already residing in the local neighborhood of a query, and 2) path-based inference that facilitates multi-hop reasoning and better interpretability. In this paper, we propose T-GAP, a novel model for TKG completion that maximally utilizes both temporal information and graph structure in its encoder and decoder. T-GAP encodes query-specific substructure of TKG by focusing on the temporal displacement between each event and the query timestamp, and performs path-based inference by propagating attention through the graph. Our empirical experiments demonstrate that T-GAP not only achieves superior performance against state-of-the-art baselines, but also competently generalizes to queries with unseen timestamps. Through extensive qualitative analyses, we also show that T-GAP enjoys transparent interpretability, and follows human intuition in its reasoning process.N
Accurate News Recommendation Coalescing Personal and Global Temporal Preferences
© Springer Nature Switzerland AG 2020.Given session-based news watch history of users, how can we precisely recommend news articles? Unlike other items for recommendation, the worth of news articles decays quickly and various news sources publish fresh ones every second. Moreover, people frequently select news articles regardless of their personal preferences to understand popular topics at a specific time. Conventional recommendation methods, designed for other recommendation domains, give low performance because of these peculiarities of news articles. In this paper, we propose PGT (News Recommendation Coalescing Personal and Global Temporal Preferences), an accurate news recommendation method designed with consideration of the above characteristics of news articles. PGT extracts latent features from both personal and global temporal preferences to sufficiently reflect users behaviors. Furthermore, we propose an attention based architecture to extract adequate coalesced features from both of the preferences. Experimental results show that PGT provides the most accurate news recommendation, giving the state-of-the-art accuracy.N
Accurate Online Tensor Factorization for Temporal Tensor Streams with Missing Values
© 2021 ACM.Given a time-evolving tensor stream with missing values, how can we accurately discover latent factors in an online manner to predict missing values? Online tensor factorization is a crucial task with many important applications including the analysis of climate, network traffic, and epidemic disease. However, existing online methods have disregarded temporal locality and thus have limited accuracy. In this paper, we propose STF (Streaming Tensor Factorization), an accurate online tensor factorization method for real-world temporal tensor streams with missing values. We exploit an attention-based temporal regularization to learn inherent temporal patterns of the streams. We also propose an efficient online learning algorithm which allows each row of the temporal factor matrix to be updated from past and future information. Extensive experiments show that the proposed method gives the state-of-the-art accuracy, and quickly processes each tensor slice.N
Model-Agnostic Augmentation for Accurate Graph Classification
© 2022 ACM.Given a graph dataset, how can we augment it for accurate graph classification? Graph augmentation is an essential strategy to improve the performance of graph-based tasks, and has been widely utilized for analyzing web and social graphs. However, previous works for graph augmentation either a) involve the target model in the process of augmentation, losing the generalizability to other tasks, or b) rely on simple heuristics that lead to unreliable results. In this work, we introduce five desired properties for effective augmentation. Then, we propose NodeSam (Node Split and Merge) and SubMix (Subgraph Mix), two model-agnostic algorithms for graph augmentation that satisfy all desired properties with different motivations. NodeSam makes a balanced change of the graph structure to minimize the risk of semantic change, while SubMix mixes random subgraphs of multiple graphs to create rich soft labels combining the evidence for different classes. Our experiments on social networks and molecular graphs show that NodeSam and SubMix outperform existing approaches in graph classification.N
- …
