1,721,001 research outputs found
On Convergence of Lookahead in Smooth Games
A key challenge in smooth games is that there is no general guarantee for gradient methods to converge to an equilibrium. Recently, Chavdarova et al. (2021) reported a promising empirical observation that Lookahead (Zhang et al., 2019) significantly improves GAN training. While promising, few theoretical guarantees has been studied for Lookahead in smooth games. In this work, we establish the first convergence guarantees of Lookahead for smooth games. We present a spectral analysis and provide a geometric explanation of how and when it actually improves the convergence around a stationary point. Based on the analysis, we derive sufficient conditions for Lookahead to stabilize or accelerate the local convergence in smooth games. Our study reveals that Lookahead provides a general mechanism for stabilization and acceleration in smooth games.N
Limit properties of continuous self-exciting processes
We introduce a self-exciting continuous process based on Brownian motion, and derive its limit properties. We find conditions when the limit behaviors of the given process and its associated Hawkes process agree. The Kolmogorov-Smirnov test was applied to check the statistical similarity of the two processes.
Retrieval of sentence sequences for an image stream via coherence recurrent convolutional networks
We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. For retrieving a coherent flow of multiple sentences for a photo stream, we propose a multimodal neural architecture called coherence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22 K unique blog posts with 170 K associated images for the travel topics of NYC, Disneyland, Australia, and Hawaii. We demonstrate that our approach outperforms other state-of-the-art image captioning methods for text sequence generation, using both quantitative measures and user studies via Amazon Mechanical Turk.OAIID:RECH_ACHV_DSTSH_NO:T201713372RECH_ACHV_FG:RR00200001ADJUST_YN:EMP_ID:A079841CITE_RATE:8.329DEPT_NM:컴퓨터공학부EMAIL:[email protected]_YN:YN
StyleMix: Separating Content and Style for Enhanced Data Augmentation
© 2021 IEEEIn spite of the great success of deep neural networks for many challenging classification tasks, the learned networks are vulnerable to overfitting and adversarial attacks. Recently, mixup based augmentation methods have been actively studied as one practical remedy for these drawbacks. However, these approaches do not distinguish between the content and style features of the image, but mix or cut-and-paste the images. We propose StyleMix and StyleCutMix as the first mixup method that separately manipulates the content and style information of input image pairs. By carefully mixing up the content and style of images, we can create more abundant and robust samples, which eventually enhance the generalization of model training. We also develop an automatic scheme to decide the degree of style mixing according to the pair's class distance, to prevent messy mixed images from too differently styled pairs. Our experiments on CIFAR-10, CIFAR-100 and ImageNet datasets show that StyleMix achieves better or comparable performance to state of the art mixup methods and learns more robust classifiers to adversarial attacks.N
Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations
© 2019 IEEE.In spite of recent enormous success of deep convolutional networks in object detection, they require a large amount of bounding box annotations, which are often time-consuming and error-prone to obtain. To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection. They create their own labels by recycling the bounding box labels (i.e. annotations of the main task) in an SSL manner, and are jointly trained with the object detection model in an MTL way. Our approach is integrable with any region proposal based detection models. We empirically validate that our approach effectively improves detection performance on various architectures and datasets. We test two state-of-the-art region proposal object detectors, including Faster R-CNN and R-FCN, with three CNN backbones of ResNet-101, Inception-ResNet-v2, and MobileNet on two benchmark datasets of PASCAL VOC and COCO.N
Video Question Answering with Spatio-Temporal Reasoning
Vision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural language form. Despite the rapid progress in the past few years, most existing work in VQA have focused primarily on images. In this paper, we focus on extending VQA to the video domain and contribute to the literature in three important ways. First, we propose three new tasks designed specifically for video VQA, which require spatio-temporal reasoning from videos to answer questions correctly. Next, we introduce a new large-scale dataset for video VQA named TGIF-QA that extends existing VQA work with our new tasks. Finally, we propose a dual-LSTM based approach with both spatial and temporal attention and show its effectiveness over conventional VQA techniques through empirical evaluations.OAIID:RECH_ACHV_DSTSH_NO:T201917027RECH_ACHV_FG:RR00200001ADJUST_YN:EMP_ID:A079841CITE_RATE:6.071DEPT_NM:컴퓨터공학부EMAIL:[email protected]_YN:YN
POL360: A Universal Mobile VR Motion Controller using Polarized Light
© 2019 Association for Computing Machinery.We introduce POL360: the first universal VR motion controller that leverages the principle of light polarization. POL360 enables a user who holds it and wears a VR headset to see their hand motion in a virtual world via its accurate 6-DOF position tracking. Compared to other techniques for VR positioning, POL360 has several advantages as follows. (1) Mobile compatibility: Neither additional computing resource like a PC/console nor any complicated pre-installation is required in the environment. Only necessary device is a VR headset with an IR LED module as a light source to which a thin-film linear polarizer is attached. (2) On-device computing: Our POL360's computation for positioning is completed on the microprocessor in the device. Thus, it does not require additional computing resource of a VR headset. (3) Competitive accuracy and update rate: In spite of POL360's superior mobile compatibility and affordability, POL360 attains competitive performance of accuracy and fast update rates. That is, it achieves the subcentimeter accuracy of positioning and the tracking rate higher than 60 Hz. In this paper, we derive the mathematical formulation of 6-DOF positioning using light polarization for the first time and implement a POL360 prototype that can directly operate with any commercial VR headset systems. In order to demonstrate POL360's performance and usability, we carry out thorough quantitative evaluation and a user study and develop three game demos as use cases.N
Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection
© 2019 IEEE.In spite of recent success of proposal-based CNN models for object detection, it is still difficult to detect small objects due to the limited and distorted information that small region of interests (RoI) contain. One way to alleviate this issue is to enhance the features of small RoIs using a super-resolution (SR) technique. We investigate how to improve feature-level super-resolution especially for small object detection, and discover its performance can be significantly improved by (i) utilizing proper high-resolution target features as supervision signals for training of a SR model and (ii) matching the relative receptive fields of training pairs of input low-resolution features and target high-resolution features. We propose a novel feature-level super-resolution approach that not only correctly addresses these two desiderata but also is integrable with any proposal-based detectors with feature pooling. In our experiments, our approach significantly improves the performance of Faster R-CNN on three benchmarks of Tsinghua-Tencent 100K, PASCAL VOC and MS COCO. The improvement for small objects is remarkably large, and encouragingly, those for medium and large objects are nontrivial too. As a result, we achieve new state-of-the-art performance on Tsinghua-Tencent 100K and highly competitive results on both PASCAL VOC and MS COCO.N
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
© 2021 IEEEThe natural association between visual observations and their corresponding sound provides powerful self-supervisory signals for learning video representations, which makes the ever-growing amount of online videos an attractive source of training data. However, large portions of online videos contain irrelevant audio-visual signals because of edited/overdubbed audio, and models trained on such uncurated videos have shown to learn suboptimal representations. Therefore, existing self-supervised approaches rely on datasets with predetermined taxonomies of semantic concepts, where there is a high chance of audiovisual correspondence. Unfortunately, constructing such datasets require labor intensive manual annotation and/or verification, which severely limits the utility of online videos for large-scale learning. In this work, we present an automatic dataset curation approach based on subset optimization where the objective is to maximize the mutual information between audio and visual channels in videos. We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data achieve competitive performances compared to models trained on existing manually curated datasets. The most significant benefit of our approach is scalability: We release ACAV100M that contains 100 million videos with high audio-visual correspondence, ideal for self-supervised video representation learning.N
Strategies Employing Transition Metal Complexes To Modulate Amyloid-beta Aggregation
Aggregation of amyloid-beta (A beta) peptides is implicated in the development of Alzheimer's disease (AD), the most common type of dementia. Thus, numerous efforts to identify chemical tactics to control the aggregation pathways of A beta peptides have been made. Among them, transition metal complexes as a class of chemical modulators against A beta aggregation have been designed and utilized. Transition metal complexes are able to carry out a variety of chemistry with A beta peptides (e.g., coordination chemistry and oxidative and proteolytic reactions for peptide modifications) based on their tunable characteristics, including the oxidation state of and coordination geometry around the metal center. This Viewpoint illustrates three strategies employing transition metal complexes toward modulation of A beta aggregation pathways (i.e., oxidation and hydrolysis of A beta as well as coordination to A beta), along with some examples of such transition metal complexes. In addition, proposed mechanisms for three reactivities of transition metal complexes with A beta peptides are discussed. Our greater understanding of how transition metal complexes have been engineered and used for alteration of A beta aggregation could provide insight into the new discovery of chemical reagents against A beta peptides found in AD
- …
