1,721,118 research outputs found

    Towards Accurate Open-Set Recognition via Background-Class Regularization

    No full text
    In open-set recognition (OSR), classifiers should be able to reject unknown-class samples while maintaining high closed-set classification accuracy. To effectively solve the OSR problem, previous studies attempted to limit latent feature space and reject data located outside the limited space via offline analyses, e.g., distance-based feature analyses, or complicated network architectures. To conduct OSR via a simple inference process (without offline analyses) in standard classifier architectures, we use distance-based classifiers instead of conventional Softmax classifiers. Afterwards, we design a background-class regularization strategy, which uses background-class data as surrogates of unknown-class ones during training phase. Specifically, we formulate a novel regularization loss suitable for distance-based classifiers, which reserves sufficiently large class-wise latent feature spaces for known classes and forces background-class samples to be located far away from the limited spaces. Through our extensive experiments, we show that the proposed method provides robust OSR results, while maintaining high closed-set classification accuracy

    Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention

    No full text
    While large-scale text-to-image diffusion models enable the generation of high-quality, diverse images from text prompts, these prompts struggle to capture intricate details, such as textures, preventing the user intent from being reflected. This limitation has led to efforts to generate images conditioned on user-provided images, referred to as image prompts. Recent work modifies the self-attention mechanism to impose image conditions in generated images by replacing or concatenating the keys and values from the image prompt. This enables the self-attention layer to work like a cross-attention layer, generally used to incorporate text prompts. In this paper, we identify two common issues in existing methods of modifying self-attention that hinder diffusion models from reflecting the image prompt. By addressing these issues, we propose a novel method that generates images that properly reflect the details of image prompts. First, existing approaches often neglect the importance of image prompts in classifier-free guidance, which directs the model towards the intended conditions and away from those undesirable. Specifically, current methods use image prompts as both desired and undesired conditions, causing conflicting signals. To resolve this, we propose conflict-free guidance by using image prompts only as desired conditions, ensuring that the generated image faithfully reflects the image prompt. In addition, we observe that the two most common self-attention modifications involve a trade-off between the realism of the generated image and alignment with the image prompt, achieved by selectively using keys and values from both images. Specifically, selecting more keys and values from the image prompt improves alignment, while selecting more from the generated image enhances realism. To balance both, we propose an alternative self-attention modification method, Stratified Attention, which jointly uses keys and values from both images rather than selecting between them. Through extensive experiments across three distinct image generation tasks, we demonstrate that the proposed method outperforms existing image-prompting models in faithfully reflecting the image prompt

    OPT-OUT: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport

    No full text
    Instruction-following large language models (LLMs), such as ChatGPT, have become widely popular among everyday users. However, these models inadvertently disclose private, sensitive information to their users, underscoring the need for machine unlearning techniques to remove selective information from the models. While prior work has focused on forgetting small, random subsets of training data at the instance-level, we argue that real-world scenarios often require the removal of an entire user data, which may require a more careful maneuver. In this study, we explore entity-level unlearning, which aims to erase all knowledge related to a target entity while preserving the remaining model capabilities. To address this, we introduce OPT-OUT, an optimal transport-based unlearning method that utilizes the Wasserstein distance from the model's initial parameters to achieve more effective and fine-grained unlearning. We also present the first Entity-Level Unlearning Dataset (ELUDe) designed to evaluate entity-level unlearning. Our empirical results demonstrate that OPT-OUT surpasses existing methods, establishing a new standard for secure and adaptable LLMs that can accommodate user data removal requests without the need for full retraining

    Novel Natural Language Summarization of Program Code via Leveraging Multiple Input Representations

    No full text
    The lack of description of a given program code acts as a big hurdle to those developers new to the code base for its understanding. To tackle this problem, previous work on code summarization, the task of automatically generating code description given a piece of code reported that an auxiliary learning model trained to produce API (Application Programming Interface) embeddings showed promising results when applied to a downstream, code summarization model. However, different codes having different summaries can have the same set of API sequences. If we train a model to generate summaries given an API sequence, the model will not be able to learn effectively. Nevertheless, we note that the API sequence can still be useful and has not been actively utilized. This work proposes a novel multi-task approach that simultaneously trains two similar tasks: 1) summarizing a given code (code to summary), and 2) summarizing a given API sequence (API sequence to summary). We propose a novel code-level encoder based on BERT capable of expressing the semantics of code, and obtain representations for every line of code. Our work is the first code summarization work that utilizes a natural language-based contextual pretrained language model in its encoder. We evaluate our approach using two common datasets (Java and Python) that have been widely used in previous studies. Our experimental results show that our multi-task approach improves over the baselines and achieves the new stateof-the-art

    Whose Opinion Matters? Analyzing Relationships between Bitcoin Prices and User Groups in Online Community

    No full text
    Public interest in cryptocurrencies has consistently risen over the past decade. Owing to this rapid growth, cryptocurrency-related information is being increasingly shared online. As considerable portions of such information in online communities are noise, extracting meaningful information is important. Therefore, judging whose opinion should be considered more important or who the opinion leaders in online communities are is critical. This study analyzed the topics that contain meaningful information, in particular, user groups, by investigating the correlation between topic weights and their price change. The proposed analysis method involves (1) effective classification of the user groups using a hypertext-induced topic selection algorithm, (2) textual information analysis through topic modeling, and (3) the identification of user groups that have a high interest in the Bitcoin price by measuring the correlation between the price and the topics and by measuring the topic similarities between each user group and all users to determine the user group that can effectively represent the entire community. By analyzing the information shared by users, we observed that most users are interested in the price information, whereas users having social influence are not only interested in the price but also in other information.

    Augmenting Imbalanced Time-series Data via Adversarial Perturbation in Latent Space

    No full text
    Success of training deep learning models largely depends on the amount and quality of training data. Although numerous data augmentation techniques have already been pro- posed for certain domains such as computer vision where simple schemes such as rotation and flipping have been shown to be effective, other domains such as time-series data have a relatively smaller set of augmentation techniques readily available. Besides, data imbalance is a phenomenon that is often observed in real-world data. However, a simple oversampling may make a model vulnerable to overfitting, so a proper data augmentation is desired. To tackle these problems, we propose a data augmentation method that utilizes latent vectors of an autoencoder in a novel way. When input data is perturbed in its latent space, the reconstructed input data retains similar properties to the original one. On the other hand, adversarial augmentation is a technique to train robust deep neural networks against un- foreseen data shifts or corruptions by providing a downstream model with difficult samples to predict. Our method adversarily perturbs input data in its latent space so that the aug- mented data is diverse and conducive to reducing test error of a downstream model. The experimental results demonstrate that our method achieves a right balance in significantly modifying the input data to help generalization while keeping the realism of it

    Data analysis and processing for spatio-temporal forecasting

    No full text
    Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data

    High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

    No full text
    Image-based virtual try-on aims to synthesize an image of a person wearing a given clothing item. To solve the task, the existing methods warp the clothing item to fit the person's body and generate the segmentation map of the person wearing the item before fusing the item with the person. However, when the warping and the segmentation generation stages operate individually without information exchange, the misalignment between the warped clothes and the segmentation map occurs, which leads to the artifacts in the final image. The information disconnection also causes excessive warping near the clothing regions occluded by the body parts, so-called pixel-squeezing artifacts. To settle the issues, we propose a novel try-on condition generator as a unified module of the two stages (i.e., warping and segmentation generation stages). A newly proposed feature fusion block in the condition generator implements the information exchange, and the condition generator does not create any misalignment or pixel-squeezing artifacts. We also introduce discriminator rejection that filters out the incorrect segmentation map predictions and assures the performance of virtual try-on frameworks. Experiments on a high-resolution dataset demonstrate that our model successfully handles the misalignment and occlusion, and significantly outperforms the baselines. Code is available at https://github. com/sangyun884/HR- VITON
    corecore