1,720,963 research outputs found

    AIPO: Automatic Instruction Prompt Optimization by model itself with "Gradient Ascent

    No full text
    Large language models (LLMs) can perform a variety of tasks such as summarization, translation, and question answering by generating answers with user input prompt. The text that is used as input to the model, including instruction, is called input prompt. There are two types of input prompt: zero-shot prompting provides a question with no examples, on the other hand, few-shot prompting provides a question with multiple examples. The way the input prompt is set can have a big impact on the accuracy of the model generation. The relevant research is called prompt engineering. Prompt engineering, especially prompt optimization is used to find the optimal prompts optimized for each model and task. Manually written prompts could be optimal prompts, but it is time-consuming and expensive. Therefore, research is being conducted on automatically generating prompts that are as effective as human-crafted ones for each task. We propose Automatic Instruction Prompt Optimization (AIPO), which allows the model to generate an initial prompt directly through instruction induction when given a task in a zero-shot setting and then improve the initial prompt to optimal prompt for model based on the "gradient ascent" algorithm. With the final prompt generated by AIPO, we achieve more accurate generation than manual prompt on benchmark datasets regardless of the output format.

    Adaptive class token knowledge distillation for efficient vision transformer

    No full text
    The Vision Transformer (ViT) outperforms Convolutional Neural Networks (CNNs) but at the cost of significantly higher computational demands. Knowledge Distillation (KD) has shown promise in compressing complex networks by transferring knowledge from a large pre-trained model to a smaller one. However, current KD methods for ViT often rely on CNNs as teachers or neglect the importance of class token ([CLS]) information, resulting in ineffective distillation of ViT's unique knowledge. In this paper, we propose Adaptive Class token Knowledge Distillation ([CLS]-KD), which fully exploits information from the class token and patches in ViT. For class embedding (CLS) distillation, the intermediate CLS of the student model is aligned with the corresponding CLS of the teacher model through a projector. Furthermore, we introduce CLS-patch attention map distillation, where an attention map between the CLS and patch embeddings is generated and matched at each layer. This empowers the student model to learn how to dynamically extract patch embedding information into the CLS under teacher guidance. Finally, we propose Adaptive Layer-wise Distillation (ALD) to mitigate the imbalance in distillation effects varying with the depth of layers. This method assigns greater weight to the losses in layers where the training discrepancies between the teacher and student models are larger during distillation. Through these strategies, [CLS]-KD consistently surpasses existing state-of-the-art methods on the ImageNet-1K dataset across various teacher-student configurations. Furthermore, the proposed method demonstrates its generalization capability through transfer learning experiments on the CIFAR-10, CIFAR-100, and CALTECH-256 datasets.

    Difficulty level-based knowledge distillation

    No full text
    Knowledge distillation (KD) enables a simple model (student model) to perform as a complex model (teacher model) by distilling the knowledge from a pre-trained teacher model. Existing soft-label distillation methods often use a fixed temperature value in the softmax function to prevent overconfidence in the distillation process. However, this approach can lead to the suppression of important 'dark knowledge' for non-target classes in difficult samples, while also over-smoothing the confidence values for easier samples. To address this issue, we propose a novel approach called difficulty level-based knowledge distillation (DLKD), which considers the difficulty level of each sample to distill refined knowledge with high or low confidence, depending on the sample's complexity. Our method calculates the difficulty level based on the Euclidean distance between the teacher model's predictions and the pruned teacher model's predictions. Experimental results demonstrate that our DLKD method outperforms state-of-the-art methods on challenging samples, including those with noisy labels or augmented data, achieving superior results on CIFAR-100, FGVR, and ImageNet datasets for image classification.

    Teach sample-specific knowledge: Separated distillation based on samples

    No full text
    Recent advancements in deep neural networks have revolutionized computer vision, enabling practical applications like classification and object detection. However, deploying these models on resource-constrained devices remains a critical challenge due to their high computational demands. Knowledge Distillation (KD) has emerged as an effective technique to address this issue by transferring knowledge from complex teacher models to lightweight student models, enhancing efficiency while maintaining high performance. Traditional logit-based KD methods use forward Kullback-Leibler divergence (FKLD) to transfer meaningful knowledge. However, FKLD typically exhibits a mode-averaging property, causing students to focus on non-target information, whether the teacher's samples are correct or incorrect. Additionally, when handling uncertain samples, even teacher models may fail to classify them accurately, leading to incorrect predictions and confusing the students. To address these issues, we classify the dataset into two groups based on the teacher's predictions: correct and incorrect samples. To ensure a more reliable transfer of knowledge from teacher to student for correct samples, we employ both forward Kullback-Leibler divergence (FKLD) and reverse Kullback-Leibler divergence (RKLD), which has mode-focusing properties. We also reduce temperature scaling for RKLD to enhance the focus on target information, ensuring that the student model prioritizes meaningful knowledge while minimizing the influence of non-target information. Conversely, for incorrect predictions, our method minimizes the teacher's knowledge, encouraging students to rely more on the true labels by focusing on cross-entropy loss. Experimental results on both classification and object detection tasks demonstrate that our method, Teach Sample-Specific Knowledge (TSSK), outperforms state-of-the-art KD methods, making it ideal for deployment on-devices in real-world scenarios.

    MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

    No full text
    Visual-semantic embedding enables various tasks such as image-text retrieval, image captioning, and visual question answering. The key to successful visual-semantic embedding is to express visual and textual data properly by accounting for their intricate relationship. While previous studies have achieved much advance by encoding the visual and textual data into a joint space where similar concepts are closely located, they often represent data by a single vector ignoring the presence of multiple important components in an image or text. Thus, in addition to the joint embedding space, we propose a novel multi-head self-attention network to capture various components of visual and textual data by attending to important parts in data. Our approach achieves the new state-of-the-art results in imagetext retrieval tasks on MS-COCO and Flicker30K datasets. Through the visualization of the attention maps that capture distinct semantic components at multiple positions in the image and the text, we demonstrate that our method achieves an effective and interpretable visual-semantic joint space

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    실제 상황에서의 모호한 샘플을 사용한 준 지도 학습 이상 차량 탐지

    No full text
    학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[v, 24 p. :]Autonomous vehicle accidents are occurring even up to a recent date due to abnormal behavior of nearby traffic-agents. Abnormal agents are identified by observing the ambiguous behavior of the agents before the actual abnormality. Therefore, to effectively prevent accidents, models should be trained with soft labels of ambiguous situations. However, existing anomaly datasets only contain hard labels, which can not imply the ambiguity of the actual scenarios. To fully utilize the small number of hard labeled ambiguous data, we propose a simple and effective semi-supervised approach, namely STADAS. Our STADAS exploits two regulatory signals from unlabeled data. One signal is the ambiguity soft label indicating the ambiguity of the unlabeled samples which is computed from the teacher model. With our novel Ambiguity loss, our model can properly understand and handle ambiguous situations. The other is the pseudo distance label which connotes information about the dynamics of a vehicle. We show that our method has impressive anomaly detection ability quantitatively and qualitatively. Quantitatively, ours outperforms other semi-supervised methods (e.g., pseudo label, mean-teacher) regarding ROC-AUC and F1 score. Moreover, ours has fewer false negatives than the supervised model. Qualitatively, ours can detect ambiguous situations that other methods could not properly detect.한국과학기술원 :전기및전자공학부
    corecore