1,721,033 research outputs found

    Towards Reliable and Practical Phishing Detection

    No full text
    As the prevalence of phishing attacks continues to rise, there is an increasing demand for more robust detection technologies. With recent advances in AI, we discuss how to construct a reliable and practical phishing detection system using language models. For this system, we introduce the first large-scale Korean dataset for phishing detection, encompassing six types of phishing attacks. We consider multiple factors for building a real-time detection system for edge devices, such as model size, Speech-To-Text quality, split length, training technique and multi-task learning. We evaluate the model’s ability twofold: in-domain, and unseen attack detection performance which is referred to as zero-day performance. Additionally, we demonstrate the importance of accurate comparison groups and evaluation datasets, showing that voice phishing detection performs reasonably well while smishing detection remains challenging. Both the dataset and the trained model will be available upon request

    Gradient Ascent Post-training Enhances Language Model Generalization

    No full text
    In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tunin
    corecore