Eszterházy Károly College

EKE Repository of Publications
Not a member yet
    7405 research outputs found

    „Róka maga farkát szokta csak dicsérni”

    Full text link

    Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language

    Full text link
    The main task of our research is to train various abstractive summarization models for the Arabic language. The work for abstractive Arabic text summarization has hardly begun so far due to the unavailability of the datasets needed for that. In our previous research, we created the first monolingual corpus in the Arabic language for abstractive text summarization. Based on this corpus, we fine-tuned various transformer models. We tested the PreSumm and multilingual BART models. We achieved a “state of the art” result in this area with the PreSumm method. The present study continues the same series of research. We extended our corpus “AraSum” and managed to reach up to 50 thousand items, each consisting of an article and its corresponding lead. In addition, we pretrained our own monolingual and trilingual BART models for the Arabic language and fine-tuned them in addition to the mT5 model for abstractive text summarization for the same language, using the AraSum corpus. While there is room for improvement in the resources and the infrastructure we possess, the results clearly demonstrate that most of our models surpassed the XL-Sum which is considered to be state of the art for abstractive Arabic text summarization so far. Our corpus “AraSum” will be released to facilitate future work on abstractive Arabic text summarization

    Benchmarking Redis and HBase NoSQL Databases using Yahoo Cloud Service Benchmarking tool

    Full text link
    The Not Structured Query Language (NoSQL) databases have become more relevant to applications developers as the need for scalable and flexible data storage for online applications has increased. Each NoSQL database system provides features that fit particular types of applications. Thus, the developer must carefully select according to the application’s needs. Redis is a key-value NoSQL database that provides fast data access. On the other hand, the Apache HBase database is a column-oriented database that offers scalability and fast data access, is a promising alternative to Redis in some types of applications. In this research paper, the goal is to use the Yahoo Cloud Serving Benchmark (YCSB) to compare the performance of two databases (Redis and HBase). The YCSB platform has been developed to determine the throughput of both databases against different workloads. This paper evaluates these NoSQL databases with six workloads and varying threads

    Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language

    Full text link
    The main task of our research is to train various abstractive summarization models for the Arabic language. The work for abstractive Arabic text summarization has hardly begun so far due to the unavailability of the datasets needed for that. In our previous research, we created the first monolingual corpus in the Arabic language for abstractive text summarization. Based on this corpus, we fine-tuned various transformer models. We tested the PreSumm and multilingual BART models. We achieved a “state of the art” result in this area with the PreSumm method. The present study continues the same series of research. We extended our corpus “AraSum” and managed to reach up to 50 thousand items, each consisting of an article and its corresponding lead. In addition, we pretrained our own monolingual and trilingual BART models for the Arabic language and fine-tuned them in addition to the mT5 model for abstractive text summarization for the same language, using the AraSum corpus. While there is room for improvement in the resources and the infrastructure we possess, the results clearly demonstrate that most of our models surpassed the XL-Sum which is considered to be state of the art for abstractive Arabic text summarization so far. Our corpus “AraSum” will be released to facilitate future work on abstractive Arabic text summarization

    7,262

    full texts

    7,405

    metadata records
    Updated in last 30 days.
    EKE Repository of Publications
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇