1,721,012 research outputs found

    Optimizing the AI Development Process by Providing the Best Support Environment

    Full text link
    The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements

    Robust generic Structured Document Classification System / Hamam M.Ibrahim Mokayed

    Full text link
    The Structured Document Classification System (SDCS) is an industrialdriven technology that has the ability to classify piles of structured documents collected everyday efficiently in different places. Although the SDCS technology has advanced tremendously, one of the most challenging tasks is to propose a classifier that supports various layouts for different categories and different script languages in a high accuracy and efficient time. To solve the issue of supporting various layouts for different categories and different script languages, a Robust Generic Structured Document Classifier has been proposed (RGSDC). RGSDS starts with finding the best objects that can be used to fit the target and solve the issue. Detailed study for all the previous thresholding techniques is conducted to introduce a new categorization method based on the transformation value of input images. This study is a good base for finding reliable thresholding algorithm. A new thresholding technique based on ordinal structure fuzzy logic (OSFM) is proposed to provide a robust generic image thresholding technique (RGT) that is able to extract clear mixed predefined objects for different languages and multi layouts problems. Two different set of features that distinguish different languages and multi layouts structured documents are proposed

    Signature verification system based on multiple classifiers and multi fusion decision approach

    Full text link
    With an increase in identity fraud and the emphasis on security, there is growing and urgent need to verify human identify efficiently. Signature and the handwriting verification application are used in many fields such as banking, public sectors. Documents and cheques verification system has triggered a real need for reliable, accurate and robust system. This work adopts different classification techniques between the local features based and the global features based of the signature system in addition to different fusion techniques between the outputs of the different classifiers and global features based to improve error rate of behavioral system. Main goal is to develop more accurate and robust signature verification system than the previous developed system with False Rejection Rate (FRR) equals to 5.3 and False Acceptance Rate (FAR) equals to 0. To achieve this goal, first multiple classification techniques are applied to the signature verification system which are artificial neural network, support vector machine and Pearson correlation and then these techniques are fused by applying two complicated fusion techniques which are fuzzy logic and sequential fuzzy logic and one simple fusion technique which is max voting. Lastly the rule-based decision is applied to specify whether the signature is genuine or not. Second, the improved signature verification system is extended with the high performance Hitachi system. This biometric based system can be realized in many real world and web based applications where there is a need for higher security and robust identification

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier

    Full text link
    One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout  detection. Keywords: Feature Extraction, Document processing, Document Classification

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore