1,721,012 research outputs found
Optimizing the AI Development Process by Providing the Best Support Environment
The purpose of this study is to investigate the development process for
Artificial inelegance (AI) and machine learning (ML) applications in order to
provide the best support environment. The main stages of ML are problem
understanding, data management, model building, model deployment and
maintenance. This project focuses on investigating the data management stage of
ML development and its obstacles as it is the most important stage of machine
learning development because the accuracy of the end model is relying on the
kind of data fed into the model. The biggest obstacle found on this stage was
the lack of sufficient data for model learning, especially in the fields where
data is confidential. This project aimed to build and develop a framework for
researchers and developers that can help solve the lack of sufficient data
during data management stage. The framework utilizes several data augmentation
techniques that can be used to generate new data from the original dataset
which can improve the overall performance of the ML applications by increasing
the quantity and quality of available data to feed the model with the best
possible data. The framework was built using python language to perform data
augmentation using deep learning advancements
Robust generic Structured Document Classification System / Hamam M.Ibrahim Mokayed
The Structured Document Classification System (SDCS) is an industrialdriven technology that has the ability to classify piles of structured documents collected everyday efficiently in different places. Although the SDCS technology has advanced tremendously, one of the most challenging tasks is to propose a classifier that supports various layouts for different categories and different script languages in a high accuracy and efficient time. To solve the issue of supporting various layouts for different categories and different script languages, a Robust Generic Structured Document Classifier has been proposed (RGSDC). RGSDS starts with finding the best objects that can be used to fit the target and solve the issue. Detailed study for all the previous thresholding techniques is conducted to introduce a new categorization method based on the transformation value of input images. This study is a good base for finding reliable thresholding algorithm. A new thresholding technique based on ordinal structure fuzzy logic (OSFM) is proposed to provide a robust generic image thresholding technique (RGT) that is able to extract clear mixed predefined objects for different languages and multi layouts problems. Two different set of features that distinguish different languages and multi layouts structured documents are proposed
Signature verification system based on multiple classifiers and multi fusion decision approach
With an increase in identity fraud and the emphasis on security, there is growing and urgent need to verify human identify efficiently. Signature and the handwriting verification application are used in many fields such as banking, public sectors. Documents and cheques verification system has triggered a real need for reliable, accurate and robust system. This work adopts different classification techniques between the local features based and the global features based of the signature system in addition to different fusion techniques between the outputs of the different classifiers and global features based to improve error rate of behavioral system. Main goal is to develop more accurate and robust signature verification system than the previous developed system with False Rejection Rate (FRR) equals to 5.3 and False Acceptance Rate (FAR) equals to 0. To achieve this goal, first multiple classification techniques are applied to the signature verification system which are artificial neural network, support vector machine and Pearson correlation and then these techniques are fused by applying two complicated fusion techniques which are fuzzy logic and sequential fuzzy logic and one simple fusion technique which is max voting. Lastly the rule-based decision is applied to specify whether the signature is genuine or not. Second, the improved signature verification system is extended with the high performance Hitachi system. This biometric based system can be realized in many real world and web based applications where there is a need for higher security and robust identification
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier
One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout detection. Keywords: Feature Extraction, Document processing, Document Classification
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
