2,628,827 research outputs found
Also By The Same Author: AKTiveAuthor, a Citation Graph Approach to Name Disambiguation
The desire for definitive data and the semantic web drive for inference over heterogeneous data sources requires co-reference resolution to be performed on those data. In particular, name disambiguation is required to allow accurate publication lists, citation counts and impact measures to be determined. This paper describes a graph-based approach to author disambiguation on large-scale citation networks. Using self-citation, co-authorship and document source analyses, AKTiveAuthor clusters papers, achieving precision of 0.997 and recall of 0.818 over a test group of eight surname clusters
Approximate Personal Name-Matching Through Finite-State Graphs
This article shows how finite-state methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of inaccuracy that affect information retrieval, the quality of information from databases, and the citation statistics used for the evaluation of scientists' work. A number of approximate string matching techniques have been developed to validate variant forms, based on similarity and equivalence relations. We classify the personal name variants as nonvalid and valid forms. In establishing an equivalence relation between valid variants and the standard form of its equivalence class, we defend the application of finite-state transducers. The process of variant identification requires the elaboration of: (a) binary matrices and (b) finite-state graphs. This procedure was tested on samples of author names from bibliographic records, selected from the Library and Information Science Abstracts (LISA) and Science Citation Index Expanded (SCI-E) databases. The evaluation involved calculating the measures of precision and recall, based on completeness and accuracy. The results demonstrate the usefulness of this approach, although it should be complemented with methods based on similarity relations for the recognition of spelling variants and misspellings
That's 'é' not 'þ' '?' or '☐': a user-driven context-aware approach to erroneous metadata in digital libraries
In this paper we present a novel system for user-driven integration of name variants when interacting with web-based information systems. The growth and diversity of online information means that many users experience disambiguation and collocation errors in their information searching. We approach these issues via a client-side JavaScript browser extension that can reorganise web content and also integrate remote data sources. The system is illustrated through three worked examples using existing digital libraries
Towards a Flexible Author Name Disambiguation Framework
summary:In this paper we propose a flexible, modular framework for author name disambiguation. Our solution consists of the core which orchestrates the disambiguation process, and replaceable modules performing concrete tasks. The approach is suitable for distributed computing, in particular it maps well to the MapReduce framework. We describe each component in detail and discuss possible alternatives. Finally, we propose procedures for calibration and evaluation of the described system
Dataset of Author Names and Name Frequencies
This file is a gzipped semicolon separated text file containing block id, frequency of the first name (number of times it appears in the 38M WoC version Q author IDs), frequency of the last name, full name, email, and Author ID. The largest block contains 993 Author IDs. </p
Dataset of Author Names and Name Frequencies
This file is a gzipped semicolon separated text file containing block id, frequency of the first name (number of times it appears in the 38M World of Code version Q author IDs), frequency of the last name, full name, email, and Author ID. The largest block contains 993 Author IDs.
The email address and Author IDs of individual authors have been replaced by their corresponding SHA1 values for privacy reasons
Phonetic Similarity in Brand Name Innovation
When developing a new brand name, similarity of the new brand name to an existing brand name may affect perceptions of the new brand name. However, marketers typically have little guidance on the optimal level of similarity versus originality. Based on linguistic theory, we develop a method to determine this optimal level. In four experiments, we examine the phonetic similarity of a company’s new brand names to the company’s original brand name, implementing a highly controlled methodology based on linguistic rules. When pre-existing attitudes towards a company are positive, an inverted U-shaped pattern is observed in brand name attitudes, such that moderate levels of phonetic similarity are preferred over closer or more distant levels of phonetic similarity. When pre-existing attitudes towards a company are negative, an opposite, U-shaped pattern is observed, such that moderate levels of phonetic similarity are less preferred over closer or more distant levels of phonetic similarity. However, when there are no pre-existing attitudes towards the company, a direct, linear relation between phonetic similarity and attitudes is observed, such that close levels are preferred over moderate levels which, in turn, are preferred over distant levels, consistent with a simple familiarity effect on brand name attitudes.Brand Names, Linguistics, Attitudes
name partitioning for author name disambiguation using supervised machine learning
In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity- based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity- specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/168534/1/asi24459.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/168534/2/asi24459_am.pd
Journal/Author Name Estimator (JANE)
The Journal/Author Name Estimator (JANE) is a free online bibliographic journal selection tool. Interfacing directly with PubMed/MEDLINE, the resource is web-based and allows users to easily input keywords, abstract text, or author names and view related articles based on terms. JANE is recommended for those working in health and biomedical fields
- …
