96 research outputs found
Tagging of Biomedical Articles on CiteULike: A Comparison of User, Author and Professional Indexing
This paper examines the context of online indexing from the viewpoint of three different groups: users, authors, and professional indexers. User tags, author keywords and descriptors were collected from academic journal articles, which were both indexed in Pubmed and tagged on CiteULike, and analysed. Descriptive statistics, informetric measures, and thesaural term comparison shows that there are important differences in the use of keywords between the three groups in addition to similarities which can be used to enhance support for search and browse. While tags and author keywords were found that matched descriptors exactly, other terms which did not match but provided important expansion to the indexing lexicon were found. These additional terms could be used to enhance support for searching and browsing in article databases as well as to provide invaluable data for entry vocabulary and emergent terminology for regular updates to indexing systems. Additionally, the study suggests that tags support organisation by association to task, projects and subject while making important connections to traditional systems which classify into subject categories
CFGT:A Lexicon-based Chinese Address Element Parsing Model
As a key step in the geocoding process,address element parsing directly affects the accuracy of geocoding.Due to the diversity and complexity of Chinese address expressions,two similar address texts may be completely different in geographical representation.Traditional address element parsing based on dictionary matching cannot handle ambiguous words well,thus showing poor recognition accuracy.A lexicon-based Chinese address element parsing model CFGT:collaborative flat-graph transformer is proposed,which uses self-matched words,nearest contextual and other lexical information to enhance the character sequence representation of address text,effectively curbing the ambiguity of address text expression.Specifically,the model first constructs two collaboration graphs,flat-lattice and flat-shift,to capture the knowledge of self-matched words and nearest contextual words for address characters,and designs a fusion layer to implement collaboration between graphs.Secondly,with the help of the improved relative position encoding,the enhancing effect of word information on the address text character sequence is further strengthened.Finally,Transformer and conditional random fields are used to analyze address elements.Experiments are conducted on multiple public datasets such as Weibo and Resume,as well as the private dataset Address.Experimental results show that the performance of the CFGT is superior to previous Chinese address element parsing models and existing models in the field of Chinese named entity recognition
Hobson-Jobson:The East India Company lexicon
Henry Yule and A.C. Burnell’s Hobson-Jobson: A Glossary of Colloquial Anglo-Indian Words and Phrases (1886) offers a richly nuanced history of the East India Company. This article argues that the lexicon shows the influence of comparative philology, particularly the work of Friedrich Max Müller. Compiled at the same time as the India Office archives were first catalogued, Hobson-Jobson engages with the primary sources of Company history. The article examines both the impact of Asian words and goods on Britain, and the cultural and trading connections between colonies. Through a series of close readings, the article demonstrates that Hobson-Jobson offers fresh ways to approach the global networks of Company trade, and personal networks of affiliation. © 2017, Wiley. The attached document (embargoed until 09/11/2019) is an author produced version of a paper published in World Englishes, uploaded in accordance with the publisher’s self- archiving policy. The final published version (version of record) is available online at the link below. Some minor differences between this version and the final published version may remain. We suggest you refer to the final published version should you wish to cite from it
- …
