1,724,070 research outputs found

    On co-authorship for author disambiguation

    No full text
    Author name disambiguation deals with clustering the same-name authors into different individuals. To attack the problem, many studies have employed a variety of disambiguation features such as coauthors, titles of papers/publications, topics of articles, emails/affiliations, etc. Among these, co-authorship is the most easily accessible and influential, since inter-person acquaintances represented by co-authorship could discriminate the identities of authors more clearly than other features. This study attempts to explore the net effects of co-authorship on author clustering in bibliographic data. First, to handle the shortage of explicit coauthors listed in known citations, a web-assisted technique of acquiring implicit coauthors of the target author to be disambiguated is proposed. Then, a coauthor disambiguation hypothesis that the identity of an author can be determined by his/her coauthors is examined and confirmed through a variety of author disambiguation experiments. (C) 2008 Elsevier Ltd. All rights reserved.X1174sciescopu

    An efficient multiuser detector with low decision delay for multiple chip rate DS/CDMA mobile radio systems

    No full text
    In this paper, an efficient multiuser detection scheme with low decision delay is presented for asynchronous multiple chip rate (MCR) DS/CDMA systems in mobile radio channels without any restrictions on processing gains and chip rates. An equivalent synchronous single bit rate DS/CDMA system is first formulated to break up the detection problem into the blocks of finite length called processing windows. Then, a low delay multipath-combining decision-feedback multiuser detector (LDMCDF) is proposed based on the equivalent system model. Since the LDMCDF makes the decisions for data bits in every processing window, the decision delay is less than the interval of one processing window. The effect of the processing window length on the performance of the LDMCDF is evaluated, and the simulation results show that the LDMCDF provides good performance with the negligible decision delay. (C) 2001 Elsevier Science B.V. All rights reserved

    Subtopic mining using simple patterns and hierarchical structure of subtopic candidates from web documents

    No full text
    The intention gap between users and queries results in ambiguous and broad queries. To solve these problems, subtopic mining has been studied, which returns a ranked list of possible subtopics according to their relevance, popularity, and diversity. This paper proposes a novel method to mine subtopics using simple patterns and a hierarchical structure of subtopic candidates. First, relevant and various phrases are extracted as subtopic candidates using simple patterns based on noun phrases and alternative partial-queries. Second, a hierarchical structure of the subtopic candidates is constructed using sets of relevant documents from a web document collection. Finally, the subtopic candidates are ranked considering a balance between popularity and diversity using this structure. In experiments, our proposed methods outperformed the baselines and even an external resource based method at high-ranked subtopics, which shows that our methods can be effective and useful in various search scenarios like result diversification. (C) 2015 Elsevier Ltd. All rights reserved.1198sciessciscopu

    Syntactic analysis of long sentences based on S-clauses

    No full text
    In dependency parsing of long sentences with fewer subjects than predicates, it is difficult to recognize which predicate governs which subject. To handle such syntactic ambiguity between subjects and predicates, an "S(ubject)-clause" is defined as a group of words containing several predicates and their common subject, and then an automatic S-clause segmentation method is proposed using semantic features as well as morpheme features. We also propose a new dependency tree to reflect S-clauses. Trace information is used to indicate the omitted subject of each predicate. The S-clause information turned out to be very effective in analyzing long sentences, with an improved parsing performance of 4.5%. The precision in determining the governors of subjects in dependency parsing was improved by 32%.X11sciescopu

    Text categorization based on k-nearest neighbor approach for Web site classification

    No full text
    Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach: It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only. (C) 2002 Elsevier Science Ltd. All rights reserved.X1169sciescopu

    Building a pronominalization model by feature selection and machine learning

    No full text
    Pronominalization is an important component in generating a coherent text. In this paper, we identify features that influence pronominalization, and construct a pronoun generation model by using various machine learning techniques. The old entities, which are the target of pronominalization, are categorized into three types according to their tendency in attentional state: Cb and old-Cp derived from a Centering model, and the remaining old entities. We construct a pronoun generation model for each type. Eighty-seven texts are gathered from three genres for training and testing. Using this, we verify that our proposed features are well defined to explain pronominalization in Korean, and we also show that our model significantly outperforms previous ones with 99% confidence level by t-test. We also identify central features that have a strong influence on pronominalization across genres.X11sciescopu

    Two-phase S-clause segmentation

    No full text
    When a dependency parser analyzes long sentences with fewer subjects than predicates, it is difficult for it to recognize which predicate governs which subject. To handle such syntactic ambiguity between subjects and predicates, we define an "a subject clause (s-clause)" as a group of words containing several predicates and their common subject. This paper proposes a two-phase method for S-clause segmentation. The first phase reduces the number of candidates of S-clause boundaries, and the second performs S-clause segmentation using decision trees. In experimental evaluation, the S-clause information turned out to be effective for determining the governor of a subject and that of a predicate in dependency parsing. Further syntactic analysis using S-clauses achieved an improvement in precision of 5 percent.open111sciescopu

    Description technique for component composition focusing on black-box view

    No full text
    As component-based software is developed by integrating components that are implemented independently, expressing the usage protocols of each component is essential. However, there is no known proper way to describe them comprehensibly from the point of component user or developer. Black-box (external) point of view of component composition sees component-based development from the user's or the system assembler's point of view. But a description technique necessary to specify the dynamic constraint explicitly is necessary to define the external view more precisely. The key contribution of this paper is to present a technique for describing the structure of components in black-box view using UML 2.0. First, we present the relevant UML notations for describing the black-box point of view and then provide diagrams showing their usage. We further illustrate how this leads to a component based software specification of the structure of composition focusing on the black-box view

    Generation of zero pronouns based on the centering theory and pairwise salience of entities

    No full text
    This paper investigates zero pronouns in Korean, especially focusing on the center transitions of adjacent utterances under the framework of Centering Theory. Four types of nominal entity (Epair, Einter, Eintra, and Enon) from Centering Theory are defined with the concept of inter-, intra-, and pairwise salience. For each entity type, a case study of zero phenomena is performed through analyzing corpus and building a pronominalization model. This study shows that the zero phenomena of entities which have been neglected in previous Centering works are explained via the center transition of the second previous utterance, and provides valuable results for pronominalization of such entities, such as p2-trans rule. We improve the accuracy of pronominalization model by optimal feature selection and show that our accuracy outperforms the accuracy of previous works.open1124sciescopu

    Identifying Top News Stories Based on their Popularity in the Blogosphere

    No full text
    A huge volume of news stories are reported by various news channels, on a daily basis. Subscribing to all the stories and keeping track of the important ones day after day is very time-consuming. This paper proposes several approaches to identify important news stories. To this end, we take advantage of the blogosphere as an information source to evaluate the importance of news stories. Blogs reflect the diverse opinions of bloggers about news stories, and the attention that these stories receive can help estimate the importance of the stories. In this paper, we define the popularity of a news story in the blogosphere as the attention it attracts from users. We measure popularity of the stories in the blogosphere from two viewpoints: content and a timeline. In terms of content, we suggest several approaches to estimate language models for a news story and blog posts, and we evaluate the importance of the story using these language models. Furthermore, we generate a temporal profile of a news story by exploring the timeline of blog posts related to the story, and evaluate its importance based on the temporal profile. We experimentally verify the effectiveness of the proposed approaches for identifying top news stories.X1122sciescopu
    corecore