1,720,952 research outputs found
DBLP-derived labeled data for author name disambiguation
This is a DBLP-derived labeled data originally created by Dr. C. Lee Giles at Penn State University and filtered for duplicate removal and error correction by Dr. Jinseok Kim at University of Michigan. For more details, see references below.1. Kim, Jinseok (2018). Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics. doi:10.1007/s11192-018-2824-5 2. Kim, Jinseok & Kim, Jenna (2018). The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics. doi: 10.1007/s11192-018-2865-9Each row refers to an author name instance with following feature information separated by tab.author name: full name string extracted from DBLPunique author id: labels assigned manually by Dr. C. Lee Giles's teampaper id: assigned by Dr. Jinseok Kimauthor list: names of authors in the byline of the paperyear: publication yearvenue: conference or journal namestitle: stopwords removed and stemmed by the Porter's stemmerIf you want to use this dataset, please consider to cite papers below.For the original dataset: Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two Supervised Learning Approaches for Name Disambiguation in Author Citations. JCDL 2004: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries, 296-305. doi:10.1145/996350.996419For the filtered dataset: 1. Kim, Jinseok (2018). Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics. doi:10.1007/s11192-018-2824-5 or2. Kim, Jinseok & Kim, Jenna (2018). The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics. doi: 10.1007/s11192-018-2865-9</div
Khoo Kay Kim, professor of Malaysian history : a biobibliometric study
Presents an analysis of the publication productivity, authorship pattern, channels of communication, journal preference and language preference of Professor Dato' Khoo Kay Kim, Professor of Malaysian History in the University of Malaya, Kuala Lumpur. The results of this biobibliometric study indicate that he can be a role model for future Malaysian historians to emulate his various achievements especially in the field of history education
A note on Kim-Ma characterization of the Hilbert ball
This is an open access article under the CC BY license.[No abstract available]Kortney Rose Foundation, KRF, (2002-070-C00005); National Research Foundation of Korea, NRF* Corresponding author. E-mail addresses: [email protected] (K.-T. Kim), [email protected] (D. Ma). 1 Research supported in part by the grant KRF 2002-070-C00005 from The Korea Research Foundation
Author recognition for Turkish documents
Günümüzde, yazar tanıma çalışmaları, teknolojinin gelişmesi ve bilginin yaygınlaşması ile ortaya çıkan bir takım sorunlara çözüm üretmek için yapılmaktadır. Bu sorunlardan bazıları yazarı belli olmayan dokümanların yazarlarının belirlenmesi ve yazarının kim olduğundan tam olarak emin olunamayan metinlerin yazarlarının belirlenmesidir. Bu çalışmada, Türkçe dokümanlar için yazar tanıma sistemleri geliştirilmiştir. Sistemlerin eğitilmesinde ve test edilmesinde kullanılmak üzere, gazetelerden seçilen 6 yazara ait köşe yazıları kullanılmıştır. Yazarların 70?er makalesinden oluşan 420 dokümandan oluşan bir derlem hazırlanmıştır. Bu dokümanlardan 20?şer tanesi eğitim için, 50?şer tanesi test için kullanılmıştır. İlk olarak, 6 yazara ait dokümanlar toplanmış, daha sonra her yazara ait 20 doküman birleştirilerek tek bir doküman haline getirilmiştir. Bu şekilde elde edilen 6 doküman için sözcük, gövde, hece ve karakter n-gramlarının öznitelik vektörleri belirlenmiştir. K-En Yakın Komşu algoritması için öznitelik vektörleri belirlenirken her yazar için vektör uzunlukları 120, 180 ve 240 olarak seçilmiş, oluşan öznitelik vektörleri için K-En Yakın Komşu algoritmasıyla test edilmiştir. En başarılı sonuçlar, vektör boyu 120 olduğunda elde edildiğinden diğer metotlar için de vektör boyu 120 olarak kullanılmıştır. Geliştirilen sistemler eğitildikten sonra test edilerek doğruluk ve F-ölçüsü değerlerine göre birbirleriyle karşılaştırılmıştır.Today, the studies of author recognition have been made for providing the solutions of the problems which occur by developing and growing of information technology. Some of these problems are to specify the authors who the papers are exactly written by. In this study, some systems about author recognition for Turkish documents have been developed. For generating the systems, we have used the columns which belong to six authors in some newspapers. A corpus which includes totally 420 documents is constructed for training and testing of the systems. Each author has seventy documents. Twenty documents of every author are used for training operation. But, the other documents are utilized for testing stage. The features of word, stem, syllable, character and their n-grams are decided for each documents of these six author. Author recognition systems have been developed with the methods as K-Nearest Neighbor, Support Vector Machine, Multi-Layer Perceptron and Learning Vector Quantization. The feature vectors? lengths of the systems developed by K-Nearest Neighbor have been chosen as 120, 180 and 240. Because the most successful results are obtained as the length of the feature vectors is 120, we have used this length for the other methods. After the developed systems are trained the methods, the systems have been tested and evaluated according to accuracy and F-measure values
Missing-data handling methods for lifelogs-based wellness index estimation: Comparative analysis with panel data
Background: A lifelogs-based wellness index (LWI) is a function for calculating wellness scores based on health behavior lifelogs (eg, daily walking steps and sleep times collected via a smartwatch). A wellness score intuitively shows the users of smart wellness services the overall condition of their health behaviors. LWI development includes estimation (ie, estimating coefficients in LWI with data). A panel data set comprising health behavior lifelogs allows LWI estimation to control for unobserved variables, thereby resulting in less bias. However, these data sets typically have missing data due to events that occur in daily life (eg, smart devices stop collecting data when batteries are depleted), which can introduce biases into LWI coefficients. Thus, the appropriate choice of method to handle missing data is important for reducing biases in LWI estimations with panel data. However, there is a lack of research in this area. Objective: This study aims to identify a suitable missing-data handling method for LWI estimation with panel data. Methods: Listwise deletion, mean imputation, expectation maximization-based multiple imputation, predictive-mean matching-based multiple imputation, k-nearest neighbors-based imputation, and low-rank approximation-based imputation were comparatively evaluated by simulating an existing case of LWI development. A panel data set comprising health behavior lifelogs of 41 college students over 4 weeks was transformed into a reference data set without any missing data. Then, 200 simulated data sets were generated by randomly introducing missing data at proportions from 1% to 80%. The missing-data handling methods were each applied to transform the simulated data sets into complete data sets, and coefficients in a linear LWI were estimated for each complete data set. For each proportion for each method, a bias measure was calculated by comparing the estimated coefficient values with values estimated from the reference data set. Results: Methods performed differently depending on the proportion of missing data. For 1% to 30% proportions, low-rank approximation-based imputation, predictive-mean matching-based multiple imputation, and expectation maximization-based multiple imputation were superior. For 31% to 60% proportions, low-rank approximation-based imputation and predictive-mean matching-based multiple imputation performed best. For over 60% proportions, only low-rank approximation-based imputation performed acceptably. Conclusions: Low-rank approximation-based imputation was the best of the 6 data-handling methods regardless of the proportion of missing data. This superiority is generalizable to other panel data sets comprising health behavior lifelogs given their verified low-rank nature, for which low-rank approximation-based imputation is known to perform effectively. This result will guide missing-data handling in reducing coefficient biases in new development cases of linear LWIs with panel data.Methodologie en Organisatie van Desig
Author Correction: El Niño–Southern Oscillation complexity (Nature, (2018), 559, 7715, (535-545), 10.1038/s41586-018-0252-6)
In this Review, the middle initial of author Kim M. Cobb was omitted. The original Review Article has been corrected online. © 2019, The Author(s), under exclusive licence to Springer Nature Limited.11Nsciescopu
Constructing tree decompositions of graphs with bounded gonality
In this paper, we give a constructive proof of the fact that the treewidth of a graph is at most its divisorial gonality. The proof gives a polynomial time algorithm to construct a tree decomposition of width at most k, when an effective divisor of degree k that reaches all vertices is given. We also give a similar result for two related notions: stable divisorial gonality and stable gonality.Accepted author manuscriptDiscrete Mathematics and Optimizatio
Coos River Basin fish management plan
prepared by Linda J. Wagoner, Kim K. Jones, Reese E. Bender, Jerry A. Butler, Darrell E. Demory, Thomas F. Gaumer, Joel A. Hurtado, William G. Mullarkey, Paul E. Reimers, Neil T. Richmond, Thomas J. Rumreich.This archived document is maintained by the State Library of Oregon as part of the Oregon Documents Depository Program. It is for informational purposes and may not be suitable for legal purposes.Includes bibliographical references (pages 122-124).Mode of access: Internet from the Oregon Government Publications Collection.Text in English
Low-Rank Tensor Decompositions for Nonlinear System Identification: A Tutorial with Examples
Nonlinear parametric system identification is the estimation of nonlinear models of dynamical systems from measured data. Nonlinear models are parameterized, and it is exactly these parameters that must be estimated. Extending familiar linear models to their nonlinear counterparts quickly leads to practical problems. For example, the generalization of a multivariate linear function to a multivariate polynomial implies that the number of parameters grows exponentially with the total degree of the polynomial. This exponential explosion of model parameters is an instance of the so-called curse of dimensionality. Both the storage and computational complexities are limiting factors in the development of system identification methods for such models.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Team Kim Batselie
Using imagery to solve spatial problems
This report focuses on the use of imagery to solve a range of spatial problems. The research projects reviewed in this report offer some insight into the range of strategies used by solvers of spatial problems and point to relationships between spatial and verbal skills
- …
