1,720,965 research outputs found
Machine Learning methods and applications in Economics
The thesis introduces supervised and unsupervised learning concepts having in mind an unexperienced audience, pointing out relevant references for further studies. Moreover, we highlight the relevance of Machine Learning for Economics and what are the possible applications. Then, the work proceeds with two contributions. The first one is a methodological contribution to cluster analysis; here we propose a novel method to score and evaluate clustering solutions where clusters are parametrized by centres, scatters and sizes parameters. The second contribution is an application of Machine Lerning methods to Labor Economics. We explore the assignment of employees-to-tasks and use trees-based learning algorithms to retrieve a mapping for the assignment. We show that the so-derived assignment rule helps explaining productivity drivers
Asymptotic Results for the Estimation of the Quadratic Score of a Clustering
In cluster analysis one often finds several partitions of a data set using different clustering methods and algorithms set with a variety of hyperparameters and tunings. The number of clusters K is one of the most relevant of such hyperparameters. Cluster selection is the task of choosing the desired partitions. The Bootstrap Quadratic Scoring is a recently introduced method where the cluster selection is performed by optimizing a score attached to a partition that is based on the quadratic discriminant function. Previously, we proposed the estimation of this cluster score via bootstrap resampling and investigated the proposed estimator based on numerical experiments and real data applications. However, that earlier work did not provide theoretical guarantees. In this paper, we fill that gap. We study the asymptotic behavior of the scoring method and show that the proposed estimator converges to well-defined population counterparts
Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score
Cluster analysis requires fixing the number of clusters and often many hyper-parameters. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant function and parameters describing clusters’ size, center and scatter. We develop two cluster-quality criteria that are consistent with groups generated from a class of elliptic–symmetric distributions. Using the bootstrap resampling of the proposed criteria, we propose a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods. Extensive experimental analysis shows that the proposed methodology achieves a better overall performance compared to established alternatives from the literature
Analysis of the Mirkin’s Distance on Binary Relations for Clustering Stability
Clustering stability is a popular approach to cluster validation, where the stability of clustering solutions is evaluated across resamples to select the most stable structure. However, there are few empirical studies that analyze clustering stability methods. This paper investigates the use of the Mirkin distance for evaluating the stability of clustering solutions, across non-parametric bootstrap resamples. The proposed strategy is validated with an extensive experimental analysis, providing useful insights in clustering stability for practical applications
Likelihood-type methods for comparing clustering solutions
Selecting an optimal clustering solution is a longstanding problem. In
model-based clustering this amounts to choose the architecture of the model mixture
distribution. Decisions to be made pertain to: cluster prototype distribution; number of
mixture components; (optionally) restrictions on the clusters’ geometry. Classical pro-
posals address this issue via penalized model selection criteria based on the observed
likelihood function. In this study, we compare these techniques with the less explored
cross-validation alternative, which is rather popular for many other data-driven opti-
mized methods. We analyze both classical methods such as BIC, AIC, AIC3 and ICL,
and several cross-validation schemes where the risk is defined in terms of minus the
log-likelihood function. Selection methods are compared by using the Iris dataset
Quadratic discriminant scoring for selecting clustering solutions
Selecting an optimal clustering solutions is a difficult problem. There exist many data-driven validation strategies in the literature to perform this task. In this paper, we focus on a recent proposal, based on quadratic discriminant scores and bootstrap resampling, namely the BQH and BQS from Coraggio and Coretto [4]. These strategies proved to be extremely successful with elliptic-symmetric clusters and, in general, when clusters can be separated by quadratic boundaries. In this work, we review the BQH and BQS strategies, and try to shed more light on their functioning, by comparing them with alternative likelihood-based validation indexes, and with different resampling schemes
JAQ of All Trades: Job Mismatch, Firm Productivity and Managerial Quality
We develop a novel measure of job-worker allocation quality (JAQ) by ex-
ploiting employer-employee data with machine learning techniques. Based on
our measure, the quality of job-worker matching correlates positively with
individual labor earnings and firm productivity, as well as with market com-
petition, non-family firm status, and employees’ human capital. Management
plays a key role in job-worker matching: when managerial hirings and firings
persistently raise management quality, the matching of rank-and-file workers
to their jobs improves. JAQ can be constructed from any employer-employee
data set including workers’ occupations, and used to explore research ques-
tions in corporate finance and organization economics
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
