1,721,039 research outputs found

    A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

    No full text
    Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting. However, it is also shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques

    Building intelligent credit scoring systems using decision tables

    No full text
    Accuracy and comprehensibility are two important criteria when developing decision support systems for credit scoring. In this paper, we focus on the second criterion and propose the use of decision tables as an alternative knowledge visualization formalism which lends itself very well to build intelligent and user-friendly credit scoring systems. Starting from a set of propositional if-then rules extracted by a neural network rule extraction algorithm, we develop decision tables and demonstrate their efficiency and user-friendliness for 2 real-life credit scoring cases

    Ant-based approach to the knowledge fusion problem

    No full text
    Data mining involves the automated process of finding patterns in data and has been a research topic for decades. Although very powerful data mining techniques exist to extract classification models from data, the techniques often infer counter-intuitive patterns or lack patterns that are logical for domain experts. The problem of consolidating the knowledge extracted from the data with the knowledge representing the experience of domain experts, is called the knowledge fusion problem. Providing a proper solution for this problem is a key success factor for any data mining application. In this paper, we explain how the AntMiner+ classification technique can be extended to incorporate such domain knowledge. By changing the environment and influencing the heuristic values, we can respectively limit and direct the search of the ants to those regions of the solution space that the expert believes to be logical and intuitive

    Mining software repositories for comprehensible software fault prediction models

    No full text
    Software managers are routinely confronted with software projects that contain errors or inconsistencies and exceed budget and time limits. By mining software repositories with comprehensible data mining techniques, predictive models can be induced that offer software managers the insights they need to tackle these quality and budgeting problems in an efficient way. This paper deals with the role that the Ant Colony Optimization (ACO)-based classification technique AntMiner+ can play as a comprehensible data mining technique to predict erroneous software modules. In an empirical comparison on three real-world public datasets, the rule-based models produced by AntMiner+ are shown to achieve a predictive accuracy that is competitive to that of the models induced by several other included classification techniques, such as C4.5, logistic regression and support vector machines. In addition, we will argue that the intuitiveness and comprehensibility of the AntMiner+ models can be considered superior to the latter models

    Ants constructing rule-based classifiers

    Full text link
    Swarm Intelligence is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. This book deals with the application of swarm intelligence in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapter giving the fundamental definitions and some important research challenges. Important features include the detailed overview of the various swarm intelligence and data mining paradigms, excellent coverage of timely, advanced data mining topics, state-of-the-art theoretical research and application developments and chapters authored by pioneers in the field. Academics, scientists as well as engineers engaged in research, development and application of optimization techniques and data mining will find the comprehensive coverage of this book invaluable. Written for: Engineers, researchers, and graduate students in Computational Intelligence<br/

    Predicting loss given default

    Full text link
    The topic of credit risk modeling has arguably become more important than ever before given the recent financial turmoil. Conform the international Basel accords on banking supervision, financial institutions need to prove that they hold sufficient capital to protect themselves and the financial system against unforeseen losses caused by defaulters. In order to determine the required minimal capital, empirical models can be used to predict the loss given default (LGD). The main objectives of this doctoral thesis are to obtain new insights in how to develop and validate predictive LGD models through regression techniques. The first part reveals how good real-life LGD can be predicted and which techniques are best. Its value is in particular in the use of default data from six major international financial institutions and the evaluation of twenty-four different regression techniques, making this the largest LGD benchmarking study so far. Nonetheless, it is found that the resulting models have limited predictive performance no matter what technique is employed, although non-linear techniques yield higher performances than traditional linear techniques. The results of this study strongly advocate the need for financial institutions to invest in the collection of more relevant data. The second part introduces a novel validation framework to backtest the predictive performance of LGD models. The proposed key idea is to assess the test performance relative to the performance during model development with statistical hypothesis tests based on commonly used LGD predictive performance metrics. The value of this framework comprises a solution to the lack of reference values to determine acceptable performance and to possible performance bias caused by too little data. This study offers financial institutions a practical tool to prove the validity of their LGD models and corresponding predictions as required by national regulators. The third part uncovers whether the optimal regression technique can be selected based on typical characteristics of the data. Its value is especially in the use of the recently introduced concept of datasetoids which allows the generation of thousands of datasets representing real-life relations, thereby circumventing the scarcity problem of publicly available real-life datasets, making this the largest meta learning regression study so far. It is found that typical data based characteristics do not play any role in the performance of a technique. Nonetheless, it is proven that algorithm based characteristics are good drivers to select the optimal technique. This thesis may be valuable for any financial institution implementing credit risk models to determine their minimal capital requirements compliant with the Basel accords. The new insights provided in this thesis may support financial institutions to develop and validate their own LGD models. The results of the benchmarking and meta learning study can help financial institutions to select the appropriate regression technique to model their LGD portfolio's. In addition, the proposed backtesting framework, together with the benchmarking results can be employed to support the validation of the internally developed LGD models

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Distributed business process coordination.

    No full text
    Business Process Management (BPM) and Process Aware Information Systems (PAIS) are becoming more and more integrated in todays business environments. The formulation of business strategies leads to business requirements for business processes. These business processes are designed, modeled and documented using formal, commercial or ad-hoc modeling languages. Furthermore, some modeled business processes get translated to executable entities, enabling the automatic enactment and follow-up of a business process. Especially in combination with Service Oriented Architectures (SOA) an enactment environment is created where processes can be deployed and executed automatically. From a managerial and technical point of view, the interpretation, control and enactment of a business process's control flow happens very often at one point in the organizational and IT structure. This creates an inflexible environment, where control over and visibility of cross-departmental processes cannot be distributed across these organizational entities. Moreover, a centralized approach creates a performance bottleneck and single point of failure in process execution. As the number of process instantiation requests increases, time to successfully handle one process instance also increases. Process model fragmentation and distribution is a technique to overcome the aforementioned issues. Process model fragmentation is the process of splitting a process model that was modeled as a whole into logically different, smaller model fragments with the intention to distribute the fragments over different execution and controlling partners. This dissertation studies the challenging aspects of process fragmentation and proposes a non-intrusive, automatic approach to fragment and distribute the process flow over different organizational entities, hereby also increasing performance and removing any single point of failure. Special attention is given to flexibility: each distributed process part becomes autonomous, creating the ability to easily add new process fragments, change the deployment structure and add monitoring and management tools without any additional effort.The first part of this dissertation discusses the fragmented enactment environment, the architecture, transformations and semantics needed to deploy and run a fragmented process model. The second part focuses on runtime adaptability of the process model in the fragmented and distributed environment. Similar to centralized process enactment, the distributed environment should support the ability to respond effectively to process changes. In the second part the difficulties, advantages and issues of process model change support are discussed. Moreover a system is proposed which tackles the identified issues and allows the propagation and coordination of process changes in the distributed process execution architecture. Two change types are discussed: top-down and bottom up. The top-down change is issued by a central coordinator, given the originally designed process model. The bottom-up change is issued by alocal fragment coordinator, given only the information available in his own process model fragment. Both thesis parts are supported by a formal and experimental verification and a proof-of-concept implementation, showing the feasibility of the runtime and adaptability potential of the fragmented process execution environment.status: Publishe

    Enterprise architecture for small and medium-sized enterprises : CHOOSE

    Full text link
    Enterprise architecture (EA) is a coherent whole of principles, methods, and models that are used in the design and realization of an enterprise’s organizational structure, business processes, information systems, and IT infrastructure. EA is used as a holistic approach to keep things aligned in a company. Some emphasize the use of EA to align IT with the business, others see it broader and use it to also keep the processes aligned with the strategy. Recent research indicates the need for EA in small and medium-sized enterprises (SMEs), important drivers of the economy, as they struggle with problems related to a lack of structure and overview of their business. However, existing EA frameworks are perceived as too complex and, to date, none of the EA approaches are sufficiently adapted to the SME context. Therefore, in this PhD, we present the CHOOSE approach for EA for SMEs. The approach consists of four artifacts: a metamodel, a method, software tool support, and a visualization. The approach is kept simple so that it may be applied in an SME context and is based on the essential dimensions of EA frameworks. Five steps were taken: first, the problem of EA in SMEs was extensively analyzed. Next, the CHOOSE metamodel was developed during action research in SMEs. Then, action research in six companies was used to develop an adequate method (consisting of guidelines, a roadmap, and stop criteria) and to further refine this CHOOSE metamodel, while different types of software tools (PC, iPad, Android, ...) were developed to enable the evaluation rounds. Finally, a proper visualization was established
    corecore