1,720,972 research outputs found
Rule Mining with RuM (Extended Abstract)
Declarative process modeling languages are especially suitable to model loosely-structured, flexible business processes. One of the most prominent of these languages is Declare. The Declare language can be used for all process mining branches and a plethora of techniques have been implemented to support process mining with Declare. The process mining application RuM integrates multiple Declare-based process mining methods into a single application and is developed to be the starting point for the use of Declare both in industry and academia. RuMhas been evaluated by conduct- ing a qualitative user evaluation, the results of which have been used as input for further development. In this paper, we give a short overview of the current functionalities of RuM, including the main improvements made thus far
RuM: Declarative Process Mining, Distilled
Flexibility is a key characteristic of numerous business process management domains. In these domains, the paths to fulfil process goals may not be fully predetermined, but can strongly depend on dynamic decisions made based on the current circumstances of a case. A common example is the adaptation of a standard treatment process to the needs of a specific patient. However, high flexibility does not mean chaos: certain key process rules still delimit the execution space, such as rules that prohibit the joint administration of certain drugs in a treatment, due to dangerous interactions. A renowned means to handle flexibility by design is the declarative approach, which aims to define processes through their core behavioural rules, thus leaving room for dynamic adaptation. This declarative approach to both process modelling and mining involves a paradigm shift in process thinking and, therefore, the support of novel concepts and tools. Complementing our tutorial with the same title, this paper provides a high-level introduction to declarative process mining, including its operationalisation through the RuM toolkit, key conceptual considerations, and an outlook for the future
Counting word frequencies based on limited regular expressions
Käesolev bakalaureusetöö keskendub ühe algoritmi arendamisele ja implementeerimisele. See algoritm moodustab ühe osa suuremast biomarkerite otsimise töövoost. Töövoogu arendatakse Tartu Ülikooli BIIT grupis ühe koostööprojekti raames. Algoritmi sisendiks on suur kogus andmeid erinevate bioloogiliste proovide kohta.
Andmed nende proovide kohta on esitatud kasutades lühikesi sõnu ja vastavaid esinemise sagedusi, mille kaudu on võimalik tuvastada olulisi erinevuseid proovide vahel. Lisaks on teada, et mõningatel juhtudel võib piiratud võimsusega regulaaravaldis anda palju paremat infot proovide erinevuste kohta. Samas regulaaravaldistele vastavad sagedused ei ole ette teada vaid tuleb arvutada sisendiks proove iseloomustavate sõnade ja vastavate sageduste põhjal.
Selle probleemi saab jagada kaheks osaks. Esiteks tuleb leida kõik sõnad mis vastavad ette antud regulaaravaldisele. Selle saavutamiseks kasutame suuri bitivektoreid, mida hoitakse pidevalt mälus. Teiseks tuleb arvutada regulaaravaldise sagedused regulaaravaldisele vastavate sõnade sageduste põhjal. Kiirus on siinkohal saavutatud hõreda maatrikis pidevalt mälus hoidmisega. Maatriksile vastava andmestruktuuri formaat on valitud selliselt, et maatriksi ridu saaks võimalikult kiirelt proovide veergude kaupa kokku liita.
Bakalaureusetöö tulemuseks olev algoritm on implementeeritud programeerimiskeeltes Python ja C++. Töös on toodud mõlema implementatsiooni detailid ning lõpuks on võrreldud nende kiirust sama ülesande lahendamiseks arendatud naiivse lahendusega.This bachelor's thesis concentrates on developing and implementing an algorithm for a subtask in a biomarker discovery pipeline. The pipeline itself is being developed at the BIIT group in the University of Tartu as part of an industrial collaboration. The input of this algorithm is data about a large number of different biological samples.
The data about these samples is represented by using short words and corresponding frequencies, which allow us to find significant differences between samples. It is also known that in some cases a limited regular expression would be a much better representation of these differences. However the frequencies that correspond to any given regular expression need to be calculated based on words and the frequencies of these words.
This problem can be divided into two parts. First we need to find all of the words that match the given regular expression, this is achieved by using large bitvectors that will be constantly stored in memory. The second part concentrates on calculating the frequencies based on matching words. Speed is here achieved by storing frequencies in memory as a sparse array in format that allows fast adding of rows.
The resulting algorithm is implemented in both Python and C++. The details of these implementations are given and finally the speed of both of these implementations is measured against a naive solution.
The bachelors thesis results in an program that is able to find the frequencies of input regular expressions with sufficient speed
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
A Desktop Application for Advanced Business Rule Mining
Process mining is one of the research disciplines belonging to the field of Business Process Management (BPM). The central idea of process mining is to use real process execution logs in order to discover, model, and improve business processes. There are multiple approaches to modeling processes with the most prevalent being procedural models. However, procedural models can be difficult to use in cases where the process is less structured and has a high number of different branches and exceptions. In these cases, it may be better to use declarative models, because declarative models do not aim to model the end-to-end processes step by step, but they constrain the behaviour of the process using rules thus allowing for more variability in the process model.
There are multiple applications available for working with procedural models. For example, Disco and Apromore, both of which have a highly polished user interface and are relatively easy to use. However, there are currently no comparable applications for working with declarative models.
This thesis builds on the Master’s Thesis of D. Kapisiz in order to develop an already existing application, RuM, into an accessible and easy to use process mining application. While RuM itself already has most of the needed functionality, the user interface of RuM is not well polished and does not have an appealing look in general. In this Master’s Thesis we will completely redesign and reimplement the user interface of RuM while also making technical changes in order to enable its continued development. The new user interface has been thoroughly evaluated by conducting a user evaluation involving 4 experts of declarative models and 4 experts of business process mining in general. The main findings of the user evaluation will be presented as a part of this thesis
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
