1,721,195 research outputs found
Closed sequential pattern mining for sitemap generation
A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stages of the website design. However, during their life websites significantly change their structure, their content and their possible navigation paths. Even if this is not the case, webmasters can fail to either define sitemaps that reflect the actual website content or, vice versa, to define the actual organization of pages and links which do not reflect the intended organization of the content coded in the sitemaps. In this paper we propose an approach which automatically generates sitemaps. Contrary to other approaches proposed in the literature, which mainly generate sitemaps from the textual content of the pages, in this work sitemaps are generated by analyzing the Web graph of a website. This allows us to: i) automatically generate a sitemap on the basis of possible navigation paths, ii) compare the generated sitemaps with either the sitemap provided by the Web designer or with the intended sitemap of the website and, consequently, iii) plan possible website re-organization. The solution we propose is based on closed frequent sequence extraction and only concentrates on hyperlinks organized in “Web lists”, which are logical lists embedded in the pages. These “Web lists” are typically used for supporting users in Web site navigation and they include menus, navbars and content tables. Experiments performed on three real datasets show that the extracted sitemaps are much more similar to those defined by website curators than those obtained by competitor algorithms
Recent advances in mining patterns from complex data
Data mining and knowledge discovery are advanced research fields with numerous algorithms and studies to extract patterns and models from complex data sources like blogs, event or log data, biological data, spatio-temporal data, social networks, mobility data, and sensor data and streams. The works presented in this special issue of the Journal of Intelligent Information Systems should keep the attention of both researchers and practitioners of data mining who are interested in the advances and latest developments in the area of extracting patterns. Behavioral Process Mining for Unstructured Processes by Claudia Diamantini, Laura Genga and Domenico Potena addresses the challenging problem of extracting useful information from the huge volume of events recorded by several of today's enterprise systems
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Distributed and explainable GHSOM for anomaly detection in sensor networks
The identification of anomalous activities is a challenging and crucially important task in sensor networks. This task is becoming increasingly complex with the increasing volume of data generated in real-world domains, and greatly benefits from the use of predictive models to identify anomalies in real time. A key use case for this task is the identification of misbehavior that may be caused by involuntary faults or deliberate actions. However, currently adopted anomaly detection methods are often affected by limitations such as the inability to analyze large-scale data, a reduced effectiveness when data presents multiple densities, a strong dependence on user-defined threshold configurations, and a lack of explainability in the extracted predictions. In this paper, we propose a distributed deep learning method that extends growing hierarchical self-organizing maps, originally designed for clustering tasks, to address anomaly detection tasks. The SOM-based modeling capabilities of the method enable the analysis of data with multiple densities, by exploiting multiple SOMs organized as a hierarchy. Our map-reduce implementation under Apache Spark allows the method to process and analyze large-scale sensor network data. An automatic threshold-tuning strategy reduces user efforts and increases the robustness of the method with respect to noisy instances. Moreover, an explainability component resorting to instance-based feature ranking emphasizes the most salient features influencing the decisions of the anomaly detection model, supporting users in their understanding of raised alerts. Experiments are conducted on five real-world sensor network datasets, including wind and photovoltaic energy production, vehicular traffic, and pedestrian flows. Our results show that the proposed method outperforms state-of-the-art anomaly detection competitors. Furthermore, a scalability analysis reveals that the method is able to scale linearly as the data volume presented increases, leveraging multiple worker nodes in a distributed computing setting. Qualitative analyses on the level of anomalous pollen in the air further emphasize the effectiveness of our proposed method, and its potential in determining the level of danger in raised alerts
Autophagy inhibitors in the treatment of colorectal cancer: a brief review
Colorectal cancer (CRC) is the third most frequent cancer. The first-line adjuvant or neoadjuvant chemotherapy is represented by 5-fluorouracil (5-FU) but its application is limited due to induction of chemoresistance. Recent studies showed that the 5-FU resistance in CRC is closely related to the activation of autophagy. During human carcinogenesis, autophagy has been demonstrated to play opposite roles of inhibitor or promoter of malignant progression depending on initial or advanced stages of growth. Currently, the autophagy inhibitor chloroquine (CQ) and its derivate, hydroxychloroquine (HCQ), are the only Food and Drug Administration (FDA)-approved drugs for clinical use. This review summarizes recent findings on the possible employment of autophagy inhibitors to overcome chemoresistance engaged in the CRC
Introduction to the special issue on discovery science
Welcome to the Discovery Science special issue. This issue contains both extended papers from the Discovery Science 2016 conference, held in Bari, Italy (19–21 October 2016), as well as new contributions solicited by an open call. Discovery science is a research discipline spanning multiple areas including advances in the development and analysis of methods for discovering scientific knowledge coming from machine learning, data mining, and intelligent data analysis, as well as their application in various scientific domains including, but not limited to, biomedical, astronomical, physics and social sciences. Applications to massive, heterogeneous, complex, continuous or imprecise data sets are of particular interest for the discipline
- …
