1,721,064 research outputs found

    H-3: A hybrid handheld healthcare framework

    No full text
    Handheld devices, which have been widely adopted to clinical environments for medical data archiving, have revolutionized error-prone manual processes of the past. Meanwhile, the use of devices has been reported to be limited to data entries and archiving, without fully leveraging their computing and retrieval capabilities. This paper studies a hybrid system which complements current state-of-art, by combining intelligent, retrieval techniques developed over middleware environments for retrieval effectiveness and flash-aware data management techniques for retrieval efficiency. By enabling intelligent ranked retrieval, the limited resources of handheld devices, e.g., limited display and computation capabilities, can be utilized effectively, by selectively retrieving the few most relevant results. However, to achieve this goal, we need a hybrid approach, bridging middleware-based ranked retrieval techniques to optimize for flash memory storage, as typically adopted by handheld devices. We address this newly emerging challenge and propose a flash-ware framework H-3, which we empirically validate its effectiveness over baseline alternatives.X11sciescopu

    Report on Data-intensive Software Management and Mining

    No full text
    X110sciescopu

    Search structures and algorithms for personalized ranking

    No full text
    As data of an unprecedented scale are becoming accessible on the Web, personalization, of narrowing down the retrieval to meet the user-specific information needs, is becoming more and more critical. For instance, while web search engines traditionally retrieve the same results for all users, they began to offer beta services to personalize the results to adapt to user-specific contexts such as prior search history or other application contexts. In a clear contrast to search engines dealing with unstructured text data, this paper studies how to enable such personalization in the context of structured data retrieval. In particular, we adopt contextual ranking model to formalize personalization as a cost-based optimization over collected contextual rankings. With this formalism, personalization can be abstracted as a cost-optimal retrieval of contextual ranking, closely matching user-specific retrieval context. With the retrieved matching context, we adopt a machine learning approach, to effectively and efficiently identify the ideal personalized ranked results for this specific user. Our empirical evaluations over synthetic and real-life data validate both the efficiency and effectiveness of our framework. (c) 2008 Elsevier Inc. All rights reserved.X1110sciescopu

    Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates

    No full text
    This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, ranked queries often accompany Boolean ranking conditions, which may turn predicates into expensive ones, as the index structure on the predicate built on the base table may be no longer effective in retrieving the filtered objects in order. Fourth, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. To enable probe minimization, we develop the problem as cost-based optimization of searching over potential probe schedules. In particular, we decouple probe scheduling into object and predicate scheduling problems and develop an analytical object scheduling optimization and a dynamic predicate scheduling optimization, which combined together form a cost-effective probe schedule.X116sciescopu

    Scalable skyline computation using a balanced pivot selection technique

    No full text
    Skyline queries have recently received considerable attention as an alternative decision-making operator in the database community. The conventional skyline algorithms have primarily focused on optimizing the dominance of points in order to remove non-skyline points as efficiently as possible, but have neglected to take into account the incomparability of points in order to bypass unnecessary comparisons. To design a scalable skyline algorithm, we first analyze a cost model that copes with both dominance and incomparability, and develop a novel technique to select a cost-optimal point, called a pivot point, that minimizes the number of comparisons in point-based space partitioning. We then implement the proposed pivot point selection technique in the existing sorting- and partitioning-based algorithms. For point insertions/deletions, we also discuss how to maintain the current skyline using a skytree, derived from recursive point-based space partitioning. Furthermore, we design an efficient greedy algorithm for the k representative skyline using the skytree. Experimental results demonstrate that the proposed algorithms are significantly faster than the state-of-the-art algorithms. (C) 2013 Elsevier Ltd. All rights reserved.X111114sciescopu

    Toward efficient multidimensional subspace skyline computation

    No full text
    Skyline queries have attracted considerable attention to assist multicriteria analysis of large-scale datasets. In this paper, we focus on multidimensional subspace skyline computation that has been actively studied for two approaches. First, to narrow down a full-space skyline, users may consider multiple subspace skylines reflecting their interest. For this purpose, we tackle the concept of a skycube, which consists of all possible non-empty subspace skylines in a given full space. Second, to understand diverse semantics of subspace skylines, we address skyline groups in which a skyline point (or a set of skyline points) is annotated with decisive subspaces. Our primary contributions are to identify common building blocks of the two approaches and to develop orthogonal optimization principles that benefit both approaches. Our experimental results show the efficiency of proposed algorithms by comparing them with state-of-the-art algorithms in both synthetic and real-life datasets.X111316sciescopu

    Optimizing top-k queries for middleware access: A unified cost-based approach

    No full text
    This article studies optimizing top-k queries in middlewares. While many assorted algorithms have been proposed, none is generally applicable to a wide range of possible scenarios. Existing algorithms lack both the "generality" to support a wide range of access scenarios and the systematic "adaptivity" to account for runtime specifics. To fulfill this critical lacking, we aim at taking a cost-based optimization approach: By runtime search over a space of algorithms, cost-based optimization is general across a wide range of access scenarios, yet adaptive to the specific access costs at runtime. While such optimization has been taken for granted for relational queries from early on, it has been clearly lacking for ranked queries. In this article, we thus identify and address the barriers of realizing such a unified framework. As the first barrier, we need to define a "comprehensive" space encompassing all possibly optimal algorithms to search over. As the second barrier and a conflicting goal, such a space should also be "focused" enough to enable efficient search. For SQL queries that are explicitly composed of relational operators, such a space, by definition, consists of schedules of relational operators (or "query plans"). In contrast, top-k queries do not have logical tasks, such as relational operators. We thus define the logical tasks of top-k queries as building blocks to identify a comprehensive and focused space for top-k queries. We then develop efficient search schemes over such space for identifying the optimal algorithm. Our study indicates that our framework not only unifies, but also outperforms existing algorithms specifically designed for their scenarios.X1111sciescopu

    Efficient entity matching using materialized lists

    No full text
    Entity matching (EM) is the task of identifying records that refer to the same entity from different sources. EM is widely used in real-world applications such as data integration and data cleaning, but the naive method of EM leads to exhaustive pair-wise comparisons. To enhance the efficiency of EM, we transform EM into the top-k query problem of identifying the best k results for a given match function, and propose a new EM algorithm using pre-materialized lists, which refer to the sorted lists of record pairs. Our proposed algorithm identifies the EM results with sub-linear cost using the materialized lists. Because it requires us to materialize the sorted lists with all record pairs, however, this approach can be impractical. To address this problem, we reduce the size of the materialized lists, which stores only 1% of all pairs without sacrificing EM accuracy. This method is inspired by the notion of skyline queries. In addition, we extend our proposed framework to collective entity matching that exploits both attributes and the reference relationships across records. Experimental results show that the proposed algorithms are an order of magnitude faster than the state-of-the-art algorithms without compromising accuracy. (C) 2013 Elsevier Inc. All rights reserved.X1132sciescopu

    Supporting efficient distributed skyline computation using skyline views

    No full text
    Skyline queries return a set of objects, or a skyline, that are not dominated by any other objects. While providing users with an intuitive query formulation, the skyline queries may incur too many results, especially, for high dimensional data. To tackle this problem, subspace skyline queries, which deals with a subset of dimensions, have been recently studied. To identify interesting skylines, users can iteratively refine multiple relevant subspaces for skyline queries. Existing work focuses primarily on supporting efficient subspace skyline computation in centralized databases. In clear contrast, this paper aims to address subspace skyline computation in distributed environments such as the Web. Toward this goal, we make use of pre-computed subspace skylines as views in databases, called skyline views. Specifically, we propose distributed subspace skyline computation which minimizes the total access cost by leveraging the skyline views. Our experimental results validate that our proposed algorithms significantly outperform state-of-the-art algorithms in extensive synthetic datasets. (C) 2011 Elsevier Inc. All rights reserved.X1133sciescopu

    Efficient bitmap-based indexing of time-based interval sequences

    No full text
    In this paper, we discuss similarity searches for time series data represented as interval sequences. For instance, the time series of phone call records can be represented by time-based interval sequences, or T-interval sequences, which consist of the start and end times of the call records. To support an efficient similarity search for such sequences, we address the desirable semantics for similarity measures for the T-interval sequences, observe how existing measures fail to address such semantics, and propose a new measure that satisfies all our semantics. We then propose approximate encoding methods for T-interval sequences. More specifically, we propose two bitmap-based feature extraction methods: (1) a bin-bitmap encoding method that transforms the T-interval sequences into bitmaps affixed length, and (2) a segmented feature extraction method that takes the longest bitmap sequences of consecutive '1' elements. Finally, we propose two query processing schemes using these bitmap-based approximate representations. We validate the efficiency and effectiveness of our proposed solutions empirically. (C) 2011 Elsevier Inc. All rights reserved.X1113sciescopu
    corecore