1,720,987 research outputs found

    Optimizing Concurrency Control: Robustness Against Read Committed Revisited

    No full text
    While serializability always guarantees application correctness, lower isolation levels can be chosen to improve transaction throughput at the risk of introducing certain anomalies. A set of transactions is robust against a given isolation level if every possible interleaving of the transactions under the specified isolation level is serializable. Robustness therefore always guarantees application correctness with the performance benefit of the lower isolation level. In this thesis, we focus on robustness against Read Committed (RC), a popular isolation level o↵ered by many database systems. We consider di↵erent flavours of RC, both in single version (SVRC) and multiversion (MVRC) settings. The first main contribution of this thesis is a characterization of robustness against both isolation levels when the workload consists of a set of transactions. These characterizations are in terms of the absence of counterexample schedules of a specific form (multi-split schedules for SVRC and multiversion split schedules for MVRC). Based on these characterizations, we show that robustness against SVRC is coNP-hard and provide a polynomial time algorithm deciding robustness against MVRC. In practice, the total number of possible transactions might be huge, but all these transactions are often generated by a small set of predefined transaction programs. The second main contribution of this thesis is a study of robustness against MVRC where workloads are specified as a set of transaction programs. To identify such cases, we introduce an expressive model of transaction programs, called transaction templates, to better reason about the serializability of these workloads. We develop a tractable algorithm to decide whether any possible schedule over such a workload executed under MVRC is serializable. Our approach yields robust subsets that are larger than those identified by previous methods. We provide experimental evidence that workloads that are robust against MVRC can be evaluated faster under MVRC compared to stronger isolation levels. We discuss techniques for making workloads robust against MVRC by promoting selective read operations to updates. Depending on the scenario, the performance improvements can be considerable. Robustness testing and safely executing transactions under the lower isolation level MVRC can therefore provide a direct way to increase transaction throughput without changing DBMS internals. An important insight is that, by more accurately modeling transaction programs, we are able to recognize larger sets of workloads as robust. The third main contribution of this thesis is a further increase of the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys. We show that the incorporation of functional constraints can identify more workloads as robust against MVRC that otherwise would not be. Even though we establish that the robustness problem becomes undecidable in its most general form, we show that various restrictions on functional constraints lead to decidable and even tractable fragments that can be used to model and test for robustness against MVRC for realistic scenarios

    Optimizing Concurrency Control: Robustness Against Read Committed Revisited

    No full text
    While serializability always guarantees application correctness, lower isolation levels can be chosen to improve transaction throughput at the risk of introducing certain anomalies. A set of transactions is robust against a given isolation level if every possible interleaving of the transactions under the specified isolation level is serializable. Robustness therefore always guarantees application correctness with the performance benefit of the lower isolation level. In this thesis, we focus on robustness against Read Committed (RC), a popular isolation level o↵ered by many database systems. We consider di↵erent flavours of RC, both in single version (SVRC) and multiversion (MVRC) settings. The first main contribution of this thesis is a characterization of robustness against both isolation levels when the workload consists of a set of transactions. These characterizations are in terms of the absence of counterexample schedules of a specific form (multi-split schedules for SVRC and multiversion split schedules for MVRC). Based on these characterizations, we show that robustness against SVRC is coNP-hard and provide a polynomial time algorithm deciding robustness against MVRC. In practice, the total number of possible transactions might be huge, but all these transactions are often generated by a small set of predefined transaction programs. The second main contribution of this thesis is a study of robustness against MVRC where workloads are specified as a set of transaction programs. To identify such cases, we introduce an expressive model of transaction programs, called transaction templates, to better reason about the serializability of these workloads. We develop a tractable algorithm to decide whether any possible schedule over such a workload executed under MVRC is serializable. Our approach yields robust subsets that are larger than those identified by previous methods. We provide experimental evidence that workloads that are robust against MVRC can be evaluated faster under MVRC compared to stronger isolation levels. We discuss techniques for making workloads robust against MVRC by promoting selective read operations to updates. Depending on the scenario, the performance improvements can be considerable. Robustness testing and safely executing transactions under the lower isolation level MVRC can therefore provide a direct way to increase transaction throughput without changing DBMS internals. An important insight is that, by more accurately modeling transaction programs, we are able to recognize larger sets of workloads as robust. The third main contribution of this thesis is a further increase of the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys. We show that the incorporation of functional constraints can identify more workloads as robust against MVRC that otherwise would not be. Even though we establish that the robustness problem becomes undecidable in its most general form, we show that various restrictions on functional constraints lead to decidable and even tractable fragments that can be used to model and test for robustness against MVRC for realistic scenarios

    Allocating Isolation Levels to Transactions in a Multiversion Setting

    No full text
    A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe to do so. However, such trading of consistency for efficiency does not come with any safety guarantees. In this paper, we study the mixed robustness problem which asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation is always serializable. That is, whether the given allocation is indeed safe. While robustness has already been studied in the literature for the homogeneous setting where all transactions are allocated the same isolation level, the heterogeneous setting that we consider in this paper, despite its practical relevance, has largely been ignored. We focus on multiversion concurrency control and consider the isolation levels that are available in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). We show that the mixed robustness problem can be decided in polynomial time. In addition, we provide a polynomial time algorithm for computing the optimal robust allocation for a given set of transactions, prioritizing lower over higher isolation levels. The present results therefore establish the groundwork to automate isolation level allocation within existing databases supporting multiversion concurrency control.This work is funded by FWO-grant G019921N

    Allocating Isolation Levels to Transactions in a Multiversion Setting

    No full text
    A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe to do so. However, such trading of consistency for efficiency does not come with any safety guarantees. In this paper, we study the mixed robustness problem which asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation is always serializable. That is, whether the given allocation is indeed safe. While robustness has already been studied in the literature for the homogeneous setting where all transactions are allocated the same isolation level, the heterogeneous setting that we consider in this paper, despite its practical relevance, has largely been ignored. We focus on multiversion concurrency control and consider the isolation levels that are available in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). We show that the mixed robustness problem can be decided in polynomial time. In addition, we provide a polynomial time algorithm for computing the optimal robust allocation for a given set of transactions, prioritizing lower over higher isolation levels. The present results therefore establish the groundwork to automate isolation level allocation within existing databases supporting multiversion concurrency control

    Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics

    No full text
    Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds

    When View- and Conflict-Robustness Coincide for Multiversion Concurrency Control

    No full text
    A DBMS allows trading consistency for efficiency through the allocation of isolation levels that are strictly weaker than serializability. The robustness problem asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation, is always safe. In the literature, safe is interpreted as conflict-serializable (to which we refer here as conflict-robustness). In this paper, we study the view-robustness problem, interpreting safe as view-serializable. View-serializability is a more permissive notion that allows for a greater number of schedules to be serializable and aligns more closely with the intuitive understanding of what it means for a database to be consistent. However, view-serializability is more complex to analyze (e.g., conflict-serializability can be decided in polynomial time whereas deciding view-serializability is NP-complete). While conflict-robustness implies view-robustness, the converse does not hold in general. In this paper, we provide a sufficient condition for isolation levels guaranteeing that conflict- and view-robustness coincide and show that this condition is satisfied by the isolation levels occurring in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). It hence follows that for these systems, widening from conflict- to view-serializability does not allow for more sets of transactions to become robust. Interestingly, the complexity of deciding serializability within these isolation levels is still quite different. Indeed, deciding conflict-serializability for schedules allowed under RC and SI remains in polynomial time, while we show that deciding view-serializability within these isolation levels remains NP-complete

    Robustness against read committed for transaction templates

    No full text
    The isolation level Multiversion Read Committed (RC), offered by many database systems, is known to trade consistency for increased transaction throughput. Sometimes, transaction workloads can be safely executed under RC obtaining the perfect isolation of serializability at the lower cost of RC. To identify such cases, we introduce an expressive model of transaction programs to better reason about the serializability of transactional workloads. We develop tractable algorithms to decide whether any possible schedule of a workload executed under RC is serializable (referred to as the robustness problem). Our approach yields robust subsets that are larger than those identified by previous methods. We provide experimental evidence that workloads that are robust against RC can be evaluated faster under RC compared to stronger isolation levels. We discuss techniques for making workloads robust against RC by promoting selective read operations to updates. Depending on the scenario, the performance improvements can be considerable. Robustness testing and safely executing transactions under the lower isolation level RC can therefore provide a direct way to increase transaction throughput without changing DBMS internal

    Schema Matching with Large Language Models: an Experimental Study

    No full text
    Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government

    Detecting Robustness against MVRC for Transaction Programs with Predicate Reads

    No full text
    The transactional robustness problem revolves around deciding whether, for a given workload, a lower isolation level than Serializable is sufficient to guarantee serializability. The paper presents a new characterization for robustness against isolation level (multi-version) Read Committed. It supports transaction programs with control structures (loops and conditionals) and inserts, deletes, and predicate reads-scenarios that trigger the phantom problem, which is known to be hard to analyze in this context. The characterization is graph-theoretic and not unlike previous decision mechanisms known from the concurrency control literature that database researchers and practicians are comfortable with. We show experimentally that our characterization pushes the frontier in allowing to recognize more and more complex workloads as robust than before

    Schema Matching with Large Language Models: an Experimental Study

    No full text
    Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government
    corecore