1,720,987 research outputs found
Optimizing Concurrency Control: Robustness Against Read Committed Revisited
While serializability always guarantees application correctness, lower isolation levels can be
chosen to improve transaction throughput at the risk of introducing certain anomalies. A set
of transactions is robust against a given isolation level if every possible interleaving of the
transactions under the specified isolation level is serializable. Robustness therefore always
guarantees application correctness with the performance benefit of the lower isolation level.
In this thesis, we focus on robustness against Read Committed (RC), a popular isolation
level o↵ered by many database systems. We consider di↵erent flavours of RC, both in single
version (SVRC) and multiversion (MVRC) settings. The first main contribution of this thesis
is a characterization of robustness against both isolation levels when the workload consists of
a set of transactions. These characterizations are in terms of the absence of counterexample
schedules of a specific form (multi-split schedules for SVRC and multiversion split schedules
for MVRC). Based on these characterizations, we show that robustness against SVRC is
coNP-hard and provide a polynomial time algorithm deciding robustness against MVRC.
In practice, the total number of possible transactions might be huge, but all these transactions are often generated by a small set of predefined transaction programs. The second
main contribution of this thesis is a study of robustness against MVRC where workloads are
specified as a set of transaction programs. To identify such cases, we introduce an expressive model of transaction programs, called transaction templates, to better reason about the
serializability of these workloads. We develop a tractable algorithm to decide whether any
possible schedule over such a workload executed under MVRC is serializable. Our approach
yields robust subsets that are larger than those identified by previous methods. We provide
experimental evidence that workloads that are robust against MVRC can be evaluated faster
under MVRC compared to stronger isolation levels. We discuss techniques for making workloads robust against MVRC by promoting selective read operations to updates. Depending
on the scenario, the performance improvements can be considerable. Robustness testing and
safely executing transactions under the lower isolation level MVRC can therefore provide a
direct way to increase transaction throughput without changing DBMS internals.
An important insight is that, by more accurately modeling transaction programs, we are
able to recognize larger sets of workloads as robust. The third main contribution of this thesis
is a further increase of the modeling power of transaction templates by extending them with
functional constraints, which are useful for capturing data dependencies like foreign keys. We
show that the incorporation of functional constraints can identify more workloads as robust
against MVRC that otherwise would not be. Even though we establish that the robustness
problem becomes undecidable in its most general form, we show that various restrictions on
functional constraints lead to decidable and even tractable fragments that can be used to
model and test for robustness against MVRC for realistic scenarios
Optimizing Concurrency Control: Robustness Against Read Committed Revisited
While serializability always guarantees application correctness, lower isolation levels can be
chosen to improve transaction throughput at the risk of introducing certain anomalies. A set
of transactions is robust against a given isolation level if every possible interleaving of the
transactions under the specified isolation level is serializable. Robustness therefore always
guarantees application correctness with the performance benefit of the lower isolation level.
In this thesis, we focus on robustness against Read Committed (RC), a popular isolation
level o↵ered by many database systems. We consider di↵erent flavours of RC, both in single
version (SVRC) and multiversion (MVRC) settings. The first main contribution of this thesis
is a characterization of robustness against both isolation levels when the workload consists of
a set of transactions. These characterizations are in terms of the absence of counterexample
schedules of a specific form (multi-split schedules for SVRC and multiversion split schedules
for MVRC). Based on these characterizations, we show that robustness against SVRC is
coNP-hard and provide a polynomial time algorithm deciding robustness against MVRC.
In practice, the total number of possible transactions might be huge, but all these transactions are often generated by a small set of predefined transaction programs. The second
main contribution of this thesis is a study of robustness against MVRC where workloads are
specified as a set of transaction programs. To identify such cases, we introduce an expressive model of transaction programs, called transaction templates, to better reason about the
serializability of these workloads. We develop a tractable algorithm to decide whether any
possible schedule over such a workload executed under MVRC is serializable. Our approach
yields robust subsets that are larger than those identified by previous methods. We provide
experimental evidence that workloads that are robust against MVRC can be evaluated faster
under MVRC compared to stronger isolation levels. We discuss techniques for making workloads robust against MVRC by promoting selective read operations to updates. Depending
on the scenario, the performance improvements can be considerable. Robustness testing and
safely executing transactions under the lower isolation level MVRC can therefore provide a
direct way to increase transaction throughput without changing DBMS internals.
An important insight is that, by more accurately modeling transaction programs, we are
able to recognize larger sets of workloads as robust. The third main contribution of this thesis
is a further increase of the modeling power of transaction templates by extending them with
functional constraints, which are useful for capturing data dependencies like foreign keys. We
show that the incorporation of functional constraints can identify more workloads as robust
against MVRC that otherwise would not be. Even though we establish that the robustness
problem becomes undecidable in its most general form, we show that various restrictions on
functional constraints lead to decidable and even tractable fragments that can be used to
model and test for robustness against MVRC for realistic scenarios
Allocating Isolation Levels to Transactions in a Multiversion Setting
A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe to do so. However, such trading of consistency for efficiency does not come with any safety guarantees. In this paper, we study the mixed robustness problem which asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation is always serializable. That is, whether the given allocation is indeed safe. While robustness has already been studied in the literature for the homogeneous setting where all transactions are allocated the same isolation level, the heterogeneous setting that we consider in this paper, despite its practical relevance, has largely been ignored. We focus on multiversion concurrency control and consider the isolation levels that are available in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). We show that the mixed robustness problem can be decided in polynomial time. In addition, we provide a polynomial time algorithm for computing the optimal robust allocation for a given set of transactions, prioritizing lower over higher isolation levels. The present results therefore establish the groundwork to automate isolation level allocation within existing databases supporting multiversion concurrency control.This work is funded by FWO-grant G019921N
Allocating Isolation Levels to Transactions in a Multiversion Setting
A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe to do so. However, such trading of consistency for efficiency does not come with any safety guarantees. In this paper, we study the mixed robustness problem which asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation is always serializable. That is, whether the given allocation is indeed safe. While robustness has already been studied in the literature for the homogeneous setting where all transactions are allocated the same isolation level, the heterogeneous setting that we consider in this paper, despite its practical relevance, has largely been ignored. We focus on multiversion concurrency control and consider the isolation levels that are available in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). We show that the mixed robustness problem can be decided in polynomial time. In addition, we provide a polynomial time algorithm for computing the optimal robust allocation for a given set of transactions, prioritizing lower over higher isolation levels. The present results therefore establish the groundwork to automate isolation level allocation within existing databases supporting multiversion concurrency control
Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds
When View- and Conflict-Robustness Coincide for Multiversion Concurrency Control
A DBMS allows trading consistency for efficiency through the allocation of isolation levels that are strictly weaker than serializability. The robustness problem asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation, is always safe. In the literature, safe is interpreted as conflict-serializable (to which we refer here as conflict-robustness). In this paper, we study the view-robustness problem, interpreting safe as view-serializable. View-serializability is a more permissive notion that allows for a greater number of schedules to be serializable and aligns more closely with the intuitive understanding of what it means for a database to be consistent. However, view-serializability is more complex to analyze (e.g., conflict-serializability can be decided in polynomial time whereas deciding view-serializability is NP-complete). While conflict-robustness implies view-robustness, the converse does not hold in general. In this paper, we provide a sufficient condition for isolation levels guaranteeing that conflict- and view-robustness coincide and show that this condition is satisfied by the isolation levels occurring in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). It hence follows that for these systems, widening from conflict- to view-serializability does not allow for more sets of transactions to become robust. Interestingly, the complexity of deciding serializability within these isolation levels is still quite different. Indeed, deciding conflict-serializability for schedules allowed under RC and SI remains in polynomial time, while we show that deciding view-serializability within these isolation levels remains NP-complete
Robustness against read committed for transaction templates
The isolation level Multiversion Read Committed (RC), offered by
many database systems, is known to trade consistency for increased
transaction throughput. Sometimes, transaction workloads can be
safely executed under RC obtaining the perfect isolation of serializability at the lower cost of RC. To identify such cases, we introduce
an expressive model of transaction programs to better reason about
the serializability of transactional workloads. We develop tractable
algorithms to decide whether any possible schedule of a workload
executed under RC is serializable (referred to as the robustness
problem). Our approach yields robust subsets that are larger than
those identified by previous methods. We provide experimental evidence that workloads that are robust against RC can be evaluated
faster under RC compared to stronger isolation levels. We discuss
techniques for making workloads robust against RC by promoting
selective read operations to updates. Depending on the scenario,
the performance improvements can be considerable. Robustness
testing and safely executing transactions under the lower isolation
level RC can therefore provide a direct way to increase transaction
throughput without changing DBMS internal
Schema Matching with Large Language Models: an Experimental Study
Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality,
verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government
Detecting Robustness against MVRC for Transaction Programs with Predicate Reads
The transactional robustness problem revolves around deciding whether, for a given workload, a lower isolation level than Serializable is sufficient to guarantee serializability. The paper presents a new characterization for robustness against isolation level (multi-version) Read Committed. It supports transaction programs with control structures (loops and conditionals) and inserts, deletes, and predicate reads-scenarios that trigger the phantom problem, which is known to be hard to analyze in this context. The characterization is graph-theoretic and not unlike previous decision mechanisms known from the concurrency control literature that database researchers and practicians are comfortable with. We show experimentally that our characterization pushes the frontier in allowing to recognize more and more complex workloads as robust than before
Schema Matching with Large Language Models: an Experimental Study
Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality,
verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government
- …
