1,721,074 research outputs found

    The cache sketch: revisiting expiration-based caching in the age of cloud data management

    No full text
    The expiration-based caching model of the web is generally considered irreconcilable with the dynamic workloads of cloud database services, where expiration dates are not known in advance. In this paper, we present the Cache Sketch data structure which makes expiration-based caching of database records feasible with rich tunable consistency guarantees. The Cache Sketch enables database services to leverage the large existing caching infrastructure of content delivery networks, browser caches and web caches to provide low latency and high scalability. The Cache Sketch employs Bloom filters to create compact representations of potentially stale records to transfer the task of cache coherence to clients. Furthermore, it also minimizes the number of invalidations the service has to perform on caches that support them (e.g., CDNs). With different age-control policies the Cache Sketch achieves very high cache hit ratios with arbitrarily low stale read probabilities. We present the Constrained Adaptive TTL Es- timator to provide cache expiration dates that optimize the performance of the Cache Sketch and invalidations. To quantify the performance gains and to derive workloadoptimal Cache Sketch parameters, we introduce the YCSB Monte-Carlo Caching Simulator (YMCA), a generic framework for simulating the performance and consistency characteristics of any caching and replication topology. We also provide empirical evidence for the efficiency of the Cache Sketch construction and the real-world latency reductions of database workloads under CDN-caching

    Who watches the watchmen? on the lack of validation in nosql benchmarking

    No full text
    There are numerous approaches towards quantifying the performance of NoSQL datastores with respect to dimensions that are notoriously hard to capture such as staleness or consistency in general. Many of these approaches, though, are built on assumptions regarding the underlying infrastructure or the test scenario and may lead to invalid results, if those assumptions do not hold. As a consequence, in-depth knowledge of both the system under test and the benchmarking procedure is required to prevent misleading results. In this paper, we want to make the case for more experimental validation in NoSQL benchmarking to uncover the bounds of existing benchmarking approaches

    Skalierbare Push-basierte Echtzeitanfragen auf Pull-basierten Datenbanken

    No full text
    Many of today's web applications notify users of status updates and other events in realtime. But even though more and more usage scenarios evolve around the interaction between users, detecting and publishing changes remains notoriously hard even with state-of-the-art data management systems. While traditional database systems excel at complex queries over historical data, they are inherently pull-based and therefore ill-equipped to push new information to clients. Systems for data stream management and processing, on the other hand, are natively push-oriented and thus facilitate reactive behavior. However, they do not retain data indefinitely and are therefore not able to answer historical queries. The separation between these two system classes gives rise to both high complexity and high maintenance costs for applications that require persistence and real-time change notifications at the same time. How can push-based access be enabled for database queries over historical data collections in a simple and efficient manner? In this thesis, we explore the system space between pull-oriented database systems and push-oriented stream management systems. Specifically, we focus on the novel system class of real-time databases that bridge the gap between both paradigms by providing collection-based semantics for pull-based and push-based queries alike. Through an in-depth system survey, we uncover deficiencies in existing implementations and scale-prohibitive limitations in their respective designs. In order to address these issues, we propose the system design InvaliDB which makes push-based real-time queries available as an opt-in feature for existing pull-based database systems. InvaliDB exhibits several substantial benefits over current real-time database architectures. First, it avoids the scalability bottlenecks that other systems are constrained by through a novel two-dimensional workload partitioning scheme. Second, our design supports more expressive queries than its peers, including sorted filter queries with limit and offset clauses, aggregations, and joins. Third, InvaliDB is database-agnostic through a pluggable query engine and can therefore be applied to existing (pull-based) application stacks in order to enable push-based data access. We provide an experimental evaluation to demonstrate that sustainable query matching throughput scales linearly with the number of servers employed for query matching, while end-to-end notification latency remains consistently low across all InvaliDB configurations. A detailed case study of our InvaliDB prototype in a production deployment further illustrates that our approach is feasible to implement, enables easy-to-use query interfaces, and is practically useful for data-intensive industry applications.Heutzutage informieren viele Webapplikationen Benutzer über Status-Updates und andere Ereignisse in Echtzeit. Aber auch wenn die Interaktion zwischen Nutzern immer häufiger in den Vordergrund rückt, so sind selbst moderne Datenverwaltungssysteme nur bedingt zur Erkennung und Propagierung von Zustandsänderungen in der Lage. Während traditionelle Datenbanken für komplexe Anfragen über historische Daten konzipiert wurden, sind sie inhärent Pull-basiert und bieten daher nur eingeschränkte Unterstützung für proaktive Datenzugriffsmuster. Systeme für Datenstromverwaltung und -verarbeitung sind dagegen Push-orientiert und ermöglichen so reaktives Verhalten. Sie speichern Daten jedoch nur für begrenzte Zeit und können folglich keine historischen Anfragen beantworten. Die Trennung zwischen diesen beiden Systemklassen bedingt sowohl hohe Komplexität als auch hohe Wartungskosten bei Anwendungen, die gleichzeitig Persistenz und Echtzeitbenachrichtigungen bei Zustandsänderungen benötigen. Wie kann Push-basierter Zugriff für Anfragen über historische Daten simpel und effizient ermöglicht werden? In dieser Arbeit untersuchen wir das Spektrum zwischen Pull-orientierten Datenbanksystemen und Push-orientierten Systemen zur Datenstromverwaltung. Insbesondere konzentrieren wir uns auf die neuartige Systemklasse der Echtzeitdatenbanken (real-time databases). Systeme dieser Klasse schließen die Kluft zwischen beiden Paradigmen, indem sie die für Datenbanksysteme übliche Collection-basierte Semantik für traditionelle Pull-basierte Anfragen sowie für Push-basierte Echtzeitanfragen (real-time queries) unterstützen. Durch eine detaillierte Analyse aktueller Systeme decken wir Mängel in konkreten Implementationen sowie konzeptionelle Limitationen in den jeweiligen Architekturen auf. Zur Lösung dieser Probleme schlagen wir das Systemdesign InvaliDB vor, welches Push-basierte Echtzeitanfragen als Opt-in-Feature für existierende Pull-basierte Datenbanksysteme bereitstellt. InvaliDB verfügt über mehrere wesentliche Vorteile gegenüber bestehenden Echtzeitdatenbankarchitekturen. Erstens vermeidet es Flaschenhälse, die die Skalierbarkeit anderer Systeme einschränken, durch ein neuartiges Konzept zur zweidimensionalen Lastverteilung. Zweitens unterstützt unser Design mächtigere Echtzeitanfragen als bestehende Systeme, darunter sortierte Filteranfragen mit Limit- und Offsetklauseln, Aggregationen und Joins. Drittens abstrahiert InvaliDB durch eine austauschbare Komponente zur Anfrageverarbeitung (pluggable query engine) von konkreten Datenbanktechnologien und kann daher auch bestehende (Pull-basierte) Anwendungsstacks um Push-basierte Datenzugriffsmechanismen erweitern. In einer experimentellen Evaluation demonstrieren wir, dass der für eine InvaliDB-Instanz tragbare Durchsatz bei der Anfrageverarbeitung (sustainable query matching throughput) linear mit der Anzahl der für die Anfrageverarbeitung eingesetzten Server skaliert, wobei die Ende-zu-Ende-Latenz über alle InvaliDB-Konfigurationen hinweg konstant niedrig bleibt. Eine detaillierte Fallstudie über unseren InvaliDB-Prototypen im Produktionsbetrieb zeigt darüber hinaus, dass unser Ansatz mit überschaubarem Aufwand implementierbar ist, simple Anfrageschnittstellen ermöglicht und in datenintensiven Industrieanwendungen praktisch einsetzbar ist

    Skalierbare Push-basierte Echtzeitanfragen für Pull-basierte DBs

    No full text
    Traditionelle Datenbanksysteme sind für Pull-basierte Anfragen optimiert, d.h. sie stellen Informationen als direkte Antwort auf Anfrage eines Klienten zur Verfügung. Dieses Zugriffsmuster ist zwar für überwiegend statische Domänen praktikabel, erfordert allerdings ineffiziente und langsame Workarounds (z.B. periodische Neuauswertung einer Anfrage), wenn die Klienten auf dem neuesten Stand gehalten werden müssen. Moderne Echtzeitdatenbanken beheben diesen Mangel zwar konzeptuell, indem sie Ergebnisaktualisierungen durch Push-basierte Echtzeitanfragen proaktiv an ihre Klienten ausliefern. Die derzeitig auf dem Markt befindlichen Systeme sind jedoch nur von begrenzter praktischer Relevanz, da sie schwer in bestehende Anwendungen zu integrieren sind, mangelhafte Skalierbarkeit aufweisen oder komplexe Anfragen von vornherein nicht unterstützen. Um diese Probleme zu lösen, schlagen wir in dieser Dissertation das Systemdesign InvaliDB vor, welches lineare Lese- und Schreibskalierbarkeit für ausdrucksmächtige Echtzeitanfragen als Optin-Feature für Pull-basierte Datenbanksysteme bereitstellt. InvaliDB befindet sich seit Juli 2017 im produktiven Einsatz als Teil der Backend-as-a-Service-Plattform der Firma Baqend

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore