1,721,110 research outputs found
An experimental exploration of Marsaglia's xorshift Generators, Scrambled
Marsaglia proposed xorshift generators are a class of very fast, good-quality pseudorandom number generators. Subsequent analysis by Panneton and L'Ecuyer has lowered the expectations raised by Marsaglia's article, showing several weaknesses of such generators. Nonetheless, many of the weaknesses of xorshift generators fade away if their result is scrambled by a nonlinear operation (as originally suggested by Marsaglia). In this article we explore the space of possible generators obtained by multiplying the result of a xorshift generator by a suitable constant. We sample generators at 100 points of their state space and obtain detailed statistics that lead us to choices of parameters that improve on the current ones. We then explore for the first time the space of high-dimensional xorshift generators, following another suggestion in Marsaglia's article, finding choices of parameters providing periods of length 21024 - 1 and 24096 - 1. The resulting generators are of extremely high quality, faster than current similar alternatives, and generate long-period sequences passing strong statistical tests using only eight logical operations, one addition, and one multiplication by a constant
Spectral ranking
We sketch the history of spectral ranking - a general umbrella name for techniques that apply the theory of linear maps (in particular, eigenvalues and eigenvectors) to matrices that do not represent geometric transformations, but rather some kind of relationship between entities. Albeit recently made famous by the ample press coverage of Google's PageRank algorithm, spectral ranking was devised more than 60 years ago, almost exactly in the same terms, and has been studied in psychology, social sciences, bibliometrics, economy, and choice theory. We describe the contribution given by previous scholars in precise and modern mathematical terms: Along the way, we show how to express in a general way damped rankings, such as Katz's index, as dominant eigenvectors of perturbed matrices, and then use results on the Drazin inverse to go back to the dominant eigenvectors by a limit process. The result suggests a regularized definition of spectral ranking that yields for a general matrix a unique vector depending on a boundary condition
fastutil 5.0
fastutil extends the JavaTM Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; it also includes a fast I/O API for binary and text files
Further scramblings of Marsaglia's xorshift generators
xorshift* generators are a variant of Marsaglia's xorshift generators that eliminate linear artifacts typical of generators based on Z/2Z-linear operations using multiplication by a suitable constant. Shortly after high-dimensional xorshift* generators were introduced, Saito and Matsumoto suggested a different way to eliminate linear artifacts based on addition in Z/232Z, leading to the XSadd generator. Starting from the observation that the lower bits of XSadd are very weak, as its reverse fails several statistical tests, we explore variants of XSadd using 64-bit operations, and describe in detail xorshift128+, an extremely fast generator that passes strong statistical tests using only three shifts, four xors and an addition
Quasi-succinct indices
Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller gaps using less bits. Additional data such as counts and positions is stored using similar techniques. A large body of research has been built in the last 30 years around gap compression, including theoretical modeling of the gap distribution, specialized instantaneous codes suitable for gap encoding, and ad hoc document reorderings which increase the efficiency of instantaneous codes. This paper proposes to represent an index using a different architecture based on quasi-succinct representation of monotone sequences. We show that, besides being theoretically elegant and simple, the new index provides expected constant-time operations, space savings, and, in practice, significant performance improvements on conjunctive, phrasal and proximity queries. © 2013 ACM
The topos of labelled trees : a categorical semantics for SCCS
The work presented in this paper starts from the observation that labelled trees together with labeland
parenthood-preserving morphisms form a topos: a slice category of the topos of sheaves over
the poset (!, ) for the topology generated by the empty cover for 0.
After a very well presented and easily accessible exposition of this idea two applications are
elaborated. First, it is shown how the use of topos theory simplifies the interpretation of SCCS
using labelled trees, in particular, the treatment of guarded recursion.
Second, a new characterisation of bisimulation in terms of morphisms is presented: two trees
S, T are bisimilar iff there are conflict-preserving morphisms f: S ! T and g: T ! S such that
fgf = f and gfg = g. In elementary terms (distilled from Propositions 6.1 and 6.2), a morphism
f: S !T is conflict-preserving if whenever x!a y in S and f(x) = f(x0) then x0 !a y0 for some y0
with f(y) = f(y0).
The proof of the characterisation of bisimulation in terms of conflict-preserving morphisms
makes essential use of the topos-theoretic view.
The presentation is very understandable and does not require prior knowledge of topos theory
WebGraph: things you thought you could not do with Java
Studying web graphs is often difficult due to their large size. The WebGraph framework is a suite of codes, algorithms and tools that make it easy to manipulate large web graphs, and to store them in a limited space, by exploiting the inner redundancies of the web. WebGraph is based on sophisticated bitwise compression techniques, and functional-style lazy constructions. Common wisdom would say that the most unlikely language to implement such a framework is Java. We are going to tell you the real story
MG4J (Managing Gigabyte for Java) 1.1
MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. As a by-product, it offers several general-purpose optimised classes, including fast & compact mutable strings, bit-level I/O, (possibly signed) minimal perfect hashing for very large strings collections, etc
Universal dynamic synchronous self-stabilization
We prove the existence of a "universal" synchronous self-stabilizing protocol, that is, a protocol that allows a distributed system to stabilize to a desired nonreactive behaviour (as long as a protocol stabilizing to that behaviour exists). Previous proposals required drastic increases in asymmetry and knowledge to work, whereas our protocol does not use any additional knowledge, and does not require more symmetry-breaking conditions than available; thus, it is also stabilizing with respect to dynamic changes in the topology. We prove an optimal quiescence time n + D for a synchronous network of n processors and diameter D; the protocol can be made finite state with a negligible loss in quiescence time. Moreover, an optimal D + 1 protocol is given for the case of unique identifiers. As a consequence, we provide an effective proof technique that allows to show whether self-stabilization to a certain behaviour is possible under a wide range of models
- …
