Virginia Tech - Wake Forest University School of Biomedical Engineering & Sciences
Computer Science Technical Reports @Virginia TechNot a member yet
997 research outputs found
Sort by
Data Leak Detection As a Service: Challenges and Solutions
We describe a network-based data-leak detection (DLD)
technique, the main feature of which is that the detection
does not require the data owner to reveal the content of the
sensitive data. Instead, only a small amount of specialized
digests are needed. Our technique – referred to as the fuzzy
fingerprint – can be used to detect accidental data leaks due
to human errors or application flaws. The privacy-preserving
feature of our algorithms minimizes the exposure of sensitive
data and enables the data owner to safely delegate the
detection to others.We describe how cloud providers can offer
their customers data-leak detection as an add-on service
with strong privacy guarantees.
We perform extensive experimental evaluation on the privacy,
efficiency, accuracy and noise tolerance of our techniques.
Our evaluation results under various data-leak scenarios
and setups show that our method can support accurate
detection with very small number of false alarms, even
when the presentation of the data has been transformed. It
also indicates that the detection accuracy does not degrade
when partial digests are used. We further provide a quantifiable
method to measure the privacy guarantee offered by our
fuzzy fingerprint framework
The Poset Cover Problem
A partial order or poset P = (X,<) on a (finite) base set X determines the set L(P)
of linear extensions of P. The problem of computing, for a poset P, the cardinality
of L(P) is #P-complete. A set {P1, P2, . . . , Pk} of posets on X covers the set of linear
orders that is the union of the L(Pi). Given linear orders L1,L2, . . . ,Lm on X, the
Poset Cover problem is to determine the smallest number of posets that cover
{L1,L2, . . . ,Lm}. Here, we show that the decision version of this problem is NP-
complete. On the positive side, we explore the use of cover relations for finding
posets that cover a set of linear orders and present a polynomial-time algorithm
to find a partial poset cover
Adaptive Key Protection in Complex Cryptosystems with Attributes
In the attribute-based encryption (ABE) model, attributes (as opposed
to identities) are used to encrypt messages, and all the receivers
with qualifying attributes can decrypt the ciphertext. However, compromised
attribute keys may affect the communications of many users
who share the same access control policies. We present the notion of
forward-secure attribute-based encryption (fs-ABE) and give a concrete
construction based on bilinear map and decisional bilinear Diffie-Hellman
assumption. Forward security means that a compromised private key by
an adversary at time t does not break the confidentiality of the communication
that took place prior to t. We describe how to achieve both
forward security and encryption with attributes, and formally prove our
security against the adaptive chosen-ciphertext adversaries. Our scheme
is non-trivial, and the key size only grows polynomially with logN (where
N is the number of time periods). We further generalize our scheme
to support the individualized key-updating schedule for each attribute,
which provides a finer granularity for key management. Our insights on
the required properties that an ABE scheme needs to possess in order to be forward-secure compatible are useful beyond the specific fs-ABE construction
given. We raise an open question at the end of the paper on the
escrow problem of the master key in ABE schemes
A Practical Method to Estimate Information Content in the Context of 4D-Var Data Assimilation. I: Methodology
Data assimilation obtains improved estimates of the state of a physical system by combining
imperfect model results with sparse and noisy observations of reality. all observations used in data
assimilation are equally valuable. The ability to characterize the usefulness of different data points
is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the
design of future sensor systems.
This paper focuses on the four dimensional variational (4D-Var) data assimilation framework. Metrics
from information theory are used to quantify the contribution of observations to decreasing the
uncertainty with which the system state is known. We establish an interesting relationship between different
information-theoretic metrics and the variational cost function/gradient under Gaussian linear
assumptions. Based on this insight we derive an ensemble-based computational procedure to estimate
the information content of various observations in the context of 4D-Var. The approach is illustrated
on a nonlinear test problem. In the companion paper (Singh et al., 2012a) the methodology is applied
to a global chemical data assimilation experiment
A Practical Blended Analysis for Dynamic Features in JavaScript
The JavaScript Blended Analysis Framework is designed to
perform a general-purpose, practical combined static/dynamic
analysis of JavaScript programs, while handling dynamic
features such as run-time generated code and variadic func-
tions. The idea of blended analysis is to focus static anal-
ysis on a dynamic calling structure collected at runtime in
a lightweight manner, and to rene the static analysis us-
ing additional dynamic information. We perform blended
points-to analysis of JavaScript with our framework and
compare results with those computed by a pure static points-
to analysis. Using JavaScript codes from actual webpages
as benchmarks, we show that optimized blended analysis
for JavaScript obtains good coverage (86.6% on average per
website) of the pure static analysis solution and nds ad-
ditional points-to pairs (7.0% on average per website) con-
tributed by dynamically generated/loaded code
A Framework to Analyze the Performance of Load Balancing Schemes for Ensembles of Stochastic Simulations
Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, and where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model is consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25%, and the total processor idle time by 85%
GreenVis: Energy-Saving Color Schemes for Sequential Data Visualization on OLED Displays
The organic light emitting diode (OLED) display has recently become popular in the consumer electronics market. Compared with current LCD display technology, OLED is an emerging display technology that emits light by the pixels themselves and doesn’t need an external back light as the illumination source. In this paper, we offer an approach to reduce power consumption on OLED displays for sequential data visualization. First, we create a multi-objective optimization approach to find the most energy-saving color scheme for given visual perception difference levels. Second, we apply the model in two situations: pre-designed color schemes and auto generated color schemes. Third, our experiment results show that the energy-saving sequential color scheme can reduce power consumption by 17.2% for pre-designed color schemes. For auto-generated color schemes, it can save 21.9% of energy in comparison to the reference color scheme for sequential data
A Practical Method to Estimate Information Content in the Context of 4D-Var Data Assimilation. II: Application to Global Ozone Assimilation
Data assimilation obtains improved estimates of the state of a physical system by combining imperfect
model results with sparse and noisy observations of reality. Not all observations used in data assimilation
are equally valuable. The ability to characterize the usefulness of different data points is important
for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future
sensor systems.
In the companion paper (Sandu et al., 2012) we derive an ensemble-based computational procedure
to estimate the information content of various observations in the context of 4D-Var. Here we apply
this methodology to quantify the signal and degrees of freedom for signal information metrics of satellite observations used in a global chemical data assimilation problem with the GEOS-Chem chemical
transport model. The assimilation of a subset of data points characterized by the highest information
content yields an analysis comparable in quality with the one obtained using the entire data set
A Practical Blended Analysis for Dynamic Features in JavaScript
JavaScript is widely used in Web applications;
however, its dynamism renders static analysis ineffective. Our
JavaScript Blended Analysis Framework is designed to handle
JavaScript dynamic features. It performs a flexible combined
static/dynamic analysis. The blended analysis focuses static
analysis on a dynamic calling structure collected at runtime
in a lightweight manner, and refines the static analysis using
dynamic information. The framework is instantiated for points-to
analysis with stmt-level MOD analysis and tainted input analysis.
Using JavaScript codes from actual webpages as benchmarks,
we show that blended points-to analysis for JavaScript obtains
good coverage (86.6% on average per website) of the pure static
analysis solution and finds additional points-to pairs (7.0% on average
per website) contributed by dynamically generated/loaded
code. Blended tainted input analysis reports all 6 true positives
reported by static analysis, but without false alarms, and finds
three additional true positives
Architecture-Aware Optimization on a 1600-core Graphics Processor
The graphics processing unit (GPU) continues to
make significant strides as an accelerator in commodity cluster
computing for high-performance computing (HPC). For example,
three of the top five fastest supercomputers in the world, as
ranked by the TOP500, employ GPUs as accelerators. Despite this
increasing interest in GPUs, however, optimizing the performance
of a GPU-accelerated compute node requires deep technical
knowledge of the underlying architecture. Although significant
literature exists on how to optimize GPU performance on the
more mature NVIDIA CUDA architecture, the converse is true
for OpenCL on the AMD GPU.
Consequently, we present and evaluate architecture-aware optimizations
for the AMD GPU. The most prominent optimizations
include (i) explicit use of registers, (ii) use of vector types, (iii)
removal of branches, and (iv) use of image memory for global data.
We demonstrate the efficacy of our AMD GPU optimizations by
applying each optimization in isolation as well as in concert to
a large-scale, molecular modeling application called GEM. Via
these AMD-specific GPU optimizations, the AMD Radeon HD
5870 GPU delivers 65% better performance than with the wellknown
NVIDIA-specific optimizations