1,721,106 research outputs found

    Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support

    No full text
    Universal probabilistic programming systems (PPSs) provide a powerful framework for specifying rich and complex probabilistic models. They further attempt to automate the process of drawing inferences from these models, but doing this successfully is severely hampered by the wide range of non-standard models they can express. As a result, although one can specify complex models in a universal PPS, the provided inference engines often fall far short of what is required. In particular, we show they produce surprisingly unsatisfactory performance for models where the support may vary between executions, often doing no better than importance sampling from the prior. To address this, we introduce a new inference framework: Divide, Conquer, and Combine, which remains efficient for such models, and show how it can be implemented as an automated and general-purpose PPS inference engine. We empirically demonstrate substantial performance improvements over existing approaches on two examples

    Modality-Agnostic Variational Compression of Implicit Neural Representations

    No full text
    We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. This allows the specialisation of a shared INR network to each data item through subnetwork selection. After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression. Variational Compression of Implicit Neural Representations (VC-INR) shows improved performance given the same representational capacity pre quantisation while also outperforming previous quantisation schemes used for other INR techniques. Our experiments demonstrate strong results over a large set of diverse modalities using the same algorithm without any modality-specific inductive biases. We show results on images, climate data, 3D shapes and scenes as well as audio and video, introducing VC-INR as the first INR-based method to outperform codecs as well-known and diverse as JPEG 2000, MP3 and AVC/HEVC on their respective modalities

    On the stick-breaking representation for homogeneous NRMIs

    Full text link
    In this paper, we consider homogeneous normalized random measures with independent increments (hNRMI), a class of nonparametric priors recently introduced in the literature. Many of their distributional properties are known by now but their stick-breaking representation is missing. Here we display such a representation, which will feature dependent stick-breaking weights, and then derive explicit versions for noteworthy special cases of hNRMI. Posterior characterizations are also discussed. Finally, we devise an algorithm for slice sampling mixture models based on hNRMIs, which relies on the representation we have obtained, and implement it to analyze real data

    LEARNING TO CONTEXTUALIZE WEB PAGES FOR ENHANCED DECISION MAKING BY LLM AGENTS

    No full text
    Recent advances in large language models (LLMs) have led to a growing interest in developing LLM-based agents for automating web tasks. However, these agents often struggle with even simple tasks on real-world websites due to their limited capability to understand and process complex web page structures. In this work, we introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form, thereby enhancing decision making by LLM agents. LCoW decouples web page understanding from decision making by training a separate contextualization module to transform complex web pages into comprehensible format, which are then utilized by the decision-making agent. We demonstrate that our contextualization module effectively integrates with LLM agents of various scales to significantly enhance their decision-making capabilities in web automation tasks. Notably, LCoW improves the success rates of closed-source LLMs (e.g., Gemini-1.5-flash, GPT-4o, Claude-3.5-Sonnet) by an average of 15.6%, and demonstrates a 23.7% average improvement in success rates for open-source LMs (e.g., Llama-3.1-8B, Llama-3.1-70B) on the WorkArena benchmark. Moreover, the Gemini-1.5-flash agent with LCoW achieves state-of-the-art results on the WebShop benchmark, outperforming human experts. The relevant code materials are available at our project page: https://lcowiclr2025.github.io

    Optimal allocation strategies for the dark pool problem

    No full text
    We study the problem of allocating stocks to dark pools. We propose and analyze an optimal approach for allocations, if continuous-valued allocations are allowed. We also propose a modification for the case when only integer-valued allocations are possible. We extend the previous work on this problem (Ganchev et al., 2009) to adversarial scenarios, while also improving on their results in the iid setup. The resulting algorithms are efficient, and perform well in simulations under stochastic and adversarial inputs.

    Interpretable Models in Probabilistic Machine Learning

    Full text link
    This thesis describes contributions to the field of interpretable models in probabilistic machine learning, by first outlining the desiderata and properties associated with the term interpretability. We claim that probabilistic models are suitable candidates for interpretable machine learning, and this claim is supported by examples of such models that satisfy two key properties of interpretability: transparency, that offers an understanding of the model's mechanism, and post-hoc interpretability, that gives other useful information about the model after training, such as explaining its predictions. Henceforth, we introduce relevant background literature in probabilistic machine learning, focusing on Bayesian inference of probabilistic models. Armed with these pre-requisites, we proceed to describe examples of probabilistic models that enjoy various interpretable properties. First, we propose a method for regression that is motivated from Gaussian Processes (GPs), that has applications to collaborative filtering with side-information and generalises classic probabilistic matrix factorisation methods in this context. Second, we develop a scalable algorithm for automated GP model selection, whereby the form of the selected GP models allows them to be translated into a natural language description of its properties. Third, we introduce a Variational Autoencoder (VAE) model that can disentangle independent factors of variations in a dataset of images by learning a factorisable latent distribution in an unsupervised fashion. Finally, we describe a model that can learn stochastic processes in a data-driven fashion with deep architectures by using the concept of attention

    Modelling, inference and optimization in probabilistic machine learning

    Full text link
    Bayesian machine learning has gained tremendous attention in the machine learning community over the past few years. Bayesian methods offer a coherent reasoning for quantifying uncertainties in the decision making procedure, based on the Bayes rule. One of the core advantages of Bayesian methods is the separation of modelling and inference. In other words, the likelihood models are completely independent of the computation of the posterior distribution of the parameters. There are many Bayesian models that are widely used in the machine learning community. For example, non-parametric models such as Gaussian Processes and Dirichlet Processes are flexible models which are able to capture and learn the structure of the data. Bayesian deep learning models, which are based on neural networks, are another example of flexible Bayesian models that are rich enough to represent non-linear structures in the data. The process of inferring the posterior lies at the center of Bayesian inference. When computing the posterior distribution exactly is not feasible, due to intractability of the posterior and the computational or memory constraints, approximate Bayesian inference comes to play. In this PhD thesis, I develop and investigate various Bayesian modelling and inference techniques and apply them to multiple interesting domains and tasks. We begin with Tucker Gaussian Processes(TGP), a class of flexible non-parametric models based on Gaussian Processes (GP). We apply the method to 1) regression problems on structured input data, and 2) collaborative filtering problems where TGP offers an elegant way of incorporating side information. We demonstrate superior results compared with benchmarks on a number of examples across different domains. A closely related line of research based on GPs is Bayesian Optimization (BO). It is a black-box optimizer where one optimizes an objective function through subsequent queries about next input locations to be evaluated at. However, this method does not work well when the input space is non-Euclidean or combinatorial. We alleviate the problem by learning a low dimensional Euclidean representation of the combinatorial input space with variational inference, using Variational Auto-encoder (VAE). The optimization can then be conducted on the low dimensional embedding instead. We apply our method to Automatic Statistician and natural scene understanding, which give promising results. For approximate Bayesian inference, we first propose an algorithm called Relativistic Hamiltonian Monte Carlo (RHMC) which is a variant of MCMC. In particular, we replace Newton’s kinetic energy in the Hamiltonian with Einstein’s relativistic kinetic energy, which makes the algorithm more robust. There are several extensions to RHMC, including a stochastic gradient version for scalability, a thermostat version based on the temperature of the physical system and a resulting optimization algorithm which gives comparable performance compared with the state-of-the-art. Finally, we propose another sampling based inference method called the Adaptive Importance Sampling with Exploration and Exploitation (Daisee), where we look into the problem of exploration-exploitation in adaptive importance sampling through establishing a natural connection between importance sampling and multi-armed bandit problem. In particular, through a finite-time regret analysis we show that the regret of the proposed algorithm grows sublinearly with time. Further, we propose a hierarchical extension of Daisee to encourage exploration in the region with high uncertainty. The new models proposed in this thesis help to allow for more flexible Bayesian modelling and the inference techniques introduced can open new research directions for efficient and accurate posterior inference. These contribute to Bayesian inference and probabilistic machine learning
    corecore