572 research outputs found

    MineDojo Internet Knowledge Base (Wiki)

    No full text
    Project website: minedojo.org Paper: arxiv.org/abs/2206.08853 GitHub: github.com/MineDojo/MineDojo The Minecraft Wiki pages cover almost every aspect of the game mechanics, and supply a rich source of unstructured knowledge in multimodal tables, recipes, illustrations, and step-by-step tutorials. We scrape 6,735 pages that interleave text, images, tables, and diagrams. To preserve the layout information, we also save the screenshots of entire pages and extract bounding boxes of the visual elements. There are two files in our Wiki knowledge base. wiki_samples.zip: A sample version of the full knowledge base (10 pages). wiki_full.zip: The full knowledge base (6,735 pages). Cite Us @article{fan2022minedojo, title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge}, author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar}, year = {2022}, journal = {arXiv preprint arXiv: Arxiv-2206.08853}

    MineDojo Internet Knowledge Base (Reddit)

    No full text
    Project website: minedojo.org Paper: arxiv.org/abs/2206.08853 GitHub: github.com/MineDojo/MineDojo We collect 340K+ Reddit posts along with 6.6M comments under the “r/Minecraft” subreddit. These posts ask questions on how to solve certain tasks, showcase cool architectures and achievements in image/video snippets, and discuss general tips and tricks for players of all expertise levels. Large language models can be finetuned on our Reddit corpus to internalize Minecraft-specific concepts and develop sophisticated strategies. Cite Us @article{fan2022minedojo, title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge}, author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar}, year = {2022}, journal = {arXiv preprint arXiv: Arxiv-2206.08853}

    Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

    No full text
    We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and propose a new approach for closed-loop system identification, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt, and prove the regret upper bound of O~(T)\tilde{\mathcal{O}}(\sqrt{T}) for adaptive control of linear quadratic Gaussian (LQG) systems, where TT is the time horizon of the problem.S. Lale is supported in part by DARPA PAI. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships

    Regret Minimization in Partially Observable Linear Quadratic Control

    No full text
    We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of O~(T2/3)\tilde{\mathcal{O}}(T^{2/3}) for ExpCommit, where TT is the time horizon of the problem.K. Azizzadenesheli is supported in part by Raytheon. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships

    Logarithmic regret bound in partially observable linear dynamical systems

    No full text
    We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (ADAPTON), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. ADAPTON estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, ADAPTON constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, ADAPTON improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that ADAPTON achieves a regret upper bound of polylog (T), after T time steps of agent-environment interaction. To the best of our knowledge, ADAPTON is the first algorithm that achieves polylog (T) regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.S. Lale is supported in part by DARPA PAI. K. Azizzadenesheli gratefully acknowledge the financial support of Raytheon and Amazon Web Services. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships
    corecore