572 research outputs found
MineDojo Internet Knowledge Base (Wiki)
Project website: minedojo.org
Paper: arxiv.org/abs/2206.08853
GitHub: github.com/MineDojo/MineDojo
The Minecraft Wiki pages cover almost every aspect of the game mechanics, and supply a rich source of unstructured knowledge in multimodal tables, recipes, illustrations, and step-by-step tutorials. We scrape 6,735 pages that interleave text, images, tables, and diagrams. To preserve the layout information, we also save the screenshots of entire pages and extract bounding boxes of the visual elements.
There are two files in our Wiki knowledge base.
wiki_samples.zip: A sample version of the full knowledge base (10 pages).
wiki_full.zip: The full knowledge base (6,735 pages).
Cite Us
@article{fan2022minedojo,
title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2206.08853}
MineDojo Internet Knowledge Base (Reddit)
Project website: minedojo.org
Paper: arxiv.org/abs/2206.08853
GitHub: github.com/MineDojo/MineDojo
We collect 340K+ Reddit posts along with 6.6M comments under the “r/Minecraft” subreddit. These posts ask questions on how to solve certain tasks, showcase cool architectures and achievements in image/video snippets, and discuss general tips and tricks for players of all expertise levels. Large language models can be finetuned on our Reddit corpus to internalize Minecraft-specific concepts and develop sophisticated strategies.
Cite Us
@article{fan2022minedojo,
title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2206.08853}
Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems
We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and propose a new approach for closed-loop system identification, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt, and prove the regret upper bound of for adaptive control of linear quadratic Gaussian (LQG) systems, where is the time horizon of the problem.S. Lale is supported in part by DARPA PAI. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships
Regret Minimization in Partially Observable Linear Quadratic Control
We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of for ExpCommit, where is the time horizon of the problem.K. Azizzadenesheli is supported in part by Raytheon. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships
Logarithmic regret bound in partially observable linear dynamical systems
We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (ADAPTON), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. ADAPTON estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, ADAPTON constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, ADAPTON improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that ADAPTON achieves a regret upper bound of polylog (T), after T time steps of agent-environment interaction. To the best of our knowledge, ADAPTON is the first algorithm that achieves polylog (T) regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.S. Lale is supported in part by DARPA PAI. K. Azizzadenesheli gratefully acknowledge the financial support of Raytheon and Amazon Web Services. B. Hassibi is supported in part by the National Science Foundation under grants CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASA’s Jet Propulsion Laboratory through the President and Director’s Fund, and by King Abdullah University of Science and Technology. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships
Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods
Recommended from our members
Efficient approaches for escaping higher order saddle points in non-convex optimization
Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in high dimensions. Many functions have degenerate saddle points such that the first and second order derivatives cannot distinguish them with local optima. In this paper we use higher order derivatives to escape these saddle points: we design the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order). We also show that it is NP-hard to extend this further to finding fourth order local optima
Recommended from our members
Training Input-Output Recurrent Neural Networks through Spectral Methods
We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computational complexity under transparent conditions such as non-degeneracy of model parameters, polynomial activations for the neurons, and a Markovian evolution of the input sequence. We also extend our results to Bidirectional RNN which uses both previous and future information to output the label at each time point, and is employed in many NLP tasks such as POS tagging
Recommended from our members
Provable Methods for Training Neural Networks with Sparse Connectivity
We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity. We leverage on the techniques developed previously for learning linear networks and show that they can also be effectively adopted to learn non-linear networks. We operate on the moments involving label and the score function of the input, and show that their factorization provably yields the weight matrix of the first layer of a deep network under mild conditions. In practice, the output of our method can be employed as effective initializers for gradient descent
- …
