Learning to play Minecraft with Video PreTraining

We skilled a neural community to play Minecraft by Video PreTraining (VPT) on an enormous unlabeled video dataset of human Minecraft play, whereas utilizing solely a small quantity of labeled contractor knowledge. With fine-tuning, our mannequin can be taught to craft diamond instruments, a activity that normally takes proficient people over 20 minutes (24,000 actions). Our mannequin makes use of the native human interface of keypresses and mouse actions, making it fairly basic, and represents a step in direction of basic computer-using brokers.