Learning to Follow Verbal Instructions

Co-PIs: Marie desJardins, Michael Littman, Smaranda Muresan
UMBC students: James MacGlashan, Kevin Winner, Richard Adjogah
Rutgers students: Monica Babeş-Vroman, Ruoyuan Gao

We address the problem of training an artificial agent to follow verbal instructions. Our training data is a set of natural language instructions paired with demonstrations of optimal behavior. From the behavior, the agent obtains a reward function associated with each task by Inverse Reinforcement Learning. From the verbal instructions, the agent obtains a parsing of each sentence by Semantic Parsing. We add an abstraction component that enables the agent to learn parameterized tasks.


Learning how to follow verbal instructions comes naturally to humans, but it has been a challenging problem to automate. In this project, we are addressing the following problem: given a verbal instruction, what is the sequence of actions that the instructed agent needs to perform to successfully carry out the corresponding task? Such an agent is faced with many challenges: what is the meaning of the spoken sentence and how should it be parsed? How do words map to objects in the real world? Once the task is identified, how should the task be executed? Once the task is learned and executed, how can it be generalized to a new context, with objects and properties that the agent has not seen in the past? How should the agent's learning be evaluated?

Preliminary Experiments
We created a simple Sokoban-style environment for our agent consisting of several rooms and objects. Depending on the task, the agent must manipulate the objects and/or itself into one of the rooms.
Sokoban domain

To generate sample commands in natural language, we organized an Amazon Mechanical Turk project that showed users an example trajectory and asked them to provide a corresponding command that allows another person to recreate the trajectory.

Example Sentences Collected from Users
• Walk out the door, find the star and go home.
• Go down, lift the star, and push up to the second opening.
• Leave the room on the right and move the star to the room on the left.
• Go through the door to the red room. Find the yellow star to your right. Push it through the door into the green room.
• Go through the door below you into the red area, pick up the star on the far left and bring it to the closest doorway of the green area.
• Go to the star in the red room and bring it to the green room.

System Architecture
The diagram below shows the overall architecture of the system. Semantic parsing techniques are used to generate possible sentence meanings and lexical bindings to objects in the world.  A task abstraction module then interprets the natural language commands as parameterized actions, using existing task models or by creating a new task model for a previously unseen task.  Inverse reinforcement learning is applied to the parameterized actions and observed state space trajectories, resulting in a set of possible grounded actions, which in turn can be used to update the likelihood of alternative semantic interpretations of the associated natural language commands. By applying this reasoning framework to a series of trajectories and associated natural language commands, the system can learn generalized task representations that can be used to carry out future directions.

System Architecture

Learning to Interpret Natural Language Instructions
Following Verbal Commands

We are sponsored by National Science Foundation