NLP M8: Task-Oriented Dialogue (Meta)

Module 8 of CS 7650 - Natural Language Processing @ Georgia Tech.

Posted Oct 28, 2025 Updated Nov 14, 2025

By nrtedesco

3 min read

Introduction

A Task-Oriented Dialogue (ToD) system helps the user to achieve a particular task through repeated natural language prompts and maintained dialogue history. Examples include using an AI agent to book a restaurant, plan a travel itinerary, and so on. A successful ToD system must maintain context of dialogue, and piece together relevant information needed to complete the task at hand.

ToD systems typically have the following major characteristics:

Turn Taking: dialogue is interleaved between at least two speakers, with one being the agent.
Speech Acts: the user presents information to the agent in one of multiple categorized ways.
Grounding: whether the user and agent mutually agree on the information being exchanged.

Components of ToD System

Automatic Speech Recognition

ToD systems rely on Automatic Speech Recognition (ASR) to capture user speech.

User utterances recorded for speech segments.
Acoustic features computed from speech segments; treated as sequence of observations. Sequence is further translated to phonemic sequence.
Phonemic representation is further translated or decoded to sub-words or words via a language model.

ASR models are evaluated using metrics such as character / word / slot error rate, where a slot is an entity or span contained in a given utterance.

Intent Understanding

Converting speech recognition output to a semantic representation is also challenging due to a number of reasons:

ASR errors may carry forward to the context understanding component.
Substantial ambiguity in natural language utterances.
Same word / phrase may have different meaning depending on context.

As part of Intent Prediction, the system attempts to understand the user goal. The system may conduct additional information seeking or clarification in order to gather necessary details. It may also define knowledge slots to fill based on task requirements. Note that the task may be completed via different paths depending on slot information available to the system.

Dialogue Manager

The purpose of the Dialogue Manager is to orchestrate content and structure of dialogue as part of a ToD conversation. It may achieve this using a handcrafted dialogue task tree with fixed transitions between dialogue states.

The state tracked by the manager consists of dialogue history:

user requests, information, and slots provided so far.
information requested and provided by the system.
user preferences.

Dialogue policy refers to the decision making process for the next action to take given the current dialogue state. If the system is simple, we may use simple conditionals for our policy. For more complicated systems, we should train a Reinforcement Learning model using a Partially Observed Markov Decision Process (POMDP).

Natural Language Generation

Text Generation as part of ToD takes one of two major forms:

User-Initiated Dialogue: the user submits some initial query to the system.
System-Initiated Dialogue: the system begins the conversation with the user.

Methods for generation include template-based responses, structured prediction, Seq2seq encoder/decoder models, and structured fusion networks. System may be capable of providing slot-level attention to properly attend to relevant context provided by the user.

Other Remarks

End-to-End ToD

An End-to-End ToD system combines many individual deep neural networks into a single cohesive unit:

Intent Network: seq2seq model to map user utterance to intent.
Belief Tracker: maps user utterance to distribution of values. Beliefs are typically per-slot, meaning we have a probability assigned to each belief slot.
Database Operator: uses beliefs to construct SQL query to select most relevant data.
Policy Network: feedforward layer combining outputs from individual networks together.
Generation Network: seq2seq model responsible for creating final text to be returned to user.

Recovering from Errors

Considering our system has many different components, we may have many different opportunities for error:

Misunderstanding Errors: miscommunication due to speech recognition issues or problematic mapping to user intent.
Nonunderstanding Errors: issues with communication due to missed slot values, requiring additional turns for resolution.

Recovery strategies for ToD systems include…

confirmation questions to recover reliable information.
task completion via restart, context switch, or termination.

(all images obtained from Georgia Tech NLP course materials)

OMSCS, NLP

This post is licensed under CC BY 4.0 by the author.