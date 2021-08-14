



Conversation agents are NLP-mediated dialogue systems for responding to specific queries in the human language. Leveraging advanced deep learning tools and natural language understanding, conversational agents can transcend simple chatbot responses and make them more contextual. Conversational AI covers three main areas of artificial intelligence research: automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS or speech synthesis). These dialogue systems are used to read from the input channel and respond with relevant responses with graphics, speech, or tactile assisted physical gestures over the output channel.

Modern conversational models often struggle when faced with temporal relationships and fluency. The ability of temporal inference in large pre-trained language model dialogs such as T5 and GPT-3 has not yet been fully explored. The lack of datasets containing this conversation and speech phenomenon has slowed progress in improving performance. To overcome these dataset problems, Google has introduced two new datasets for conversational NLP.

A Google-published study uses TimeDial and Disfl-QA to explore a pre-trained language model for temporal inference capabilities in dialogs. Each of these helps in understanding temporary common sense reasoning and contextual discrepancies in dialogs. These are benchmark datasets that show the gap between human performance and current state-of-the-art NLP models.

TimeDial dataset

TimeDial allows conversation agents to easily engage in temporary conversations such as the duration, frequency, and relative order of events in a dialog. Current NLP models tend to be inadequate choices when you need to fill out blank questions that require a basic level of knowledge to infer or understand the concept of time. TimeDial introduces a multiple-choice span-filling task for temporal understanding.

For example, investigate this conversation displayed on the Google AI blog.

Credit: Google AI Blog

Determining the time required for the NLP model to understand the temporal relationships between events is such that half of the past one comes before 3 o’clock, 3:30 comes after both, and so on. It also requires them to have knowledge of the world in order to determine that individuals are not yet late for the meeting. But current models like the T5 and BERT will pick the wrong answer.

GooglesTimeDial, which fits this question, is a benchmark dataset that measures the model’s temporal common sense inference ability within the context of a dialogue by setting up four multiple-choice questions.

Google led an experiment across three modeling paradigms-

In a dialog using the BERT-MLMGenerative method with T5, use the BERTmask filling of masked spans to classify the four options provided.

Quantitative error analysis concluded that pre-trained language models cannot truly infer context. Instead, they often rely on shallow, fake features such as test matching. This requires finding a new way to represent time objects in common textual representations.

The dataset is published at https://github.com/google-research-datasets/timedial.

Disfl-QA dataset

Fluency occurs in the text output produced by the speech recognition system. Therefore, it is essential to study this fluent text to build a conversation agent that understands human speech. However, NLP research faces two hurdles.

The lack of carefully selected datasets hinders deeper research and model innovation. Data sets usually include these fluency. The datasets available are limited in size and complexity.

These create challenges for researchers to perform stress tests on NLP models.

Google claims that Disfl-QA is the first dataset to contain contextual mismatches in information retrieval settings. This is a fluent dataset consisting of questions (12k) that include the complexity of these sentences.

Related item

Disfl-QA accounts for nearly 90% of fixes or reboots, making fluency fixes difficult. In addition, it has a wider range of semantic distractions, namely distractions that carry semantic meaning instead of simpler speech fluency.

Google has shown this with the help of an example.

Credit: Google AI Blog

In this sentence, Q1 is a question about the location of Normandy. However, the fluent version (DQ1) mentions the Scandinavian language before the question is corrected. The fluency of this fix confuses the QA model because it relied on shallow text cues to answer the question.

According to their results, the performance of existing language models was inadequate when tested with Disfl-QA. You can use data augmentation methods to partially recover from this performance degradation. Researchers have also found that the need for large fluency datasets for NLP models is robust to fluency.

The dataset is published at https://github.com/google-research-datasets/disfl-qa.

Avi Gopani

I’m a liberal arts graduate and enjoy studying new topics and writing about them. As an avid journalist, I love reading books, going out for a drive on a rainy day, and listening to old Bollywood music.

