Spoken Language Generation. Algorithms for generating natural Language in Spoken Dialogue Systems (SDS)

Essay 2007 10 Pages

English Language and Literature Studies - Linguistics



This essay describes several algorithms for generating natural language in spoken dialogue systems (SDS). Natural language generation (NLG) deals with the transformation of semantic representations to well-formed utterances. As speech significantly differs from written documents it is necessary to develop different approaches for its generation than for text. SDS should produce easily understandable, human-like sentences in order to increase the facility of information retrieval as well as the convenience of use for humans. The purpose of this essay is to compare template-based (e.g. GENESIS and GENESIS-II), rule-based, and hybrid linguistic / statistical (e.g. HALogen, Acorn, Communicator, NLG[1 – 4], SPoT and SPaRKy) methods and to highlight their strengths and weaknesses. This evaluation may be helpful when creating new SDS in practice. However, the final decision what algorithm to use depends on the task and the users’ needs as well as the time, money, and effort available for the system’s development.

1 Introduction

In this essay, several methods for generating language for SDS are discussed. This topic is interesting because the performance of existing SDS could be improved to better fulfill the needs of different applications. For example, more variation in the output speech could be generated. Moreover, SDS should be easy to use. Giving the user the initiative in a dialogue may meet this goal but results in the need for a more flexible system. Efficiency and user satisfaction are very important evaluation metrics of the system’s performance. In a SDS, system utterances should be understandable and appropriate in a way that the user always knows what to say next.

Research in this area is important because in a SDS there is a need to create a suitable response to a user query in real-time, i.e. grammatically well-formed sentences must be produced without long response times, whereas written text can usually be generated without time constraints. Furthermore, spoken utterances are generally shorter and less grammatically complex than written texts (Oh & Rudnicki, 2002).

The aim of this essay is to show the different approaches to NLG in SDS. Some of the techniques currently in use are template-based (e.g. GENESIS and GENESIS-II), rule-based, and hybrid linguistic / statistical (e.g. HALogen, Acorn, Communicator, NLG[1 – 4], SPoT and SPaRKy). The consideration of the benefits and drawbacks of these algorithms helps to decide which method to use when a new SDS is going to be built.

The rest of the essay is as follows. Section 2 gives an overview of the different algorithms and illustrates their good and bad points. It also describes them in more detail using the example systems mentioned above. This is followed by recommendations for practical use and future developments in section 3. Finally, a summary of the work mentioned in this essay is presented in section 4.

2 Algorithms

There are three different ways for generating speech: template-based, rule-based, and hybrid linguistic / statistical methods. The following analyzes these algorithms and presents examples.

2.1 Template-based Methods

Probably the easiest way to generate language is using template-based methods. Templates are handcrafted and usually consist of text strings. Canned text is a possibility, but filling slots is also common. Figure 1 shows a pre-defined text string, whereas the example in figure 2 allows different words for the variables that start with a “$” sign.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Canned Text

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Filling Slots

Templates are simple, and easy to create without much linguistic knowledge. They generally provide good quality for a certain domain, but a sufficient amount of templates is mostly needed. The output of SDS can only be as good as the templates are; that means they cannot generate what was not specified. Template-based algorithms are predominantly domain-specific and not generally usable. They generate inflexible dialogues, for example they only allow little variation in style. Nevertheless, templates can be improved by linguistic rules to generate more complicated sentences. In addition to that, template-based SDS are hard to maintain. For example, subject-verb agreement has to be implemented repeatedly for all options (“there is 1 restaurant” vs. “there are 2 restaurants”). New templates must be added manually, e.g. for a different domain. That is not only time-consuming but also a problem of choosing the right words as the template writer is usually not an expert in the domain. Now two examples for this approach are shown.

GENESIS: The MIT GENESIS system (Glass et al., 1994) can be adapted to multi-domain and multilingual generation. It uses semantic frames to represent the meaning of sentences, for example in figure 3 the query “Are there any banks on Main Street?” is shown. This clause is concerned with the existence of the topic bank which is in plural; the quantifier adds the word “any”, and the predicate builds the location “on Main Street”.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3: Semantic Frame (Glass et al., 1994)

The generation component consists of a lexicon, various templates, and several rewrite rules for English and French. The lexicon contains information about the words (parts of speech), their stems and their derived forms whereas the rewrite rules deal with phonetic constraints (e.g. “a” vs. “an” as article). The templates consist of a name, and values of word strings or keywords. Default values are possible in case of no input. The templates control the ordering of the sentence constituents. Clauses, topics, and predicates are considered as the parts of a sentence. They are connected to the templates. Figure 4 gives examples of templates for English. The query in figure 3 has to be expressed with a form of the auxiliary verb “be”, the word “there” and the topic. Here, the topic template says that this part has to be constructed in the order quantifier and noun phrase.

Abbildung in dieser Leseprobe nicht enthalten

Figure 4: Message Templates (Glass et al., 1994)

GENESIS-II: GENESIS-II (Baptist & Seneff, 2000) is an improved version of the original GENESIS system with more advanced features. It produces higher quality speech, particularly in foreign and artificial languages such as SQL or HTML. The system converts semantic frames to shorter e-forms, an example for the utterance “United flight 94 leaves from Boston at 3:00 p. m.” is shown in figure 5.

Abbildung in dieser Leseprobe nicht enthalten

Figure 5: E-form (Baptist & Seneff, 2000)

The e-form is processed in a similar way as in the original GENESIS system using the information depicted in figure 6. There, flight_leg determines the order airline_flight and leaves_from. The former looks up airline in the e-form and finds “UA”. That is searched for in the lexicon file and “United” is found. Then, the word “flight” and the flight number, which is “94”, follow. The second part leaves_from consists of from_source and at_dpt_time, which produces “leaves from Boston at 3:00 p.m.” after considering the values in the e-form. If there is no input for source or time, this information will be left out in order to create grammatically correct utterances.

Abbildung in dieser Leseprobe nicht enthalten

Figure 6: Generation Process (Baptist & Seneff, 2000)

Advanced features include an info frame that saves up to date information about syntactic, semantic, and prosodic cues, for example a plural marker to enforce number agreement during the generation process. There is also a mechanism that generates tags for list elements retrieved from the database in order to create commas between the elements and the word “and” before the last list element. Word-sense disambiguation was improved as well using context-dependent selectors for supporting prosody marking and generation in different languages.

1 The developers of GENESIS-II hypothesize that the system is faster than the hybrid ones described below because its generation algorithm uses a very efficient binary search for finding entries in the lexicon and the rule files. For example, GENESIS-IIs response times are small fractions of one second in the flight domain, even for complex utterances (Seneff, 2002).



ISBN (eBook)
ISBN (Book)
File size
481 KB
Catalog Number
Institution / College
University of Sheffield
1,0 (100%)
NLP SDS spoken language generation algorithms dialogue systems




Title: Spoken Language Generation. Algorithms for generating natural Language in Spoken Dialogue Systems (SDS)