Hear Your Web Site


Last week, I talked about how XHTML Basic is becoming the new standard for handheld devices such as PDAs and mobile phones. But what if you could convert an existing XHTML document into another markup language that turned it into speech?

Here is where VoiceXML comes in. VoiceXML is an extensible markup language that allows developers to create audio dialogs. These include elements such as:

  • synthesized speech
  • digitized audio
  • sound recognition
  • telephone keytone (DTMF) recognition
  • recording sounds
  • and more

In order to implement a VoiceXML interpreter, you need the following things:

  • A tool to acquire the document. This is usually via an incoming phone call
  • A tool to provide audio output. This can be either audio files or text-to-speech
  • A tool to accept audio input. There are three parts to this:
    • character recognition (the touch tone signals, for example)
    • speech recognition
    • record audio received

Here is a simple example of a VoiceXML script:

<?xml version="1.0" ?>
 <vxml version="1.0">
 <block>Hello World!</block>

Using VoiceXML, you can create all different types of voice activated menus, forms, and much more. With speech synthesis your applications can contain as much information as you would like, and it's all audible. For example, I could convert my HTML Tag Library into a VoiceXML application for use over a phone.

<field name="html-tags">
 <block>Learn HTML Tags from HTML at About</block>
 <prompt>What HTML Tag would you like to learn?</prompt>
 <help>Say one of paragraph, ul, or code.</help>
 <grammar type="application/x-jsgf">
 paragraph {p} | ul {ul} | code {code}
 <dtmf type="application/x-jsgf">
 1 {p} | 2 {ul} | 3 {code}
 <submit next="/cgi-bin/tags.cgi"
 method="post" namelist="tags" />

If you called the above application, you might hear something like this:
Computer: Learn HTML Tags from HTML at About
C: What HTML Tag would you like to learn?
C: Say one of paragraph, ul, or code.
Human: html
C: I did not understand what you said.
H: (uses touch-tone phone) 2
C: goes on to /cgi-bin/tags.cgi application

