Tuesday, 8 January 2013

SL4A and Bing Human Language Conversation

Just before Christmas, my English colleague told me how he likes to conduct instant messaging sessions with my Italian colleague in Italian.  My English colleague doesn't speak Italian, he just uses the babelfish site to do the conversation.

So that's fine for written translation but what about spoken translation?  There are apps available to do this but I thought it would be fun to write my own, (and you learn more by doing rather than simply using).  To do this I used my all time favourite scripting capability for Android, Python using SL4A

The plan was to have something that would:
  1. Listen to my voice and translate it to (English) text.  i.e. Speech-to-text
  2. Translate the English in to another language.
  3. Do text-to-speech the resulting language.
For speech-to-text, this is very easy in  SL4A.  Here's a code fragment:

import android

#Set up out droid object
droid = android.Android()


#Get the speech
speech = droid.recognizeSpeech("Talk Now",None,None)
print "You said: " + speech[1]


This just pops up a dialogue that prompts you to speak, when it detects a pause it goes away (presumably to Google's servers) and does the text-to-speech translation.  It then prints the result to screen.  It's pretty accurate and can even handle whole sentences.

Text-to-speech is pretty easy as well.  Assuming you've imported Android and created an Android object (as above), it's simply:

droid.ttsSpeak(TextToSpeak)

So easy! 

To get the text-to-speech part to work I did need to change some settings on my Android device.  Specifically, on my HTC Desire HD I needed to go menu - Settings - Voice input & output settings- Text-to-speech settings and:

1)Install voice data (a quick download from the Play Store), and

2)Set the language - French(France) for my tinkering.

Both of these things used to be a premium product and those chaps from Google give it away for free!

The trickier part was doing the translation from English to French.  However as I've learnt through  my tinkering, there's an API for pretty much everything these days...  A quick Google search led me to the Bing translation API.  This provides a HTTP REST, AJAX and SOAP interfaces to perform a range of tasks.

Using it is relatively simple.  You:

1)Register a developer account, register a translation application and get a client ID and client secret.

2)Make a HTTP GET call to get a temporary access token, (last 10 minutes).

3)Make a HTTP GET call to translate your chosen text.

As luck would have it, this excellent blog details the process you go through to set up and use the API and has Python code available for you to re-use.  All credit to the Blog author, Denis Papathanasiou.

Here's a screenshot.  You can see how it's detected what I've spoken and translated it in to French.  Of course you can't hear the spoken response but trust me, it works a treat.  The nasty error message at the end just comes from the inelegant way I ended the script in order to remove the "Speak Now" dialogue that obscured the text:


Full code listing is below.  To get it to work for you simply edit the clientID and client secret values to match yours.  To change languages simply edit the line TheResponse = translate(token, speech[1], 'fr', 'en') and also change the settings within the handset settings menu.  Inelegant I know but hey, this is tinkering!


# SL4A Demos Transcribe Speech
# http://blog.matthewashrafi.com/
http://denis.papathanasiou.org/?p=948

#Language translation and suchlike

#Secret keys.  Enter your own here.
MY_CLIENT_ID = ""
MY_CLIENT_SECRET = ""

#!/usr/bin/python

"""

msmt.py



Functions to access the Microsoft Translator API HTTP Interface, using python's urllib/urllib2 libraries



"""



import urllib, urllib2
import json
import android


from datetime import datetime



def datestring (display_format="%a, %d %b %Y %H:%M:%S", datetime_object=None):

    """Convert the datetime.date object (defaults to now, in utc) into a string, in the given display format"""

    if datetime_object is None:

        datetime_object = datetime.utcnow()

    return datetime.strftime(datetime_object, display_format)



def get_access_token (client_id, client_secret):

    """Make an HTTP POST request to the token service, and return the access_token,

    as described in number 3, here: http://msdn.microsoft.com/en-us/library/hh454949.aspx

    """



    data = urllib.urlencode({

            'client_id' : client_id,

            'client_secret' : client_secret,

            'grant_type' : 'client_credentials',

            'scope' : 'http://api.microsofttranslator.com'

            })



    try:



        request = urllib2.Request('https://datamarket.accesscontrol.windows.net/v2/OAuth2-13')

        request.add_data(data) 



        response = urllib2.urlopen(request)

        response_data = json.loads(response.read())



        if response_data.has_key('access_token'):

            return response_data['access_token']



    except urllib2.URLError, e:

        if hasattr(e, 'reason'):

            print datestring(), 'Could not connect to the server:', e.reason

        elif hasattr(e, 'code'):

            print datestring(), 'Server error: ', e.code

    except TypeError:

        print datestring(), 'Bad data from server'



supported_languages = { # as defined here: http://msdn.microsoft.com/en-us/library/hh456380.aspx

    'ar' : ' Arabic',

    'bg' : 'Bulgarian',

    'ca' : 'Catalan',

    'zh-CHS' : 'Chinese (Simplified)',

    'zh-CHT' : 'Chinese (Traditional)',

    'cs' : 'Czech',

    'da' : 'Danish',

    'nl' : 'Dutch',

    'en' : 'English',

    'et' : 'Estonian',

    'fi' : 'Finnish',

    'fr' : 'French',

    'de' : 'German',

    'el' : 'Greek',

    'ht' : 'Haitian Creole',

    'he' : 'Hebrew',

    'hi' : 'Hindi',

    'hu' : 'Hungarian',

    'id' : 'Indonesian',

    'it' : 'Italian',

    'ja' : 'Japanese',

    'ko' : 'Korean',

    'lv' : 'Latvian',

    'lt' : 'Lithuanian',

    'mww' : 'Hmong Daw',

    'no' : 'Norwegian',

    'pl' : 'Polish',

    'pt' : 'Portuguese',

    'ro' : 'Romanian',

    'ru' : 'Russian',

    'sk' : 'Slovak',

    'sl' : 'Slovenian',

    'es' : 'Spanish',

    'sv' : 'Swedish',

    'th' : 'Thai',

    'tr' : 'Turkish',

    'uk' : 'Ukrainian',

    'vi' : 'Vietnamese',

}



def print_supported_languages ():

    """Display the list of supported language codes and the descriptions as a single string

    (used when a call to translate requests an unsupported code)"""



    codes = []

    for k,v in supported_languages.items():

        codes.append('\t'.join([k, '=', v]))

    return '\n'.join(codes)



def to_bytestring (s):

    """Convert the given unicode string to a bytestring, using utf-8 encoding,

    unless it's already a bytestring"""



    if s:

        if isinstance(s, str):

            return s

        else:

            return s.encode('utf-8')



def translate (access_token, text, to_lang, from_lang=None):

    """Use the HTTP Interface to translate text, as described here:

    http://msdn.microsoft.com/en-us/library/ff512387.aspx

    and return an xml string if successful

    """



    if not access_token:

        print 'Sorry, the access token is invalid'

    else:

        if to_lang not in supported_languages.keys():

            print 'Sorry, the API cannot translate to', to_lang

            print 'Please use one of these instead:'

            print print_supported_languages()

        else:

            data = { 'text' : to_bytestring(text), 'to' : to_lang }



            if from_lang:

                if from_lang not in supported_languages.keys():

                    print 'Sorry, the API cannot translate from', from_lang

                    print 'Please use one of these instead:'

                    print print_supported_languages()

                    return

                else:

                    data['from'] = from_lang



            try:



                request = urllib2.Request('http://api.microsofttranslator.com/v2/Http.svc/Translate?'+urllib.urlencode(data))

                request.add_header('Authorization', 'Bearer '+access_token)



                response = urllib2.urlopen(request)

                return response.read()

            

            except urllib2.URLError, e:

                if hasattr(e, 'reason'):

                    print datestring(), 'Could not connect to the server:', e.reason

                elif hasattr(e, 'code'):

                    print datestring(), 'Server error: ', e.code

#Gets the text from the XML response from MSFT 
def GetTextFromXML (InXML):
  #Here be an example
  #The response was: <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">grue</string>
  #The '>' and subsequent '<' can be used to denote the string
  
  #Get the position of the first '>'
  BracketPos = InXML.find('>')

  #Now get everything from that point
  MyStr = InXML[BracketPos+1:len(InXML)-1]
  #print "String so far " + MyStr

  #Now get the second '<'
  BracketPos = MyStr.find('<')

  #And get everything before that
  MyStr = MyStr[0:BracketPos]

  return MyStr


###The main body of code


#Set up out droid object
droid = android.Android()

#Get the token to use for the loop
token = get_access_token(MY_CLIENT_ID, MY_CLIENT_SECRET)
#print "Token: " + token

#Print some funky stuff
print "###################################################"
print "# SL4A Demos Transcribe Speech                    #"
print "# From http://blog.matthewashrafi.com/            #"
print "# Also from http://denis.papathanasiou.org/?p=948 #" 
print "###################################################"

while True:
  
  #Get the speech
  speech = droid.recognizeSpeech("Talk Now",None,None)
  print "You said: " + speech[1]
  
  #Call a def to get a response 
  TheResponse = translate(token, speech[1], 'fr', 'en')
  print "The response was: " + TheResponse

  #Now extract from the XML...
  ExtractedResponse = GetTextFromXML(TheResponse)
  print "Extracted Response: " + ExtractedResponse

  #Do the text to speech bit
  droid.ttsSpeak(ExtractedResponse)