Wednesday, 29 March 2017

Giving Alexa a new Sense - Vision! (Using Microsoft Cognitive APIs)

Amazon Alexa is just awesome so I thought it would be fun to let her "see".  Here's a video of this in action:



...and here's a diagram of the "architecture" for this solution:



The clever bit is the Microsoft Cognitive API so let's look at that first!  You can get a description of the APIs here and sign up for a free account.  To give Alexa "vision" I decided to use the Computer Vision API which takes a image URL or an uploaded image, analyses it and provides a description.

Using the Microsoft Cognitive API developer console I used the API to analyse the image of a famous person shown below and requested a "Description":



...and within the response JSON I got:

"captions": [ { "text": "Elizabeth II wearing a hat and glasses", "confidence": 0.28962254803103227 } ]

...now that's quite some "hat" she's wearing there (!) but it's a pretty good description.

OK - So here's a step-by-step guide as to how I put it all together.

Step 1 - An Apache Webserver with Python Scripts
I needed AWS Lambda to be able to trigger a picture to be taken and a Cognitive API call to be made so I decided to run this all from a Raspberry Pi + camera in my home.

I already have a Apache webserver running on my Raspberry Pi 2 and there's plenty of descriptions on the internet of how to do it (like this one).

I like a bit of Python so I decided to use Python scripts to carry out the various tasks.  Enabling Python for cgi-bin is very easy; here's an example of how to do it.

So to test it I created the following script:

#!/usr/bin/env python
print "Content-type: text/html\n\n"
print "<h1>Hello World</h1>"

...and saved it as /usr/lib/cgi-bin/hello.py.  I then tested it by browsing to http://192.168.1.3/cgi-bin/hello.py (where 192.168.1.3 is the IP address on my home LAN that my Pi is sitting on).  I saw this:



Step 2 - cgi-bin Python Script to Take a Picture
The first script I needed was one to trigger my Pi to take a picture with the Raspberry Pi camera.  (More here on setting up and using the camera).

After some trial and error I ended up with this script:

#!/usr/bin/env python
from subprocess import call
import cgi

def picture(PicName):
  call("/usr/bin/raspistill -o /var/www/html/" + PicName + " -t 1s -w 720 -h 480", shell=True)

#Get arguments
ArgStr = ""
arguments = cgi.FieldStorage()
for i in arguments.keys():
 ArgStr = ArgStr + arguments[i].value

#Call a function to get a picture
picture(ArgStr)

print "Content-type: application/json\n\n"

print "{'Response':'OK','Arguments':" + "'" + ArgStr + "'}"

So what does this do?  The ArgString and for i in arguments.keys() etc. code section makes the Python script analyse the URL entered by the user and extract any query strings.  The query string can be used to specify the file name of the photo that is taken.  So for example this URL:

http://192.168.1.3/cgi-bin/take_picture_v1.py?name=hello.jpg

...will mean a picture is taken and saved as hello.jpg.

The "def Picture" function then uses the "call" module to run a command line command to take a picture with the Raspberry pi camera and save it in the root directory for the Apache 2 webserver.

Finally the script responds with a simple JSON string that can be rendered in a browser or used by AWS Lambda.  The response looks like this in a browser:


Step 3 - Microsoft Cognitive API for Image Analysis
So now we've got a we need to analyse it.  For this task I leaned heavily on the code published here so all plaudits and credit to chsienki and none to me. I used most of the code but removed the lines that overlaid on top of the image and showed it on screen.

#!/usr/bin/env python
import time
from subprocess import call
import requests
import cgi

# Variables
#_url = 'https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze'
_url = 'https://westus.api.cognitive.microsoft.com/vision/v1.0/describe'

_key = "your key"   #Here you have to paste your primary key
_maxNumRetries = 10

#Does the actual results request
def processRequest( json, data, headers, params ):

    """
    Helper function to process the request to Project Oxford

    Parameters:
    json: Used when processing images from its URL. See API Documentation
    data: Used when processing image read from disk. See API Documentation
    headers: Used to pass the key information and the data type request
    """

    retries = 0
    result = None

    while True:
        print("This is the URL: " + _url)
        response = requests.request( 'post', _url, json = json, data = data, headers = headers, params = params )

        if response.status_code == 429:

            print( "Message: %s" % ( response.json()['error']['message'] ) )

            if retries <= _maxNumRetries:
                time.sleep(1)
                retries += 1
                continue
            else:
                print( 'Error: failed after retrying!' )
                break

        elif response.status_code == 200 or response.status_code == 201:

            if 'content-length' in response.headers and int(response.headers['content-length']) == 0:
                result = None
            elif 'content-type' in response.headers and isinstance(response.headers['content-type'], str):
                if 'application/json' in response.headers['content-type'].lower():
                    result = response.json() if response.content else None
                elif 'image' in response.headers['content-type'].lower():
                    result = response.content
        else:
            print( "Error code: %d" % ( response.status_code ) )
            #print( "Message: %s" % ( response.json()['error']['message'] ) )
            print (str(response))

        break

    return result

#Get arguments from the query string sent
ArgStr = ""
arguments = cgi.FieldStorage()
for i in arguments.keys():
 ArgStr = ArgStr + arguments[i].value

# Load raw image file into memory
pathToFileInDisk = r'/var/www/html/' + ArgStr

with open( pathToFileInDisk, 'rb' ) as f:
    data = f.read()

# Computer Vision parameters
params = { 'visualFeatures' : 'Color,Categories'}

headers = dict()
headers['Ocp-Apim-Subscription-Key'] = _key
headers['Content-Type'] = 'application/octet-stream'

json = None

result = processRequest( json, data, headers, params )

#Turn to a string
JSONStr = str(result)

#Change single to double quotes
JSONStr = JSONStr.replace(chr(39),chr(34))

#Get rid of preceding u in string
JSONStr = JSONStr.replace("u"+chr(34),chr(34))


if result is not None:
  print "content-type: application/json\n\n"

  print JSONStr

So here I take arguments as before to know which file to process, "read" the file and then use the API to get a description of it.  I had to play a bit with the response to get it into a format that could be parsed by the Python json module.  This is where I turn single quotes to double quotes and get rid of preceding "u" characters.  There's maybe a more Pythonic way to do this, please let me know if you know a way....

When you call the script via a browser you get:


Looking at the JSON structure in more detail you can see a "description" element which is how the Microsoft Cognitive API has described the image.

Step 4 - Alexa Skills Kit Configuration and Lambda Development
The next step is to configure the Alexa Skills kit and write the associated AWS Lambda function.  I've covered how to do this previously (like here) so won't cover all that again here.

The invocation name is "yourself"; hence you can say "Alexa, ask yourself...".

There is only one utterance which is:
AlexaSeeIntent what can you see

...so what you actually say to Alexa is "Alexa, ask yourself what can you see".  

This then maps to the intent structure below:

{
  "intents": [
    {
      "intent": "AlexaSeeIntent"
    },
    {
      "intent": "AMAZON.HelpIntent"
    },
    {
      "intent": "AMAZON.StopIntent"
    },
    {
      "intent": "AMAZON.CancelIntent"
    }
  ]
}

Here we have a boilerplate intent structure with the addition on AlexaSeeIntent which is what will be passed to the AWS Lambda function.

I won't list the whole AWS Lambda function below, but here's the relevant bits:

#Some constants
TakePictureURL = "http://<URL or IP Address>/cgi-bin/take_picture_v1.py?name=hello.jpg"
DescribePictureURL = "http://<URL or IP Address>/cgi-bin/picture3.py?name=hello.jpg"

Then the main Lambda function to handle the AlexaSeeIntent:

def handle_see(intent, session):
  session_attributes = {}
  reprompt_text = None
  
  #Call the Python script that takes the picture
  APIResponse = urllib2.urlopen(TakePictureURL).read()
  
  #Call the Python script that analyses the picture.  Strip newlines
  APIResponse = urllib2.urlopen(DescribePictureURL).read().strip()
  
  #Turn the response into a JSON object we can parse
  JSONData = json.loads(APIResponse)
  
  PicDescription = str(JSONData["description"]["captions"][0]["text"])
  
  speech_output = PicDescription
  should_end_session = True
  
  # Setting reprompt_text to None signifies that we do not want to reprompt
  # the user. If the user does not respond or says something that is not
  # understood, the session will end.
  return build_response(session_attributes, build_speechlet_response(
        intent['name'], speech_output, reprompt_text, should_end_session))

So super, super simple.  Call the API to take the picture, call the API to analyse it, pick out the description and read it out.

Here's the image that was analysed for the Teddy Bear video:



Here's another example:


The image being:


...and another:


...based upon this image:
  

Now to think about what other senses I can give to Alexa...


Using IBM Bluemix Watson APIs to Optimise my CV (Resume)

I kept seeing adverts for IBM Bluemix popping up on my social media feeds and online adverts so I thought I'd take a look and see what it was all about.  You can sign up an account and get 30 days access for free so that was all good for a home hobbyist like me!

So what is Bluemix?  Here's a snippet from the IBM Bluemix website:

The IBM Bluemix cloud platform helps you solve real problems and drive business value with applications, infrastructure and services.


So it does what it says on the tin really.  A bunch of cloud based capabilities that lets you do interesting stuff!  So what interesting stuff to do with this?   Creating an account (free for 30 days - no payment card required) and browsing the Bluemix catalogue my eye was drawn to the Watson APIs.  Watson was made famous through winning the US Jeopardy gameshow and there's a bunch of exciting artificial intelligence capabilities like Natural Language Understanding and Personality Insights you can use.

As a starting point I decided to play with the "Tone Analyzer" API, the description of which is as follows:

People show various tones, such as joy, sadness, anger, and agreeableness, in daily communications. Such tones can impact the effectiveness of communication in different contexts. Tone Analyzer leverages cognitive linguistic analysis to identify a variety of tones at both the sentence and document level. This insight can then used to refine and improve communications.

At the moment I'm updating my CV (resume for you good people in the USA) and I'm told that when faced with an avalanche of CVs, recruiters will sometimes only ready the very first "personal profile" section of the document to make their initial decision.  Additionally recruiters are more-and-more using AI tools to filter CVs.  I thought that if I could use the tone analyser to optimise that first section of my CV then this would be a good use of Bluemix.

To use the API you simply click "Create" and get some credentials to access the API.  IBM provide a lot of guidance as to how to use the API and provide SDKs for languages like Python and node.js.  I decided to use curl as all I wanted to do was throw some text at the API and see the result.

So here's a curl command to access the API:

curl -v -u "username":"password" -H "Content-Type: text/plain" -d "Some text" "https://gateway.watsonplatform.net/tone-analyzer/api/v3/tone?version=2016-05-19"

(Replace username and password with the ones you are provided by Bluemix).

The response is a JSON structure that looks like this (abridged):
{"document_tone":{"tone_categories":[{"tones":[{"score":0.135461,"tone_id":"anger","tone_name":"Anger"},{"score":0.045643,"tone_id":"disgust","tone_name":"Disgust"},{"score":0.71908,"tone_id":"fear","tone_name":"Fear"},{"score":0.232038,"tone_id":"joy","tone_name":"Joy"},{"score":0.524529,"tone_id":"sadness","tone_name":"Sadness"}],"category_id":"emotion_tone","category_name":"Emotion Tone"},

The structure provides numeric values (range 0 to 1) based upon a set of analysis criteria that IBM defines as follows:

It detects three types of tones, including emotion (anger, disgust, fear, joy and sadness), social propensities (openness, conscientiousness, extroversion, agreeableness, and emotional range), and language styles (analytical, confident and tentative) from text.

Numeric values are provided for the whole piece of text you provide plus it's broken down into sentences and each sentence is analysed.  My grand plan it to pick out those attributes that I deem important for the type of job I would like to get and then "tune" the text from my CV to improve those attributes.

First I need to be able to take the JSON API response and turn it into something I could read and interpret.  I decided to use Python to analyse the JSON (because Python rocks) and use an online charting capability called Plotly to visualise the data.  I used Plotly as it has an API that I thought would be fun to learn about.

Plotly provides a REST API that you can HTTP POST to and have Plotly render a chart in your online account that you can, for example, reference in another website. Plotly provide online descriptions of the REST API here but in simple terms you specify the data to plot and some formatting parameters and Plotly does the rest for you.

Here's a example POST message body:

un=chris& key=kdfa3d& origin=plot& platform=lisp& args=[[0, 1, 2], [3, 4, 5], [1, 2, 3], [6, 6, 5]]& kwargs={"filename": "plot from api", "fileopt": "overwrite", "style": { "type": "bar" }, "traces": [1], "layout": { "title": "experimental data" }, "world_readable": true }

Here's some Python I wrote to extract data from the JSON structure and use the Plotly API, (replace Watson API response and credentials with your values):

import json
import pprint
import urllib.request
import sys

#Constants
#Baseline response  
APIResponse = 'Bluemix JSON Response Here'

PlotlyURL = "https://plot.ly/clientresp"
UserName = "username"
APIKey = "YourKey"

#Example simple arguments Strings
#NArgsString = '[["One", "Two", "Three"], [0.98, 0.87, 0.87]]'


#This is a arguments string for plotly formatting
KwargsJSON = {"filename": "plot from api",
        "fileopt": "overwrite",
        "style": {"type": "bar"},
    "traces": [0],
    "layout": {
        "title": "Less Anger!"
    },
    "world_readable": True
}

#First we extract all the data from the JSON from Watson.  Can pretty print if you want
ToneJSON = json.loads(APIResponse)
#pprint.pprint(ToneJSON)

#Initialise the sub components of the plotly argument string
XList = []
YList = []

#Itterate through the JSON structure picking up attributes and values
for MyTone in ToneJSON["document_tone"]["tone_categories"]:
  for TheTones in MyTone["tones"]: 
    #Build the x and y Python lists that we'll use for the plotly argument
    XList.append(str(TheTones["tone_name"]))
    YList.append(float(TheTones["score"]))
    
#This is the arguments string for plotly.  We need to use the .join method to make sure the arguments string is properly formatted for the x axis values
NArgsString = "[[" + ','.join('"{0}"'.format(w) for w in XList) + "], " + str(YList) + "]"

#Make sure we have " not ' around the JSON elements
NArgString = NArgsString.replace(chr(39),chr(34))

#Form the body for the HTTP POST
KwargsString = json.dumps(KwargsJSON)
PostBody = "un=" + UserName + "&" + "key=" + APIKey + "&" + "origin=plot&platform=lisp&args=" + NArgsString + "&" + "kwargs=" + KwargsString

#Encode the whole post body
PostBody = PostBody.encode('utf-8')

#Form the request
MyRequest = urllib.request.Request(PlotlyURL, data=PostBody)

try:
  #Execute the request
  wp = urllib.request.urlopen(MyRequest)
  
  #Read the response and print it for the user
  TheResponse = wp.read()
  print(str(TheResponse))

#Handle pesky errors
except urllib.error.HTTPError as e:
  print("HTTP Error caught when making request " + str(e.code) + "\n")
except urllib.error.URLError as e:
  print("URL Error caught when making request\n")

 So we're ready to analyse some text.  First I used some made up text to test how good the tone analyser API is.  Here's the text:

I am excellent at everything.  There is nothing I can not do.  Throw a challenge at me and I will succeed.  I have beaten every target ever set for me.  Employ me and you will employ a winner.

...and here's the resulting Plotly chart:


That seems about right, in particular the sky-high confidence score!

Here's my current CV profile statement:

A Solution Architect with a wide range of knowledge and experience in the Telecommunications and IT industry.  Has significant experience of leading cross-functional teams to deliver innovative solutions spanning IT, Network and TV systems.  A strong self-starter with proven analytical and problem solving skills.  Able to learn about new technologies quickly and apply this knowledge to design tasks.  Well-developed communication and presentation skills, both written and oral.

...which when analysed by Watson and charted by Plotly yields this:


So I would say that for the type of job I want I need to:

  • Reduce the anger and sadness
  • Maintain analytical
  • Have some confidence!
  • Improve conscientiousness

..but as another test of the API I analysed this version of my profile (addition in red):

A Solution Architect with a wide range of knowledge and experience in the Telecommunications and IT industry.  Has significant experience of leading cross-functional teams to deliver innovative solutions spanning IT, Network and TV systems.  A strong self-starter with proven analytical and problem solving skills.  Able to learn about new technologies quickly and apply this knowledge to design tasks.  Well-developed communication and presentation skills, both written and oral.  I’m so afraid that if I don’t get this CV right then I won’t be employed by anyone; I’m really really scared, worried and frightened about this!

Watson and Plotly yield this:


There we go!  Fear increases from negligible to ~0.7 so there's a definite correlation between text and the analysis.

Back to business.  Here's a modification to my profile to try and boost confidence:

A Solution Architect with a track record of successful delivery in the Telecommunications and IT industry.  Has significant experience of leading cross-functional teams to deliver innovative solutions spanning IT, Network and TV systems.  A strong self-starter with proven analytical and problem solving skills.  In a fast paced, ever changing technology world, is confident in his abilities to quickly learn and apply new skills.  Well-developed communication and presentation skills, both written and oral.

Watson and Plotly they say:


Bingo!

Now to up the conscientiousness.  I actually had to play with the language a lot and even then I only managed to improve it by 0.1.  Here's what I wrote:

A Solution Architect with a track record of successful delivery in the Telecommunications and IT industry.  Has significant experience of leading cross-functional teams to deliver innovative solutions spanning IT, Network and TV systems.  A strong self-starter with proven analytical and problem solving skills.  In a fast paced, ever changing technology world, is confident in his abilities to quickly learn and apply new skills.  A conscientious, reliable individual who always who sets challenging goals, forms structured plans to achieve them and follows through until the job is complete.

Which results in:

Finally to drop the anger levels as anger is never a good look!  Here's what I wrote:

A Solution Architect with a track record of successful delivery in the Telecommunications and IT industry.  Has significant experience of leading cross-functional teams to deliver innovative solutions spanning IT, Network and TV systems.  A strong self-starter with proven analytical and problem solving skills that is never happier than when working with like-minded individuals to harmoniously collaborate and solve problems.  In a fast paced, ever changing technology world, is confident in his abilities to quickly learn and apply new skills.  A conscientious, reliable individual who always who sets challenging goals, forms structured plans to achieve them and follows through until the job is complete.

The net result being:
So finally I used Plotly to compare the initial (baseline) analysis with the final text.  Here's the result:

So less anger, more joy, less sadness, more confidence and more conscientiousness so all looking good here.  However I've also dropped the analytical and openness scores which isn't so good for the type of role I'd like but I can live with that!  So now to use this for my real-life CV.  Wish me luck...