Wednesday 19 April 2017

Half Marathon Comparison Using Azure, Google Maps, Python, MongoDB and Javascript

This time last year I blogged about a half-marathon I had run where I paced it badly and slowed down massively at the end.  I did the same race this year and ran a faster time but more importantly paced it more consistently and so enjoyed the experience more.

The run was over the same course and the weather was similar so this provides a good opportunity to compare and contrast both years.  At a superficial level, as part of the results that are provided you get to see your split after every 5K.  Hence it was possible to compare the splits of last year with this year:

So simply put:
  • After 5K, in 2017 I was 38 seconds behind where I was in 2016.
  • After 10K I was 44 seconds behind and after 15K I was a massive 80 seconds behind.
  • However after 20K in 2017 I had turned this around was 14 seconds ahead of 2016.
  • Then I ran the final 1.1K 27 seconds faster in 2017 than in 2016 to finish 41 seconds up.
Note that none of this was down to significantly better fitness, I just paced the run more sensibly in 2017.  (Put differently I was a lot more stupid in 2016!).

As a Geek I wanted to go further in this analysis so I thought it would be fun to visually compare 2016 versus 2017 on a map.  i.e. See my 2016 self zoom past my 2017 self then see my 2017 self catch up and pass 2016.  Having tinkered with AWS and Bluemix it was time to drive a different cloud computing offering so I decided to take up Microsoft's kind offer of £150 of credit.  

Here's the result.  The "6" marker is 2017, the "7" marker is 2017.



So you can see:
  • Me starting further up the road in 2017.
  • The 2016 me catching up and passing the 2017 me around the University.
  • 2016 me staying ahead for a long period of time.
  • 2017 me catching up and quickly passing 2016 me on the final straight stretch to the finish.

So a fascinating profile!

Here's a diagram of what I put together.  Full description and code then follows.



The above diagram shows the following key steps:
  • Garmin Sports watch syncs with Garmin Connect (standard activity)
  • GPX files downloaded from Garmin Connect and uploaded to Azure Virtual Machine (covered in Step 1 below).
  • Python script to parse GPX files and load them in a MongoDB instance (Step 2)
  • Apache webserver and Python cgi-bin to extract data from the MongoDB instance and offer a simple API (step 3)
  • HTML, CSS and Javascript to access API and present animated map markers using the Google Maps Javascript API (step 4)
Step 1 - Getting a Azure Linux Virtual Machine
Microsoft Azure is very easy and intuitive to use.  I already had a Microsoft account for Outlook.com so just used this to go through the Azure free trial sign up process.  This gave me £150 worth of free credit on the Azure platform.

After quickly reviewing tutorials I requested a Linux Virtual Machine using the steps New - Compute - Ubuntu Server 16.04TS and then providing some basic configuration details.  Within roughly a minute the server had been setup and I could get details as to how to SSH onto the VM using PuTTY.  The size of the platform was Standard DS1 v2 (1 core, 3.5 GB memory) which was suitable for my needs.

A tile on the Azure dashboard gave me access to all manner of information and configuration options for the VM.  Example below:



Take a step back now - for an olde skool Technology guy such as myself I am still super impressed by cloud computing capabilities.  No massive forms to fill out, no tetchy administrators to haggle with, no IP networking to organise - just BOOM! and you've got a machine to play with.

The final part of this step was to use an FTP client (WinSCP) to upload the Garmin GPX files to the VM.

Step 2 - MongoDB, GPX File Parsing and Database Loading
The plan was to use the GPX files recorded by my Garmin sports watch in 2016 and 2017 to allow map markers to be animated.  So what's a GPX file?  Here's a definition:

GPX, or GPS Exchange Format, is an XML schema designed as a common GPS data format for software applications. It can be used to describe waypoints, tracks, and routes. The format is open and can be used without the need to pay license fees.

Here's the top section of one of my half-marathon GPX files:

<?xml version="1.0" encoding="UTF-8"?>
<gpx creator="Garmin Connect" version="1.1"
  xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/11.xsd"
  xmlns="http://www.topografix.com/GPX/1/1"
  xmlns:ns3="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
  xmlns:ns2="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadata>
    <link href="connect.garmin.com">
      <text>Garmin Connect</text>
    </link>
    <time>2016-04-03T09:19:19.000Z</time>
  </metadata>
  <trk>
    <name>Whitley Ward Running</name>
    <type>running</type>
    <trkseg>
      <trkpt lat="51.42623794265091419219970703125" lon="-0.992680527269840240478515625">
        <ele>45.40000152587890625</ele>
        <time>2016-04-03T09:19:19.000Z</time>
        <extensions>
          <ns3:TrackPointExtension>
            <ns3:hr>82</ns3:hr>
          </ns3:TrackPointExtension>
        </extensions>
      </trkpt>
      <trkpt lat="51.42622503452003002166748046875" lon="-0.9927202574908733367919921875">
        <ele>45.40000152587890625</ele>
        <time>2016-04-03T09:19:20.000Z</time>
        <extensions>
          <ns3:TrackPointExtension>
            <ns3:hr>82</ns3:hr>
          </ns3:TrackPointExtension>
        </extensions>
      </trkpt>

So some metadata and then a <trk> section with a <trkseg> subsection which is a container for a bunch of <trkpt> elements.  Within each of these you can see:

  • The position logged (latitude and longitude)
  • Elevation
  • Date and time
  • Heart rate 

I wanted to store all the data in a database and I chose to use MongoDB on Azure as I enjoyed using it for a Raspberry Pi project last year (and so also had cracked using Python to write to and read from the database).

Getting the database was super easy.  Within the Azure I did New - Databases - "Database as a Service for MongoDB", entered a few details and minute or two later had a MongoDB instance.

Remembering that the structure of a document database is different to a relational database as follows:

Relational Database TermDocument Database Term
DatabaseDatabase
TableCollection
RowDocument

...I set up a database called "geekmongo" and created a collection within it called "Test".  Hence the task at hand was to parse the GPX files, create JSON documents and write them to the Test collection in the geekmongo database.

#Import statements
import xml.etree.ElementTree as ET
from datetime import datetime
from pymongo import MongoClient

#File to process
FileOne = '/home/map/rhm/activity_1109977624_2016.gpx'

#Database related - Got all these from "Connection String" area on Azure for the database instance
#Created the collection test myself manually on Azure
dBAddress = 'Your Connection String'

#Start parsing that XML
tree = ET.parse(FileOne)
root = tree.getroot()

#Set up for the mongo instance, the database then the collection
#Connect to the database
client = MongoClient(dBAddress)
db = client.geekmongo
collection = db.Test

#Get the first timestamp as we'll reference all subsequent ones to this in order to be able to calculate elapsed timestamp
FirstTimeStamp = root[1][2][0][1].text
#Turn it into a time object we can use
TStart = datetime.strptime(FirstTimeStamp[:-5],"%Y-%m-%dT%H:%M:%S")

#Loop through the XML file picking out lat and lng and writing them to a file, (unless they're the last ones on the list)
#Example is {"elapsed": 0, "lat":51.4566827,"lng":-0.9690389},
LoopVar = 0
for myTrkpt in root[1][2]:
  #Calculate the elapsed time
  TimeNow = myTrkpt[1].text
  TNow = datetime.strptime(TimeNow[:-5],"%Y-%m-%dT%H:%M:%S")
  TimeElapsed = abs(TNow - TStart).seconds

  #Build a Python dictionary that we'll write to the MongoDB
  MongoDoc = {}
  MongoDoc["elapsed"] = TimeElapsed
  MongoDoc["lat"] =  myTrkpt.attrib.get('lat')
  MongoDoc["lng"] =  myTrkpt.attrib.get('lon')
  MongoDoc["elevation"] =  myTrkpt[0].text
  MongoDoc["timestamp"] =  TimeNow[:-5]
  MongoDoc["heart"] = myTrkpt[2][0][0].text
  MongoDoc["cadence"] = 0

  #Write the document to the footie collection
  collection.insert_one(MongoDoc)

print("Done")


So here I use the Python XML and pymongo modules to parse the GPX files and write to the database respectively.

With pymongo you create an object and then connect to the database using a "Connection String".  You get this from the Azure management console for the MongoDB instance under the "Connection String" settings area.  This string contains the database address and the credentials required to access it.  You then can create a collection object which you write documents to using the insert_one() method.

Using the XML module you create an object called "root" and can use indices to access the different parts of the GPX structure.  So for example root[0] will be the first part of the GPX file.

The code then loops through the <trkpt> elements of the GPX file, picks out all the relevant data and then creates a Python dictionary which will be written to the database.  I also calculate an "elapsed" field which is the difference in seconds between the first <trkpt> elements and the element in question.  I foresee this being useful later...

It was interesting to look at the Azure console as I ran the scripts to parse the GPX files and write documents to the database.  Here's what was shown:

Here you can see a peaks "insert" requests as the data was being inserted.  The two peaks represent the two separate files being parsed and loaded.

Step 3 - A Web Server and an API
I wanted to create an API such that Javascript running within a browser could make a AJAX request to extract the data.  At some point I'll explore an Azure Web App for this sort of thing but for now I decided to use an Apache web server running on the Azure Linux VM and use a cgi-bin Python script to provide the functionality of the API.  I simply ran the sudo apt-get install apache2 command to install Apache and used this guide to get cgi-bin working for Python scripts.

To get the web server to work I had to do some configuration within the Azure console.  Specifically I had to configure a rule to enable HTTP traffic (port 80) on the platform.  To do this I selected the VM from the console, selected "Network Interfaces" then selected the Network Security Group.  I then configured the "HTTPRule" shown below:


To create the API I wrote the following Python script:



#!/usr/bin/env python

#Import statements
from pymongo import MongoClient
import cgi
import re
import cgitb

#Enable error logging
cgitb.enable()

#Database related - Got all these from "Connection String" area on Azure for the database instance
dBAddress = 'Your Connection String' 
CollectionID = 'Test'

#Get the query string parameters provided.  'name' field is the mongo name, 'value' field is the value
arguments = cgi.FieldStorage()
MongoName = arguments['name'].value
MongoValue = arguments['value'].value

#Form the document to use for the database access.  We will do a Regex because we may be searching on a partial date
MongoRegex = {}
MongoRegex['$regex'] = MongoValue
MongoDoc = {}
MongoDoc[MongoName] = MongoRegex

#Connect to the database, get a database object and get a collection
client = MongoClient(dBAddress)
db = client.geekmongo
collection = db.Test

print ('content-type: application/json\n\n')

#Get the total number of documents returned and set up a counter variable
TotalDocCount = collection.count(MongoDoc)
DocCounter = 0

#Start the output string
OutString = '{"markers":['

#Do a database find based upon the parameters provided.  Use this to form the output.  Need elapsed (integer), lat (long 4dp), lng (long 4dp)
#elevation (1dp), timestamp (string) and heart rate (integer)
for rhmDoc in collection.find(MongoDoc):
  OutString = OutString + '{"elapsed":' + str(rhmDoc["elapsed"]) + ','
  OutString = OutString + '"lat":' + str(round(float(rhmDoc["lat"]),4)) + ',' 
  OutString = OutString + '"lng":' + str(round(float(rhmDoc["lng"]),4)) + ','
  OutString = OutString + '"elevation":' + str(round(float(rhmDoc["elevation"]),1)) + ','
  OutString = OutString + '"timestamp":' + chr(34) + rhmDoc["timestamp"] + '",'
  OutString = OutString + '"heart":' + str(rhmDoc["heart"]) + '}'

  #See how many documents we've dealt with and whether we need to add a , to the end of the document
  DocCounter += 1
  #If we're on the last document then we add the ] to close the JSON array
  if (DocCounter == TotalDocCount):
    OutString = OutString + '],'
  else:
    OutString = OutString + ','

#Add the total items part
OutString = OutString + '"TotalItems":' + str(TotalDocCount) + '}'

#Stream the output to the client
print OutString

Here I use the CGI module to read query string parameters provided by the client.  So for example the URL:

http://<server URL>/GetGPXData.py?name=heart&value=82


...will result in a database query being made for all documents that contain the heart rate value of 82.

The script then takes the response of the database query and forms a string JSON document with all the values to pass back to the client.


The "regex" component of the MongoDB query document means you can do a "contains" search on the database.  i.e. "contains 2016" to return all the values for 2016.

Step 4 - Web Page, Javascript and Google Maps API
So the final part of the project was to write a web page that could use Javascript to a)download data from the API I just created and b)plot it on a map using the Google Maps API.
  • The HTML, CSS and Javascript is below.  Highlights:
  • Uses Google maps Javascript API to bring up a map, place and move markers.  I started with this tutorial.
  • Uses the API previously described to acquire the position data to plot (function startRace).
  • Uses a Javascript interval to "fire" and cause an assessment of position and the map to be updated
  • Calculates the straight line distance between the markers.
<!DOCTYPE html>
<html>
<head>
<style>
#map {
height: 500px;
width: 100%;
}
</style>

</head>
<body>
<h3>Reading Half Marathon Analysis</h3>
<div id="map"></div>
<input type='button' id='btnLoad' value='Load One' onclick='loadLocsOne();'>
<input type='button' id='btnLoad' value='Load Two' onclick='loadLocsTwo();'>
<p id="Distance"></p>
<input type='button' id='btnLoad' value='Race' onclick='startRace();'>
<input type='button' id='btnLoad' value='Stop' onclick='stopRace();'>
<input type='button' id='btnLoad' value='Heart Chart' onclick='heartChart();'>
<script type="text/javascript">
//This is V10 that adds getting the data from an 'API'
//Some nasty global variables. Discovered needed to use setInterval to control a marker and these were needed for that.
var map //Enables us to reference the map in all parts of the code
var markerOne //A marker entity
var markerTwo //A marker entity
var timeElapsed //A variable to hold how many seconds have elapsed
var maxElapsed //Defines the maximum elapsed time we'll have across the two JSON structures
var locsOne = {}; //A position array
var locsTwo = {}; //A position array
var intervalVar //Use for the setInterval thingy
var raceStarted //Boolean that defines whether the race has started
//Initialise the map and put a marker on it
function initMap() {
var readingOne = {lat: 51.4366827, lng: -0.9680389};
var readingTwo = {lat: 51.4466827, lng: -0.9780389};
map = new google.maps.Map(document.getElementById('map'), {
zoom: 13,
center: readingOne
});
markerOne = new google.maps.Marker({
position: readingOne,
map: map,
label: "6"
});
markerTwo = new google.maps.Marker({
position: readingTwo,
map: map,
label: "7"
//color: 0xFFFFFF
});
//This is a load or reload so state that the race is not started
raceStarted = false;
}
//Just move a marker
function positionMarker(inMarker, inLat, inLng)
{
//Set the position of the marker
var newPos = {lat: inLat, lng: inLng};
//Set the position of the marker
inMarker.setPosition(newPos);
}
//Initialises matters when user presses "Race"
function startRace()
{
//What we do first depends on whether the race is started!
if (raceStarted == false)
{
//Initialise the position number
timeElapsed = 0;
//We need to find out the max elapsed time across the two structures. In this way we'll increment the elapsed time every time the interval handler
//fires. If we find a position we update the marker. If not we leave the marker where it is. When we've exhausted all possible elapsed times then
//we know to stop the handler
var maxElapsedOne = locsOne.markers[locsOne.TotalItems - 1].elapsed;
var maxElapsedTwo = locsTwo.markers[locsTwo.TotalItems - 1].elapsed;
//Set up the max elapsed value
if (maxElapsedOne > maxElapsedTwo)
{
maxElapsed = maxElapsedOne;
}
else
{
maxElapsed = maxElapsedTwo;
}
raceStarted = true;
}
//Set up to move the marker
intervalVar = setInterval(function(){ assessMarkerMove()}, 10);
}
//Stops the race
function stopRace()
{
clearInterval(intervalVar);
}
//Handles assessing whether to move the markers and if required doing so
//locs.markers[posNumber].lat,locs.markers[posNumber].lng
function assessMarkerMove()
{
//Variables
var i
//See if we can find a marker associated with the current elapsed time
for (i in locsOne.markers)
{
if (locsOne.markers[i].elapsed == timeElapsed)
{
positionMarker(markerOne, locsOne.markers[i].lat, locsOne.markers[i].lng);
}
}
for (i in locsTwo.markers)
{
if (locsTwo.markers[i].elapsed == timeElapsed)
{
positionMarker(markerTwo, locsTwo.markers[i].lat, locsTwo.markers[i].lng)
}
}
//Calculate the straightline distance between the markers. Only do this every 10 iterations else it look messy
if (Number.isInteger(timeElapsed / 10) == true){
distanceBetween = Math.round(google.maps.geometry.spherical.computeDistanceBetween(markerOne.position, markerTwo.position));
document.getElementById("Distance").innerHTML = distanceBetween + ' metres between!';}
//Increment the counter of how many times this has been called
timeElapsed++;
//See whether we've reached the end of the array
if (timeElapsed > maxElapsed)
{
clearInterval(intervalVar);
raceStarted = false;
}
}
//Called when the load data button is pressed
function loadData()
{
loadLocsOne();
loadLocsTwo();
document.getElementById("Distance").innerHTML = 'Data Loaded!!';
}
//Load the first array
function loadLocsOne() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
//alert(this.responseText);
locsOne = JSON.parse(this.responseText);
document.getElementById("Distance").innerHTML = 'locsOne Loaded';
}
};
xhttp.open("GET", "http://a.b.c.d/cgi-bin/GetGPXData.py?name=timestamp&value=2016", true);

}
//Load the second array
function loadLocsTwo() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
locsTwo = JSON.parse(this.responseText);
document.getElementById("Distance").innerHTML = 'locsTwo Loaded';}
};
xhttp.open("GET", "http://a.b.c.d/cgi-bin/GetGPXData.py?name=timestamp&value=2017", true);
xhttp.send();
}
</script>
<script async defer
src="https://maps.googleapis.com/maps/api/js?key=<Your Key Here>&callback=initMap&libraries=geometry">
</script>
</body>
</html>