Paul's Geek Dad Blog: January 2016

Sunday, 31 January 2016

Fitbit API Data Analysis Using Raspberry Pi, Python and R

I'm still feeling vaguely bad about the machine I made to cheat on pedometer step counting so I felt had to pay penance by doing more Fitbit API data analysis.

I wanted to try and find some more interesting ways to visualise the data I get from the Fitbit API. My inspiration was a book called “Information is Beautiful”, a book I bought from a well-known South American river book company just before Christmas. Except from my foray into creating a Sleep infographic I’ve always been a bit conservative in terms of how I visualise data, relying on bog-standard, boring bar charts and scatter graphs. “Information is Beautiful” has many and various infographics that make analysing data accessible, intuitive and just, well, beautiful! That was my inspiration, here’s the journey I went on…

Here's what I produced, I'll then tell you how I did it!

I've had my Fitbit Charge HR for just over a year now so I thought I'd "celebrate" by analysing a whole years worth of data from the Fitbit API! To do this I used the OAUTH2.0 method I wrote about here.

To get a years worth of data I simply had to use the following URL for the API call:

https://api.fitbit.com/1/user/-/activities/steps/date/2016-01-31/1y.json

So this is asking for my step data (activities/steps) for the one year period up to and including 2016-01-31. The command I ran was:

sudo python fitbit_oauth_request_v1.py > 2016-01-31.json

...meaning the output was redirected to the file 2016-01-31.json. The content of the file looked like this (after trimming off some initial text that came from the print statements in the Python script):

more 2016-01-31.json
{"activities-steps":[{"dateTime":"2015-02-01"
,"value":"21803"},{"dateTime":"2015-02-02","value":"7324"},{"dateTime":"2015-02-03","value":"10293"},{"dateTime":"2015-02-04","value":"12714"},{"dateTime":"2015-02-05",
"value":"10383"},{"dateTime":"2015-02-06","value":"11496"},{"dateTime":"2015-02-07","value":"17795"},{"dateTime":"2015-02-08","value":"19735"},{"dateTime":"2015-02-09",
"value":"10808"},{"dateTime":"2015-02-10","value":"8897"},{"dateTime":"2015-02-11","value":"10106"},{"dateTime":"2015-02-12","value":"9779"},{"dateTime":"2015-02-13","v
alue":"9850"},{"dateTime":"2015-02-14","value":"12108"},{"dateTime":"2015-02-15","value":"27393"},{"dateTime":"2015-02-16","value":"12992"}

So a simple JSON structure that has one element per day of the year with a simple step count in it. I then transferred the JSON file to my PC to process it with R.

I loaded up the JSON structure in R using:

> library(jsonlite)
> stepdata2015 <- fromJSON(file.choose(),flatten=TRUE)

Where file.choose() means the Windows file chooser form is opened to allow you to select the JSON file. The data looked like this (abridged):

> stepdata2015
$`activities-steps`
dateTime value
1 2015-02-01 21803
2 2015-02-02 7324
3 2015-02-03 10293
4 2015-02-04 12714
5 2015-02-05 10383

Looking at the type of data I saw:
> stepdata2015[0]
named list()

So not the "data frame" I've worked with in the past. This was reflected in the fact that I couldn't manipulate the data in a similar way to how I'd done it in the past. So I turned it into a data frame by doing this:

> stepdata2015_df <- as.data.frame(stepdata2015)

...which made the data look like this (abridged):

> stepdata2015_df
activities.steps.dateTime activities.steps.value
1 2015-02-01 21803
2 2015-02-02 7324
3 2015-02-03 10293
4 2015-02-04 12714
5 2015-02-05 10383

Then I graphed the data using these commands:
> library(ggplot2)
> graphval <- qplot(activities.steps.dateTime, activities.steps.value, data=stepdata2015_df)
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

...which yielded this graph:

This is definitely a graph but every single X value and Y value has a corresponding axis label. Most likely because they're both considered to be text fields. To make the X axis values to be of type date/time I did:

> stepdata2015_df$TimePosix <- as.POSIXct(stepdata2015_df$activities.steps.dateTime)

Then to turn the Y axis values into numbers I did:

> stepdata2015_df$StepsInt <- as.integer(stepdata2015_df$activities.steps.value)

...yielding:

> stepdata2015_df
activities.steps.dateTime activities.steps.value TimePosix StepsInt
1 2015-02-01 21803 2015-02-01 21803
2 2015-02-02 7324 2015-02-02 7324
3 2015-02-03 10293 2015-02-03 10293
4 2015-02-04 12714 2015-02-04 12714
5 2015-02-05 10383 2015-02-05 10383

Which means a much nicer looking graph which understands the X axis as a date and the Y axis as a number and intelligently provides fewer labels:

A nicer graph but really just a random collection of points to my eye. A bit of reading showed me you could add a smoother trendline to the chart by using a "geom" parameter and doing this:

> graphval <- qplot(TimePosix, StepsInt, data=stepdata2015_df,geom = c("point", "smooth"))
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

Yielding:

...which actually tells the story of my year quite nicely and shows how my step totals are really influenced by how much running I do. I started 2015 doing a little bit of running, did lots of running up to May/June, then cut back over the summer as I got injured and then did more towards the end of the year and into 2016 as I came back from injury. In fact, I've been really careful coming back from injury, increasing my weekly KM by no more than 10% and this is reflected in the gradient of the trendline.

I then decided the data needed aggregating into monthly totals and so did this:

> stepdata_2015_agg_sum <- aggregate(list(Steps = stepdata2015_df$StepsInt), list(month = cut(stepdata2015_df$TimePosix, "month")), sum)

Yielding (abridged):

> stepdata_2015_agg_sum
month Steps
1 2015-02-01 350767
2 2015-03-01 385209
3 2015-04-01 385578
4 2015-05-01 477423
5 2015-06-01 391484

I also decided to create my own infographic to visualise my month-on-month step count.

To calculate how many footsteps I needed to show on my visualisation I added some summaries:

> stepdata_2015_agg_sum$tenthoublocks <- stepdata_2015_agg_sum$Steps / 10000
> stepdata_2015_agg_sum$footsteps <- round(stepdata_2015_agg_sum$tenthoublocks, digits=0)

...yielding (abridged):

> stepdata_2015_agg_sum
month Steps tenthoublocks footsteps
1 2015-02-01 350767 35.0767 35
2 2015-03-01 385209 38.5209 39
3 2015-04-01 385578 38.5578 39
4 2015-05-01 477423 47.7423 48
5 2015-06-01 391484 39.1484 39

I then opened the data in Excel to graph it (or create a pictograph to use the proper lingo). Using this website to tell me how to create charts with images instead of boring bars I came up with the chart below. Each foot represents 10,000 steps:

I then thought I'd create my own! Each step on the “path” below represents 10,000 steps and I did it by manually copying, pasting and formatting in Excel:

Notwithstanding that months are of a different length, the infographic does nicely tally with my 2015 running profile of running a bit (Feb to April), running a lot (May - too much really), getting injured (June to October), getting back into running (October to Jan). It’s not the most beautiful infographic in the world and Mrs Geek thinks the footsteps look like butterflies but I’m happy with it!!

I think the standard Excel generated one was just fine!

Sunday, 24 January 2016

Google API Access Using OAUTH2.0, Python and Raspberry Pi

This is a post about using Python to retrieve data from a Google API. For a long time I've been aware that there were lots of Google APIs that gave access to lots of delicious data but I've not got around to playing with them. Finally I found the time.

I decided to try to get access to the blogger API (i.e. to give information about this blog!) using OAUTH2.0 and Python. The process of accessing the API is similar as for the Fitbit API I blogged about recently. So in simple terms the procedure is:

Register your app, specify access to the Blogger API and get your credentials.
Using OAUTH2.0 to get permission from the user to access their data (and get an authorisation code).
Swap your authorisation code for access and refresh tokens and use them to access the API.
Periodically get a new access token using the refresh token.

On their overview pages, Google recommend the use of pre-defined libraries to authenticate and access the APIs. Who am I to argue so this is what I did!

Part 1 - Register Your App
To do this go to Google Developer Console and login (I assume you've got a Google account). You see a screen that looks like the image below. Click on the project list and select "Create a project":

From the Developer Console select "Enable and manage APIs". Select the API you want (in my case Blogger V3) and then "Enable API".

You'll then be prompted to create credentials for the project (these are the OAUTH2.0 credentials). To do this I basically followed the Wizard that Google takes you through to specify what you want. In summary this was:

Select "Go to Credentials"
When asked "Where will you be calling the API from?" specified "Other UI".
When asked "What data will you be accessing?" specified "User data".
This told me I needed OAUTH2.0 credentials and so I clicked "Create client ID"

This took me to a place where I could define what details the user (in this case always me) would see when they're asked to authenticate access from my app. I specified "Blogger API Application" as the name of the application.

Click "Continue" and your OAUTH2.0 credentials are created. At this point there's the option to download a file containing your credentials. Do this and rename the file to be "client_secrets.json", (you'll need this later).

Part 2- Everything Else!

As stated before, the nice people from Google recommend that you use pre-defined software modules to authenticate and access the API. Sounds like a cunning plan so, as a Python man, the first thing I did was ran this command on my Raspberry Pi to download and install the Python module for Google API access:

sudo pip install --upgrade google-api-python-client

(I'm pretty sure pip was either already installed on my Pi or I installed it in the dim and distant past).

I then create a directory to hold my Python script to authenticate and use the API. For me this was:

/home/pi/google/blogger

In this directory I placed the client_secrets.json file I downloaded in the earlier step.

I then set about writing the Python script required to authenticate and access the API. However to my resounding joy I found that there were loads of pre-written scripts on the interweb, including one for the blogger API. Never one to look a gift horse in the mouth* I set about cribbing** a pre-written script.

(*-An English saying meaning if someone offers you something then take it. **-Another English saying meaning flagrant copying).

There is stacks of documentation here telling you how to use the Google Python module for API access. After a quick look at this I selected "Samples" then "Sample Applications" which took me to a Github page. On here there is a stack of pre-written Python scripts for Google API access including one for Blogger API use. It's written by Joe Gregorio (jcgregorio@google.com) so all credit to him and zero credit to me.

I copied the script and pasted it into a Nano editor. To create this file I did:

sudo nano blogger_2.py

Then I ran the script using the command:

sudo python blogger_2.py --noauth_local_webserver

(The --noauth_local_webserver switch was because I was running the script from a remote SSH session).

Here's a screenshot of what I saw in the SSH session:

So I've redacted some sensitive stuff on the image above but you copy the URL that the script presents and paste it into a browser. You're shown a page like that shown below and you click "Allow" to permit access from your application to your data:

You're then served a page with an authorisation code on it (redacted below).

Copy this and paste it back onto the command line for the Python script. The script then continue, accesses the API and prints the result!

So there you go. So the script looks at my blogger account, for my user object gets my blogs object then for each of my blogs objects prints out all the posts objects.

If you look in the directory from which you ran the script you can see that a file called blogger.dat is created. This contains your current access and refresh tokens and so is used and overwritten when new tokens are needed.

To show I'm not a complete cribber and to learn a bit more I set about adding extra lines to the script to get stats relating to my blog.

For this you need the Python API module reference which is here. Here's a screen shot:

In the script you can see an object called "service" is created using this line:

service, flags = sample_tools.init(

argv, 'blogger', 'v3', __doc__, __file__,

scope='https://www.googleapis.com/auth/blogger')

This can then be used to create objects to access different methods within the API. Such as:

blogs = service.blogs()

...which is then used to get the lists of blogs on the users Blogger account using:

# Retrieve the list of Blogs this user has write privileges on

thisusersblogs = blogs.listByUser(userId='self').execute()

Then it ierates through each blog using:

# List the posts for each blog this user has

for blog in thisusersblogs['items']:

So before it does this you can create an object for page views using:

pageviews = service.pageViews()

(Reference the Python module documentation references above). Then within the "for blog" loop you can do:

print('The stats for %s:' % blog['name'])

request = pageviews.get(blogId=blog['id'],range='all')

views_doc = request.execute()

print (views_doc)

The range='all' parameter specifies that you want stats for the lifetime of the blog. The options are:

      30DAYS - Page view counts from the last thirty days.
      7DAYS - Page view counts from the last seven days.
      all - Total page view counts from all time.

...and overall you get output like this:

The stats for Paul's Geek Dad Blog:
{u'kind': u'blogger#page_views', u'counts': [{u'count': u'94521', u'timeRange': u'ALL_TIME'}], u'blogId': u'123456678902'}

The stats for Paul's Blog:
{u'kind': u'blogger#page_views', u'counts': [{u'count': u'22', u'timeRange': u'ALL_TIME'}], u'blogId': u'123456678902'}

Enjoy!

Friday, 15 January 2016

Code.org Hour of Code Exercises are Awesome!

Here's a word -> Awesome. What's awesome you ask? The Hour of Code exercises on the code.org website are awesome I tell you!!

Prompted by an excellent teacher at my youngest daughter's school she did a few of these exercises at home before Christmas. I helped her with a few odds and ends but by-and-large she did it all herself.

I'll give you an overview as to what it's all about and as I go along, I'll tell you why it's awesome!

What is It?
Based upon a set of well known movies, characters or toys you chose a theme to do some coding with. Here's few examples:

Among other things, my daughter chose to do the Minecraft activities. I think because it's based upon famous things that kids will have heard of it's more engaging than "Hello World" or some other more abstract topic.

How Does to Work?
You step through a set of tasks associated with the theme. First you are shown a video to give you some background (you also get videos as you go along as new concepts are introduced):

Your challenge is then presented to you:

You then get presented with a Scratch IDE that allows you to drag and drop code blocks and run the resulting code:

You drag and drop your code, press "Run" and see what happens. If your code is correct you get a nice "well done" message. So really short, snappy and interactive with loads of sound effects, feedback and prompts to make you want to do more.

What Coding Concepts do you Learn?
Different blocks for different actions:

Increasing the complexity of the steps:

Introducing loops, showing that you don't have to create endless code to do repeated steps:

You get a nice error message if you get something wrong. You can then modify your code to get it right:

You get tips that your code could be more efficient and can even see the detail of the Javascript underlying your blocks:

You learn about IF statements:

You even get a certificate to print out and stick on the fridge!

Conclusion
So here you've learned about the main building blocks of writing code (sequence, selection and repetition) in a fun and engaging way. I wrote an email to the people behind it just to say thanks and how ace I thought it was.

Two quotes from my daughter:

"It's really cool"
"I like how it lets you use the skills you've learned"

Go do it. Here's the link again!

Monday, 11 January 2016

Fitbit API Access Using OAUTH2.0 and Raspberry Pi

Previously I've blogged on accessing the Fitbit API to do sleep analysis, a sleep infographic and to get per minute data.

This used the OAUTH1.0 authentication method but as of March 2016 that was deprecated. The requirement is to move to using OAUTH2.0 instead so this post is about how I rolled up my sleeves and did that!

The Fitbit developer site has excellent documentation on using OAUTH2.0 to access the API. I basically followed this step by step in order to gain access. I'll describe what I actually did and give code examples below.

Note this is for the general hobbiest, not for someone trying to do this professionally!

Step 1 - Register App
Go to step 1 of my original Fitbit API post and carry out the steps to register you application. Then come back here.

To do OAUTH2.0 you also need to go onto the Fitbit developer site and:

Select "MANAGE MY APPS", select your app and log your "OAuth 2.0 Client ID". (I assume you've already got your "Client (Consumer) Secret" logged).
Then select "Edit Application Settings" and set your "OAuth 2.0 Application Type" to "Personal".

...you're all set to go!

Step 2 - Get an Authorisation Code
Carry out the steps under "Authorization Page" section of the OAUTH2.0 documentation. This simply means forming a URL, pasting it into a browser and then following the steps on the resulting web page to authorise the app to use your data.

https://www.fitbit.com/oauth2/authorize?response_type=code&client_id=22942C&redirect_uri=http%3A%2F%2Fexample.com%2Fcallback&scope=activity%20nutrition%20heartrate%20location%20nutrition%20profile%20settings%20sleep%20social%20weight

The example URL above is from the Fitbit documentation. Simply change the client_id to the one you logged above to do it for you. "response_type=code" means your request type is "Authorization Code Flow". From reading the documentation this means you get both an access token and a refresh token for using the API; more on this later.

This will result in Fitbit redirecting to the callback URL you specified when registering your application and appending an "Authorization Code" to the end of the URL. Log this authorisation code for later use.

Step 3 - Get Access and Refresh Tokens
You now need to use your authorisation code to obtain your first access and refresh tokens. You need to do this within 10 mins of step 2 above.

How to do this is specified in the section "Access Token Request" of the Fitbit OAUTH2.0 page. To follow these steps I wrote some Python script on my Raspberry Pi to put the parameters together and execute the HTTP POST that you need to do. Simply take the code below and enter your client ID, your consumer secret, your authorisation code and your redirect URL.

If it works the result will be a JSON response printed to screen that shows your first access token and refresh token. Log these for use in the next step!

import base64
import urllib2
import urllib

#These are the secrets etc from Fitbit developer
OAuthTwoClientID = "Your_ID_Here"
ClientOrConsumerSecret = "Your_Secret_Here"

#This is the Fitbit URL
TokenURL = "https://api.fitbit.com/oauth2/token"

#I got this from the first verifier part when authorising my application
AuthorisationCode = "Your_Code_Here"

#Form the data payload
BodyText = {'code' : AuthorisationCode,
'redirect_uri' : 'http://pdwhomeautomation.blogspot.co.uk/',
'client_id' : OAuthTwoClientID,
'grant_type' : 'authorization_code'}

BodyURLEncoded = urllib.urlencode(BodyText)
print BodyURLEncoded

#Start the request
req = urllib2.Request(TokenURL,BodyURLEncoded)

#Add the headers, first we base64 encode the client id and client secret with a : inbetween and create the authorisation header
req.add_header('Authorization', 'Basic ' + base64.b64encode(OAuthTwoClientID + ":" + ClientOrConsumerSecret))
req.add_header('Content-Type', 'application/x-www-form-urlencoded')

#Fire off the request
try:
response = urllib2.urlopen(req)

FullResponse = response.read()

print "Output >>> " + FullResponse
except urllib2.URLError as e:
print e.code
print e.read()

Step 4 - Make and API Call and Refresh Tokens
This follows the steps described under the "Making Requests" and "Refreshing Tokens" section of the Fitbit OAUTH2.0 document.

In simple terms, you make a request using the access token. This has a limited lifetime (one hour) so when it runs out you use the refresh token to get a new access token (to use now) and a new refresh token (to get the next access token).

To get data from the API and get new tokens (if required) I used the code pasted in below. In simple terms this:

Reads the current tokens from a text file. I created this text file on my Raspberry Pi, pasted in my access token, pressed return then pasted in the refresh token. (Both tokens from step 3 above).
Makes a HTTP GET to the API URL using the access token. If this works, happy days.
If the HTTP GET doesn't work, it does a HTTP POST using the refresh token and logs the new tokens to file ready for next time.

To make it work:

Edit the IniFile variable to specify where you have your file stored.
Enter your client ID and client secret.

import base64
import urllib2
import urllib
import sys
import json
import os

#This is the Fitbit URL to use for the API call
FitbitURL = "https://api.fitbit.com/1/user/-/profile.json"

#Use this URL to refresh the access token
TokenURL = "https://api.fitbit.com/oauth2/token"

#Get and write the tokens from here
IniFile = "/home/pi/fitbit/tokens.txt"

#From the developer site
OAuthTwoClientID = "Your_ID_Here"
ClientOrConsumerSecret = "Your_Secret_Here"

#Some contants defining API error handling responses
TokenRefreshedOK = "Token refreshed OK"
ErrorInAPI = "Error when making API call that I couldn't handle"

#Get the config from the config file. This is the access and refresh tokens
def GetConfig():
print "Reading from the config file"

#Open the file
FileObj = open(IniFile,'r')

#Read first two lines - first is the access token, second is the refresh token
AccToken = FileObj.readline()
RefToken = FileObj.readline()

#Close the file
FileObj.close()

#See if the strings have newline characters on the end. If so, strip them
if (AccToken.find("\n") > 0):
AccToken = AccToken[:-1]
if (RefToken.find("\n") > 0):
RefToken = RefToken[:-1]

#Return values
return AccToken, RefToken

def WriteConfig(AccToken,RefToken):
print "Writing new token to the config file"
print "Writing this: " + AccToken + " and " + RefToken

#Delete the old config file
os.remove(IniFile)

#Open and write to the file
FileObj = open(IniFile,'w')
FileObj.write(AccToken + "\n")
FileObj.write(RefToken + "\n")
FileObj.close()

#Make a HTTP POST to get a new
def GetNewAccessToken(RefToken):
print "Getting a new access token"

#Form the data payload
BodyText = {'grant_type' : 'refresh_token',
'refresh_token' : RefToken}
#URL Encode it
BodyURLEncoded = urllib.urlencode(BodyText)
print "Using this as the body when getting access token >>" + BodyURLEncoded

#Start the request
tokenreq = urllib2.Request(TokenURL,BodyURLEncoded)

#Add the headers, first we base64 encode the client id and client secret with a : inbetween and create the authorisation header
tokenreq.add_header('Authorization', 'Basic ' + base64.b64encode(OAuthTwoClientID + ":" + ClientOrConsumerSecret))
tokenreq.add_header('Content-Type', 'application/x-www-form-urlencoded')

#Fire off the request
try:
tokenresponse = urllib2.urlopen(tokenreq)

#See what we got back. If it's this part of the code it was OK
FullResponse = tokenresponse.read()

#Need to pick out the access token and write it to the config file. Use a JSON manipluation module
ResponseJSON = json.loads(FullResponse)

#Read the access token as a string
NewAccessToken = str(ResponseJSON['access_token'])
NewRefreshToken = str(ResponseJSON['refresh_token'])
#Write the access token to the ini file
WriteConfig(NewAccessToken,NewRefreshToken)

print "New access token output >>> " + FullResponse
except urllib2.URLError as e:
#Gettin to this part of the code means we got an error
print "An error was raised when getting the access token. Need to stop here"
print e.code
print e.read()
sys.exit()

#This makes an API call. It also catches errors and tries to deal with them
def MakeAPICall(InURL,AccToken,RefToken):
#Start the request
req = urllib2.Request(InURL)

#Add the access token in the header
req.add_header('Authorization', 'Bearer ' + AccToken)

print "I used this access token " + AccToken
#Fire off the request
try:
#Do the request
response = urllib2.urlopen(req)
#Read the response
FullResponse = response.read()

#Return values
return True, FullResponse
#Catch errors, e.g. A 401 error that signifies the need for a new access token
except urllib2.URLError as e:
print "Got this HTTP error: " + str(e.code)
HTTPErrorMessage = e.read()
print "This was in the HTTP error message: " + HTTPErrorMessage
#See what the error was
if (e.code == 401) and (HTTPErrorMessage.find("Access token invalid or expired") > 0):
GetNewAccessToken(RefToken)
return False, TokenRefreshedOK
#Return that this didn't work, allowing the calling function to handle it
return False, ErrorInAPI

#Main part of the code
#Declare these global variables that we'll use for the access and refresh tokens
AccessToken = ""
RefreshToken = ""

print "Fitbit API Test Code"

#Get the config
AccessToken, RefreshToken = GetConfig()

#Make the API call
APICallOK, APIResponse = MakeAPICall(FitbitURL, AccessToken, RefreshToken)

if APICallOK:
print APIResponse
else:
if (APIResponse == TokenRefreshedOK):
print "Refreshed the access token. Can go again"
else:
print ErrorInAPI

Thursday, 7 January 2016

Strava API Analysis Using R

In a couple of previous posts I've covered how I have used the Strava API to analyse exercise data I've captured with my Garmin Forerunner 910 watch.

Analysing this data has always been a bit laborious but my new found discovery of R (see here for how I used it for Fitbit data) means it's now easy.

Getting data from the Strava API is pretty easy, you just have to register, get a key and then you can use simple HTTP GET requests to get your data (the first link above shows how I did it). So no HTTP post, no forming payload, no 256 hashes or anything.

This makes it easy to import into R using the jsonlite library. Here's an example (assuming you've installed jsonlite):

library(jsonlite)
stravadata <- fromJSON('https://www.strava.com/api/v3/activities?access_token=7<your key here>&per_page=200&after=1420070400',flatten=TRUE)

This then yields a R data frame that you can manipulate. First have a quick look at the first row of the dataframe:

> stravadata[c(1),]
id resource_state external_id upload_id name distance moving_time elapsed_time total_elevation_gain type start_date
1 236833349 2 <NA> NA First swim of 2015 1300 2700 2700 0 Swim 2015-01-04T20:45:00Z
start_date_local timezone start_latlng end_latlng location_city location_state location_country start_latitude start_longitude
1 2015-01-04T20:45:00Z (GMT+00:00) Europe/London NULL NULL <NA> <NA> United Kingdom NA NA
achievement_count kudos_count comment_count athlete_count photo_count trainer commute manual private flagged gear_id average_speed max_speed total_photo_count
1 0 0 0 1 0 FALSE FALSE TRUE FALSE FALSE <NA> 0.481 0 0
has_kudoed average_cadence average_watts device_watts average_heartrate max_heartrate elev_high elev_low workout_type kilojoules athlete.id athlete.resource_state
1 FALSE NA NA NA NA NA NA NA NA NA 4309532 1
map.id map.summary_polyline map.resource_state
1 a236833349 <NA> 2

This gives you a good idea of interesting fields to further analyse:

name = column 5
distance = column 6
type = column 10

..which lets you look at the data in a more refined format. So first row and the three columns listed above:

> stravadata[c(1),c(5,6,10)]
name distance type
1 First swim of 2015 1300 Swim

Before going much further I needed to filter the results to just show those for 2015 as my Strava API call would have included everything from 2016 to date as well. Do this by:

strava2015 <- stravadata[grep("2015-", stravadata$start_date), ]

...which yields this (just first 3 rows shown):

> strava2015[c(1:3),c(5,6)]
name distance
1 First swim of 2015 1300.0
2 HIIT 20150106 4716.1
3 HIIT 20140108 4709.2

Then picking out just the type and the distance:

> strava2015simple <- strava2015[,c(10,6)]

...and looking at first 3 rows of this:

> strava2015simple[c(1:3),]
type distance
1 Swim 1300.0
2 Ride 4716.1
3 Ride 4709.2

Making it very easy to compute some aggregated stats for distances for 2015:

First averages:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), mean)
> stravaagg
Type Distance
1 Ride 17765.398
2 Run 5487.856
3 Swim 1067.619

...then totals:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), sum)

> stravaagg
Type Distance
1 Ride 1030393.1
2 Run 334759.2
3 Swim 50178.1

So easy! (I'm not going to trouble the Brownlee brothers with these figures!)