Paul's Geek Dad Blog: Strava API Analysis Using R

In a couple of previous posts I've covered how I have used the Strava API to analyse exercise data I've captured with my Garmin Forerunner 910 watch.

Analysing this data has always been a bit laborious but my new found discovery of R (see here for how I used it for Fitbit data) means it's now easy.

Getting data from the Strava API is pretty easy, you just have to register, get a key and then you can use simple HTTP GET requests to get your data (the first link above shows how I did it). So no HTTP post, no forming payload, no 256 hashes or anything.

This makes it easy to import into R using the jsonlite library. Here's an example (assuming you've installed jsonlite):

library(jsonlite)
stravadata <- fromJSON('https://www.strava.com/api/v3/activities?access_token=7<your key here>&per_page=200&after=1420070400',flatten=TRUE)

This then yields a R data frame that you can manipulate. First have a quick look at the first row of the dataframe:

> stravadata[c(1),]
id resource_state external_id upload_id name distance moving_time elapsed_time total_elevation_gain type start_date
1 236833349 2 <NA> NA First swim of 2015 1300 2700 2700 0 Swim 2015-01-04T20:45:00Z
start_date_local timezone start_latlng end_latlng location_city location_state location_country start_latitude start_longitude
1 2015-01-04T20:45:00Z (GMT+00:00) Europe/London NULL NULL <NA> <NA> United Kingdom NA NA
achievement_count kudos_count comment_count athlete_count photo_count trainer commute manual private flagged gear_id average_speed max_speed total_photo_count
1 0 0 0 1 0 FALSE FALSE TRUE FALSE FALSE <NA> 0.481 0 0
has_kudoed average_cadence average_watts device_watts average_heartrate max_heartrate elev_high elev_low workout_type kilojoules athlete.id athlete.resource_state
1 FALSE NA NA NA NA NA NA NA NA NA 4309532 1
map.id map.summary_polyline map.resource_state
1 a236833349 <NA> 2

This gives you a good idea of interesting fields to further analyse:

name = column 5
distance = column 6
type = column 10

..which lets you look at the data in a more refined format. So first row and the three columns listed above:

> stravadata[c(1),c(5,6,10)]
name distance type
1 First swim of 2015 1300 Swim

Before going much further I needed to filter the results to just show those for 2015 as my Strava API call would have included everything from 2016 to date as well. Do this by:

strava2015 <- stravadata[grep("2015-", stravadata$start_date), ]

...which yields this (just first 3 rows shown):

> strava2015[c(1:3),c(5,6)]
name distance
1 First swim of 2015 1300.0
2 HIIT 20150106 4716.1
3 HIIT 20140108 4709.2

Then picking out just the type and the distance:

> strava2015simple <- strava2015[,c(10,6)]

...and looking at first 3 rows of this:

> strava2015simple[c(1:3),]
type distance
1 Swim 1300.0
2 Ride 4716.1
3 Ride 4709.2

Making it very easy to compute some aggregated stats for distances for 2015:

First averages:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), mean)
> stravaagg
Type Distance
1 Ride 17765.398
2 Run 5487.856
3 Swim 1067.619

...then totals:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), sum)

> stravaagg
Type Distance
1 Ride 1030393.1
2 Run 334759.2
3 Swim 50178.1

So easy! (I'm not going to trouble the Brownlee brothers with these figures!)

Paul's Geek Dad Blog

Thursday, 7 January 2016

Strava API Analysis Using R

No comments:

Post a Comment