Thursday, 7 January 2016

Strava API Analysis Using R

In a couple of previous posts I've covered how I have used the Strava API to analyse exercise data I've captured with my Garmin Forerunner 910 watch.

Analysing this data has always been a bit laborious but my new found discovery of R (see here for how I used it for Fitbit data) means it's now easy.

Getting data from the Strava API is pretty easy, you just have to register, get a key and then you can use simple HTTP GET requests to get your data (the first link above shows how I did it).  So no HTTP post, no forming payload, no 256 hashes or anything.

This makes it easy to import into R using the jsonlite library.  Here's an example (assuming you've installed jsonlite):

stravadata <- fromJSON('<your key here>&per_page=200&after=1420070400',flatten=TRUE)

This then yields a R data frame that you can manipulate.  First have a quick look at the first row of the dataframe:

> stravadata[c(1),]
         id resource_state external_id upload_id               name distance moving_time elapsed_time total_elevation_gain type           start_date
1 236833349              2        <NA>        NA First swim of 2015     1300        2700         2700                    0 Swim 2015-01-04T20:45:00Z
      start_date_local                  timezone start_latlng end_latlng location_city location_state location_country start_latitude start_longitude
1 2015-01-04T20:45:00Z (GMT+00:00) Europe/London         NULL       NULL          <NA>           <NA>   United Kingdom             NA              NA
  achievement_count kudos_count comment_count athlete_count photo_count trainer commute manual private flagged gear_id average_speed max_speed total_photo_count
1                 0           0             0             1           0   FALSE   FALSE   TRUE   FALSE   FALSE    <NA>         0.481         0                 0
  has_kudoed average_cadence average_watts device_watts average_heartrate max_heartrate elev_high elev_low workout_type kilojoules athlete.resource_state
1      FALSE              NA            NA           NA                NA            NA        NA       NA           NA         NA    4309532                      1 map.summary_polyline map.resource_state
1 a236833349                 <NA>                  2

This gives you a good idea of interesting fields to further analyse:

  • name = column 5
  • distance = column 6
  • type = column 10

..which lets you look at the data in a more refined format. So first row and the three columns listed above:

> stravadata[c(1),c(5,6,10)]
                name distance type
1 First swim of 2015     1300 Swim

Before going much further I needed to filter the results to just show those for 2015 as my Strava API call would have included everything from 2016 to date as well.  Do this by:

strava2015 <- stravadata[grep("2015-", stravadata$start_date), ]

...which yields this (just first 3 rows shown):

> strava2015[c(1:3),c(5,6)]
                name distance
1 First swim of 2015   1300.0
2      HIIT 20150106   4716.1
3      HIIT 20140108   4709.2

Then picking out just the type and the distance:

> strava2015simple <- strava2015[,c(10,6)]

...and looking at first 3 rows of this:

> strava2015simple[c(1:3),]
  type distance
1 Swim   1300.0
2 Ride   4716.1
3 Ride   4709.2

Making it very easy to compute some aggregated stats for distances for 2015:

First averages:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), mean)
> stravaagg
  Type  Distance
1 Ride 17765.398
2  Run  5487.856
3 Swim  1067.619

...then totals:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), sum)

> stravaagg
  Type  Distance
1 Ride 1030393.1
2  Run  334759.2
3 Swim   50178.1

So easy!  (I'm not going to trouble the Brownlee brothers with these figures!)