Monday, 4 January 2016

Using R to Analyse Fitbit API Data

I felt vaguely bad about my last post which was about creating a machine to cheat a pedometer/step counter.  Hence I thought I'd make amends by doing some analysis of step data for me obtained from the Fitbit API.

Previously I've written blog posts on accessing the Fitbit API and creating a Fitbit based infographic.  For this post I wanted to explore using R to ease the processing of analysing Fitbit data.  And wow!  Just wow! Once you get your head around R it's so easy!

Why R you may ask?  Simply because usually for this sort of analysis I end up manipulating data in Python or Excel then graphing it in Excel.  I'd read that R has all sorts of capabilities to make this easier and so wanted to give it a go.

I obtained a days worth of Fitbit API data for 2015-12-31 for me using the method I documented here.  The data looked like this (abridged):

ime":"00:30:00","value":0},{"time":"00:31:00","value":0}, basically one long JSON object that was represented as a single string in a text file.

To do the analysis, first you need to download and install R.  I did this from here.

To load the JSON into R for analysis I simply did this from the command line:

> install.packages("jsonlite")
> library(jsonlite)
> stepdata <- fromJSON(file.choose(),flatten=TRUE)

So here I've installed a package called jsonlite that can be used to pull in JSON objects to manipulate in R.  When you install an R package for the first time you have to choose a site to download it from.  I simply chose my nearest site geographically.

The library(jsonlite) statement just makes the library available to use.

The stepdata <- fromJSON(file.choose(),flatten=TRUE) statement then pulls the data into R.  The file.choose() component of this causes the Windows file chooser form to be loaded, allowing you to choose a file.

You can then look at the resulting R variable by doing this (output abridged):

> stepdata
    dateTime value
1 2015-12-31 21446

         time value
1    00:00:00     0
2    00:01:00     7
3    00:02:00    11
4    00:03:00     0
5    00:04:00     0
6    00:05:00     0

So each sub-component of the JSON string becomes a R sub-component that you can easily reference.  To just see the step data and  not see any of the other related data you can do this (output abridged):

> juststepdata <- stepdata$`activities-steps-intraday`$dataset
> juststepdata
         time value
1    00:00:00     0
2    00:01:00     7
3    00:02:00    11
4    00:03:00     0
5    00:04:00     0
6    00:05:00     0

So now I had a variable (actually an R "data frame") that could use for graphing.

To produce a nice graph I first had to get the time fields into a proper date and time format that R would understand as such.  To do this I did (output abridged):

> juststepdata$DateAndTime <- paste("2015-12-31",juststepdata$time,sep=" ")
> juststepdata$TimePosix <- as.POSIXct(juststepdata$DateAndTime)
> juststepdata

         time value           TimePosix         DateAndTime
1    00:00:00     0 2016-01-02 00:00:00 2015-12-31 00:00:00
2    00:01:00     7 2016-01-02 00:01:00 2015-12-31 00:01:00
3    00:02:00    11 2016-01-02 00:02:00 2015-12-31 00:02:00
4    00:03:00     0 2016-01-02 00:03:00 2015-12-31 00:03:00
5    00:04:00     0 2016-01-02 00:04:00 2015-12-31 00:04:00
6    00:05:00     0 2016-01-02 00:05:00 2015-12-31 00:05:00

First I added a date and time string column using the paste() function to concatenate a fixed date string onto the time string data elements.  Then the as.POSIXct() function call turns the date + time string into a full Posix style date and time value that R can understand as such.

The to graph simply do:

> install.packages("ggplot2")
> library(ggplot2) install and load the library and then to draw the graph:

> graphval <- qplot(TimePosix, value, data=juststepdata)
> graphval + labs(title="Fitbit Step Data- 31/12/2015",x = "Time of Day",y = "Steps")

...which results in:

Simples! I just love the way you can draw a decent looking graph with just two lines of R.  No mucking about with rows and columns like in Excel and no jiggery-pokey with axis label positions and the like.

On the graph I observe:

  • Lots of 0s when I'm asleep and sitting around during the day.
  • A general "buzz" of ~25 steps per minute during day time.
  • An afternoon walk at ~1300 with a walking cadence of ~100 steps per minute plus a few higher values when I ran.
  • A few extra peaks during the evening.  It was New Years Eve and we spent the evening visiting different people's houses.  

I'm sure there's more interesting ways to represent the data and get more insight.  Something for a future post perhaps...