Sunday, 31 January 2016

Fitbit API Data Analysis Using Raspberry Pi, Python and R

I'm still feeling vaguely bad about the machine I made to cheat on pedometer step counting so I felt had to pay penance by doing more Fitbit API data analysis.

I wanted to try and find some more interesting ways to visualise the data I get from the Fitbit API.  My inspiration was a book called “Information is Beautiful”, a book I bought from a well-known South American river book company just before Christmas.  Except from my foray into creating a Sleep infographic I’ve always been a bit conservative in terms of how I visualise data, relying on bog-standard, boring bar charts and scatter graphs.  “Information is Beautiful” has many and various infographics that make analysing data accessible, intuitive and just, well, beautiful!  That was my inspiration, here’s the journey I went on…

Here's what I produced, I'll then tell you how I did it!


I've had my Fitbit Charge HR for just over a year now so I thought I'd "celebrate" by analysing a whole years worth of data from the Fitbit API!  To do this I used the OAUTH2.0 method I wrote about here.

To get a years worth of data I simply had to use the following URL for the API call:

https://api.fitbit.com/1/user/-/activities/steps/date/2016-01-31/1y.json

So this is asking for my step data (activities/steps) for the one year period up to and including 2016-01-31.  The command I ran was:

sudo python fitbit_oauth_request_v1.py > 2016-01-31.json

...meaning the output was redirected to the file 2016-01-31.json.  The content of the file looked like this (after trimming off some initial text that came from the print statements in the Python script):

more 2016-01-31.json
{"activities-steps":[{"dateTime":"2015-02-01"
,"value":"21803"},{"dateTime":"2015-02-02","value":"7324"},{"dateTime":"2015-02-03","value":"10293"},{"dateTime":"2015-02-04","value":"12714"},{"dateTime":"2015-02-05",
"value":"10383"},{"dateTime":"2015-02-06","value":"11496"},{"dateTime":"2015-02-07","value":"17795"},{"dateTime":"2015-02-08","value":"19735"},{"dateTime":"2015-02-09",
"value":"10808"},{"dateTime":"2015-02-10","value":"8897"},{"dateTime":"2015-02-11","value":"10106"},{"dateTime":"2015-02-12","value":"9779"},{"dateTime":"2015-02-13","v
alue":"9850"},{"dateTime":"2015-02-14","value":"12108"},{"dateTime":"2015-02-15","value":"27393"},{"dateTime":"2015-02-16","value":"12992"} 

So a simple JSON structure that has one element per day of the year with a simple step count in it.  I then transferred the JSON file to my PC to process it with R.

I loaded up the JSON structure in R using:

> library(jsonlite)
> stepdata2015 <- fromJSON(file.choose(),flatten=TRUE)

Where file.choose() means the Windows file chooser form is opened to allow you to select the JSON file.  The data looked like this (abridged):

> stepdata2015
$`activities-steps`
      dateTime value
1  2015-02-01 21803
2  2015-02-02  7324
3  2015-02-03 10293
4  2015-02-04 12714
5  2015-02-05 10383

Looking at the type of data I saw:
> stepdata2015[0]
named list()

So not the "data frame" I've worked with in the past.  This was reflected in the fact that I couldn't manipulate the data in a similar way to how I'd done it in the past.  So I turned it into a data frame by doing this:

> stepdata2015_df <- as.data.frame(stepdata2015)

...which made the data look like this (abridged):

> stepdata2015_df
   activities.steps.dateTime activities.steps.value
1                 2015-02-01                  21803
2                 2015-02-02                   7324
3                 2015-02-03                  10293
4                 2015-02-04                  12714
5                 2015-02-05                  10383

Then I graphed the data using these commands:
> library(ggplot2)
> graphval <- qplot(activities.steps.dateTime, activities.steps.value, data=stepdata2015_df)
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

...which yielded this graph:


This is definitely a graph but every single X value and Y value has a corresponding axis label.  Most likely because they're both considered to be text fields.  To make the X axis values to be of type date/time I did:

> stepdata2015_df$TimePosix <- as.POSIXct(stepdata2015_df$activities.steps.dateTime)

Then to turn the Y axis values into numbers I did:

> stepdata2015_df$StepsInt <- as.integer(stepdata2015_df$activities.steps.value)

...yielding:

> stepdata2015_df
    activities.steps.dateTime activities.steps.value  TimePosix StepsInt
1                  2015-02-01                  21803 2015-02-01    21803
2                  2015-02-02                   7324 2015-02-02     7324
3                  2015-02-03                  10293 2015-02-03    10293
                 2015-02-04                  12714 2015-02-04    12714
5                  2015-02-05                  10383 2015-02-05    10383

Which means a much nicer looking graph which understands the X axis as a date and the Y axis as a number and intelligently provides fewer labels:


A nicer graph but really just a random collection of points to my eye.  A bit of reading showed me you could add a smoother trendline to the chart by using a "geom" parameter and doing this:

> graphval <- qplot(TimePosix, StepsInt, data=stepdata2015_df,geom = c("point", "smooth"))
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

Yielding:



...which actually tells the story of my year quite nicely and shows how my step totals are really influenced by how much running I do.  I started 2015 doing a little bit of running, did lots of running up to May/June, then cut back over the summer as I got injured and then did more towards the end of the year and into 2016 as I came back from injury.  In fact, I've been really careful coming back from injury, increasing my weekly KM by no more than 10% and this is reflected in the gradient of the trendline.

I then decided the data needed aggregating into monthly totals and so did this:

> stepdata_2015_agg_sum <- aggregate(list(Steps = stepdata2015_df$StepsInt), list(month = cut(stepdata2015_df$TimePosix, "month")), sum)

Yielding (abridged):

> stepdata_2015_agg_sum
        month  Steps
1  2015-02-01 350767
2  2015-03-01 385209
3  2015-04-01 385578
4  2015-05-01 477423
5  2015-06-01 391484

I also decided to create my own infographic to visualise my month-on-month step count.

To calculate how many footsteps I needed to show on my visualisation I added some summaries:

> stepdata_2015_agg_sum$tenthoublocks <- stepdata_2015_agg_sum$Steps / 10000
> stepdata_2015_agg_sum$footsteps <- round(stepdata_2015_agg_sum$tenthoublocks, digits=0)

...yielding (abridged):

> stepdata_2015_agg_sum
        month  Steps tenthoublocks footsteps
1  2015-02-01 350767       35.0767        35
2  2015-03-01 385209       38.5209        39
3  2015-04-01 385578       38.5578        39
4  2015-05-01 477423       47.7423        48
5  2015-06-01 391484       39.1484        39

I then opened the data in Excel to graph it (or create a pictograph to use the proper lingo).  Using this website to tell me how to create charts with images instead of boring bars I came up with the chart below.  Each foot represents 10,000 steps:



I then thought I'd create my own!  Each step on the “path” below represents 10,000 steps and I did it by manually copying, pasting and formatting in Excel:


Notwithstanding that months are of a different length, the infographic does nicely tally with my 2015 running profile of running a bit (Feb to April), running a lot (May - too much really), getting injured (June to October), getting back into running (October to Jan).  It’s not the most beautiful infographic in the world and Mrs Geek thinks the footsteps look like butterflies but I’m happy with it!!

I think the standard Excel generated one was just fine!