16th September 2016
This blog post is based on some work we did a while back to support the International Health Analytics Capability. The challenge was initially to investigate the impact of socio-economic status on depression and to investigate examples of health inequality. This is a particular area of interest for the Government here in Northern Ireland for whom we produced a report on this.
In order to investigate this, we had to find some suitable datasets and it just happens that we have a hugely rich dataset here on GP prescribing. This dataset can be obtained from Open Data NI
As you can see from the screenshot, this is essentially a CSV file that contains all of the GP prescribing info linked to a specific practice ID. Fantastic data but limited in what it can really tell us. If we do a bit of searching around, we can find a means of linking this data to the actual GP practices themselves with the address and postcode available. Obviously this brings us one step further to making this data useful as we can associate the practice to a geographic location. GP practices here have a limited catchment area so although there will be some outliers and issues, the level of aggregation we are looking at should negate that.
In order to look at the aggregation, we looked at a few different options and eventually settled on electoral areas as being a suitable level of granularity. This splits Northern Ireland, a relatively small place, into 18 regions. These regions ore often similar in socio-economic status and of course, when producing a report for government, the electoral regions are what they will be interested in.
In order to delve into the socio-economic aspect, we pulled in census data on health status and deprivation measures as well as unemployment statistics from the Northern Ireland Statistics and Research Agency (NISRA). We were also able to obtain data from NISRA about population levels for normalisation.
Finally, we used MapIt to obtain GeoJSON files for the mapping of the data. So what we ended up with is a number of datasets from 4 different locations in a number of different formats. In our last blog we talked about the right datastore for the right data and data virtualisation. This was a prime example of doing that where all the data is stored in Analytics Engines XDP™ and pulled together using our unified data view.
For the initial Exploratory analysis and experimentation, we simply plugged in Tableau, a popular BI tool. This enabled us to explore the data and look for trends. With the datasets presented via our UnifiedDataView, the integration is straightforward and enables the user to just get to work using the data rather than spending the fabled 80% of their time “data wrangling”. One of our engineers also created a nice D3 visualisation for me that enabled me to create lists of drugs, link them to conditions and then map the incidence geographically and integrate with the socio-economic data. This was what I used to explore the data and while the info below features static screengrabs, obviously in D3 this is interactive.
For the rest of this post I will focus on answering my initial question regarding the link between mental health and socio-economic status. There were a lot of other interesting insights I gained from having the flexibility of just being able to play with the data and by using some statistical analysis, but that will perhaps be the subject of a future post.
To examine the original question, I first created a list of the commonly prescribed anti-depressants and anti-anxiety medications used here. I actually found it a little curious that Venlafaxine and mirtazapine, serotonin-norepinephrine reuptake inhibitors (SNRI), are the most commonly prescribed antidepressant rather than a selective serotonin reuptake inhibitor (SSRI) like fluoxetine or paroxetine. SNRI’s are generally considered more effective, but SSRIs have generally lower side effects. The general consensus is that SSRIs are more commonly prescribed, however based on this data it seems this may not actually be the case. In fact the NHS information on antidepressants actually states that SSRIS are most commonly used. I’m going to have to restrain myself here from going off on tangents here, although it’s so easy to do with such a rich and easily explored dataset.
The next figure shows a map of Northern Ireland and a timeline which shows the frequency of prescription in the different electoral regions and over the period of time for which we have this dataset. The data we have dates back to April 2014 and runs up to this month as the HSC now releases this data monthly.
Our next step was to correlate this data with some socio-economic factors. The first we look at is unemployment rates. A fairly strong trend line can be seen here indicating that increase in antidepressant usage correlates strongly with the percentage unemployment in the region. There is a notable outlier in the data here, where Foyle in the Northwest of the province has the highest unemployment but a surprisingly low use of antidepressants.
The next figure show the same drug usage but in relation to “deprivation measure”. This measure developed by NISRA is made of up 52 indicators grouped into seven domains (Income Deprivation, Employment Deprivation, Health Deprivation and Disability, Education, Skills and Training Deprivation, Proximity to Services, Living Environment, Crime and Disorder). These show a highly similar trend that suggests that increased deprivation in an area leads to increase in depression rates. Foyle, the outlier from the previous figure, has fallen back in line here and follows the trend indicating that while unemployment may be high in the region, the overall level of deprivation and quality of life may be better. This is an important observation highlighting that single statistics, such as unemployment, are not necessarily a good measure of quality of life in a region.
The final figure here is an analysis of prescription rates compared with “bad and very bad health”. This is a self reported statistic garnered from census data. Unsurprisingly the trend is similar. I wont go into it here but this “bad and very bad health” is not limited to depression and mental health but spans across the board and follows the same regional and socio-economic trends.
I mentioned before that it’s easy to go off on tangents when playing with this data, in fact since we included the prescription data from England (undoubtedly the subject of a future post) its got worse. While writing this I got distracted by some research into SSRI vs SNRI for antidepressant and narrowly avoided really going down a rabbit hole of looking into MAOIs vs Tricyclics and an examination of frequency of Coeliac disease across the UK and why there seems to be a regional preference for low protein snacks.
Playing with datasets like this can be fun as well as informative. Its fascinating that we can look at so much through the integration of these datasets. For example, I could go and see if there was any sign of a correlation between unemployment level and consumption of gluten free bread (nope), or whether people in Northern Ireland get prescribed a lot of low protein snacks (also nope) and where in the UK is the highest usage of depigmenting agents (London, by a huge amount).
One of the most interesting things though is that this isn’t just abstract data… Its very real and personal. While its completely de-identified, anonymised and based on aggregates… I know my data is in there, and the simple interactive visualisations make it easy to really get an impact with a simple click.
Fundamentally though, what we see here is the impact of being able to integrate disparate data. By the very nature of the datasets we have looked at here, it makes a lot of sense to pull them together. When we do so, there are clearly interesting insights that can be derived. In reality though, the challenge of integrating data holds people back from doing so. The easier we can make this process, the more people can spend time analysing and getting insight from their data.