Focusing on our upcoming event on scrutiny and governance, we understand that the scrutiny process involves dealing with a lot of data. Our chat with Suzanne Draper from Data Unit Wales highlighted a key question – how can we get data to work for you? In this blog Suzanne gives some tips on how to understand and use data effectively.
“Lies, damned lies and statistics”
Benjamin Disraeli’s famous quote suggests that all statistics and data are questionable. And, indeed, they are – you simply need to make sure you are asking the right questions.
Here are our top 10 questions to help you better understand and use data:
Is the data relevant?
Data is everywhere. We are bombarded daily with facts and figures. It can be overwhelming, confusing even. How do you know what is important?
The trick is to focus on what you are trying to achieve and ask yourself: does this data help me understand more about the topic? Will I be able to make better, more informed decisions as a result?
If not, move on.
Is the data reliable?
In the same way, it can be difficult to know who or what to believe – will that face cream really reduce my wrinkles in just 7 days?
When using data, you need to be able to trust it. To do this, you need to understand where the data comes from and how it was produced.
There are many credible organisations who produce and publish quality data, including Data Unit Wales! These organisations will all have robust methods of collecting, verifying and publishing data to make sure it is as accurate and reliable as possible.
Is the definition clear?
Definitions are often simplified to make data more accessible. However, this can be misleading. Take, for example, a headline which appeared in a British newspaper in 2013:
“1,200 killed by mental patients”
However, if you look more closely at the underlying data you’d see that around half of those that committed the reported homicides had symptoms of mental illness at the time of the homicide, but were not, in fact, ‘mental patients’. What’s more, the study noted that it is unclear whether these symptoms led to the homicides.
While misrepresentation of the facts is usually unintentional, it can have a big impact on how you perceive the data and what you do with it.
Are the units clear?
Data is presented in a variety of formats, each with its own purpose.
Numbers, or counts, help you to understand the quantity or amount of something e.g. 151,000 tonnes of waste was sent to landfill in 2016-17.
I don’t know about you, but I wouldn’t know whether this is a lot or not. Percentages and rates are, therefore, used to make the data more meaningful and accessible:
e.g. 10% of waste was sent to landfill in 2016-17
e.g. 0.05 tonnes of waste per person was sent to landfill in 2016-17
When using percentages it is important to understand the underlying data. For instance, if two local planning authorities both decided 50% of their planning applications in 8 weeks you’d say that they were performing at the same level. However, if you knew that Authority A had decided 100 applications (50 of which were in 8 weeks) and Authority B just four applications (two in 8 weeks) would you still say they were performing at the same level?
How current is the data?
It is important to be clear about what time period the data relates to – is it this month, last month, this year, last year?
Most good quality data takes some time to produce. Usually, annual data will take between 6 and 12 months to be published, but some larger datasets may take longer.
Data shouldn’t be disregarded simply because it is ‘old’ – there are many valid reasons why we might use such data. For instance, it may be collected infrequently (such as Census data) or it may simply be the best estimate available.
How robust is the data?
Most data has a degree of unknown error – it is almost impossible to guarantee that a piece of data is 100% accurate. However, some data is likely to be more robust than others due to the way in which it was collected. Counts and estimates are likely to be very robust. Survey data may be less so due to sample sizes and the subjectivity of the data.
Are the comparisons valid?
Comparisons are very useful in helping put data into perspective, but only if the data is comparable. This may seem obvious, but it is very easy to make a mistake. There are two key things to consider when comparing data:
Has the data been produced to the same definition? For instance, have you included and excluded the same things, does it cover the same period, etc.
Has the data been standardised to take account of other factors that might influence differences in the data? For instance, if you were comparing staff age profiles across organisations you would expect a bigger organisation to have more staff in each age bracket. Comparing whole numbers wouldn’t therefore tell you anything you didn’t already know. If, however, you compared the percentage of staff within each age bracket (thus removing the impact of the size of the organisation) you’d quickly see how your age profiles compared.
So, if you answer ‘no’ to either of these questions, chances are the comparisons aren’t valid.
Are the graphics clear?
In addition to the above considerations, when looking at data in charts or graphs there are a couple more things you should look out for:
- Always check the axis – if the data doesn’t start from zero your perspective may be distorted;
- Beware of 3D charts – they do not give an accurate representation of the data;
A graphic should have one clear message. If you can’t find it quickly don’t waste your time and find another way to look the data.
Do you have the complete picture?
So often, particularly in the media, you are presented with one, lone figure on which to form an opinion.
In no other aspect of our lives would we expect this to happen. For instance, we wouldn’t expect a doctor to make a diagnosis based on our blood pressure reading alone.
And so it follows that the data you are using should provide you with a balanced picture – it should allow you to answer both ‘what?’ and ‘why?’.
Are there any other factors that need to be taken into consideration?
It’s important to make sure you have all the information to help you understand the data. For instance, is the data rounded? Has some of the data been ‘hidden’ in order to protect individuals? Is there any national (or local) legislation that has a direct bearing on the data and its use?
Most organisations publish metadata alongside their data. Metadata is “data about data” and is designed to provide you with all the necessary information about the data that you are looking at, including any ‘special instructions’.
So, to summarise, in order to use data effectively you need to understand what you are looking at. If in doubt, ask!