Big data is big enough
Having a data strategy means you'll collect the right data for the right reasons
- extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.
The data around data is mind blowing. More data has been created in the past two years than in the history of time. And this growth will remain exponential. Google processes 63,000 searches every single second. We send 300 billion emails every day. We use 20 billion smart devices and by 2025 the pool of existing data will have swelled to an incomprehensible 175 zettabytes.
Harlem 1947: Reclusive millionaire and notorious hoarder Langley Collyer is found dead in his home after being crushed beneath a pile of fallen clutter
As organisations fanatically hoard as much data as they can now, based on the assumption that it will become useful in the future, recent studies show that over 99% of it will never be used. And this cannot be justifiable. Okay so the costs of data storage have decreased, but costs of curating it – verification, back up, recovery and so on – can be very expensive requiring substantial input from specialist personnel.
And this valuable people resource should be focused on analysing your existing data, searching for the Holy Grail, revealing trends which gain competitive advantage, rather than firefighting the never ending deluge. An analyst recently likened modern data analytics to searching for a needle in a haystack where the haystack gets ten times bigger every day.
December 2016: Yahoo reports a data breach in 2013 had affected 1billion of its users. Closer investigation revealed the true figure to be nearer 3billion.
There is another problem. The more of something you have, the less you really know about it. So in the event of a serious data breach – and let’s face it – they’re becoming worryingly frequent, organisations find themselves having to notify their stakeholders that they don’t have any real idea of the scale of the breach, or the content of the data affected. Aside from being extremely embarrassing, it isn’t good corporate governance.
So it costs too much, wipes value off existing data and wipes value off your organisation when something goes wrong.
Clearly then, the way forward is to declutter? But this isn’t straightforward. There’s no Marie Kondo tutorial to cover data. The cost of the people resource needed to make sure you don’t throw the needle away with the haystack will be epic. Even more waste on top of the money wasted building the haystack in the first place.
So what’s the answer?
First of all, define the problems which you want to use your data to solve. Then get clarity on the outcomes which need to be achieved. All the while interrogate the commercials. Ask yourself, will solving these problems cost us more money than we stand to make? Then define your analytics strategy identifying the data pool and stream you will work with.
Here at Tharsus we’re using motion data with our own manufacturing teams to optimise the layout of the shop floor. We gather specific data that feeds a number of different supervised and unsupervised machine learning algorithms to drive significant improvements in productivity as well as the work environment. And these in turn are adding value to our business as well as our customers’ business.
For our customers, who value data driven automation, we understand the need to collect the right data, as opposed to hording as much data as possible in the hope that it will provide insight. Machine learning models are only as good as the data that defines how they have learned – quite simply – you get out what you put in. At Tharsus we’re developing a better understanding for how to define the data you need to solve your actual business needs, and no more than that.
You don’t need to climb on the big data bandwagon.
Band wagon: climb on
Idiom Definition. To become involved with or support an activity or cause that has recently become popular
Paul Featonby is Digital Technology Director at Tharsus