Do you know what data you have in your organisation and why should you care? This blog will discuss the costs of storing the data, the challenges and some ideas and suggestions to manage your data.
In recent years there has been a data explosion. With new technological advances comes a further burden on the data requirements. For example the ability to use biometrics in passports using retina scanners or facial recognition, or fingerprint readers on mobile phones. PST files, hard disks in photo copiers, security required for children's school lunches, CCTV. The list of new data sources is endlessly growing. Since Coronavirus COVID-19 there has been a lot more home working with data being copied to local computers to work on. With remote working there is also more data being shared on central storage platforms like Microsoft Teams in order to collaborate.
According to IDC research1 the expected global growth of data is to rise from 4ZB (Zetabytes) in 2010 to a staggering 59ZB in 2020 (A Zetabyte is 1,000,000,000,000,000 Megabytes!). A worrying statistic is that 65% of this data is expected to be unstructured. Unstructured data is information stored in emails or file servers that are not stored in a structured database. These files can have many versions or copies of the same file distributed across many computers. Due to the unstructured nature of these files it is very difficult to keep track of where they are and what data is contained within these files.
A further issue to the management of files is highlighted by a UK study by Veritas which shows that 53% of data is classed as Dark Data2. This is not data that hackers are working with or dangerous data, but the unknown, untouched data. Companies have no idea as to the worth or sensitive nature of these files.
Companies therefore have less visibility and control over data that is stored in corporate systems. This unknown data might contain corporate sensitive data or personal information that would need to be protected under various regulations like the American Health Insurance Portability and Accountability Act (HIPAA), The EU legislation for the Markets in Financial Instruments Directive (MIFIID II), the EU General Data Protection Regulation (GDPR) and many others. Understanding this data is therefore essential. Using the GDPR as an example. If you don't know what you have, how can you protect the personal data or respond to an Article 17 Right to erasure ("Right to be Forgotten") request in the short timeframes that are allowed within this regulation?
There are therefore potential business costs to not knowing about your data. Failure to comply with regulatory requirements, failing to extract knowledge or being able to effectively search all your data could lead to financial and repetitional penalties.
There is also the physical cost of storing all this data. The raw disk space cost per terabyte (TB) would be $48.413. If we look at the total global data, the disk cost would be $2,856 trillion. This is the equivalent total projected GDP for the UK ($2,716) in 2020 according to the February 2020 StatisticsTimes report4. If 53% of data is dark data that is the equivalent of $1,514 trillion of potentially wasted or not needed storage!
With all this unknown data around we have the following challenges:
♦︎ We don't know if the data has been kept longer than necessary (a potential breach of data regulations)
♦︎ We won't know which volume to search for DSAR requests
♦︎ Can we be sure the personal data or other sensitive data is secure
♦︎ Can we build an asset register if we don't know where and what data we hod
So to summarise, we all need to understand what data we have; delete any Dark Data, remove PST files, delete data that is no longer required and move sensitive data into a location that is protected. We also need a way to quickly search through this data, firstly to help identify what we have, but then to satisfy any eDisclosure or Data Subject Access Requests (DSAR) search.
1. This information is from the IDC Global DataSphere forecast 2020 - https://www.idc.com/getdoc.jsp?containerId=prUS46286020
2. Statistics from the updated Veritas UK 2020 Databerg Report - https://www.veritas.com/en/uk/form/whitepaper/the-uk-2020-databerg-report
3. Cost of a single 180TB disk in the region of $8,713. 28 x 180TB disks would provide 5PB (5.04PB) storage (without RAID). 28 x $8,713 = $243,964 or $48.41 per TB