Saturday, June 24, 2023

The Top 3 Cyber Risks Of Latent Data

 


There is no doubt that businesses today are facing uncertain times.  A lot of this has been due to the layoffs in the tech sector, and the persistent interest hikes by the Federal Reserve to keep inflation lower, and keep it at bay. 

But one thing is also for certain, the growth of AI and ML has picked up its pace very quickly since the beginning of this year, and a lot of that has been driven by the evolution of ChatGPT. 

But what people fail to realize is that both AI and ML are needed to learn.  In other words, they need to be given a baseline from which they can literally learn something, and from there, try to predict the outcomes of an issue or an event, or to simply answer a query that an end user could pose to ChatGPT. 

But in order to do this, it all takes data, and tons of it.  This can be compared to putting fuel in your car.  If you don’t have any, you of course will not go anywhere. 

This is the same with AI and ML.  They need data as their fuel to keep their algorithms and models running on a real time basis. 

This can be in the form of structured data (which are quantitative in nature), or unstructured data (which is qualitative in nature).  Btu what the actual datasets need to be will depend upon what the AI/ML application has been designed to do.

The world of Data Science is truly a unique one, and in fact to get off of the subject a little bit, this is where the majority of jobs will be in the future.  But there are different kinds of data (apart from the ones just mentioned). 

For example, there is Data at Rest, Data in Motion, and Data in Transaction.  TO make life even more confusing, there is now even a new piece of data classification that has been emerging out of the woodworks.

This is known as “Dark Data”.  What is it you may be asking?  Well, a technical definition of it as follows:

These are the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).”

(SOURCE:  https://www.gartner.com/en/information-technology/glossary/dark-data)

In other words, put in simpler terms, this is the information and data that is not being used by a business.  In other words, it is simply being stored for no useful purpose.  One might wonder why a business would do this, but it is hard to give an answer. 

Obviously, they have their own reasons for doing this, and it is something that would not be public information.

One of the biggest issues of storing Latent Data is the sheer cost of storage that can add up quickly.  For example, if you have On Premises Infrastructure, you have finite resources.  But if you have your IT and Network Infrastructure based in the Cloud (such as AWS or Azure), you will have many resources at your disposal to store these datasets.

Although the Cloud offers you both elasticity and scalability in this regard, using more storage will also add up to your monthly bill.  To give you a point of example of this, it has been quoted that Netflix has spent nearly $10 Million per month on storing Latent Data.  (SOURCE:  https://www.comparitech.com/blog/vpn-privacy/netflix-statistics-facts-figures/).

Another key issue to keep in mind is that even if you are not using these kinds of datasets, simply storing them for indefinite periods of time will also make you subject to the guises of the various data privacy laws, such as the GDPR, CCPA, HIPAA, etc.  This will mean that you will have to make sure that you have implemented the right  kinds of controls to protect these datasets. 

If you don’t and they have been leaked out, you will not only be the subject of an audit, but you could also face very stiff fines and penalties as well.  For example, under the GDPR, this can amount to up to 4% of your total gross revenue.  Now, that is a huge chunk of change, IMHO. 

Third, there is a huge risk that simply having data around for no useful purpose whatsoever will become prey to the eyes of the Cyberattacker.  In fact, this would be a very easy really to go after.  If he or she gets hold of it, they can use that to launch ID Theft attacks, sell it on the Dark Web, or worst yet, make it publicly available in an extortion like attack.  

By having this “useless” kind of data, not only are you putting your employees and customers at grave risk, but you are also risking your complete brand image if you do experience a data leakage issue, whether it is intentional or not.

My Thoughts On This:

Simply put, keeping any sort of extraneous datasets around is a huge risk to borne.  Not only can it be costly, but it can even lead to potential security, as just reviewed.  So what is the best way out of this situation?  Just simply delete whatever you don’t need or use. 

For example, if you have launched a recent marketing campaign, and have already used the information and data that has been collected from it, there is no use having it around.

Remember, datasets can lose their value to a company quickly over time, because it has not been updated.  This can also be a costly proposition if you intend to, but have no solid business case to do so. 

But, if you do intend to get rid of Latent Data, make sure you hire a data destruction company to do it.  Have everything documented in case you do ever face an audit from a regulator.

No comments:

Post a Comment

4 Ways How Generative AI Can Combat Deepfakes

  Just last week, I authored an entire article for a client about Deepfakes.   For those of you who do not know what they are, it is basical...