There is no doubt that businesses today are facing uncertain
times. A lot of this has been due to the
layoffs in the tech sector, and the persistent interest hikes by the Federal
Reserve to keep inflation lower, and keep it at bay.
But one thing is also for certain, the growth of AI and ML
has picked up its pace very quickly since the beginning of this year, and a lot
of that has been driven by the evolution of ChatGPT.
But what people fail to realize is that both AI and ML are needed
to learn. In other words, they need to be
given a baseline from which they can literally learn something, and from there,
try to predict the outcomes of an issue or an event, or to simply answer a
query that an end user could pose to ChatGPT.
But in order to do this, it all takes data, and tons of
it. This can be compared to putting fuel
in your car. If you don’t have any, you
of course will not go anywhere.
This is the same with AI and ML. They need data as their fuel to keep their
algorithms and models running on a real time basis.
This can be in the form of structured data (which are
quantitative in nature), or unstructured data (which is qualitative in nature). Btu what the actual datasets need to be will
depend upon what the AI/ML application has been designed to do.
The world of Data Science is truly a unique one, and in fact to get off
of the subject a little bit, this is where the majority of jobs will be in the future. But there are different kinds of data (apart
from the ones just mentioned).
For
example, there is Data at Rest, Data in Motion, and Data in Transaction. TO make life even more confusing, there is
now even a new piece of data classification that has been emerging out of the woodworks.
This
is known as “Dark Data”. What is it you may
be asking? Well, a technical definition of
it as follows:
“These
are the information assets organizations collect, process and store during
regular business activities, but generally fail to use for other purposes (for
example, analytics, business relationships and direct monetizing).”
(SOURCE: https://www.gartner.com/en/information-technology/glossary/dark-data)
In other words, put in simpler terms, this is the information
and data that is not being used by a business.
In other words, it is simply being stored for no useful purpose. One might wonder why a business would do
this, but it is hard to give an answer.
Obviously, they have their own reasons for doing this, and
it is something that would not be public information.
One of the biggest issues of storing Latent Data is the sheer
cost of storage that can add up quickly.
For example, if you have On Premises Infrastructure, you have finite
resources. But if you have your IT and
Network Infrastructure based in the Cloud (such as AWS or Azure), you will have
many resources at your disposal to store these datasets.
Although the Cloud offers you both elasticity and scalability
in this regard, using more storage will also add up to your monthly bill. To give you a point of example of this, it
has been quoted that Netflix has spent nearly $10 Million per month on storing
Latent Data. (SOURCE: https://www.comparitech.com/blog/vpn-privacy/netflix-statistics-facts-figures/).
Another key issue to keep in mind is that even if you are
not using these kinds of datasets, simply storing them for indefinite periods
of time will also make you subject to the guises of the various data privacy
laws, such as the GDPR, CCPA, HIPAA, etc.
This will mean that you will have to make sure that you have implemented
the right kinds of controls to protect these
datasets.
If you don’t and they have been leaked out, you will not
only be the subject of an audit, but you could also face very stiff fines and penalties
as well. For example, under the GDPR,
this can amount to up to 4% of your total gross revenue. Now, that is a huge chunk of change,
IMHO.
Third, there is a huge risk that simply having data around for
no useful purpose whatsoever will become prey to the eyes of the Cyberattacker. In fact, this would be a very easy really to
go after. If he or she gets hold of it, they
can use that to launch ID Theft attacks, sell it on the Dark Web, or worst yet,
make it publicly available in an extortion like attack.
By having this “useless” kind of data, not only are you
putting your employees and customers at grave risk, but you are also risking
your complete brand image if you do experience a data leakage issue, whether it
is intentional or not.
My Thoughts On This:
Simply put, keeping any sort of extraneous datasets around
is a huge risk to borne. Not only can it
be costly, but it can even lead to potential security, as just reviewed. So what is the best way out of this
situation? Just simply delete whatever
you don’t need or use.
For example, if you have launched a recent marketing
campaign, and have already used the information and data that has been
collected from it, there is no use having it around.
Remember, datasets can lose their value to a company quickly
over time, because it has not been updated.
This can also be a costly proposition if you intend to, but have no solid
business case to do so.
But, if you do intend to get rid of Latent Data, make sure
you hire a data destruction company to do it.
Have everything documented in case you do ever face an audit from a
regulator.
No comments:
Post a Comment