It just struck me today that I have been doing a lot of
writing on AI and ML this year. In fact,
I even wrote an entire book on this subject, and it came out in the springtime. Even on my podcasts, I usually ask my guests
what they think of it, and if they are currently using or planning to implement
it in their future product and service lines.
The answer I get most of the time is that they plan to deploy
it in a phased approach.
The main reason for this is that the terms AI and ML are
very much misused in the Cyber Industry, especially by the vendors themselves. There is a lot of hype around it when selling
products, and of course, customers get excited when they think they are
deploying the latest and greatest. It’s
like calculating Cyber Risk.
Every vendor under the sun claims to have their own unique,
trademarked formula. But truth be to tell,
all of these fancy formulas all stem from the same set of statistical techniques. But that is about as far as I am going to
take this, don’t want to get too critical (though I do have some rather harsh
views on this).
It’s the same thing for AI and ML: Garbage In and Garbage Out for the data that
is being used to train the systems.
But with respect to the latter, it finally dawned on me the
other day is that the datasets that are feeding into the AI and ML systems also
have to be compliant with the data privacy laws, especially that with the GDPR
and the CCPA. That is something that
never occurred to me, even as I wrote my book about it. But with this in mind, you now have to keep
into account compliance from this perspective as well.
So how do you do it?
Here are some quick tips:
1)
Make sure that you have the permission to use
it:
This relates to when you are
collecting customer information and data as they visit your website. Nowadays, you have to have a check box on
your contact form asking the prospect that they consent to having their private
data collected, and also being stored and processed for uses down the
road. You also have to give them the right
to opt out and have their Personal Identifiable Information (PII) datasets
deleted from your databases upon request.
But it does not go as far as just as your website. It also includes if you are processing orders
from customers on the phone or even your online store. The main intention here of course is to feed this
kind of real time information and data into your AI system so that you can further
study their buying patterns, and even try to predict what future purchases could
be. But, you have to let the know for
what purposes the information/data is being collected, and they also have the
right to opt out in this regard.
2)
Scrub your data:
The above scenario pretty much
applies to when you are interacting with customers and prospects on a real time
basis. But what if you are feeding data into
your AI and ML systems from previously collected sets? How do you keep clean with the GDPR and the CCPA
in this regard? The best way here is to
go through these datasets, and purge anything that appears to be personal, or
that can identify a particular individual.
In this regard, you will probably purchasing these kinds of datasets
from a vendor, and really it is there job to do this. But to be really safe, you should probably run
a double check on it as well. If you can,
as far as possible, try to train your systems without the PII datasets
involved. Although the GDPR and the CCPA
are still very murky in this area (thus giving you some leeway), it is best to
be safe rather than sorry. This all
comes down the “Right To Be Forgotten” provision in the GDPR. More information about this can be seen at
this link:
https://gdpr-info.eu/art-17-gdpr/
3)
Know where your data resides at:
I remember reading an article some
time ago polling CISOs if they were their data is stored at. Quite astonishingly, a majority of them did
not know where it was kept at. But,
you should not be a part of this statistic! If you have to collect PII datasets to train your
AI and ML models, you must know every step of the way how these types of data
are being archived, stored, and even processed.
In fact, this all has to be documented.
Yea, it sounds like a real pain, but taking the extra steps to do this
now will save you the nightmares of having to go through an audit and face serious
financial penalties down the road. In
fact, you really have no choice in the matter. The GDPR and the CCPA both have specific
mandates that require you to know where all of your data is kept at.
4)
Always protect the data:
If you ever read through the
provisions of both the GDPR and the CCPA, one of the most common terms you will
see is that of “controls”. Essentially,
these are the tools that you have put into place to protect your PII datasets
in case they are every heisted, such as in a Ransomware attack. This is where conducting an assessment on your
existing state of controls on at least a yearly basis is absolutely crucial,
and fixing up any weaknesses. By doing
this, not only are you showing to regulators that are in compliance, but you will
have a much better chance of getting a liability payout from your insurance
company as well. But this is something that
you should not go alone at. You really
need to have a compliance specialist help you out with this, such as that of a
vCCO or vDPO. Also, try to come up with and
even use fake identifiers. If you have to mark out certain datasets, just to play
it even more safe.
My Thoughts On This:
So as you have fun experimenting around with your ML and AI
systems, just keep in mind that you too, are subject to the tenets of the GPDR
and CCPA as well in this regard.
Everybody has a stake in this, but it ultimately comes down the CISO for
the overall responsibility, so make sure to keep him or her always apprised.
Also, make sure that you purchase your datasets (which will be
used for training purposes) from a vendor that has good, solid reputation, and
is also abiding by the rules and regulations of the data privacy laws. If need be, you may even have institute an
entirely new vetting process for this.
No comments:
Post a Comment