Saturday, November 20, 2021

How To Stay GDPR/CCPA Compliant With ML & AI

 


It just struck me today that I have been doing a lot of writing on AI and ML this year.  In fact, I even wrote an entire book on this subject, and it came out in the springtime.  Even on my podcasts, I usually ask my guests what they think of it, and if they are currently using or planning to implement it in their future product and service lines.  

The answer I get most of the time is that they plan to deploy it in a phased approach.

The main reason for this is that the terms AI and ML are very much misused in the Cyber Industry, especially by the vendors themselves.  There is a lot of hype around it when selling products, and of course, customers get excited when they think they are deploying the latest and greatest.  It’s like calculating Cyber Risk. 

Every vendor under the sun claims to have their own unique, trademarked formula.  But truth be to tell, all of these fancy formulas all stem from the same set of statistical techniques.  But that is about as far as I am going to take this, don’t want to get too critical (though I do have some rather harsh views on this). 

It’s the same thing for AI and ML:  Garbage In and Garbage Out for the data that is being used to train the systems.

But with respect to the latter, it finally dawned on me the other day is that the datasets that are feeding into the AI and ML systems also have to be compliant with the data privacy laws, especially that with the GDPR and the CCPA.  That is something that never occurred to me, even as I wrote my book about it.  But with this in mind, you now have to keep into account compliance from this perspective as well.

So how do you do it?  Here are some quick tips:

1)     Make sure that you have the permission to use it:

This relates to when you are collecting customer information and data as they visit your website.  Nowadays, you have to have a check box on your contact form asking the prospect that they consent to having their private data collected, and also being stored and processed for uses down the road.  You also have to give them the right to opt out and have their Personal Identifiable Information (PII) datasets deleted from your databases upon request.  But it does not go as far as just as your website.  It also includes if you are processing orders from customers on the phone or even your online store.  The main intention here of course is to feed this kind of real time information and data into your AI system so that you can further study their buying patterns, and even try to predict what future purchases could be.  But, you have to let the know for what purposes the information/data is being collected, and they also have the right to opt out in this regard.

2)     Scrub your data:

The above scenario pretty much applies to when you are interacting with customers and prospects on a real time basis.  But what if you are feeding data into your AI and ML systems from previously collected sets?  How do you keep clean with the GDPR and the CCPA in this regard?  The best way here is to go through these datasets, and purge anything that appears to be personal, or that can identify a particular individual.  In this regard, you will probably purchasing these kinds of datasets from a vendor, and really it is there job to do this.  But to be really safe, you should probably run a double check on it as well.  If you can, as far as possible, try to train your systems without the PII datasets involved.  Although the GDPR and the CCPA are still very murky in this area (thus giving you some leeway), it is best to be safe rather than sorry.  This all comes down the “Right To Be Forgotten” provision in the GDPR.  More information about this can be seen at this link:

https://gdpr-info.eu/art-17-gdpr/

3)     Know where your data resides at:

I remember reading an article some time ago polling CISOs if they were their data is stored at.  Quite astonishingly, a majority of them did not know where it was kept at.  But, you should not be a part of this statistic!  If you have to collect PII datasets to train your AI and ML models, you must know every step of the way how these types of data are being archived, stored, and even processed.  In fact, this all has to be documented.  Yea, it sounds like a real pain, but taking the extra steps to do this now will save you the nightmares of having to go through an audit and face serious financial penalties down the road.  In fact, you really have no choice in the matter.  The GDPR and the CCPA both have specific mandates that require you to know where all of your data is kept at.

4)     Always protect the data:

If you ever read through the provisions of both the GDPR and the CCPA, one of the most common terms you will see is that of “controls”.  Essentially, these are the tools that you have put into place to protect your PII datasets in case they are every heisted, such as in a Ransomware attack.  This is where conducting an assessment on your existing state of controls on at least a yearly basis is absolutely crucial, and fixing up any weaknesses.  By doing this, not only are you showing to regulators that are in compliance, but you will have a much better chance of getting a liability payout from your insurance company as well.  But this is something that you should not go alone at.  You really need to have a compliance specialist help you out with this, such as that of a vCCO or vDPO.  Also, try to come up with and even use fake identifiers. If you have to mark out certain datasets, just to play it even more safe.

My Thoughts On This:

So as you have fun experimenting around with your ML and AI systems, just keep in mind that you too, are subject to the tenets of the GPDR and CCPA as well in this regard.  Everybody has a stake in this, but it ultimately comes down the CISO for the overall responsibility, so make sure to keep him or her always apprised.

Also, make sure that you purchase your datasets (which will be used for training purposes) from a vendor that has good, solid reputation, and is also abiding by the rules and regulations of the data privacy laws.  If need be, you may even have institute an entirely new vetting process for this.

No comments:

Post a Comment

How To Launch A Better Penetration Test In 2025: 4 Golden Tips

  In my past 16+ years as a tech writer, one of the themes that I have written a lot about is Penetration Testing.   I have written man blog...