It seems like
that all the news headlines today in Cyber are all about Generative AI and its
many different subsets, such as Large Language Models (also known as “LLMs”). I have covered this topic very extensively in
the four books that I have written about it, as well as in the white papers, articles
and blogs that I have written for other people.
But there is
one area in which, unbelievably, I have touched upon, and that is the area of what
is known as “AI Data Poisoning”.
You may be
wondering what it is, so here is a technical definition of it:
“Data
poisoning is a type of cyberattack where threat actors manipulate
or corrupt the training data used to develop artificial
intelligence (AI) and machine learning (ML) models.”
(SOURCE: What Is Data Poisoning?
| IBM)
Remember, as
I have written about in the past, what drives a Generative AI model is the data
that is fed into it. It can be easily compared
to a car which needs gasoline to make it run and go places. Likewise, it is the data that fuels the model
and gives the momentum that it needs to produce an answer, or an output to the query
that has been submitted to it.
But keep in
mind that not just any output will do.
It must meet what the end user is looking for. In order to make sure that this happens, whoever
is in charge of the model must make sure
that the datasets that are fed into the model are cleansed and robust, as well
as free from having any statistical outliers.
Using our car
for example again, you need to give the right kind of fuel so that the engine will
not get damaged (for instance, you do not pump diesel fuel into a Honda). The same is true of the Generative AI
model. It needs the right data to make
its algorithms (which is its engine) work equally smoothly.
But
Generative AI is a field that is changing on an almost daily basis. Thus trying to deploy the latest
Cybersecurity controls can be an almost. impossible task to accomplish. The Cyberattacker is fully aware of this and knows
the vulnerabilities that are present. Thus,
they launch what are known as Poisoning Attacks to insert fake data into the
model.
But it does
not stop here. They can also quite easily
insert a malicious payload to serve two key purposes:
Ø
Launch
another Supply Chain Attack (just as we saw with Solar Winds and Crowd Strike)
that could have huge, cascading effects.
Ø
Launch
a Data Exfiltration Attack to not only steal the legitimate datasets that are
being used in the model itself, but also those datasets which reside in the IT
and Network Infrastructure of a business entity.
So
given all of this, there are now three trends that are expected to happen, at
some point in time down the road, which are as follows:
1)
Back
To Solar Winds:
Yes,
I know I just mentioned this, but the kind of attack that can happen here to a
Generative AI Model will be magnified by at least ten times because of a Poisoning
Attack. To put it another perspective, when
the Solar Winds hack took place, there were about 1,000 victims. Now, there could be at least 10,000 victims or
even more, all over the world. In this
regard, the main point of insertion for a malicious payload would be LLM, if there
is one that is present.
2)
The
Role of the CDO:
This
is an acronym that stands for the “Chief Data Officer”. This job title can be compared to that of the
CISO, but their focus is on the datasets that their company has and is
currently using. Up until now, their main
tasks were to simply write the Security Policies that would help fortify the
lines of defenses around a Generative AI model.
But with the advent of Data Poisoning, their role will now shift into
hiring and managing a team of employees whose sole mission is the cleansing and
optimization of the datasets before they are fed into the model. Another key role for them here also is to
make sure that whatever datasets they are using come into compliance with the
data privacy laws, such as those of the GDPR and the CCPA.
3)
It
is Going to Happen:
If
Phishing has been around, so will Poisoning Attacks. They will start to evolve this year and pick
up steam later on. But as companies keep
using Generative AI, this will be a highly favored threat variant for the Cyberattacker. In fact, according to a recent market survey that
was conducted by McKinsey, over 65% of businesses today use Generative AI on a
daily basis. To see the full report,
access the link below:
http://cyberresources.solutions/Blogs/Gen_AI_Report.pdf
My
Thoughts on This:
I am far from
being an actual Generative AI practitioner, but I would like to offer my opinion
as to how you can mitigate the threat of a Poisoning Attack from impacting your
business:
Ø
Generative
AI models are not just one thing. The
model or models that it uses are connected to many other resources in the
external world. There are a lot of interconnectivities
here, so I would recommend keeping a map or visual to keep track of all this
and keep updating on a real-time basis as more connections are being made into
it. This will also give a clever idea as
to where you need to exactly deploy your Cybersecurity controls in the
Generative AI Ecosystem.
Ø
If
you can, hire a CDO as quickly as you can.
You do not have to hire them as full-time employees, you can also hire
them on a contract basis, to keep them affordable. But you will need them ASAP if you are going
to make use of Generative AI based models.
Poisoning
Attacks are going to be around for a long time.
So, now is the time to get prepared!!!
No comments:
Post a Comment