In the world
of AI today, we are certainly hearing a lot of buzzwords that are floating around
today. A lot of them of them come from the
vendors themselves, most notably those of Google, Microsoft, and OpenAI. But on a technical level, the only one that
most people have at least heard of is that of “Generative AI”.
Simply put,
this is where you submit a query to ChatGPT, and the output to it (which is
actually the answer you are looking for) can come in a wide variety of formats,
ranging from the simple text answer to even an audio or video file.
But another
integral part of AI that is going to also take the world by “storm” is that of Large
Language Models, also known as “LLMs” for short. But before we go any further on this, it is
first important to define it, which as follows:
“Large
language models (LLMs) are a category of foundation models trained on immense
amounts of data making them capable of understanding and generating natural
language and other types of content to perform a wide range of tasks.”
(SOURCE: https://www.ibm.com/topics/large-language-models)
So while you
think that ChatGPT already uses large amounts of data for it to learn, and answer
your queries, the LLM can take datasets that are at least 100X as large and
still have the ability to generate the right outputs. Some differentiating factors between this and
other areas of AI, such as Machine Learning and Neural Networks include:
*It needs to
be hosted on several Virtual Machines given the size of the datasets that they
process.
*It also
tries to comprehend the human language that is spoken to it, and even tries to
create the output in the same way.
But given its
sheer power, LLMs are also prone to be in the cross hairs of the Cyberattacker. For example, if an LLM is used in a Chatbot
(or “Digital Personality”), it can actually be quickly manipulated in such a
way that it can easily launch a Social Engineering Attack. For instance, after the tool has developed a good,
and trusting rapport with the end user, the conversation can then shift to him
or her giving away their confidential information.
So in order
to help mitigate this risk of happening, it is very important to establish a
set of best practices and standards that you should follow. Here are some starting points:
1)
Always
keep an eye:
One
of the cardinal rules in Cybersecurity is to always keep tabs on abnormal
behavior. But if your organization is
large enough in terms of endpoints and network security devices, this can be an
almost task to do for your IT Security team to accomplish in a timely
fashion. Therefore, for the purposes of automation,
and to only provide those messages and warnings that are truly legitimate, you should
seriously consider using a Generative AI based tool in this regard. But keep in mind that that this too will have
to be trained, so it can learn what to look out for in the future with regard
to unusual trends.
2)
Create
solid prompts:
The
advent of ChatGPT has created a new field called “Prompt Engineering”. This is the art of writing queries that will
guide the Generative AI model or LLM into giving you the most specific answer
possible. For example, when you type in
keywords in Google, within seconds, you get a list of a ton of resources that
you can use to find the answer to your question. But this is not the case with Generative
AI. The goal of it is not to give you a
list of resources to use (unless you actually ask for that), its objective is
to give you the best possible answer the first time around. But in order to do this, at the sending end,
you need to craft a query that will allow for it to happen. This is not something that you can learn from
taking an online class, it comes with lots of time as well as practice. There are tools available to help you to do this,
and I know for a fact that CoPilot from Microsoft, has a library of prompts
that you can use and even further customize to your own needs. But, creating open ended prompts can also
pose a security risk to the LLM. Therefore,
if you are going be using something like ChatGPT quite heavily, it is highly
recommended that you get better at “Prompt Engineering”.
3)
Keep
training ‘em:
Unfortunately,
many people think that once you have an AI model in hand, it will always work
forever. While this is true, the performance
of it will degrade over time quickly if you don’t keep optimizing it. By this I mean that you are constantly giving
it datasets for it to keep on learning.
But keep in mind also that these datasets have to be cleansed and optimized,
to make sure that there are no levels of skewness or outliers that persist. Remember in the end, all AI is “Garbage In
And Garbage Out”. In other words, the outputs
that you get from it are only as good as the datasets that you feed into it.
4)
Keep
‘em safe:
Not
everybody in your organization needs to know the proverbial “Secret Sauce” that
creates the foundation for your Generative AI model or LLM. Therefore in this regard, access should be highly
restricted to those who need to have
it. Even in these cases, make sure that
you are following the concepts of “Least Privilege” which explicitly states that
the rights, privileges, and permissions that have been assigned are no longer what
needs to be done in terms of the job tasks.
5)
Find
the holes:
Just
like anything else in Cybersecurity, even Generative AI models and LLMs are
prone to having their fair share of weaknesses and gaps. Therefore, you need to be able to find and remediate them quickly. Some of the best ways to do this are through
Penetration Testing and Vulnerability Scanning.
Also, you can implement a methodology called “Adversarial Testing”. In this scenario, you are taking the mindset
of a Cyberattacker, and breaking down your models to see where all of the weak
points are at.
My
Thoughts On This:
The above list
is to get you started on thinking about how important it is to secure your
Generative AI models and LLMs. If you
don’t take this seriously, you could be facing a huge Data Exfiltration Attack. Also, it is very important to keep in mind
that all of the datasets you use and store for the purposes of AI now also come
under the data privacy laws, those of the GDPR, CCPA, HIPAA, etc.
If you don’t
have the right controls in place and face a security breach, you could be prone
to a very exhaustive audit and even face very harsh penalties as a result. For more details on this, click on the link
below:
No comments:
Post a Comment