Sunday, January 26, 2025

Breaking Down What Shadow Gen AI Is All About

 


It seems like we can never get away from this one topic: Generative AI.  The bottom line is whether you love it or hate it, use it or not, etc.  it is going to be around with us for a long time to come.  There are many facets of Generative AI that touch our lives, the one that is the most influential is that of ChatGPT.  Personally, I have never used it, and I have made a promise to myself to never use it.

Anyways, differences aside, there is a new trend coming out now, and it is called “Shadow Generative AI”.  It is just like its cousin, “Shadow IT”.  With this, employees either continue to use outdated software or download non-sanctioned ones either because they are creatures of habit and do not want to use what is new, or they are trying to retaliate against something that has happened to them personally at their workplace.

The same can be said of the former.  This is the situation where employees use non-approved Generative AI models to help them to do their daily job tasks. Because of the inherent risks that it can bring, many businesses in Corporate America are now cracking down on this.  Consider some of these stats:

*Using non approved Generative AI models is now completely banned in the healthcare and financial sectors.

*Even technology companies are cracking down, such as Apple, Samsung, and even Amazon.

*ChatGPT that has not been approved for corporate usage is at an alarming 74%.  The same can also be said of its competitors, Geminin and Bard.

*Given the development of Generative AI, companies are finding it difficult to enforce data security policies, and at least 27% of the data that is pumped into the model is not made secure.

(SOURCE:  The Security Risk of Rampant Shadow AI)

The main culprit for the huge risk of “Shadow Generative AI” lies in the fact that it is the datasets fed that are fed into a model which is the lifeblood for it.  For example, large amounts of it are needed in not only to initially train the model, but to keep it learning going forward. 

Because of this, datasets have become a prized target not only for the Cyberattacker, but also for those rogue employees that are considering launching an Insider Attack.

So now all of this begs the question: How can a CISO and their IT Security team mitigate the risks of “Shadow Generative AI” from happening in the first place?  This can be examined from a couple of different areas but let us start at the heart of the matter: the datasets.

Every effort must be taken to ensure that they are as secure as possible.  You can even think of this is as a “Wash, Rinse, and Repeat” cycle:

1)     Before Ingestion:

Make sure that all pathways that lead from the database to the actual model are secure.  It is at this first point that the datasets will be “ingested”, and thus, this can be considered to be the most vulnerable point.  Aside from this, you must make sure that the datasets are cleansed and optimized as much as possible.  This is imperative, because if they are not, the processing of any submitted query and the output that must be created from it will be skewed.

2)     In Process:

The appropriate controls also need to be put into place to protect the model in the first instance.  This simply means that only the authorized employees should be able to gain access to it, such as the data scientists that have built the model and the data analysts that examine the processes that are taking place from within it.

3)     What Comes Out:

In the end, once all the datasets have been inputted, and processed per the query that was submitted by the end user, the result is the answer to the question, or in more technical terms, the “output”.  But even here, the appropriate safeguards must be out into place not only to ensure the privacy of the end user, but if the outputs that have derived are created for market intelligence purposes, then this can be considered as Intellectual Property, which needs to be highly protected.

This entire process must be run each time the model is submitted with a new query, or a new one has been created from scratch.

The second angle of attack here is the policies that CISO and the IT Security implement with regards to the usage of Generative AI in the workplace.  Here are some key things to be considered:

Ø  Use Obfuscation: This is where Data Tokens can be created to represent the actual datasets.  As a result, even if they are hijacked, there is very little that the Cyberattacker can do with it. 

 

Ø  Watch The Access:  In this regard, you will want to follow up on the concept of “Least Privilege”.  This is where you assign those rights, permissions, and privileges that are absolutely for the employee to do their jobs, and no more than that.  This also holds true for the Generative AI models.  The scientists and analysts that work on them should be only given what they absolutely need.

My Thoughts on This:

There is another motivating factor for the CISO and their IT Security team to be on their toes.  All of the datasets that are used are now starting to come under the purview of the data privacy laws, such as the GDPR and the CCPA.  Meaning, if the controls are not in place or have not been further optimized to protect them, they could also face an exhaustive audit and steep financial penalties.

In the end, all of this may same to be a huge rat race, and in fact, it can be quite overwhelming.  IMHO, it is thus very important to break down the “Wash, Rinse, and Repeat” cycle into the smaller tasks so that it becomes much more manageable.

No comments:

Post a Comment

Will Generative AI Replace Human Penetration Testers? Find Out Here

  Very often, I get the question asked to me:   “What Is a Penetration Test”?   To make a long story short, I usually tell people that it is...