Generative artificial intelligence (AI) tools like ChatGPT, Bard, and Bing search have revolutionized the way companies interact with customers, generate content, and perform other tasks. However, the use of these tools also raises significant concerns around data security and data protection. In this article, we’ll explore the risks of using third-party generative AI tools and offer some tips for ensuring that your company is using these tools safely and responsibly.
Generative AI and data security
One of the primary risks associated with using generative AI tools is the potential for data breaches or leaks. Because these tools are designed to generate content automatically, they often collect and store large amounts of data. This is the main reason why many financial institutions like JP Morgan Chase and Deutsche Bank have either restricted or outright banned the used of ChatGPT in their workplaces.
The risks involved in using these third-party generative AI aren’t obvious to many. Because ChatGPT has an iterative tool that relies on machine learning capabilities, any interaction with the bot is used as training data for future interactions. This means any information shared in the chat is saved in the AI engine’s servers – data which may include sensitive, personally identifiable information (PII). Because this data resides outside the user’s infrastructure, it will be hard to retrieve and delete if and when a data leak occurs.
The power of generative AI is impressive, which is why many people have embraced it and integrated it into their day-to-day tasks, such as drafting emails or generating code. However, this latter case was the cause for concern for Samsung, which has also banned the use of generative AI tools in the workplace. This comes after engineers have uploaded a source code into ChatGPT, potentially sharing sensitive intellectual property (IP) to a third party.
Using generative AI responsibly
The risks that come with using generative AI is largely tied to how people use it. Because these tools help increase our efficiency, it can be tempting to use them indiscriminately . However, once you understand how the technology works, you will start having some genuine concerns around data protection. Here are some steps to avoid data leaks when using generative AI tools –
- Limit the data you share: Only share the data necessary for the tool to function, and make sure that any sensitive data is either anonymised, made obscure (such as using percentages or using a factor of 10), or not shared at all.
- Train your employees: Make sure that all employees who use the tool are trained on data protection and privacy best practices. Help them understand the risks so they can avoid sharing sensitive IP or proprietary information.
- Keep data in-house with a local large language model (LLM): Instead of sending data to a third-party, you can use AI solutions that rely on local language models. This requires the development of an LLM specific to your company, done by an AI solutions provider.
Benefits of using a local large language model (LLM)
The third option to ensure the security and protection of company data is a path that was taken by Bloomberg, the financial services company. It developed its own language model called BloombergGPT, which is specifically tailored to the needs of the financial industry. The 50-billion parameter LLM has been built using Bloomberg’s financial data and domain-specific knowledge. Bloomberg’s AI team decided to leverage the rich data the company has amassed in 40 years, and translate that into training data for its own generative AI solution.
BloombergGPT is a great example of a domain-specific LLM that outperforms general-purpose LLMs in a narrow field – in this case, financial markets. And although Bloomberg may not have decided to develop its own GPT or LLM for security reasons, having it ensures that its employees won’t inadvertently share sensitive data outside the company. BloombergGPT is poised to help many of the company’s employees by coming up with accurate reports as well as process the vast amounts of data that come through the Bloomberg Terminal.
Generative AI tools offer tremendous benefits to companies looking to automate tasks and generate high-quality content quickly and efficiently. However, it’s important to recognize that these tools also come with risks around data security and data protection. Having your own company- or domain-specific LLM will help mitigate these data leaks while providing your employees the benefits of using generative AI.