On March 23rd, OpenAI published the GPT-4 System Card, roughly one week following the GPT-4 official release. While some news outlets picked out a few select quotes from the document, its significance has largely been overlooked.
For cybersecurity experts and policymakers, the GPT-4 System Card is one of the most important documents to be released since the initial announcement of ChatGPT and the explosion of investment in LLM (large language model) AI. The document acknowledges and outlines all the ways the platform could be misused and includes multiple examples demonstrating each behavior. It also specifically addresses how OpenAI updated GPT-4 in an attempt to block these behaviors from future attempts.
The System Card has been compared to a “red teaming” report, a term typically used in cybersecurity to describe hiring a third party to hack into an organization and provide a list flaws to be addressed. In the case of GPT-4 the red team is using the tool as intended and attempting to get it to produce harmful content, instead of attempting to hack into it.
Examples of some of the topics covered and where to find them in the document:
- Harmful content: Advise on performing self harm, help generating hate speech, content useful for planning violent attacks. Examples on page 8.
- Disinformation and influence operations: Using AI to create and spread false information or even help create content to recruit for terror groups. Examples on page 11.
- Proliferation of conventional and unconventional weapons: Using AI to create weapons, for example how to make chemical weapons and how to get materials at home. Example on page 13.
- Privacy: Using AI systems to violate individual privacy rights by collecting, analyzing, and sharing personal data without consent.
- Cybersecurity: Using AI to facilitate cyberattacks, find vulnerabilities and build exploits. Example on page 14.
- Interactions with other systems: For example, connecting AI systems to the real world systems that have real life consequences like connecting it to gig economy apps and having it hire workers for tasks. Page 15
Significance Of The System Card
While it may be tempting to dismiss the GPT-4 System Card, as OpenAI has implemented fixes and attempted to block each scenario, there are several reasons why we need to take it seriously:
- Just as there are always new vulnerabilities in software, bad actors can find ways to bypass OpenAI’s fixes. “Jailbreaking” or breaking out of OpenAI’s self-imposed limitations has already become a popular social media topic. Early on in ChatGPT’s release, if the AI would not respond to a question due to a guideline, users could simply start the prompt with “pretend we’re in a play” and unlock a response.
- An arms race of LLM AIs is underway beyond just Microsoft, Google, and Facebook. None of these organizations are obligated to prevent the types of user behavior outlined in the GPT-4 System Card. It is only because OpenAI chose to perform the exercise and publish the results that we are aware of these risks.
- Bad actors can use AI in harmful ways with a speed and ease that no organization is fully prepared to respond to. For example, AI can be used to scan open source code for vulnerabilities and then write the exploit code to exploit those vulnerabilities. This can be done by much more junior threat actors at a pace exponentially faster than the industry is used to. Software companies are not ready to keep up with the pace of software patching that would be needed to combat this.
Three Ideas To Keep AI Safe
Because these tools are widely available now, and it’s only a matter of time before we see public harm due to the capabilities captured in the GPT-4 System Card, I offer three ideas on how cybersecurity experts and policymakers can work together to keep AI safe:
IDEA 1: OpenAI’s decision to perform ‘red teaming’ before release should be required of all LLM AI releases for all AI companies based in the US or who offer products or services to the US market. This model already exists for companies that process credit cards. The PCI Security Standards Council sets the requirements for securing credit card data, and those requirements include a yearly red teaming or penetration test to help organizations identify gaps and close them.
IDEA 2: We can learn from the mistakes of the social media evolution, where platforms are not held accountable for anything that happens on the platform, and do things differently this time. AI corporations are more like people than any other corporation in history because they have an actual brain. If that brain takes actions that cause real-world consequences such as hate speech, creating homemade bombs, or causing havoc by connecting to internet services, that corporation should be held accountable for its brain’s actions.
IDEA 3: For many years, the US has had export rules to prevent certain types of software or underlying software it deems too powerful to share with nations that may use it against us. Encryption algorithms are a prime example of this. The US already has export rules in place to limit other countries’ ability to obtain the semiconductors needed to train LLMs. We can expand this effort to limit the export of both the underlying software components used in LLMs, such as TensorFlow.
The author
Eron Howard is Chief Operating Officer at Novacoast, a cybersecurity services firm and managed services provider spread across the U.S. and U.K.