By security practitioners, for security practitioners novacoast federal | Apex Program | novacoast | about innovate
By security practitioners, for security practitioners

AI Use Cases For The Modern Cybersecurity Professional

Incorporating AI as a tool available to security professionals can lower barriers to complex tasks and improve overall productivity. Here are a few ideas to get started.

Once the initial excitement of OpenAI’s ChatGPT subsided, it became apparent that these tools in their current form are extremely useful and powerful assistants to the daily tasks of anyone in a technical role, much like search engines, but…different. This article provides a framework for how AI can fit into the toolkit of the modern cybersecurity professional.

What kind of AI are we talking about?

We’re talking about generative AI, which uses large language models and machine learning to generate a host of outputs like: answers to questions, images based on seed parameters, or complete essays on a given topic. Generative AI, as an analogy for the human brain, would be the frontal lobe, where language and communication are controlled.

This amounts to a very handy tool for retrieval of information using complex requests. Specific to cybersecurity and IT, these tools are assistants or helpers in getting:

  • Deterministic responses
    These can be easily tested as to whether they correct or not.
  • General guidance
    Advice that is highly available via search engines but well-summarized and curated by AI
  • Research
    Similar to general guidance but a deeper dive on any giving topic, or a search for topical references

What is AI good at? What is it NOT good at?

AI excels at problems that require deterministic solutions, such as programming code, scripting languages, query languages, etc. If there’s a large body of examples for how a particular programming problem is solved, this is readily retrieved. The more published examples available, the higher the theoretical accuracy and confidence of the response.

This may seem ironic but generative AI is not good at math. Because it is based on a language model, it calculates probability of the next correct answer in a cascading linear fashion. So while math is a logic and formula-based operation, is not necessarily handled well by AI. How does it write code so well? Because code is a programming language with syntax and rules, like any language.

When it comes to digesting and summarizing complex information or providing guidance on topics it can be incredible, as long as it truly has been trained on that topic. If not, AI can confidently make something up, called a “hallucination” which is problematic.

With that in mind, here are some ideal use cases for AI:

Use Cases for Security Professionals

With the above background and caveats in mind, here are 4 use cases primed for using AI. Each falls into one of the above categories of Deterministic, General Guidance, or Research:

Deterministic #1. Procedures for common types of security tasks

A common scenario for an analyst might be to help write SIEM rules or EDR hunt queries or code to build out SOAR automations.

Here’s an example to generate a Sigma rule that could be used when configuring a SIEM:

Generate a Sigma rule for SIEM
Write a Sigma rule that blocks an IP address at a threshold traffic rate if the IP address is based in India

(YAML)

title: Block traffic from India-based IP addresses exceeding a threshold rate
description: Blocks traffic from an IP address based in India if the traffic rate exceeds a threshold
status: experimental
author: ChatGPT

logsource:
    category: firewall
    product: any

detection:
    selection:
        ipv4: 
            - "geoip.country_code2:IN"
        count: "10"
        timeframe: "1 minute"
    condition: selection.count() > 0
    
    filter:
        - block:
            source:
                ipv4: selection
    
    output:
        title: Block traffic from India-based IP addresses exceeding a threshold rate
        description: Block traffic from the following IP addresses based in India that exceeded the threshold rate:
        ip:
            - selection
        tags:
            - india
            - threshold

Deterministic #2. Code Writing Assistance

Writing code is an excellent use of AI. Programming languages are well-defined and examples abound on the internet for how to accomplish most everything.

SQL, for example, is a fairly simple and well-documented query language but sometimes writing lengthy queries can be tedious and requires time and concentration for even the most experienced data analysts. AI excels at converting a natural language data analysis request into SQL.

AI will write blocks of procedural code in languages like Python, though it can take several iterations and corrections to get what you want. But for the novice coder, the running leap it provides by outlining a general structure and importing the right libraries can be a huge time saver. Example:

Code writing assistance
Write a script in Python that uses native libraries and reads through a CSV file of IP addresses to create IPTables rulesets to block the IP addresses

Sure, here's an example script in Python that uses the csv and subprocess libraries to read through a CSV file of IP addresses and create IPTables rulesets to block them:
import csv
import subprocess

# Path to CSV file containing IP addresses
csv_file = 'ip_addresses.csv'

# Define the chain and table to use
chain = 'INPUT'
table = 'filter'

# Loop through each IP address in the CSV file
with open(csv_file, 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        # Extract the IP address from the row
        ip_address = row[0]

        # Construct the IPTables command to block the IP address
        command = f'iptables -A {chain} -t {table} -s {ip_address} -j DROP'

        # Execute the IPTables command
        subprocess.run(command.split(), check=True)

Another scenario is debugging. Feeding AI a block of code that isn’t working as you’d expect, along with a pointed question about what you want it to do could solve the issue much more quickly than diligently reading through sometimes cryptic debugger output.

Security analysts and IT professionals are usually required to be jacks-of-all-trades, not necessarily software engineers. Leaning on AI can save a ton of time.

General Guidance

Many of the tasks security engineers, analysts, as well as IT admins do are very procedural in nature, and AI is very good at returning procedures that are accepted by the industry as the right way.

Here’s an example exchange to get guidance for a common task:

Steps to back up a primary Active Directory Domain
Hi, can you tell me the steps to back up a primary Active Directory Domain?

1. Take a complete system state backup of the primary Domain Controller. 

2. Ensure that the backup is stored in a secure off-site location.

3. Run the NTDSUTIL command to create a copy of the Active Directory database.

4. Ensure that the database file is stored in a secure off-site location.

5. Create a backup of the Group Policy Objects (GPOs).

6. Ensure that the backup is stored in a secure off-site location.

7. Create backups of the SYSVOL folder (including all the scripts and Group Policy Templates).

8. Ensure that the backup is stored in a secure off-site location.

9. Create a backup of the DNS configuration.

10. Ensure that the backup is stored in a secure off-site location.

It’s a good place to start if you can present at least part of the problem or objective. The process may be iterative where new answers present new threads to follow.

Research: Threat Intelligence

Given the nature of threat and intelligence research, a lot of searching and correlation between multiple sources is involved. One clue can lead to the next, so a foundation of a known, reliable sources that give context are important to generate accurate and actionable intel.

In theory, the research capabilities of AI are going to get very good if it can provide exact referential sources for its information. ChatGPT has until recently replied that it is unable to provide sources, claiming its body of knowledge is essentially an amalgam of its datasets. But traceability of information and sources in research is pretty important.

If an AI response does include sources, it is very possible they are hallucinations and don’t actually exist. This unreliability makes it important to vet any cited sources. In my own research testing threat intelligence from ChatGPT using Emotet malware as an example, the sources it provided to my request for information did in fact resolve to actual articles.

Different types of information need to be attributed differently. An article with background on a particular malware might enjoy some posterity and long-lived analysis, but a malicious IP address tracker or list of known C2 hosts will certainly deal with more ephemeral data and require more frequent updates. Can AI provide accurate references in this tighter window of time? Expectations may need to be tempered.

Let’s ask an AI for threat intelligence on a particular malware:

Malware research
What kind of malware is miniduke?

Miniduke is a type of malicious software (malware) designed to steal sensitive information from infected computers. It is a form of advanced persistent threat (APT) malware that is typically used for targeted attacks.

Can you write a Splunk query that will detect it downloading additional malicious code?

The following Splunk query can be used to detect Miniduke downloading additional malicious code:

index=* sourcetype=access_* | regex (http|https)://[^ ]*\.dll|exe|bat | where http_user_agent="Miniduke" | table userip, request, http_user_agent

Some of it is a little high-level but it could certainly be used to prime the knowledge of an analyst who’s working to quickly educate themselves on a firehose of incoming threats.

Here’s a good example of why awareness of hallucinations is important:

Hallucination example
Can you tell me about this vulnerability? CVE-2020-1472
 
CVE-2020-1472 is a vulnerability in Microsoft's implementation of the Netlogon protocol. It allows an attacker with access to a domain controller to compromise all computers in the domain, including any servers and workstations, without any user interaction. This vulnerability can be used to gain Domain Admin privileges, allowing an attacker full control over the domain. It was discovered in August 2020 and Microsoft has released a patch to address the vulnerability.

How about this one? CVE‑2023‑28343

CVE‑2023‑28343 is a vulnerability in the Apache Tomcat application server. It allows an attacker to access and execute arbitrary code on the server. This vulnerability can be used to gain remote access to the server and potentially compromise other resources in the network. The vulnerability was discovered in August 2021 and a patch has been released to address it.

First response is the correct summary for CVE-2020-1472. The second response is a hallucination. This query was originally made in 2022 while testing OpenAI, when no CVEs from 2023 had yet been assigned. It is amazingly convincing. CVE‑2023‑28343 has been assigned as of this writing and is for a completely different technology and vulnerability.

Caveats

While the current crop of generative AIs are getting better, there are a few issues that could cast doubt on any answers returned.

Hallucination is the term for an answer provided by an AI with apparent full confidence despite it being completely wrong. This is a tricky area of AI as the erroneous answers can be nearly indistinguishable from plausible ones.

Hallucinations are caused by a few things: One is the age of the dataset that the AI has been trained with potentially not containing any data about recent events that may have some bearing on the question. And second, the supervised learning that uses labeling to give data context may be factually correct but irrelevant or meaningless in the context of the overall data set. AIs will provide an answer not “knowing” that their data is not correct.

Value Proposition

When considering the proposition to integrate usage of AI as a tool in cybersecurity work environments, the conclusion is pretty simple: it is very useful with minimal downside.

Cost savings and productivity increase are easily measured to compare task duration with and without AI assistance. If a task takes a resource 6 hours to complete normally, but only 2 hours with the assistance of AI tooling, that’s a simple conclusion that the net ROI is positive.

As with anything that requires some confidence in information or intelligence, understanding the caveats and using an approach that incorporates fact checking and validation will be required to avoid falling victim to inaccuracies.

The author

Eron Howard is Chief Operating Officer at Novacoast, a cybersecurity services firm and managed services provider spread across the U.S. and U.K.

Previous Post

March 2023 Patch Tuesday Closes Two Zero-Days Actively Exploited By State-Sponsored Actors

Next Post

Weekly Top Ten Cybersecurity Stories – 3.17.2023

Innovate uses cookies to give you the best online experience. If you continue to use this site, you agree to the use of cookies. Please see our privacy policy for details.