Risk, Severity, Threat Modeling, and Why You Need A Pentest

February 19, 2026 16:50 GMT

Vulnerability severity scores like CVSS are often mistaken for risk, but without understanding the more nuanced context, organizations cannot effectively prioritize remediation. Patryk Sipowicz, a penetration tester with Novacoast’s Attack Team, explains how penetration testing is the best method for accurately modeling relevant threats.

CVSS – A misunderstood scoring system which leads to inefficiencies in defensive teams

Through my experience as a Penetration Tester at Novacoast, I have observed that organizations
often struggle with timely patching and accurate threat modeling of their assets. This stems not from a lack of expertise or commitment, but from limited visibility, or a threat model built on the wrong parameters.

CVSSv3 is a widely used severity scoring system, but it is commonly misused to assess risk and prioritize patching efforts for vulnerabilities. Such patching strategy has been analyzed to be severely inefficient, in fact, it’s as good as patching at random!^[46] This suggests a gap between how vulnerability severity is viewed, or perhaps misunderstood (or confused with!) risk, and how organizations can effectively translate that information into actionable security decisions.

We’re going to explore why CVSS is not as useful as you might think, what the alternatives are, and what we recommend for accurate threat modeling of a system.

TLDR;

CVSS assesses severity, i.e. “how bad” if it got exploited. It can’t and does not measure “how likely” it is to be exploited, and thus patching by this metric is severely inefficient.
EPSS assesses likelihood of exploitation within 30 days. 0.8 means 80% likelihood, i.e. very likely to be exploited. Patching just by this score is mathematically more efficient.
The big issue with any of these scores is that they capture only CVEs, i.e. vulnerabilities (cataloged bugs in software). There is a whole world of issues that can lead to organization hack, such as misconfigurations or dangerous defaults that are not captured by any score and are tricky or impossible to detect with automated tools.
There is no substitute for actual penetration testing that can juggle the world of vulnerabilities with defaults, misconfigurations, systematic issues, bad practices, and more.

Understanding the Fundamentals: Key Terms for Effective Threat Modeling

To understand why CVSS falls short for threat modeling, we need to establish clear definitions for some crucial terms that are often confused or misused in security discussions.

Core Definitions

Threat – A destructive circumstance or person with malicious intent that can cause loss. This loss occurs in relation to Integrity, confidentiality, reputation, and accessibility. The most prominent threats in cybersecurity are cybercriminals, malware, and APTs.

Vulnerability – A bug or unwanted feature in software (whether an app or protocol) that allows someone with malicious intent to perform unintended actions, often causing damage or gaining remote control over computer systems. In simple terms, vulnerabilities are what hackers look for to compromise your systems.

Severity – A subjective measure of how bad a vulnerability could be. Various scoring systems attempt to quantify vulnerability severity, with CVSS being the most commonly used.

Risk – How likely it is for a threat to cause some kind of loss. When we discuss risk associated with software, we mean there’s a threat with a likelihood of acting—a vulnerability
that could be exploited by hackers or malware.

A Practical Example: Why Severity ≠ Risk

Let’s illustrate this critical distinction:

Company A has assets vulnerable to vulnerability X, which scores 3 out of 10 in severity measure S
Company B has assets vulnerable to vulnerability Y, which scores 10 out of 10 in severity measure S

For simplicity let’s say that for each score in severity metric S, you risk a loss of 1 million US dollars ($1M).
Now, consider this:

Cybercriminal gangs have exploited vulnerability X against 850 out of 1,000 companies with this vulnerability since publication.
The same threat groups have exploited vulnerability Y against only 10 out of 1,000 companies since publication.
Both vulnerabilities were published simultaneously.

The result: Company A faces high risk despite the lower severity score, while Company B faces low risk despite having the more severe vulnerability.

To use a financial analogy: an 85% chance of losing $3M represents greater risk than a 1% chance of losing $10M.

“The bottom line is that patching by even the most mathematically optimal CVSS threshold, your efficiency will be very low. EPSS is vastly superior if you care to patch quickly and efficiently—metrics being increasingly important in times where CVEs are being published and weaponized faster than ever. “

What is Threat Modeling?

Threat modeling is a complex task that aims to objectively assess software (applications, operating systems, etc.) to identify threats and quantify risk.

While the core objective remains consistent, motivations can differ. Threat modeling approaches typically fall into three categories: asset-centric, attacker-centric, or software-centric.

There’s no universally “best” approach—it’s a complex undertaking with inherent trade-offs, and the chosen method should align with specific organizational goals ^[44].

Threat Modelling expert Adam Shostack points out that threat models fail when organizations add steps without understanding their associated costs, benefits, and potential issues. They also fail when they become overly complex or introduce excessive subjectivity. Therefore, an ideal threat model should enable quick and accurate assessment while clearly defining its intended users and their required skill sets^[44].

Current Threat Modeling Solutions: A Critical Analysis

Understanding existing frameworks helps illuminate why we need better approaches:

DREAD

Structure: Mnemonic for Damage, Reproducibility, Exploitability, Affected Users, Discoverability
Scoring: Each category receives a score between 1 and 10, with the final score being the arithmetic mean
Fatal flaws: The “Discoverability” component promotes “Security Through Obscurity,” which is fundamentally flawed. Many practitioners moved to “DREAD-D” (removing Discoverability)
or automatically rated it as 10
Status: Microsoft discontinued DREAD in 2008 due to excessive subjectivity

STRIDE

Origin: Microsoft threat model dating to the late 1990s
Structure: Mnemonic for Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege
Strengths: Each element describes potential threats against applications/protocols using general terms that remain applicable today. Can be integrated throughout the development cycle
and works well for threat categorization
Limitations: Provides no numerical scoring — it’s primarily a brainstorming framework. Time-consuming and impossible to automate

Attack Trees

Format: Diagram-based model structured as a tree
Function: Root represents the ultimate attack goal, while branches represent possible attack paths
Advantages: Simple to build and present visually
Limitations: Difficult to quantify risk when used alone
Usage: Often employed alongside CVSS and STRIDE frameworks ^[39]

Others

This is not meant to be an exhaustive list, but an quick overview with solid examples. There exist other Threat Models like PASTA, or other that seem to have faded to obscurity e.g. TRIKE.

Furthermore, some companies modify the existing models and thus create their “original” frameworks.

CVSS and Its Fundamental Misuse

Many automated tools exist to help ease the burden on the Blue Team, and help them assess the state of the network and help prioritize patching effort. Unfortunately, CVSS seems to be the parameter that defensive teams and CISOs obsess over.

After all, if there is a newly found vulnerability with CVSS 10.0 (Critical!) then we should drop everything and patch that ASAP, right? If we stop and reflect for a moment, what do we know about this vulnerability and how it affects our business? Nothing. We know it would be bad if it did get exploited, but how likely is it that to actually happen? And even then, once exploited, what is the impact on the rest of network or the business operation? Perhaps it’s an isolated machine, a VM even, such that full compromise doesn’t
cause much disruption besides rolling back to the previous snapshot.

Often, organizations find it difficult to contextualize the risk that vulnerabilities—particularly those rated as Low or Medium severity by CVSS—actually present to their business operations. The risk that those issues present can be High, even if severity is dubbed low, because attackers will chain a few lower severity vulnerabilities, that put together would allow a real threat to cause significant damage.

On the contrary, some critical severity vulnerabilities may have taken researchers many months to develop a single unstable exploit (which they did not publish), and may trigger under very specific
conditions, making it practically impossible to reproduce, thus lowering the real risk by a significant margin.

Once again, pentests are what truly allows us to put vulnerabilities in context. But what if we don’t have an onsite, full-time pentesting team to consistently review the risk landscape? I believe there are 2 solutions:

A) Change the parameter which dictates prioritization efforts—something designed specifically for this purpose, e.g. EPSS^[26]. This is the low-effort, quick, small but tangible improvement over utilizing CVSS.
B) Hire pentesters to review your system to know what you stand on, then continue to develop a penetration testing program that starts with yearly tests but moves towards a more continuous
cadence (quarterly, monthly).

Industry Recognition of the Problem

During the 2021 RSA Conference, security expert Allan Liska expressed serious concerns about using CVSS for organizational threat modeling. He argued that CVSS fails to adequately represent risk, emphasizing that “correct threat modeling must be risk-centric.” ^[32]

While Liska’s concerns stem from decades of field experience, concrete evidence appears in the recent paper “CVSS: Ubiquitous and Broken”^[24]. This research demonstrates that patching based
on CVSS scores—prioritizing software with higher CVSS ratings—provides no benefit. In fact, CVSS-based patching performs no better than random patching.

Why CVSS Fails: The Missing Link

Howland explains this failure occurs because there is no correlation between vulnerability weaponization and severity^[24]. Simply put: just because a vulnerability could be very severe doesn’t tell us whether it will actually be exploited.

Threat groups often chain multiple lower-severity exploits to achieve their objectives. This fundamental flaw has been recognized across multiple research papers that either criticize CVSS^[46][42][45] or acknowledge its shortcomings while proposing alternatives.^[16][18]

Important note:

Above statements hold true for historical data, but we should keep an open mind for the future. We are living in the world of massive leaps in development (yes AI I’m talking about you) and it is becoming apparent that CVE weaponization is now happening more quickly than ever.

In Q1/Q2 of 2025, there was an 8.5% increase over 2024 in terms of exploitation within less than 24 hours of CVE being published. ^[49] And there is a steady increase in the number of published CVEs, the number of CVEs with public exploits, and the number of CVEs with Proof-of-Concept code.

Interestingly, a group of researchers published a paper outlining a multi-agentic framework for automatic CVE weaponization. ^[50]

These events and research clearly show that weaponization might be accelerating, and although it is unclear whether vulnerability weaponization and severity stay uncorrelated, or perhaps developments in AI might allow us to weaponize more CVEs—it seems logical that threat actors would choose the highest severity exploits as their priority.

Institutional Misuse

The most concerning misuse of CVSS as a risk score appears in recommendations from authoritative sources like the National Institute of Standards and Technology (NIST)^[35] and the payment card industry, ^[12] demonstrating how deeply entrenched this flawed approach has become.

EPSS: A Superior Alternative

The Exploit Prediction Scoring System (EPSS) deserves detailed examination as a CVSS replacement for automated threat modeling.

Meaning of the Score

How the score is calculated is beyond this article, but in essence a huge amount of intelligence is collected and for each CVE a value between 0 and 1 is produced.

0.8 would mean a likelihood of 80% that this vulnerability will be exploited within the next 1 month (for EPSSv3). Similarly 0.5 gives 50%, 0.13 means 13%, etc.

Effectiveness of patching by EPSS instead of CVSS

To understand why and how much better it is for a team to patch by prioritizing EPSS scores rather than CVSS we need to establish common terminology.

Evaluation Metrics: Understanding EPSS evaluation requires familiarity with classification terminology:

True positive: Vulnerability patched and attempted for exploitation
False negative: Vulnerability not patched but exploited
True negative: Vulnerability not patched and not attempted for exploitation
False positive: Vulnerability patched but not attempted for exploitation

Performance Definitions:

Efficiency (Precision): True positives ÷ (true positives + false positives)
Coverage (Recall): True positives ÷ (true positives + false negatives)
Patching effort: Sum of true positives and false positives
F1 Score: Represents the harmonic mean^[51]of precision and recall, i.e., the optimal F1 threshold, maximizes both metrics simultaneously. In our case this means we are maximizing coverage AND efficiency at the same time.

In simpler terms

In the vulnerability management context, efficiency addresses the question: of all the vulnerabilities remediated, how many were actually exploited? If a remediation strategy suggests patching 100 vulnerabilities, 60 of which were exploited, the efficiency would be 60%.
In the vulnerability management context, coverage addresses the question: of all the vulnerabilities that are being exploited, how many were actually remediated?” If 100 vulnerabilities are exploited, 40 of which are patched, the coverage would be 40%.

Validation Results

When basing patching strategy on optimal F1 threshold, you could:

Patch vulnerabilities with EPSSv3 score 0.36 or higher, and achieve 78.5% efficiency and 67.8% coverage.^[1]
Patch vulnerabilities with CVSSv3 score of 9.7 or higher, and achieve an efficiency rating of 6.5% and coverage of 32.3%. ^[1]

The bottom line is that patching by even the most mathematically optimal CVSS threshold, your efficiency will be very low. EPSS is vastly superior if you care to patch quickly and efficiently—metrics being increasingly important in times where CVEs are being published and weaponized faster than ever.

It is worth noting that EPSS v4 was released this year, but no research has been published just yet. It’s presumed an improvement over the previous version as we’ve seen in the past, so the good news is that these numbers are probably even better!

Why EPSS Works for Automated Threat Modeling

Beyond EPSS’s direct alignment with risk-based goals, it provides an easy-to-use API, making it the ideal candidate for risk scoring in automated threat modeling systems. Even better, it is readily available in a few vulnerability scanners already, so we should really see a bigger shift in recommended patching strategies across the industry, as it becomes the norm.

The Bad News

Now that we know we can utilize EPSS instead of CVSS, can we hang our coat and put our feet up? Sadly, no.

There is a whole world of issues that a human can uncover and contextualize that automated tools (including AI) still struggle with.

Namely: misconfigurations, defaults, bad practices, poor protocols and/or poor user education (that usually lead to sucessful social engineering attacks), lack of/insufficient backups, insufficient network segregation, insufficient logging and monitoring…just to name a few.

The scary thing is that we find environments that come back clean on vulnerability scanners, but by reviewing for misconfigurations it becomes apparent that a total takeover is possible within minutes.

This is often related to Man-in-the-middle attacks due to weak defaults in Active Directory environments, or it can relate to misconfigured certificate templates (e.g., any user on a domain can make themselves Domain Admin, and trigger no alerts). It can also happen to web applications that have scanned clean in SAST, but are riddled with logical flaws. The bottom line is that there are issues that vulnerability scanners don’t catch, and we need to do more than that.

What can we do about it?

Stop using only scoring or platform results to prioritize issues. Be mindful to not limit your visibility to only what the platform is scanning—it won’t properly threat model your systems. No vulnerability scanner can find logical flaws or detect bad practices.
If you aren’t doing pentests, you should! An experienced penetration tester will put your system into perspective and properly threat model the environment.
If you are doing pentests, you should strive to enhance your program, for example, by:
- Improving breadth—moving from just external to also include, internal, cloud and social engineering pentesting.
- Improve frequency—increasing frequency from annual pentesting to ultimately reach continuous pentesting.

Closing Words

There is so much more that could be said on threat modeling, what pentesting really is, and the kind of things that automated tools miss. But this article is already too long, so I’m looking forward to publishing another one specifically on the crucial difference between vulnerabilities and everything else that a bad actor (or pentester) can exploit to gain footholds and escalate privileges. I plan to showcase those differences and give concrete examples of weak defaults and misconfigurations that allow fast and
stealthy ways to take over the entire domain. Stay tuned!

About the Author

London, England-based Patryk Sipowicz is a penetration tester at the Novacoast Attack Team (NCAT) with broad security expertise spanning mobile, social engineering, and infrastructure assessments.

References Cited:

[1] Jay Jacobs and Sasha Romanosky and Octavian Suciu and Benjamin Edwards and Armin Sarabi, Enhancing Vulnerability Prioritization: Data-Driven Exploit Predictions with Community-Driven Insights, 2023,2302.14172, arXiv, https://arxiv.org/abs/2302.14172.

[12] Payment Card Industry Security Standard Council, August 2022.

[16] Katheryn A Farris, Ankit Shah, George Cybenko, Rajesh Ganesan, and Sushil Jajodia. Vulcon: A system for vulnerability prioritization, mitigation, and management. ACM Transactions on Privacy and Security (TOPS), 21(4):1–28, 2018.

[18] Christian Fruhwirth and Tomi Mannisto. Improving cvss-based vulnerability prioritization and response with context information. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 535–544, 2009.

[23] Siv Hilde Houmb, Virginia NL Franqueira, and Erlend A Engum. Quantifying security risk level from cvss estimates of frequency and impact. Journal of Systems and Software, 83(9):1622–1634, 2010.

[24] Henry Howland. Cvss: Ubiquitous and broken. Digital Threats: Research and Practice, 4(1):1–12, 2023.

[26] Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Idris Adjerid, and Michael Roytman. Exploit prediction scoring system (epss). Digital Threats: Research and Practice, 2(3):1–17, 2021.

[32] Allan Liska. Cvss scores are dead. let’s explore 4 alternatives. In Proceedings of RSA Conference 2021. rsaconference, May 2021.

[35] Karen Scarfone (Scarfone Cybersecurity) Murugiah Souppaya (NIST). Guide to enterprise patch management planning: Preventive maintenance for technology. Technical Report NIST Special Publication (SP) 800-40r4, National Institute of Standards and Technology, Gaithersburg, MD, 2023.

[39] Bradley Potteiger, Goncalo Martins, and Xenofon Koutsoukos. Software and attack centric integrated threat modeling for quantitative risk assessment. In Proceedings of the Symposium and Bootcamp on the Science of Security, pages 99–108, 2016.

[42] Karen Scarfone and Peter Mell. An analysis of cvss version 2 vulnerability scoring. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 516–525, 2009.

[44] Adam Shostack. Threat modeling: Designing for security. John Wiley & Sons, 2014.

[45] Jonathan Spring, Eric Hatleback, Allen Householder, Art Manion, and Deana Shick. Time to change the cvss? IEEE Security & Privacy, 19(2):74–78, 2021.

[46] Jonathan Spring, Eric Hatleback, A Manion, and D Shic. Towards improving cvss. Software Engineering Institute, Carnegie Mellon University, Tech. Rep, 2018.

[47] Wei Tai. What Is VPR and How Is It Different from CVSS? https://www.tenable.com/blog/what-is-vpr-and-how-is-it-different-from-cvss, May 2020.

[48] National Coordinator for Critical Infrastructure Security and Resilience https://www.cisa.gov/ssvc-calculator

[49] https://www.vulncheck.com/blog/state-of-exploitation-1h-2025

[50] Saad Ullah, Praneeth Balasubramanian, Wenbo Guo, Amanda Burnett, Hammond Pearce, Christopher Kruegel, Giovanni Vigna, Gianluca Stringhini. From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs. https://arxiv.org/html/2509.01835v1

[51] Harmonic Mean Wikipedia page, https://en.wikipedia.org/wiki/Harmonic_mean

Is the “SaaS Apocalypse” Inevitable with the Advent of AI Coding?

February 17, 2026 17:37 GMT

Top 10 Cybersecurity News (Feb. 23 2026): PayPal Data Breach — Exposed PI for Months, AI-Assisted Hacker Breaches 600+ Fortinet Firewalls, Mississippi Medical System Ransomware Disrupts Clinics, and More

February 23, 2026 16:00 GMT