From 3eaf33014fc4ce36d33c221b59b5e9f47965d113 Mon Sep 17 00:00:00 2001 From: Davide Guerri Date: Sat, 17 Aug 2024 08:20:22 +0200 Subject: [PATCH] Update 2024-08-13-cvss-vectors-with-embeddings-and-random-forests.md add one reason why we can't use LLMs directly to predict the entire CVSS vector --- ...08-13-cvss-vectors-with-embeddings-and-random-forests.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/_posts/2024-08-13-cvss-vectors-with-embeddings-and-random-forests.md b/_posts/2024-08-13-cvss-vectors-with-embeddings-and-random-forests.md index 44c1c3e..5b54cfb 100644 --- a/_posts/2024-08-13-cvss-vectors-with-embeddings-and-random-forests.md +++ b/_posts/2024-08-13-cvss-vectors-with-embeddings-and-random-forests.md @@ -3,7 +3,7 @@ layout: post description: Predicting CVSS Vectors with text embeddings and random forests comments: true date: 2024-08-13 -last-update: 22024-08-14 +last-update: 22024-08-17 --- Tired of hearing/reading only about generative AI models? This post explores how Artificial Intelligence and Machine Learning can help with a very real cybersecurity problem. @@ -78,6 +78,8 @@ Can AI do that? It certainly can, to some extent. You probably cannot just feed the description to a large language model and hope to get a super accurate CVSS vector. At least in 2024. +One reason is that LLMs produce sequences of tokens based on previously seen (or generated) tokens, so the initial part of the CVSS vector could influence the following parts. Of course, that heavily depends on the data the specific model in use has been trained on. + But fear not, AI is not just LLMs and sharks. Getting text and embeddings is a great way to extract meaning from words. That meaning, encoded with a vector in a highly dimensional space, is a perfect candidate for classifying machine learning models. @@ -294,7 +296,7 @@ Not bad for a quick and dirty model! To summarise, "just" looking at the CVE description we are able to predict -- the attack vector, attack complexity, need of user interaction, scope with an accuracy of over 90% +- the attack vector, attack complexity, need for user interaction, scope with an accuracy of over 90% - the impacts with an accuracy of over 83% - the need of privileges with an accuracy of 75%