Skip to content

vuln-structure is a package that extracts vulnerability details from raw text and outputs standardized, structured data for security teams.

Notifications You must be signed in to change notification settings

chigwell/vuln-structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Vuln‑Structure

PyPI version License: MIT Downloads LinkedIn

A lightweight Python package that extracts and structures critical security vulnerability information from unstructured text.
Given a raw description of a cybersecurity flaw (for example, the WatchGuard firewall RCE vulnerability), vuln_structure returns a list of clean, machine‑readable data entries that include:

  • Vulnerability type
  • Affected systems
  • Potential impact
  • Recommended actions

It uses the llmatch-messages library to validate that the data returned by the LLM matches a strict regular‑expression pattern, ensuring consistent formatting for automated processing.

Tip: The output is intentionally simple CSV‑like items so that security teams can drop the data into dashboards, SIEMs, or other triage tools with minimal plumbing.


📦 Installation

pip install vuln_structure

⚡ Quick Start

from vuln_structure import vuln_structure

user_input = """
WatchGuard WatchGuard Firebox RCE vulnerability. A remote attacker can trigger
remote code execution by sending a specially crafted GET request on port 80.
"""

# Using the default LLM7 model
results = vuln_structure(user_input)

# results is a list of strings, one per extracted data item.
print(results)

📚 Alternative LLM Backends

The function accepts any langchain_core.language_models.BaseChatModel instance.
Below are short examples for the most common back‑ends.

OpenAI

from langchain_openai import ChatOpenAI
from vuln_structure import vuln_structure

llm = ChatOpenAI()  # API key taken from environment (`OPENAI_API_KEY`)

results = vuln_structure(user_input, llm=llm)

Anthropic

from langchain_anthropic import ChatAnthropic
from vuln_structure import vuln_structure

llm = ChatAnthropic()  # API key taken from environment (`ANTHROPIC_API_KEY`)

results = vuln_structure(user_input, llm=llm)

Google Gemini

from langchain_google_genai import ChatGoogleGenerativeAI
from vuln_structure import vuln_structure

llm = ChatGoogleGenerativeAI()  # API key from `GOOGLE_API_KEY`

results = vuln_structure(user_input, llm=llm)

Note: If you don't pass an llm argument, the package will automatically initialise a ChatLLM7 instance (from the langchain_llm7 package). The free tier of LLM7 imposes generous rate limits that are usually sufficient for typical use cases.


🚀 Configuration

Parameter Type Description
user_input str Raw vulnerability description.
llm Optional[BaseChatModel] LangChain LLM instance. If omitted, a default ChatLLM7 is used.
api_key Optional[str] API key for LLM7. If omitted, the code will look for the LLM7_API_KEY environment variable; a fallback value of "None" is used if both are missing.

🔑 Getting an LLM7 API Key

  1. Sign up at the LLM7 token portal.
  2. Store the key safely:
    export LLM7_API_KEY="your_token_here"
    or pass it directly: vuln_structure(user_input, api_key="your_token_here").

🗂️ Output Format

Each item in the returned list is a single line that follows the regex pattern defined in prompts.py.
Typical items look like:

"Vulnerability: Remote Code Execution (CVE‑2023‑5265)"
"Affected System: WatchGuard Firebox Series, Version ≤ 4.6.0"
"Impact: Full system compromise"
"Mitigation: Update to version 4.6.1 or later"

You can easily parse these strings into JSON or CSV with standard Python tools.


🤝 Contributing & Issues


📄 License

This project is licensed under the MIT License.


📬 Author

Eugene Evstafev
Email: hi@euegne.plus
GitHub: chigwell