Skip to content
This repository was archived by the owner on Dec 12, 2024. It is now read-only.

Commit

Permalink
GitHub action (#5)
Browse files Browse the repository at this point in the history
* add daily upload action

* updating some docs

* Update cve2stix

* Update README.md

* Update daily-r2.yml

* add support for secrets.CLOUDFLARE_*; closes #2

* change dir structure; closes #4

* update submodules

* update submodules

* adding more docs

* fixing names for cloudflare

* Update README.md

* adding latest cpe2stix

* Update README.md

* updating git sub modules

---------

Co-authored-by: Fadl <chaos@efqr.dev>
  • Loading branch information
himynamesdave and fqrious authored Aug 25, 2024
1 parent ab1c2a6 commit 2a2ff4a
Show file tree
Hide file tree
Showing 4 changed files with 221 additions and 24 deletions.
79 changes: 79 additions & 0 deletions .github/workflows/daily-r2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: R2 Daily Upload
run-name: ${{ github.actor }} is running cxe2stix
on:
schedule:
- cron: "0 7 * * *" # 7am everyday
jobs:
upload-daily:
runs-on: ubuntu-latest
env:
DATE: ${{ vars.DATE || 'yesterday' }}
services:
redis:
image: redis
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- name: install gh
if: ${{ env.ACT }}
run: |
(type -p wget >/dev/null || (sudo apt update && sudo apt-get install wget -y)) \
&& sudo mkdir -p -m 755 /etc/apt/keyrings \
&& wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& sudo chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y
- id: setup_rclone_config
if: ${{ !secrets.RCLONE_CONFIG }}
run: |
rclone_config=$(echo -e '[r2]' \
'\ntype = s3' \
'\nprovider = Cloudflare' \
'\naccess_key_id = ${{ secrets.CLOUDFLARE_ACCESS_KEY_ID }}' \
'\nsecret_access_key = ${{ secrets.CLOUDFLARE_ACCESS_KEY_SECRET }}' \
'\nregion = auto' \
'\nendpoint = ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com' \
'\nacl = private')
rclone_config_b64=$(base64 -w0 <<< $rclone_config)
echo "::add-mask::$rclone_config_b64"
echo rclone_cf_config=$rclone_config_b64 >> $GITHUB_ENV
- name: Setup Rclone
uses: AnimMouse/setup-rclone@v1
with:
rclone_config: ${{ secrets.RCLONE_CONFIG || env.rclone_cf_config }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}


- uses: actions/checkout@v4
- uses: actions/setup-python@v5

- name: Install requirements
run: |
# create a venv
python -m venv cxe2stix_helper-venv
source cxe2stix_helper-venv/bin/activate
# install requirements
pip install -r requirements.txt
- name: Run CPE2STIX & CVE2STIX
run: |
source cxe2stix_helper-venv/bin/activate
YESTERDAY=$(date -u -d $DATE +"%Y-%m-%d")
python3 cxe2stix_helper.py \
--run_cpe2stix \
--run_cve2stix \
--last_modified_earliest "$YESTERDAY"T00:00:00 \
--last_modified_latest "$YESTERDAY"T23:59:59 \
--file_time_range 1d
- name: upload bundle to r2
run: rclone copy output/bundles/ r2:cxe2stix-helper-github-action-output/ -v
147 changes: 128 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,6 @@

A small wrapper to download data using cve2stix and cpe2stix, organising it into STIX bundles based on time ranges.

## Before you get started

If you do not want to backfill, maintain, or support your own CVE and CPE STIX objects check out CTI Butler which provides a fully manage database of these objects and more!

https://www.ctibutler.com/

## Install the script

```shell
Expand Down Expand Up @@ -52,7 +46,7 @@ Where;
* default: none
* `last_modified_latest` (required, date in format `YYYY-MM-DDThh:mm:ss`): used in the the cve2stix/cpe2stix config
* default: none
* `file_time_range` (optional): defines how much data should be packed in each output bundle. Use `d` for days, `m` for months, `y` for years. Note for months and years, bundles are packed per calendar month / year (see example for more info)
* `file_time_range` (optional): defines how much data should be packed in each output bundle. Use `d` for days, `m` for months, `y` for years. Note, if no results are found for a time period, a bundle will not be generated. This usually explains why you see "missing" bundles for a day or month.
* default `1m` (1 month)

Both scripts also use the following parameters that a user does not enter at the command line
Expand All @@ -70,13 +64,18 @@ python3 cxe2stix_helper.py \
--file_time_range 1m
```

Will generate 4 bundle files:

* `cpe-bundle-2023_03_04-2023_03_31.json`
* `cpe-bundle-2023_04_01-2023_04_30.json`
* `cpe-bundle-2023_05_01-2023_05_31.json`
* `cpe-bundle-2023_06_01-2023_06_04.json`

Will generate 4 bundle files in directories as follows:

```txt
output
└── bundles
└── cpe
└── 2023
├── cpe-bundle-2023_03_04-2023_03_31.json
├── cpe-bundle-2023_04_01-2023_04_30.json
├── cpe-bundle-2023_05_01-2023_05_31.json
└── cpe-bundle-2023_06_01-2023_06_04.json
```

### Example 2: Get 3 days of CVE data (split into STIX bundles of 1 day)

Expand All @@ -94,6 +93,16 @@ Will generate 3 bundle files:
* `cve-bundle-2023_01_02-2023_01_02.json`
* `cve-bundle-2023_01_03-2023_01_03.json`

```txt
output
└── bundles
└── cve
└── 2023-01
├── cve-bundle-2023_01_01-2023_01_01.json
├── cve-bundle-2023_01_02-2023_01_02.json
└── cve-bundle-2023_01_03-2023_01_03.json
```

### Example 3: Get 2 days of CVE and CPE data (split into STIX bundles of 2 months)

```shell
Expand All @@ -107,8 +116,16 @@ python3 cxe2stix_helper.py \

Will generate 2 bundle files:

* `cpe-bundle-2023_01_01-2023_01_02.json`
* `cve-bundle-2023_01_01-2023_01_02.json`
```txt
output
└── bundles
├── cve
│ └── 2023
│ └── cve-bundle-2023_01_01-2023_01_02.json
└── cpe
└── 2023
└── cpe-bundle-2023_01_01-2023_01_02.json
```

## Why not run the scripts (cpe2stix / cve2stix) independently?

Expand All @@ -120,13 +137,15 @@ Which means you need to manually edit the .env files for many time ranges each t

cxe2stix_helper is designed to automate the process of downloading very large datasets whilst also allowing control on the output filenames.

If you want to keep a copy of each individual STIX .json object, you should use cve2stix or cpe2stix. cxe2stix_helper will only print the final bundles.

## Recommendations for running large backfills

### CVE

The first CVE published was `1988-10-01T04:00:00.000`. There are 250,888 at the time of writing, and this number increasing rapidly.

Due to the volume and size of CVEs, we recommend iterating through the data in months. This means all bundles (especially those after 2018) will always be less than 100mb.
Due to the volume and size of CVEs, we recommend iterating through the data in days. This means all bundles (especially those after 2018) will always be less than 10mb.

Here is what we use;

Expand All @@ -135,7 +154,7 @@ python3 cxe2stix_helper.py \
--run_cve2stix \
--last_modified_earliest 2005-01-01T00:00:00 \
--last_modified_latest 2024-01-01T23:59:59 \
--file_time_range 1m
--file_time_range 1d
```

Note, whilst the first CVE was published in October 1988, it appears all CVEs published before 2005 were updated at the end of 2005 (or afterwards). The
Expand All @@ -153,7 +172,7 @@ python3 cxe2stix_helper.py \
--run_cpe2stix \
--last_modified_earliest 2007-01-01T00:00:00 \
--last_modified_latest 2024-01-01T23:59:59 \
--file_time_range 3m
--file_time_range 1d
```

The earliest CPEs have a last modified date in 2007.
Expand All @@ -176,6 +195,96 @@ git checkout main
git pull
```

## Support for Cloudflare R2 + Github action

We use a Github action to run this script daily to store the bundles generated by cxe2stix_helper on Cloudflare R2.

The script runs at 0700 UTC everyday (github servers UTC) using cron: `"0 7 * * *"`

You can see the action in: `/.github/workflows/daily-r2.yml`.

Essentially the following command is run everyday by the action

```shell
python3 cxe2stix_helper.py \
--run_cve2stix \
--run_cpe2stix \
--last_modified_earliest "YESTERDAY (00:00:00)" \
--last_modified_latest "YESTERDAY (23:59:59)" \
--file_time_range 1d
```

The action will store the data in the bucket as follows;

```txt
cxe2stix-helper-github-action-output
├── cve
│ └── 2023-01
│ └── cve-bundle-2023_01_01-2023_01_02.json
└── cpe
└── 2023-01
└── cpe-bundle-2023_01_01-2023_01_02.json
```

If you'd like to run the action in your own repository to create your own data store you will need to do the following;

### Create Cloudflare bucket/kets

First, go to Cloudflare.com and navigate to R2. Create a new bucket called `cxe2stix-helper-github-action-output`.

Now you need to create a CloudFlare API keys. For the CloudFlare API Key you create, make sure to set the permissions to `Admin Read & Write`. For security, it is also worth limiting the scope of the key to the bucket `cxe2stix_helper-github-action-output` (defined in the action).

### Set Github vars

Then go to the Github repo, then `repo > settings > secrets and variables > actions > new repository secret`.

![](docs/github-repo-vars.png)

Then choose one of the following options;

#### Option 1: use `CLOUDFLARE_*` vars

Set the following in the secrets;

```txt
CLOUDFLARE_ACCOUNT_ID=#Get this in Cloudflare R2 UI
CLOUDFLARE_ACCESS_KEY_ID=#Get this in Cloudflare R2 UI
CLOUDFLARE_ACCESS_KEY_SECRET=#Get this in Cloudflare R2 UI
NVD_API_KEY=#Get this from https://nvd.nist.gov/developers/request-an-api-key
```

You most likely want to use this approach.

#### Option 2: use `RCLONE_CONFIG` var

In the `RCLONE_CONFIG` var, add a valid RClone conf file (title must be `[R2]`), e.g.

```
[r2]
type = s3
provider = Cloudflare
access_key_id = <ACCESS_KEY>
secret_access_key = <SECRET_ACCESS_KEY>
region = auto
endpoint = https://<ACCOUNT_ID>.r2.cloudflarestorage.com
acl = private
```

This approach allows you to potentially use other services than just Cloudflare, if you know what you're doing.

Where:

* `[r2]`: A custom name(an alias) for storage service. We need to use it to operate files.
* `type` = s3: The type of file operation API. R2 supports the S3 standard protocol.
* `provider` = Cloudflare: The storage provider ID. You could use man rclone in your terminal to get the supported providers.
* `access_key_id`: You need to create a token with Admin Read & Write permissions on the R2 console (note, I am not sure if this is a bug, but I couldn’t get it to work with any other permissions levels)
* `secret_access_key`: Same as above.
* `endpoint`: The URL that rclone uses to operate files. To get the account id on the top-right of the R2 homepage.

### Backfill advicde

Due to the backfill size it will cause timeouts if you try to run it on Github. Similarly, if you set the `file_time_range` above `1d` it is likely to timeout due to data sizes. It's better to run the backfill locally and then start the automated action to backfill from backfill dayN+1.

## Support

[Minimal support provided via the DOGESEC community](https://community.dogesec.com/).
Expand Down
19 changes: 14 additions & 5 deletions cxe2stix_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def get_time_ranges(s, earliest: dt, latest: dt) -> list[tuple[dt, dt]]:
hi = hi.replace(hour=23, minute=59, second=59)
if hi >= latest:
hi = latest
output.append((lo, hi))
output.append((unit, lo, hi))
hi += ONESEC
return output

Expand Down Expand Up @@ -122,16 +122,24 @@ def main():

shutil.rmtree(PARENT_PATH, ignore_errors=True)

for start_date, end_date in get_time_ranges(file_time_range, last_modified_earliest, last_modified_latest):
for time_unit, start_date, end_date in get_time_ranges(file_time_range, last_modified_earliest, last_modified_latest):
# end_date = dt.combine(end_date, dt.max.time())
start_day, end_day = start_date.strftime('%Y_%m_%d-%H_%M_%S'), end_date.strftime('%Y_%m_%d-%H_%M_%S')

subdir = ""
match time_unit:
case 'm':
subdir = start_date.strftime('%Y')
case 'd':
subdir = start_date.strftime('%Y-%m')
case _:
subdir = '.'

if run_cve2stix:
file_system = OBJECTS_PARENT/f"cve_objects-{start_day}-{end_day}"
file_system.mkdir(parents=True, exist_ok=True)
cprocess = start_celery("cve2stix.celery", "cve2stix", env=dict(RESULTS_PER_PAGE=cve_results_per_page))
bundle_name = f"cve-bundle-{start_day}-{end_day}.json"
bundle_name = f"cve/{subdir}/cve-bundle-{start_day}-{end_day}.json"
(BUNDLE_PATH/bundle_name).parent.mkdir(parents=True, exist_ok=True)
celery_task = cve2stix.main(
filename=bundle_name,
config=cve2stix.Config(
Expand All @@ -151,7 +159,8 @@ def main():
file_system = OBJECTS_PARENT/f"cpe_objects-{start_day}-{end_day}"
file_system.mkdir(parents=True, exist_ok=True)
cprocess = start_celery("cpe2stix.celery", "cpe2stix", env=dict(RESULTS_PER_PAGE=cpe_results_per_page))
bundle_name = f"cpe-bundle-{start_day}-{end_day}.json"
bundle_name = f"cpe/{subdir}/cpe-bundle-{start_day}-{end_day}.json"
(BUNDLE_PATH/bundle_name).parent.mkdir(parents=True, exist_ok=True)
celery_task = cpe2stix.main(
filename=bundle_name,
config=cpe2stix.Config(
Expand Down
Binary file added docs/github-repo-vars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2a2ff4a

Please sign in to comment.