Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*.pyc
.DS_Store
dist/
dist/
utbox/local/
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@

URL Toolbox (UTBox) is a set of building blocks for Splunk specially created for URL manipulation. UTBox has been created to be modular, easy to use and easy to deploy in any Splunk environments.

One of the core feature of UTBox is to correctly parse URLs and complicated TLDs (Top Level Domain) using the Mozilla Suffix List. Other functions like shannon entropy, counting, suites, meaning ratio, bayesian analysis, etc, are also available.
One of the core features of UTBox is correctly parsing URLs and complicated TLDs (Top Level Domain) using the Mozilla Suffix List. Other functions like shannon entropy, counting, suites, meaning ratio, bayesian analysis, etc., are also available.

UTBox has firstly be created for security analysts but may fit other needs as it’s a set of building blocks. UTBox only needs to be deployed on Splunk Search Heads (the bundles will automatically be sent to your Splunk Indexers). Finally, each lookups is shipped with a macro to make it easier to use.
UTBox was created for security analysts but may fit other needs as it’s a set of building blocks. UTBox only needs to be deployed on Splunk Search Heads (the bundles will automatically be sent to your Splunk Indexers). Finally, each lookup is shipped with a macro to make it easier to use.

[Read about this app on Splunk Blogs!](https://www.splunk.com/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html)

## Getting Started

This section outlines the steps required to use the app on a Splunk Enterprise environment. If you want to develop the code base further, refer to the [Development](##development) section of this README.
This section outlines the steps required to use the app on a Splunk Enterprise environment. If you want to develop the code base further, refer to the [Development](#development) section of this README.


### Prerequisites
Expand All @@ -46,7 +46,7 @@ This app needs to be installed on the Search tier of your deployment.

This app provides a set of macros that simplify the interaction with the bundled lookups.

Please find below some selected samples of commands and their respective output. Please find more in-depth examples and explanation [in the docs](utbox/appserver/static/documentation.pdf).
Please find below some selected samples of commands and their respective output. Please find more in-depth examples and explanation [in the PDF docs](utbox/appserver/static/documentation.pdf) or [in the Markdown docs](utbox/appserver/static/documentation.md).


### ut_parse_simple
Expand All @@ -59,9 +59,10 @@ Please find below some selected samples of commands and their respective output.
```

**Output**
|_time |url |ut_fragment|ut_netloc |ut_params|ut_path|ut_query|ut_scheme|
|----------------------------|------------------|-----------|----------|---------|-------|--------|---------|
|2021-12-16T10:29:07.000+0000|https://www.splunk.com/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html|None |www.splunk.com|None |/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html|None |https |

| _time | url | ut_fragment | ut_netloc | ut_params | ut_path | ut_query | ut_scheme |
|------------------------------|--------------------|-------------|------------|-----------|-------------------------------------------------------------------|----------|-----------|
| 2021-12-16T10:29:07.000+0000 | https://splunk.com | None | splunk.com | None | /en_us/blog/security/ut-parsing-domains-like-house-slytherin.html | None | https |

## ut_parse

Expand All @@ -72,9 +73,10 @@ Please find below some selected samples of commands and their respective output.
| `ut_parse(url, list)`
```
**Output**
|_time |list |url |ut_domain |ut_domain_without_tld|ut_fragment|ut_netloc|ut_params|ut_path |ut_port|ut_query|ut_scheme|ut_subdomain|ut_subdomain_count|ut_subdomain_level_1|ut_tld|
|----------------------------|------------------|----|----------|---------------------|-----------|---------|---------|-----------------------------------------------------------------|-------|--------|---------|------------|------------------|--------------------|------|
|2021-12-16T10:30:00.000+0000|* |https://www.splunk.com/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html|splunk.com|splunk |None |www.splunk.com|None |/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html|80 |None |https |www |1 |www |com |

| _time | list | url | ut_domain | ut_domain_without_tld | ut_fragment | ut_netloc | ut_params | ut_path | ut_port | ut_query | ut_scheme | ut_subdomain | ut_subdomain_count | ut_subdomain_level_1 | ut_tld |
|------------------------------|------|-----------------------------------------------------------------------------------------|------------|-----------------------|-------------|----------------|-----------|-------------------------------------------------------------------|---------|----------|-----------|--------------|--------------------|----------------------|--------|
| 2021-12-16T10:30:00.000+0000 | * | https://www.splunk.com/en_us/blog/security/ut-parsing-domains-like-house-slytherin.html | splunk.com | splunk | None | www.splunk.com | None | /en_us/blog/security/ut-parsing-domains-like-house-slytherin.html | 443 | None | https | www | 1 | www | com |

## ut_shannon

Expand All @@ -86,13 +88,15 @@ Please find below some selected samples of commands and their respective output.
```

**Output**
|_time |url |ut_shannon|
|----------------------------|------------------|----------|
|2021-12-16T10:32:19.000+0000|buttercup |2.725480556997868|

| _time | url | ut_shannon |
|------------------------------|-----------|-------------------|
| 2021-12-16T10:32:19.000+0000 | buttercup | 2.725480556997868 |

## ut_countset

**SPL**

```
|makeresults count=1
| eval url="buttercup"
Expand All @@ -101,9 +105,10 @@ Please find below some selected samples of commands and their respective output.
```

**Output**
|_time |set |url |ut_countset |
|----------------------------|------------------|----|---------------------------------------------|
|2021-12-16T10:34:17.000+0000|tu |buttercup|{"ut_countset": {"sum": 4, "74": 2, "75": 2}}|

| _time | set | url | ut_countset |
|------------------------------|-----|-----------|-----------------------------------------------|
| 2021-12-16T10:34:17.000+0000 | tu | buttercup | {"ut_countset": {"sum": 4, "74": 2, "75": 2}} |


## Development
Expand Down Expand Up @@ -137,11 +142,11 @@ This will create an app package in `dist/utbox.tar.gz`

## License

Please refer to the [License on Splunkbase.](https://cdn.apps.splunk.com/static/misc/eula.html)
Please refer to the [License on Splunkbase.](https://cdn.splunkbase.splunk.com/static/misc/eula.html)

## Bug Fixes / Enhancement Requests

🐞 ✍🏼 💡 Create [issue](https://github.com/splunk/utbox/issues/new) with appropriate label to submit the request.
🐞 ✍🏼 💡 Create [issue](https://github.com/splunk/utbox/issues/new) with appropriate labels to submit the request.

## Troubleshooting

Expand Down
2 changes: 0 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
version: "3.6"

services:
splunk:
image: splunk/splunk:9.0.1
Expand Down
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
publicsuffixlist
## Splunk provides the following Python packages
certifi
charset-normalizer
idna
requests
urllib3
98 changes: 98 additions & 0 deletions tests/data/publicsuffix_tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// Any copyright is dedicated to the Public Domain.
// https://creativecommons.org/publicdomain/zero/1.0/

// null input.
null null
// Mixed case.
COM null
example.COM example.com
WwW.example.COM example.com
// Leading dot.
.com null
.example null
.example.com null
.example.example null
// Unlisted TLD.
example null
example.example example.example
b.example.example example.example
a.b.example.example example.example
// Listed, but non-Internet, TLD.
//local null
//example.local null
//b.example.local null
//a.b.example.local null
// TLD with only 1 rule.
biz null
domain.biz domain.biz
b.domain.biz domain.biz
a.b.domain.biz domain.biz
// TLD with some 2-level rules.
com null
example.com example.com
b.example.com example.com
a.b.example.com example.com
uk.com null
example.uk.com example.uk.com
b.example.uk.com example.uk.com
a.b.example.uk.com example.uk.com
test.ac test.ac
// TLD with only 1 (wildcard) rule.
mm null
c.mm null
b.c.mm b.c.mm
a.b.c.mm b.c.mm
// More complex TLD.
jp null
test.jp test.jp
www.test.jp test.jp
ac.jp null
test.ac.jp test.ac.jp
www.test.ac.jp test.ac.jp
kyoto.jp null
test.kyoto.jp test.kyoto.jp
ide.kyoto.jp null
b.ide.kyoto.jp b.ide.kyoto.jp
a.b.ide.kyoto.jp b.ide.kyoto.jp
c.kobe.jp null
b.c.kobe.jp b.c.kobe.jp
a.b.c.kobe.jp b.c.kobe.jp
city.kobe.jp city.kobe.jp
www.city.kobe.jp city.kobe.jp
// TLD with a wildcard rule and exceptions.
ck null
test.ck null
b.test.ck b.test.ck
a.b.test.ck b.test.ck
www.ck www.ck
www.www.ck www.ck
// US K12.
us null
test.us test.us
www.test.us test.us
ak.us null
test.ak.us test.ak.us
www.test.ak.us test.ak.us
k12.ak.us null
test.k12.ak.us test.k12.ak.us
www.test.k12.ak.us test.k12.ak.us
// IDN labels.
食狮.com.cn 食狮.com.cn
食狮.公司.cn 食狮.公司.cn
www.食狮.公司.cn 食狮.公司.cn
shishi.公司.cn shishi.公司.cn
公司.cn null
食狮.中国 食狮.中国
www.食狮.中国 食狮.中国
shishi.中国 shishi.中国
中国 null
// Same as above, but punycoded.
xn--85x722f.com.cn xn--85x722f.com.cn
xn--85x722f.xn--55qx5d.cn xn--85x722f.xn--55qx5d.cn
www.xn--85x722f.xn--55qx5d.cn xn--85x722f.xn--55qx5d.cn
shishi.xn--55qx5d.cn shishi.xn--55qx5d.cn
xn--55qx5d.cn null
xn--85x722f.xn--fiqs8s xn--85x722f.xn--fiqs8s
www.xn--85x722f.xn--fiqs8s xn--85x722f.xn--fiqs8s
shishi.xn--fiqs8s shishi.xn--fiqs8s
xn--fiqs8s null
97 changes: 87 additions & 10 deletions tests/test_parse.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,107 @@
import sys
import os
import sys
import unittest
from pathlib import Path

log_path = Path(os.environ["SPLUNK_HOME"] + "/var/log/splunk")
log_path.mkdir(parents=True, exist_ok=True)
(log_path / Path("utbox.log")).write_text("")
SPLUNK_HOME = os.environ.get("SPLUNK_HOME")

if SPLUNK_HOME:
log_path = Path(os.environ["SPLUNK_HOME"]) / "var" / "log " / "splunk"
log_path.mkdir(parents=True, exist_ok=True)
(log_path / Path("utbox.log")).write_text("")


bin_path = Path(__file__).resolve().parents[1] / "utbox" / "bin"

sys.path.append(str(bin_path))

path_to_add = os.path.abspath(os.path.join(__file__, '..','..','utbox', 'bin'))
sys.path.append(path_to_add)

import ut_parse_lib


class TestParseMethods(unittest.TestCase):

def test_parse_extended(self):
def test_parse_iana_domain(self):
domain = "http://www.example.com/123/123.php"
lists = ["mozilla", "iana", "custom"]
lists = {l: "example.com" for l in ["iana", "icann", "mozilla"]}

for l in lists:
with self.subTest(l=l):
TLDList = ut_parse_lib.loadTLDFile(l)
parse_result = ut_parse_lib.parse_extended(domain, TLDList)
tld_list = ut_parse_lib.get_public_suffix_list(l)
parse_result = ut_parse_lib.parse_extended(domain, tld_list)
self.assertEqual(parse_result["ut_domain"], "example.com")

def test_parse_psl_domain(self):
domain = "http://www.example.co.uk/123/123.php"
lists = {
"iana": "co.uk",
"icann": "example.co.uk",
"mozilla": "example.co.uk",
}

for l, val in lists.items():
with self.subTest(l=l):
tld_list = ut_parse_lib.get_public_suffix_list(l)
parse_result = ut_parse_lib.parse_extended(domain, tld_list)
self.assertEqual(parse_result["ut_domain"], val)

def test_parse_private_domain(self):
domain = "http://putz.priv.at/123/123.php"
lists = {
"iana": "priv.at",
"icann": "priv.at",
"mozilla": "putz.priv.at",
}

for l, val in lists.items():
with self.subTest(l=l):
tld_list = ut_parse_lib.get_public_suffix_list(l)
parse_result = ut_parse_lib.parse_extended(domain, tld_list)
self.assertEqual(parse_result["ut_domain"], val)

def test_mozilla_domains(self):
# Parsing data/tests.txt from https://github.com/publicsuffix/list/tree/main/tests
tests = Path("data") / "publicsuffix_tests.txt"
tld_list = ut_parse_lib.get_public_suffix_list("mozilla")
for line in tests.read_text().splitlines():
if not line or line.startswith("//"):
continue
domain, expected = line.split(" ", 1)
domain = "None" if domain == "null" else domain
expected = "None" if expected == "null" else expected
with self.subTest(domain=domain):
parse_result = ut_parse_lib.parse_extended(domain, tld_list)
self.assertEqual(parse_result["ut_domain"], expected)

def test_parse_protocol(self):
urls = {
"http://www.example.com/123/123.php": "80",
"https://www.example.com/123/123.php": "443",
"https://www.example.com:8443/123/123.php": "8443",
"ftp://www.example.com/123/123.php": "21",
}

tld_list = ut_parse_lib.get_public_suffix_list("mozilla")
for url, port in urls.items():
with self.subTest(url=url):
parse_result = ut_parse_lib.parse_extended(url, tld_list)
self.assertEqual(parse_result["ut_port"], port)

def test_ip_host(self):
urls = {
"https://192.0.46.8/123/123.php": "192.0.46.8",
"https://192.0.46.8:8443/123/123.php": "192.0.46.8",
"https://[2620:0000:2830:0200:0000:0000:000b:0008]/123abc/123abc.php": "2620:0:2830:200::b:8",
"https://[2620:0000:2830:0200:0000:0000:000b:0008]:8443/123abc/123abc.php": "2620:0:2830:200::b:8",
}

tld_list = ut_parse_lib.get_public_suffix_list("mozilla")
for url, ip in urls.items():
with self.subTest(url=url):
parse_result = ut_parse_lib.parse_extended(url, tld_list)
self.assertEqual(parse_result["ut_tld"], "None")
self.assertEqual(parse_result["ut_domain_without_tld"], ip)


if __name__ == "__main__":
unittest.main()
6 changes: 6 additions & 0 deletions utbox/README/inputs.conf.spec
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[update_tld_lists://<default>]
* Download updates to IANA TLD list, Mozilla Public Suffix List, and generate related lookup files

PSL_URL =
IANA_URL =
create_lookup =
Loading