Skip to content

Conversation

@malvidin
Copy link

Adds update mechanism for #4. When executed, the ut_parse_extended lookup will attempt to update the Mozilla and IANA lists if the files have not been updated in 30 days. This likely also addresses #5. A custom lookup could be added that uses native WILDCARD lookup capabilities to extract the Mozilla TLD.

The changes change use of list="*" for the extended lookup; it defaults to "iana" if a value of "iana", "icann", "mozilla", or "custom" is not used. The custom lookup can be modified to provide similar capabilities to list="*".

For #7, the use of the publicsuffixlist package fixes that issue, but the output is not the same as what @dbranger listed. If the TLD is not in the selected list, "None" is returned (accept_unknown=False). This could be modified to accept unknown TLDs, or to only reject unknown TLDs for the ut_tld field.

The ut_bayesian export is now a JSON object, so the values per ngram can be distinguished.

I recommend making a version change to 1.10.0 or higher, due to the changes above.

Update Public Suffix List and IANA TLDs
Move TLD Lists to "default"
Create lists updates in "local"
Move Bayesian and Meaning lists to Lookups
Add lookup/macro to unwrap rewritten URLs
Add updater for PSL/TLDs
Black formatting on all scripts
Update all scripts to use main()
Update many scripts to use built-in functions
Add cache of requirements packages for Python 3.9
Add packages for Python 3.7 (pre-Splunk 9.2)
Add CSV-based domain parsing with generate_psl_lookup.py
Add lookup macro descriptions to documentation.md
Remove Python packages that were already available
Move TLD and PSL direct downloads to simple modular input.
Bump version
Add test for Mozilla test domains
Add test for common protocol ports
Add test for IP address as host
Remove Python 3.7 specific package folder
Fix typo in lookup generator
Fix IPv6 host parsing without port
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant