Skip to content
This repository has been archived by the owner on Feb 3, 2024. It is now read-only.

Commit

Permalink
Merge pull request #336 from maarten-boot/master
Browse files Browse the repository at this point in the history
update to whoisdomain 1.20231115.1
  • Loading branch information
maarten-boot authored Nov 15, 2023
2 parents 995091d + 7cd3a23 commit 495d315
Show file tree
Hide file tree
Showing 17 changed files with 541 additions and 150 deletions.
177 changes: 71 additions & 106 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,141 +1,106 @@
# whois
* A Python package for retrieving WHOIS information of domains.
* This package will not support querying ip CIDR ranges or AS information
* It requires the whois cli component of your os to be installed: e.g. `/usr/bin/whois` on Linux

## NOTE
* 2023-04-25: mboot
* when DannyCork returns he can decide on the future of this repo.
* in his absence future development will take place in: https://github.com/mboot-github/WhoisDomain
* and new pypi releases will come from: https://pypi.org/project/whoisdomain/
* efforts will be made to keep the v1.x.y version of whoisdomain compatible with this repo
* changes will be verified and back copied also here for the time being
* starting 2024-02, this repo will be abandon-ware

## Support
* Python 3.x is supported for x >= 9
* Python 2.x IS NOT supported.
# whoisdomain

* A Python package for retrieving WHOIS information of DOMAIN'S ONLY.
* Python 2.x IS NOT supported.
* Currently no additional python packages need to be installed.

## Features
* Python wrapper for the "whois" cli command of your operating system.
* Simple interface to access parsed WHOIS data for a given domain.
* Able to extract data for all the popular TLDs (com, org, net, biz, info, pl, jp, uk, nz, ...).
* Query a WHOIS server directly instead of going through an intermediate web service like many others do.
* Works with Python >= 3.9
* All dates as datetime objects.
* Possibility to cache results.
* Verbose output on stderr during debugging to see how the internal functions are doing their work
* raise a exception on Quota ecceeded type responses
* raise a exception on PrivateRegistry tld's where we know the tld and know we don't know anything
* allow for optional cleaning the whois response before extracting information
* optionally allow IDN's to be translated to Punycode
* optional specify the whois command on query(...,cmd="whois") as in: https://github.com/gen1us2k/python-whois/

## Dependencies
* please install also the command line "whois" of your distribution
* this library parses the output of the "whois" cli command of your operating system
---

## Docker
* docker pull mbootgithub/whoisdomain:latest
## Notes

## Help Wanted
Your contributions are welcome, look for the Help wanted tag https://github.com/DannyCork/python-whois/labels/help%20wanted
* This package will not support querying ip CIDR ranges or AS information
* This was a copy of the original DanyCork 'whois'.
* Significantly refactored in 2023.
* The output is still compatible with DanyCork 'whois'

## Usage example
## Versioning

Install the cli `whois` of your operating system if it is not present already
* I will start versioning at 1.x.x
* the second item will be YYYYMMDD,
* the third item will start from 1 and be only used if more than one update will have to be done in one day.

Install `whois` package from your distribution (e.g apt install whois)
Versions `1.x.x` will keep the output compatible with Danny Cork until 2024-02-03 (February 2024)

## Releases

$pip install whois
* Releases are avalable at: [Pypi](https://pypi.org/project/whoisdomain/)

>>> import whois
>>> domain = whois.query('google.com')
Pypi releases can be installed with:

>>> print(domain.__dict__)
{
'expiration_date': datetime.datetime(2020, 9, 14, 0, 0),
'last_updated': datetime.datetime(2011, 7, 20, 0, 0),
'registrar': 'MARKMONITOR INC.',
'name': 'google.com',
'creation_date': datetime.datetime(1997, 9, 15, 0, 0)
}
* `pip install whoisdomain`

>>> print(domain.name)
google.com
## Features
* See: [Features](docs/Features.md)

>>> print(domain.expiration_date)
2020-09-14 00:00:00
## Dependencies
* please install also the command line "whois" of your distribution as this library parses the output of the "whois" cli command of your operating system

## ccTLD & TLD support
see the file: ./whois/tld_regexpr.py
or call whois.validTlds()
### Notes for Mac users
* it has been observed that the default cli whois on Mac is showing each forward step in its output, this makes parsing the result very unreliable.
* using a brew install whois will give in general better results.

## Issues
* Raise an issue https://github.com/DannyCork/python-whois/issues/new
## Docker release
* See [Docker](docs/Docker.md)

## Changes: 2022-06-09: maarten_boot:
* the returned list of name_servers is now a sorted unique list and not a set
* the help function whois.validTlds() now outputs the true tld with dots
## Usage example
* See [Usage](docs/Usage.md)

## 2022-09-27: maarten_boot
* add test2.py to replace test.py
* ./test2.py -h will show the possible usage
* all tests from the original program are now files in the ./tests directory
* test can be done on all supported tld's with -a or --all and limitest by regex with -r <pattern> or --reg=<pattern>
## whoisdomain
* the cli `whoisdomain` is documented in [whoisdomain-cli](docs/whoisdomain-cli.md)

## 2022-11-04: maarten_boot
* add support for Iana example.com, example.net
## ccTLD & TLD support

## 2022-11-07: maarten_boot
* add testing against static known data in dir: ./testdata/<domain>/output
* test.sh will test all domains in testdata without actually calling whois, the input data is instead read from testdata/<domain>/input
Most `tld's` are now autodetected via IANA root db, see the Analizer directory
and `make suggest`.

## 2022-11-11: maarten_boot
* see the file: [tld_regexpr](./whoisdomain/tldDb/tld_regexpr.py)
* for python use: `whoisdomain.validTlds()`
* for cli use `whoisdomain -S`

* add support for returning the raw data from the whois command: flag include_raw_whois_text
* add support for handling unsupported domains via whois raw text only: flag return_raw_text_for_unsupported_tld
---

## 2023-01-18: sorrowless
## Support
* Python 3.x is supported for x >= 9
* Python 2.x IS NOT supported.

* add an opportunity to specify maximum cache age
## Author's
* See: [Authors](docs/Authors.md)

## 2023-01-25: maarten_boot
---

* convert the tld file to a Dict, we now no longer need a mappper for python keywords or second level domains.
* utf8 level domains also need no mapper anymore an can be added as is with a translation to xn--<something>
* added xn-- tlds for all known utf-8 domains we currently have
* we can now add new domains on the fly or change them: whois.mergeExternalDictWithRegex(aDictToOverride) see example exampleExtend.py
## Updates
* see [Updates](docs/Updates.md) for a full history of changes.
* Only the latest update is mentioned here

## 2023-01-27: maarten_boot
### 1.20230906.1
* introduce parsing based on functions
* allow contextual search in splitted data and plain data
* allow contextual search based on earlier result
* fix a few tld to return the proper registrant string (not nic handle)

* add autodetect via iana tld file (this has only tld's)
* add a central collection of all compiled regexes and reuse them: REG_COLLECTION_BY_KEY in _0_init_tld.py
* refresh testdata now that tld has dot instead of _ if more then one level
* add additional strings meaning domain does not exist
### 1.20230913.1
* if you have installed `tld` (pip install tld) you can enable withPublicSuffix=True to process untill you reach the pseudo tld.
* the public_suffix info is added if available (and if requested)
* example case is: ./test2.py -d www.dublin.airport.aero --withPublicSuffix

## 2023-02-02: maarten_boot
### 1.20230913.3
* fix re.NOFLAGS, it is not compatible with 3.9, it appears in 3.11

* whois.QuotaStringsAdd(str) to add additional strings for over quota detection. whois.QuotaStrings() lists the current configured strings
* whois.NoneStringsAdd(str) to add additional string for NoSuchDomainExists detection (whois.query() retuning None). whois.NoneStrings() lsts the current configured strings
* suppress messages to stderr if not verbose=True
## 1.20230917.1
* prepare work on pylint
* switch to logging: all verbose is currently log.debug(); to show set LOGLEVEL=DEBUG before calling, see Makefile: make test
* experimental: add extractServers: bool default False; when true we will try to extract the "redirect info chain" on rcf1036/whois and jwhois for linux/darwin
* add missing option to query(), test in production environment done

## 2023-07-20: maarten_boot
## 1.20231102.1
* fix from kazet for .pl tld.

* sync with https://github.com/mboot-github/WhoisDomain; 1.20230720.1; (gov.tr), (com.ru, msk.ru, spb.ru), (option to preserve partial output after timeout)
* sync with https://github.com/mboot-github/WhoisDomain; 1.20230720.2; add t_test hint support; fix some server hints
## 1.20231115.1
New tld's and removal of a few tlds no longer supported at iana

## 2023-08-21: mboot-github (maarten_boot)
* abb, bw, bn, crown, crs, fj (does not work), gp (does not work), weir, realtor, post, mw, pf (a strange one), iq (gives timout), mm, int, hm (does not work)

* abandon any python below 3.9 (mypy compatibilities)
* major refactor into more object based approch and parameterContext
* allow custom caching backends (e.g. redis, dbm, ...)
---

## 2023-09-22 see new paramaters in whois/context/parameterContext.oy
## in progress

* Sync with latest whoisdomain
* Allow cleaning up the http(s) info in the status response.
* Allow correlation with tld (pip install tld) public_suffix.
* Allow display of what whois-servers were used until we reach the final item.
7 changes: 7 additions & 0 deletions docs/Authors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Author's

* this is a rename copy of original work done in: https://github.com/DannyCork/python-whois
* the project is also related to the project: https://github.com/gen1us2k/python-whois
* both seem derived from a older google.code site: https://code.google.com/archive/p/python-whois
* aside from the original authors, many others already contributed to these repositories
* if authors/contributors prefer to be named explicitly, they can add a line in [Historical.txt](Historical.txt)
11 changes: 11 additions & 0 deletions docs/Docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Docker

[Docker](https://hub.docker.com/r/mbootgithub/whoisdomain)

* docker pull mbootgithub/whoisdomain:latest
* docker run mbootgithub/whoisdomain -V # show version
* docker run mbootgithub/whoisdomain -d google.com # run one domain
* docker run mbootgithub/whoisdomain -a # run all tld
* docker run mbootgithub/whoisdomain -d google.com -j | jq -r . # run one domains , output in json and reformat with jq
* docker run mbootgithub/whoisdomain -d google.com -j | jq -r '.expiration_date' # output only expire date
* docker run mbootgithub/whoisdomain -d google.com -j | jq -r '[ .expiration_date, .creation_date ]
20 changes: 20 additions & 0 deletions docs/Features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Features
* Python wrapper for the "whois" cli command of your operating system.
* Simple interface to access parsed WHOIS data for a given domain.
* Able to extract data for all the popular TLDs (com, org, net, biz, info, pl, jp, uk, nz, ...).
* Query a WHOIS server directly instead of going through an intermediate web service like many others do.
* Works with Python >= 3.9
* All dates as datetime objects.
* Possibility to cache results.
* Verbose output on stderr during debugging to see how the internal functions are doing their work
* raise a exception on Quota ecceeded type responses
* raise a exception on PrivateRegistry tld's where we know the tld and know we don't know anything
* allow for optional cleaning the response before extracting information
* optionally allow IDN's to be translated to Punycode
* optional specify the whois command on query(...,cmd="whois") as in: https://github.com/gen1us2k/python-whois/
* the module is now 'mypy --strict' clean
* the module now also exports a cli command domainwhois
* both the module and the cli now support showing the version with lib:whois.getVersion() or cli:whoisdomain -V
* the whoisdomain can now output json data (one per domain: e.g 'whoisdomain -d google.com -j' )
* withRedacted: bool = False has been added to query(), if set to True any redacted fields will now be shown also (also supported in the cli whoisdomain as --withRedacted)
* a analizer directory is presend in the github repo that will be used to look for new IANA tls's currently unsupported but maching known whois servers
106 changes: 106 additions & 0 deletions docs/Updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Updates
## 1.20230627.2
* add Kenia proper whois server and known second level domains
## 1.20230627.3
* add rw tld proper whois server and second level ; restore mistakenly deleted .toml file
## 1.20230627.3
* additional kenia second level domains
## 1.20230712.2
* tld .edu now can have up to 10 nameservers; remove action on pull request
## 1.20230717.1
* add tld: com.ru, msk.ru, spb.ru (all have a test documented), also update the tld: ru, the newlines are not needed.
## 1.20230717.2
* add option to parse partial result after timout has occurred (parse_partial_response:bool default False); this will need `stdbuf` installed otherwise it will fail
## 1.20230718.3
* fix typo in whois server hint for tld: ru
## 1.20230720.1
* add gov.tr; switch off status:available and status:free as None response, we should not interprete the result by default (we can add a option later)
## 1.20230720.2
* fix server hints for derived second level "xxx.tr", add processing "_test" hints during 'test2.py -a'
* add external caching framework that can be overridden for use of your own caching implementation
* renaming various vars to mak them more verbose
* preparing for capturing all parameters in one object and parring that object around instead of many arguments in methods/functions
* switch to json so we dont need a additional dependency in ParamContext
* finish rework args to ParameterContext, split of domain as file
## 1.20230803.1
* frenzy refactor-release
## 1.20230804.1
* testing
## 1.20230804.2
* testing after remove of leading dot in rw second level domains
## 1.20230804.3
* simplefy cache implementation after feedback from `baderdean`
* "more lembas bread", refactor parse and query
* remove option to typecheck CACHE_STUB, use try/catch/exit instead, does not work when timout happens, removed ;-(
* refactor doQuery create processWhoisDomainRequest, split of lastWhois
## 1.20230806.1
* testing done, prep new release: "more lembas bread"
* bug found with the default timeout: if no timeout is specified the program fails: all pypi releases before 2023-07-17 yanked
## 1.20230807.1
* fix default timeout
* add DummyCache, DBMCache, RedisCache with simple test in testCache.py, testing custom cache options
## 1.20230811.1
* replace type hint | with Union for py3.9 compat; switch off experimental redis tools
* switch off 3.[6-8] minimal is 3.9 we test against
* start working on dataContext;
* add more \_test items; reorder parts of tld_regexpr;
* propagate all meta domains servers as they are not inherited, testing , some domains have been retracted mboot; 2023-08-23;
* add suggestion from baderdean to parse fr domains with more focus on ORGANISATION
* 2023-08-24: mboot: more \_test added to tld
* verify all \_test on whois.nic.<tld> \_test: nic.<tld> fix where needed; remove some abandoned tld's
## 1.20230824.1
* mboot; to combine all new tests and changes, "the galloping Chutzpah release"
## 1.20230824.5
* mboot; fix missing module in whl
* restore python 3.6 test as i still use it on one remaining app with python 3.6 (make testP36)
* finalize verification of all tld's in iana, add test where this can be auto generated from whois.nic.<tld> 2023-08-28; mboot
## 1.20230829.1
* mboot; all \_test now work, using analizer tool to verify that iana tld db web site and tl-regexpr match
* add DEBUG to all verbose strings
* remove tldString and dList and domain , all go via dc (dataContect) now
* run tests and add new TODO
* moving all TLD_RE activities to tldInfo.py, and all exported helper funcs to helpers.py
* thinking about adding more complicated nested regex extractors to target contact info
* start with dependency inject: parser is passed as arg
* add cli interface to dependency inject, rightsize after test
* finish dependency inject move Domain create outside
* prep for other types or regex; all simple regex strings in tld_regexpr.py now need R() around them
* use currying to make all regex strings into function cal in whoisParser.py; all regexes in tld_regexpr.py are now converted on import to function calls via R()
* update tld: sk to use contextual extract, test with google.sk
* add findFromToAndLookForWithFindFirst contextual search based on a previous findFirst, used in "fr" tld, example google.fr, {} is used to add to fromStr

## 1.20230904.1
* only on pypi-test

---

## 1.20230906.1
* introduce parsing based on functions
* allow contextual search in splitted data and plain data
* allow contextual search based on earlier result
* fix a few tld to return the proper registrant string (not nic handle)
* introduce parsing based on functions, allow contextual search in splitted data and plain data, allow contextual search based on earlier result; fix a few tld to return the proper registrant string (not nic handle)

### 1.20230906.1
* introduce parsing based on functions
* allow contextual search in splitted data and plain data
* allow contextual search based on earlier result
* fix a few tld to return the proper registrant string (not nic handle)

### 1.20230913.1
* if you have installed `tld` (pip install tld) you can enable withPublicSuffix=True to process untill you reach the pseudo tld.
* the public_suffix info is added if available (and if requested)
* example case is: ./test2.py -d www.dublin.airport.aero --withPublicSuffix

### 1.20230913.3
* fix re.NOFLAGS, it is not compatible with 3.9, it appears in 3.11

---

## in progress

* prepare work on pylint
* switch to logging: all verbose is currently log.debug(); to show set LOGLEVEL=DEBUG before calling, see Makefile: make test



34 changes: 34 additions & 0 deletions docs/Usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Usage

## Requirements

* Install the cli `whois` of your operating system if it is not present already,
* Debian / Ubuntu:
* `sudo apt install whois`
* Fedora/Centos/Rocky:
* `sudo yum install whois`

## whois used in python (compatible with the Danny Cork version)

### example for fedora 37


sudo yum install whois
pip install whoisdomain

python
# to make it compatible with Danny_Cork whois
>>> import whoisdomain as whois
>>> d = whois.query('google.com')
>>> print(d.__dict__)

{'name': 'google.com', 'tld': 'com', 'registrar': 'MarkMonitor, Inc.', 'registrant_country': 'US', 'creation_date': datetime.datetime(1997, 9, 15, 9, 0), 'expiration_date': datetime.datetime(2028, 9, 13, 9, 0), 'last_updated': datetime.datetime(2019, 9, 9, 17, 39, 4), 'status': 'clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)', 'statuses': ['clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)', 'clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)', 'clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)', 'serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)', 'serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)', 'serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)'], 'dnssec': False, 'name_servers': ['ns1.google.com', 'ns2.google.com', 'ns3.google.com', 'ns4.google.com'], 'registrant': 'Google LLC', 'emails': ['abusecomplaints@markmonitor.com', 'whoisrequest@markmonitor.com']}

>>> print (d.expiration_date)
2028-09-13 09:00:00

>>> print(d.name)
google.com

>>> print (d.creation_date)
1997-09-15 09:00:00
Loading

0 comments on commit 495d315

Please sign in to comment.