mongo-diff
is a command-line tool people can use to compare two MongoDB collections.
Those collections can reside in either a single database or two separate databases (even across servers).
%% This is the source code of a Mermaid diagram, which GitHub will render as a diagram.
%% Note: PyPI does not render Mermaid diagrams, and instead displays their source code.
%% Reference: https://github.com/pypi/warehouse/issues/13083
graph LR
script[["mongo_diff.py"]]
result["List of<br>differences"]
subgraph s1 \[Server]
subgraph d1 \[Database]
collection_a[("Collection A")]
end
end
subgraph s2 \[Server]
subgraph d2 \[Database]
collection_b[("Collection B")]
end
end
collection_a --> script
collection_b --> script
script --> result
Assuming you have pipx
installed, you can install the tool by running the following command:
pipx install mongo-diff
pipx
is a tool people can use to download and install Python scripts that are hosted on PyPI. You can installpipx
by running$ python -m pip install pipx
.
You can display the tool's --help
snippet by running:
mongo-diff --help
At the time of this writing, the tool's --help
snippet is:
Usage: mongo-diff [OPTIONS]
Compare two MongoDB collections.
Those collections can reside in either a single database or two separate
databases (even across servers).
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --include-id --no-include-id Includes the `_id` field when comparing │
│ documents. │
│ [default: no-include-id] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection A ───────────────────────────────────────────────────────────────╮
│ * --mongo-uri-a TEXT Connection string for accessing │
│ the MongoDB server containing │
│ collection A. │
│ [env var: MONGO_URI_A] │
│ [required] │
│ * --database-name-a TEXT Name of the database containing │
│ collection A. │
│ [required] │
│ * --collection-name-a TEXT Name of collection A. [required] │
│ --identifier-field-name-a TEXT Name of the field of each document │
│ in collection A to use to identify │
│ a corresponding document in │
│ collection B. │
│ [default: id] │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection B ───────────────────────────────────────────────────────────────╮
│ --mongo-uri-b TEXT Connection string for accessing the │
│ MongoDB server containing collection │
│ B (if different from that specified │
│ for collection A). │
│ [env var: MONGO_URI_B] │
│ --database-name-b TEXT Name of the database containing │
│ collection B (if different from that │
│ specified for collection A). │
│ --collection-name-b TEXT Name of collection B (if different │
│ from that specified for collection │
│ A). │
│ --identifier-field-name-b TEXT Name of the field of each document in │
│ collection B to use to identify a │
│ corresponding document in collection │
│ A (if different from that specified │
│ for collection A). │
╰──────────────────────────────────────────────────────────────────────────────╯
Note: The above snippet was captured from a terminal window whose width was 80 characters.
As documented in the --help
snippet above, you can provide the MongoDB connection strings to the tool via either (a)
command-line options; or (b) environment variables named MONGO_URI_A
and MONGO_URI_B
. The latter can come in handy
for MongoDB connection strings that contain passwords.
Here's how you could create those environment variables:
export MONGO_URI_A='mongodb://localhost:27017'
export MONGO_URI_B='mongodb://username:password@host.example.com:22222'
Note: That will only create those environment variables in the current shell process. You can persist them by adding those same commands to your shell initialization script (e.g.
~/.bashrc
,~/.zshrc
).
As the tool compares the collections, it will display the differences it detects; like this:
Documents differ between collections: id=1,id=1. Differences: [('change', 'name', ('Joe', 'Joseph'))]
Document exists in collection A only: id=2
Document exists in collection A only: id=4
Document exists in collection B only: id=5
When the tool finishes comparing the collections, it will display a summary of the result; like this:
Result
╭───────────────────────────────────────────┬──────────╮
│ Description │ Quantity │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A │ 4 │
│ Documents in collection B │ 3 │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A only │ 2 │
│ Documents in collection B only │ 1 │
├───────────────────────────────────────────┼──────────┤
│ Documents that differ between collections │ 1 │
╰───────────────────────────────────────────┴──────────╯
You can update the tool to the latest version available on PyPI by running:
pipx upgrade mongo-diff
You can uninstall the tool from your computer by running:
pipx uninstall mongo-diff
We use Poetry to both (a) manage dependencies and (b) publish packages to PyPI.
pyproject.toml
: Configuration file for Poetry and other tools (was generated via$ poetry init
)poetry.lock
: List of dependencies, direct and indirect (was generated via$ poetry update
)
git clone https://github.com/eecavanna/mongo-diff.git
cd mongo-diff
Create a Poetry virtual environment and attach to its shell:
poetry shell
You can see information about the Poetry virtual environment by running:
$ poetry env info
You can detach from the Poetry virtual environment's shell by running:
$ exit
From now on, I'll refer to the Poetry virtual environment's shell as the "Poetry shell."
At the Poetry shell, install the project's dependencies:
poetry install
Edit the tool's source code and documentation however you want.
While editing the tool's source code, you can run the tool as you normally would in order to test things out.
mongo-diff --help
PyPI doesn't allow people to publish the same "version" of a package multiple times.
You can update the version identifier of the package by running:
poetry version {version_or_keyword}
You can replace
{version_or_keyword}
with either a literal version identifier (e.g.0.1.1
) or a keyword (e.g.major
,minor
, orpatch
). You can run$ poetry version --help
to see the valid keywords.
Alternatively, you can manually edit a line in pyproject.toml
:
- version = "0.1.0"
+ version = "0.1.1"
At the Poetry shell, build the package based upon the latest source code:
poetry build
That will create both a source distribution file (whose name ends with
.tar.gz
) and a wheel file (whose name ends with.whl
) in thedist
directory.
At the Poetry shell, create the following environment variable, which Poetry will check for if credentials aren't specified to it in another way.
export POETRY_PYPI_TOKEN_PYPI="{api_token}"
Replace
{api_token}
with a PyPI API token whose scope includes the PyPI project to which you want to publish the package.
At the Poetry shell, publish the newly-built package to PyPI:
poetry publish
At this point, people will be able to download and install the package from PyPI.