Closed
Description
Hello,
Currently, setting the environment variable PROJ_DATA has no effect on pyproj when the installation of pyproj brings its own data. I think it would be good to lower the priority of the internal data, and let users override the proj data with the environment variable in more cases.
Example: (from a fresh virtual env, python 3.12)
$ pip install pyproj
...
Successfully installed certifi-2024.8.30 pyproj-3.7.0
$ # create a custom proj data dir, here just a copy of the default one
$ cp -r .venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj test/
$ # without env var, pyproj finds the its own data directory
$ pyproj -v
pyproj info:
pyproj: 3.7.0
PROJ (runtime): 9.4.1
PROJ (compiled): 9.4.1
data dir: /tmp/t/.venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
...
$ # even with the env var, it uses its own directory
$ PROJ_DATA=test/ pyproj -v
...
data dir: /tmp/t/.venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
...
$ # remove the internal dir manually, now it works
$ rm -fr .venv/lib/python3.12/site-packages/pyproj/proj_dir/share/proj
$ PROJ_DATA=test/ pyproj -v
...
data dir: test/
...
(related discussion: NixOS/nixpkgs#282139)
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
snowman2 commentedon Oct 5, 2024
This is by design. The reason this is the case is to prevent using the PROJ_DIR for a different PROJ installation that is incompatible. The PROJ database must be the one provided for that specific PROJ version and should not be interchanged.
If you have a separate PROJ installation, you should install pyproj from source instead of from a wheel if that is what you would like to use.
https://pyproj4.github.io/pyproj/stable/api/datadir.html
kidanger commentedon Oct 5, 2024
Thank you for the fast answer.
Then I'm not sure why
pyproj.datadir.set_data_dir
would have precedence over pyproj internal data butPROJ_DATA
doesn't, but I don't know all the details of pyproj and proj. Maybe this is not the goal ofPROJ_DATA
. My use-case is to bundle specific datum grids during the distribution of a software, to avoid network downloads or relying on user folders.Feel free to close the issue, if the behavior in intended.
snowman2 commentedon Oct 5, 2024
The reason
set_data_dir
exists is to set the data directory if it cannot be found automatically. It is guaranteed to be for the specific instance of pyproj and not for another installation of PROJ.With multiple installations of PROJ on a single machine, PROJ_DATA could potentially point to an incorrect directory that shouldn't be used by pyproj.