Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add composite_trip Migration Scripts #1009

Merged

Conversation

JGreenlee
Copy link
Contributor

@JGreenlee JGreenlee commented Jan 10, 2025

add _common.py with run_on_all_deployments for migration scripts

run_on_all_deployments can be used byother migration scripts, causing them to be run on all production deployments
It does this by scanning the nrel-openpath-deploy-configs repo for a list of deployments and modifying the DB_HOST accordingly in between each time the function is called

add trim_fluff_from_composite_trips.py

"""
Trim unnecessary fields from composite trips in the analysis_timeseries_db.
The shape of the remaining fields is unchanged.
"""

`run_on_all_deployments` can be used byother migration scripts, causing them to be run on all production deployments
It does this by scanning the `nrel-openpath-deploy-configs` repo for a list of deployments and modifying the DB_HOST accordingly in between each time the function is called
"""
Trim unnecessary fields from composite trips in the analysis_timeseries_db.
The shape of the remaining fields is unchanged.
"""
@JGreenlee JGreenlee marked this pull request as ready for review January 14, 2025 21:12
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this was ready yesterday afternoon, and it is pressing, I am just going to make the changes now so we can run the script overnight


import emission.core.get_database as edb

DB_HOST_TEMPLATE = "mongodb://localhost:27017/openpath_prod_REPLACEME"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DB_HOST_TEMPLATE = "mongodb://localhost:27017/openpath_prod_REPLACEME"
DB_HOST_TEMPLATE = os.environ["DB_HOST_TEMPLATE"]

The template for DB_HOST on DocumentDB is very different. It has a different host (not localhost), and includes user+password

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 947492d

Comment on lines 11 to 19
'rm -rf nrel-openpath-deploy-configs && ' +
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' +
'cd nrel-openpath-deploy-configs && ' +
'git ls-tree -r --name-only HEAD | grep configs/',
shell=True,
capture_output=True,
text=True)
filenames = proc.stdout.replace("configs/", "").split("\n")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works, and I don't want the perfect to be the enemy of the good, but this seems overly convoluted. Why isn't this just

Suggested change
'rm -rf nrel-openpath-deploy-configs && ' +
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' +
'cd nrel-openpath-deploy-configs && ' +
'git ls-tree -r --name-only HEAD | grep configs/',
shell=True,
capture_output=True,
text=True)
filenames = proc.stdout.replace("configs/", "").split("\n")
'rm -rf nrel-openpath-deploy-configs && ' +
'git clone https://github.com/e-mission/nrel-openpath-deploy-configs.git)'
filenames = os.listdir('nrel-openpath-deploy-configs')

It seems much cleaner than capturing stdout and mucking around with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'rm -rf nrel-openpath-deploy-configs && ' +
'git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git && ' +
'cd nrel-openpath-deploy-configs && ' +
'git ls-tree -r --name-only HEAD | grep configs/',
shell=True,
capture_output=True,
text=True)
filenames = proc.stdout.replace("configs/", "").split("\n")
with tempfile.TemporaryDirectory() as tmpdirname:
print('created temporary directory', tmpdirname)
proc = subprocess.run('git clone --no-checkout https://github.com/e-mission/nrel-openpath-deploy-configs.git ' + tmpdirname)
filenames = os.listdir(tmpdirnme+"/configs")

Copy link
Contributor Author

@JGreenlee JGreenlee Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to avoid downloading the actual files since all we need are the filenames, which are viewable even when --no-checkout is used
May have been overengineered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it may be something like that (no-checkout) but yes, I thought it was overly complex, and having the full configs might be helpful for some functions (e.g. maybe the function will be sensitive to the variation in the configs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed in 947492d

This commit builds on 1c16800 and:
- simplifies the retrieval (via git) to dynamically identify the full list of
  deployments. Notably it no longer relies on munging command output
- supports providing a subset of the deployments via an environment variable

It also adds a new function that is lightweight and just prints out the
configured URL as a simple template for testing and adapting

Testing done:
- ran the template function above

```
./e-mission-py.bash bin/historical/migrations/all_deployments_template.py
Connecting to database URLmongodb://localhost:27017/openpath_prod_nc_transit_equity_study
Running print_connect_url for ebikegj on DB ebikegj
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_ebikegj', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_ebikegj
Connecting to database URLmongodb://localhost:27017/openpath_prod_ebikegj
Running print_connect_url for cortezebikes on DB cortezebikes
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_cortezebikes', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_cortezebikes
Connecting to database URLmongodb://localhost:27017/openpath_prod_cortezebikes
Running print_connect_url for dcebike on DB dcebike
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_dcebike', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_dcebike
Connecting to database URLmongodb://localhost:27017/openpath_prod_dcebike
Running print_connect_url for washingtoncommons on DB washingtoncommons
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_washingtoncommons', 'DB_RESULT_LIMIT': None}
```
- ran the template function above with an overridden `PROD_LIST`. Note that the
  first two entries are from final values set to the DB_HOST variable before
  the program terminated previously. We may want to unset the variable before
  the program ends

```
$ export PROD_LIST='cortezebikes,fortmorgan'
$ ./e-mission-py.bash bin/historical/migrations/all_deployments_template.py
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost/openpath_prod_nrel_commute
PROD_LIST: ['cortezebikes', 'fortmorgan']
Running print_connect_url for cortezebikes on DB cortezebikes
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_cortezebikes', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_cortezebikes
Connecting to database URLmongodb://localhost:27017/openpath_prod_cortezebikes
Running print_connect_url for fortmorgan on DB fortmorgan
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_fortmorgan', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_fortmorgan
Connecting to database URLmongodb://localhost:27017/openpath_prod_fortmorgan

```
@shankari
Copy link
Contributor

I stayed up until 2am-ish last night so I will merge and run this tomorrow, when I will be more careful.
I don't want to run scripts against production when I am burned out

@shankari shankari merged commit 33eda70 into e-mission:master Jan 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants