Skip to content

Commit d43b904

Browse files
author
Naman Jain
committed
fix main function args logic
1 parent c39a3bb commit d43b904

File tree

4 files changed

+14
-22
lines changed

4 files changed

+14
-22
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ COPY . .
1212

1313
ENV PYTHONPATH=/app
1414

15-
ENTRYPOINT ["poetry", "run", "python", "src/__main__.py"]
15+
ENTRYPOINT ["poetry", "run", "python", "netcdf_to_parquet/__main__.py"]

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,19 @@ The transformed data supports:
1414
### Using Docker
1515
To build the docker image, run:
1616
```sh
17-
docker build -t data_transformations .
17+
docker build -t netcdf_to_parquet .
1818
```
1919

2020
To run the docker container, here's an example command:
21-
`docker run -v $(pwd)/output:/app/output data_transformations <start_date> <end_date> <out_dir>`
21+
`docker run -v $(pwd)/output:/app/output netcdf_to_parquet <start_date> <end_date> <out_dir>`
2222
<br/>
2323
Here, `start_date` and `end_date` are in DD-MM-YYYY format.
2424
<br/>
2525
And `out_dir` is the directory where the parquet files will be generated.
2626

2727
For example:
2828
```sh
29-
docker run -v $(pwd)/output:/app/output data_transformations 01-01-2022 04-01-2022 out_dir
29+
docker run -v $(pwd)/output:/app/output netcdf_to_parquet 01-01-2022 04-01-2022 out_dir
3030
```
3131

3232
### Using Poetry
@@ -36,7 +36,7 @@ To generate and activate the environment, run following commands from the root d
3636
```sh
3737
poetry install
3838
poetry shell
39-
python -m data_transformations 01-01-2023 03-01-2023 ./parquet_files
39+
python -m netcdf_to_parquet 01-01-2023 03-01-2023 ./parquet_files
4040
```
4141

4242
### Using pip
@@ -116,6 +116,7 @@ Note that some of the tests are live tests that actually download a file from GC
116116
- add coverage report?
117117
- fix black and flake8 conflict
118118
- why is GCS getting initialized when imported as library
119+
- add a test for using it as a library
119120

120121
## Improvements
121122
- speed up

netcdf_to_parquet/__main__.py

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -80,37 +80,28 @@ def main(
8080
out_dir (str): Output directory for Parquet files. Defaults to "parquet_files".
8181
args (argparse.Namespace, optional): Command-line arguments. Defaults to None.
8282
"""
83-
if start_date is None or end_date is None:
84-
if args is None:
85-
raise ValueError(
86-
"Please provide values for start_date and end_date either as arguments or through the command line interface."
87-
)
83+
if args is None:
84+
args = parse_arguments()
8885

89-
start_date = args.start_date or start_date
90-
end_date = args.end_date or end_date
91-
out_dir = args.out_dir or out_dir
86+
start_date = args.start_date or start_date
87+
end_date = args.end_date or end_date
88+
out_dir = args.out_dir or out_dir
9289

93-
# TODO: improve logic
9490
if start_date is None or end_date is None:
9591
raise ValueError("start_date and end_date must be provided.")
9692

9793
pathlib.Path(out_dir).mkdir(parents=True, exist_ok=True)
98-
9994
current_date = start_date
10095
total_days = (end_date - current_date).days + 1
101-
10296
for _ in tqdm.tqdm(range(total_days), desc="Processing dates"):
10397
try:
10498
date_str = current_date.strftime("%Y/%m/%d")
10599
file_path = f"{constants.GCS_BASE_URL}/{date_str}/total_precipitation/surface.nc"
106100
output_path = f"{out_dir}/precipitation_{current_date.strftime('%d_%m_%Y')}.parquet"
107101
netcdf_to_parquet(file_path, output_path, file_system)
108102
except Exception as e:
109-
logging.error(
110-
f"Failed to process date {current_date.strftime('%d_%m_%Y')}: {e}"
111-
)
112-
finally:
113-
current_date += datetime.timedelta(days=1)
103+
logging.error(f"Failed to process date {current_date}: {e}")
104+
current_date += datetime.timedelta(days=1)
114105

115106

116107
if __name__ == "__main__":

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "netcdf-to-parquet"
3-
version = "0.1.4"
3+
version = "0.1.5"
44
description = "Transforms total precipitation NetCDF files into Parquet format."
55
authors = ["Naman Jain"]
66
license = "MIT"

0 commit comments

Comments
 (0)