Since we do not have the permission to publish the Ta datasets measured in mainland China, we use the publicly available Ta datasets measured in Finland to illustrate how to use the code.
The detailed information for mainland China and Finland is available in Table I and Table II.
Please download the Ta datasets from the GSOD website. We provide a function (utils/dl_gsod.py
) for you to download the datasets.
To obtain the Ta images:
utils/dl_gsod.py
: you will obtain the observed Ta which contains all the observed Ta worldwide.utils/process_gsod.py
: filter the Ta measured in Finland. A square limitation (longitudes and latitudes) is applied, thus there are some stations may not really in but near Finland. You will obtain thedatasets/met/Ta_FI_raw.csv
file.utils/interp.py
: You will obtain the Ta images and the related masks. The Ta images are the interpolated using the Inverse Distance Weighting (IDW) method. The mask files are also created to illustrate the pixels that contain the observed Ta. Acsv
file (datasets/met/Ta_FI.csv
) that contains the related rows and columns for stations in the Ta images is also available after this step.
To save time, we provide the Ta_FI.csv
for you: Ta_FI.csv. Please download and save it under the datasets/met
folder and execute the utils/interp.py
file to obtain Ta and mask images.
The datasets/station_loc.csv
file contain the station locations (longitudes, latitudes, rows, cols) for stations in Finland. You can obtain this file by filtering the state
column in datasets/met/Ta_FI.csv
.
df = pd.read_csv('datasets/met/Ta_FI.csv')
df_unique = df[['state', 'longitude', 'latitude', 'row', 'col']].drop_duplicates()
df_FI = df_unique[df_unique['state'].str.contains(', FI')]
df_FI.to_csv('datasets/station_loc.csv', index=False)
STEP 1
The MODIS Aqua LST datasets are available from the Google Earth Engine: MYD11A1.061. Download LST datasets from 2010-01-01 to 2021-12-31 using the geemap
package. We provide a shapfile of Finland (under the folder datasets/topo/
) for you to upload to the Google Earth Engine to download these images.
Note that the files will be downloaded to your google drive if scale is less than 5000. The unit of scale is meter. For more details, please refer to geemap.
To speed up, you can execute several .py
files simultaneously and split the dates. For instance, you can download files from '2010-01-01' to '2011-01-01' and from '2011-01-01' to '2012-01-01' simultaneously.
Map = geemap.Map()
ROI_ = ee.FeatureCollection('users/{your_user_name}/FI')
ROI = ROI_.geometry()
Map.centerObject(ROI)
Map.addLayer(ROI, {}, 'FI')
# LST
bands = ['LST_Day_1km', 'QC_Day', 'LST_Night_1km', 'QC_Night']
names = ['FI_Day', 'FI_QCD', 'FI_Night', 'FI_QCN']
for band, name in zip(bands, names):
dataset = ee.ImageCollection('MODIS/061/MYD11A1')\
.filter(ee.Filter.date('2010-01-01', '2022-01-01'))\
.select(band)\
.filterBounds(ROI)
geemap.ee_export_image_collection_to_drive(dataset, folder=f'{name}', scale=1000, region=ROI, crs='EPSG:4326')
If there is no copyright problem, you may write an email to the authors to obtain the LST datasets.
- datasets
- tif
- LSTD
- QCD
- LSTN
- QCN
- tif
STEP 2
Then, you need to convert the file format from .tif
to .npy
by applying the utils/tif2npy.py
file.
We use an error level control of 1 k, but 3 k is also possible, and we find that 3 k will obtain better retrieval results. We use 1 k to illustrate that our method works well under the worst case.
- datasets
- npy
- LSTD
- QCD
- LSTN
- QCN
- npy
run the datasets/LST2Ta_loader.py
to test whether you obtain the correct model inputs and outputs.
We use the UNet
model, but other models are also available.
For the UNet model, you can choose single conv layer, double conv layer, or single conv with SE module. We find that different types of convs do not affect the retrieval results.
LST2Ta
Options for LST2Ta (running on one GPU)
:
$ python main.py --name LST2Ta --model_name LST2Ta --dist none --epochs 200 --early_stop 120 --batch_size 8 --input_folders datasets/npy/LSTD datasets/npy/LSTN --output_folders datasets/Ta --input_nc 2 --output_nc 3 --val_ratio 0.1 --test_ratio 0.1 --pad_size 400 --lr 0.00016 --beta1 0.5
To speed up training, you can choose distributed training: DP or DDP. Note that PyTorch has a new version of DDP, and we do not run on DDP. So, if you want to use DDP, it is at your own risk.
We use DP on a machine with 4 NVIDIA V100 GPUs, and thus the learning rate is accordingly 0.00004.
pad_size
is applied to avoid overfitting.
Simply add an option --isTest to obtain the retrieval results on the test set.
The code is released under the MIT license.
Please kindly cite our paper if you use our code.
@article{su2023lst2ta,
author={Su, Peifeng and Abera, Temesgen and Guan, Yanlong and Pellikka, Petri},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
title={Image-to-Image Training for Spatially Seamless Air Temperature Estimation With Satellite Images and Station Data},
year={2023},
volume={16},
number={},
pages={3353-3363},
doi={10.1109/JSTARS.2023.3256363}}
Our code is inspired by PyTorch CycleGAN code.