Iterating over api results in no images downloaded #16

robmarkcole · 2022-04-01T13:20:17Z

Describe the bug
I have a pandas dataframe with locations I wish to download tiles for. I am downloading a limited area around each location. However placing in a loop, I find that often no images are downloaded. Deleting the chromedriver and retrying can fix the issue, but not always

To Reproduce

for index in tqdm(range(len(df))):
    test_lat = df.iloc[index]['lat']
    test_lon = df.iloc[index]['lon']
    extent = 0.01

    min_lat_deg = test_lat - extent
    max_lat_deg = test_lat + extent
    min_lon_deg = test_lon - extent
    max_lon_deg = test_lon + extent

    print(min_lat_deg, max_lat_deg, min_lon_deg, max_lon_deg)

    download_obj = api(
        min_lat_deg = min_lat_deg, # min_lat,
        max_lat_deg = max_lat_deg, # max_lat,
        min_lon_deg = min_lon_deg, # min_lon,
        max_lon_deg = max_lon_deg, # max_lon,
        zoom = 16, # 0 is min, 17 is good
        verbose = False,
        threads_ = 5, 
        container_dir = img_dir
        )

    download_obj.download(getMasks = False)
    time.sleep(2) # wait for download to finish

Expected behavior
Either an exception is raised if there are no images to download, or some mechanism is available to retry

Screenshots
NA

Desktop (please complete the following information):

OS: macOS
Browser: chrome
Version: jimutmap==1.3.9

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

welcome · 2022-04-01T13:20:19Z

Thanks for opening your first issue here! Be sure to follow the issue template!

Jimut123 · 2022-04-01T14:07:42Z

I probably might have run into this bug before, but ignored it, as I thought it might be a network connection issue. Here is a workable idea to solve the issue, but not sure if it will be efficient. Let’s keep a temporary database for each run containing the IDs and corresponding marker to tile image and corresponding road masks. We need to continue the download until all the markers are marked to 1.

ID	Road	Mask
XXXXX1_YYYYY1	1	1
XXXXX1_YYYYY2	1	1
XXXXX1_YYYYY3	0	0
...	...	...

I am not sure if this will provide a suitable solution. It might also happen that some tiles are not downloading at all and we need to do this repeatedly. Please provide suggestions, if you find a better way to mitigate the problem :)

robmarkcole · 2022-04-01T14:25:13Z

It appears the first 2-3 requests go through, then subsequent ones do not. I've tried adding some sleep, without success. Could it be a block on the IP or issue with the auth?

Your suggestion is to use retries essentially? I've used this approach in previous role as data engineer.

Jimut123 · 2022-04-01T14:39:36Z

I am thinking of using retries as it came to my mind in the first glance of the issue. If the issue is regarding block on the IP or authentication, then I am not sure if that will work out or not. I also thought that it might also be an issue with the multiprocessing.pool library, since in case of overloading of requests, I am assuming Operating System might kill some requests internally. In that case, retries will be our only option :)

robmarkcole · 2022-04-01T15:50:49Z

Yes suspect something to do with multiprocessing.. Set threads_=1 but issue persists. May try using celery or aws lambda to spread the load. Feel free to close this issue if you want

Jimut123 · 2022-04-01T15:56:36Z

Probably I will try to find a solution by next week.
Let’s see.
If you solve this before, then please feel free to share the solution by doing a PR :)

Best,
-Jimut

robmarkcole · 2022-04-02T05:52:52Z

OK today threads_=1 appears to be OK...

Jimut123 · 2022-04-02T06:16:29Z

I think the maximum threads offered by CPU (in my case it is 4) will also work.

Thanks for pin-pointing the issue, now I am sure it is a thread issue. The only problem with threads=1 is it will be very slow compared to the others, since it is searching deterministically. But increasing the thread over the capacity of hardware may also cause the computer to slow down, like for example in Linux and Windows-based OS, it slows down considerably and may even result in deadlock (hang).

Retries will again slow down, since we are checking it repeatedly. Looks like I have to use some buffer mechanism, which selectively retries the links by using the database. It will slow down things considerably, but using multiprocessing within retries may solve the issue. I have an exam tomorrow. Let’s see, I hope I will come up with a workable, efficient solution by next week.

Jimut123 · 2022-04-02T06:21:14Z

Eventually, I have to use a database, since I have to write the image stitching module sometime.

This tool was created with the hypothetical idea of converting 2D satellite images to 3D by using GANs and other related unsupervised deep learning stuffs (back in 2019).

Not sure when I will get time to work on this project for solving that initial purpose :) But I will come up with the solution to the present bug by next week.

robmarkcole · 2022-04-02T06:36:12Z

I'm quite happy to leave it running over a weekend on my Max, so speed is not my main concern. The generated filenames are unique? One suggestion is the download method could return a dictionary of the created files, request etc. This can then be appended to a pandas data frame, inserted to SQLite db etc.

Jimut123 · 2022-04-04T08:56:50Z

Hi, it should work now, created a dirty patch, and it will be a bit slow to start probably. The patch uses the maximum number of threads the core of your CPU can provide, so this will be the maximum limit of the hardware.

Be sure to install the latest version using pip and then try to check the test.py file, and update accordingly.

"""
Jimut Bahan Pal
First updated : 22-03-2021
Last updated : 04-04-2022
"""

import os
import glob
import shutil
from jimutmap import api, sanity_check


download_obj = api(min_lat_deg = 10,
                      max_lat_deg = 10.01,
                      min_lon_deg = 10,
                      max_lon_deg = 10.01,
                      zoom = 19,
                      verbose = False,
                      threads_ = 50, 
                      container_dir = "myOutputFolder")

# If you don't have Chrome and can't take advantage of the auto access key fetch, set
# a.ac_key = ACCESS_KEY_STRING
# here

# getMasks = False if you just need the tiles 
download_obj.download(getMasks = True)

# create the object of class jimutmap's api
sanity_obj = api(min_lat_deg = 10,
                      max_lat_deg = 10.01,
                      min_lon_deg = 10,
                      max_lon_deg = 10.01,
                      zoom = 19,
                      verbose = False,
                      threads_ = 50, 
                      container_dir = "myOutputFolder")

sanity_check(min_lat_deg = 10,
                max_lat_deg = 10.01,
                min_lon_deg = 10,
                max_lon_deg = 10.01,
                zoom = 19,
                verbose = False,
                threads_ = 50, 
                container_dir = "myOutputFolder")

print("Cleaning up... hold on")

sqlite_temp_files = glob.glob('*.sqlite*')

print("Temporary sqlite files to be deleted = {} ? ".format(sqlite_temp_files))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
    for item in sqlite_temp_files:
        os.remove(item)



## Try to remove tree; if failed show an error using try...except on screen
try:
    chromdriver_folders = glob.glob('[0-9]*')
    print("Temporary chromedriver folders to be deleted = {} ? ".format(chromdriver_folders))
    inp = input("(y/N) : ")
    if inp == 'y' or inp == 'yes' or inp == 'Y':
        for item in chromdriver_folders:
            shutil.rmtree(item)
except OSError as e:
    print ("Error: %s - %s." % (e.filename, e.strerror))

Kindly tell if it works or not.

Note: This patch will force download all the road masks too.

robmarkcole · 2022-04-04T09:23:55Z

Tried 1.4.0 but issue persists. I set threads=20 and get this nice warning: Sorry, 20 -- threads unavailable, using maximum CPU threads : 8

Running test.py:

(venv) robin@Robins-MacBook-Pro dataset-global-solar-plant-locations % python3 test.py 
Initializing jimutmap ... Please wait...
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 332.47it/s]
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4418.78it/s]
Total satellite images to be downloaded =  225
Total roads tiles to be downloaded =  225
Approx. estimated disk space required = 4.39453125 MB
Total number of satellite images needed to be downloaded =  225
Total number of satellite images needed to be downloaded =  225
Batch =============================================================================  1
===================================================================================
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Downloading all the satellite tiles: 
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1817.74it/s]
Total number of satellite images needed to be downloaded =  210
Total number of satellite images needed to be downloaded =  210
Waiting for 15 seconds... Busy downloading
Batch =============================================================================  2
===================================================================================
Downloading all the satellite tiles: 
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 450/450 [00:00<00:00, 177724.75it/s]
Total number of satellite images needed to be downloaded =  0
Total number of satellite images needed to be downloaded =  0
************************* Download Sucessful *************************
Cleaning up... hold on
Temporary sqlite files to be deleted = ['temp_sanity.sqlite'] ? 
(y/N) : y
Temporary chromedriver folders to be deleted = ['99'] ? 
(y/N) : y

Jimut123 · 2022-04-04T09:37:13Z

Could you please tell what is the expected number of files that are to be downloaded...?
And how many of them are actually being downloaded ?

I think in your code, increasing the sleep might fix the issue.

for index in tqdm(range(len(df))):
    test_lat = df.iloc[index]['lat']
    test_lon = df.iloc[index]['lon']
    extent = 0.01

    min_lat_deg = test_lat - extent
    max_lat_deg = test_lat + extent
    min_lon_deg = test_lon - extent
    max_lon_deg = test_lon + extent

    print(min_lat_deg, max_lat_deg, min_lon_deg, max_lon_deg)

    download_obj = api(
        min_lat_deg = min_lat_deg, # min_lat,
        max_lat_deg = max_lat_deg, # max_lat,
        min_lon_deg = min_lon_deg, # min_lon,
        max_lon_deg = max_lon_deg, # max_lon,
        zoom = 16, # 0 is min, 17 is good
        verbose = False,
        threads_ = 5, 
        container_dir = img_dir
        )

    download_obj.download(getMasks = False)
    time.sleep(100) # wait for download to finish

robmarkcole · 2022-04-04T10:00:09Z

If I use threads=8 images are downloaded in the first iteration, but not subsequent ones. If I use threads=1 images are downloaded at every iteration

Jimut123 · 2022-04-04T10:04:43Z

I am not sure about this. Sorry I couldn't solve this, I give up. Probably a multiprocessing issue.
How long is it taking to download all the files using threads=1?

robmarkcole · 2022-04-04T11:17:06Z

Using threads=1 I left it overnight and it completed. OK thanks for looking into this, I will consider other ways to parallize if I need to in future. Cheers

Jimut123 added bug Something isn't working help wanted Extra attention is needed labels Apr 1, 2022

Jimut123 pinned this issue Apr 1, 2022

robmarkcole closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterating over api results in no images downloaded #16

Iterating over api results in no images downloaded #16

robmarkcole commented Apr 1, 2022

welcome bot commented Apr 1, 2022

Jimut123 commented Apr 1, 2022 •

edited

Loading

robmarkcole commented Apr 1, 2022

Jimut123 commented Apr 1, 2022

robmarkcole commented Apr 1, 2022

Jimut123 commented Apr 1, 2022

robmarkcole commented Apr 2, 2022

Jimut123 commented Apr 2, 2022

Jimut123 commented Apr 2, 2022

robmarkcole commented Apr 2, 2022

Jimut123 commented Apr 4, 2022 •

edited

Loading

robmarkcole commented Apr 4, 2022 •

edited

Loading

Jimut123 commented Apr 4, 2022

robmarkcole commented Apr 4, 2022

Jimut123 commented Apr 4, 2022

robmarkcole commented Apr 4, 2022

Iterating over api results in no images downloaded #16

Iterating over api results in no images downloaded #16

Comments

robmarkcole commented Apr 1, 2022

welcome bot commented Apr 1, 2022

Jimut123 commented Apr 1, 2022 • edited Loading

robmarkcole commented Apr 1, 2022

Jimut123 commented Apr 1, 2022

robmarkcole commented Apr 1, 2022

Jimut123 commented Apr 1, 2022

robmarkcole commented Apr 2, 2022

Jimut123 commented Apr 2, 2022

Jimut123 commented Apr 2, 2022

robmarkcole commented Apr 2, 2022

Jimut123 commented Apr 4, 2022 • edited Loading

robmarkcole commented Apr 4, 2022 • edited Loading

Jimut123 commented Apr 4, 2022

robmarkcole commented Apr 4, 2022

Jimut123 commented Apr 4, 2022

robmarkcole commented Apr 4, 2022

Jimut123 commented Apr 1, 2022 •

edited

Loading

Jimut123 commented Apr 4, 2022 •

edited

Loading

robmarkcole commented Apr 4, 2022 •

edited

Loading