-
Notifications
You must be signed in to change notification settings - Fork 17
Iterating over api results in no images downloaded #16
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
I probably might have run into this bug before, but ignored it, as I thought it might be a network connection issue. Here is a workable idea to solve the issue, but not sure if it will be efficient. Let’s keep a temporary database for each run containing the IDs and corresponding marker to tile image and corresponding road masks. We need to continue the download until all the markers are marked to 1.
I am not sure if this will provide a suitable solution. It might also happen that some tiles are not downloading at all and we need to do this repeatedly. Please provide suggestions, if you find a better way to mitigate the problem :) |
It appears the first 2-3 requests go through, then subsequent ones do not. I've tried adding some Your suggestion is to use retries essentially? I've used this approach in previous role as data engineer. |
I am thinking of using retries as it came to my mind in the first glance of the issue. If the issue is regarding block on the IP or authentication, then I am not sure if that will work out or not. I also thought that it might also be an issue with the |
Yes suspect something to do with multiprocessing.. Set |
Probably I will try to find a solution by next week. Best, |
OK today |
I think the maximum threads offered by CPU (in my case it is 4) will also work. Thanks for pin-pointing the issue, now I am sure it is a thread issue. The only problem with threads=1 is it will be very slow compared to the others, since it is searching deterministically. But increasing the thread over the capacity of hardware may also cause the computer to slow down, like for example in Linux and Windows-based OS, it slows down considerably and may even result in deadlock (hang). Retries will again slow down, since we are checking it repeatedly. Looks like I have to use some buffer mechanism, which selectively retries the links by using the database. It will slow down things considerably, but using multiprocessing within retries may solve the issue. I have an exam tomorrow. Let’s see, I hope I will come up with a workable, efficient solution by next week. |
Eventually, I have to use a database, since I have to write the image stitching module sometime. This tool was created with the hypothetical idea of converting 2D satellite images to 3D by using GANs and other related unsupervised deep learning stuffs (back in 2019). Not sure when I will get time to work on this project for solving that initial purpose :) But I will come up with the solution to the present bug by next week. |
I'm quite happy to leave it running over a weekend on my Max, so speed is not my main concern. The generated filenames are unique? One suggestion is the download method could return a dictionary of the created files, request etc. This can then be appended to a pandas data frame, inserted to SQLite db etc. |
Hi, it should work now, created a dirty patch, and it will be a bit slow to start probably. The patch uses the maximum number of threads the core of your CPU can provide, so this will be the maximum limit of the hardware. Be sure to install the latest version using pip and then try to check the test.py file, and update accordingly. """
Jimut Bahan Pal
First updated : 22-03-2021
Last updated : 04-04-2022
"""
import os
import glob
import shutil
from jimutmap import api, sanity_check
download_obj = api(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
# If you don't have Chrome and can't take advantage of the auto access key fetch, set
# a.ac_key = ACCESS_KEY_STRING
# here
# getMasks = False if you just need the tiles
download_obj.download(getMasks = True)
# create the object of class jimutmap's api
sanity_obj = api(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
sanity_check(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
print("Cleaning up... hold on")
sqlite_temp_files = glob.glob('*.sqlite*')
print("Temporary sqlite files to be deleted = {} ? ".format(sqlite_temp_files))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
for item in sqlite_temp_files:
os.remove(item)
## Try to remove tree; if failed show an error using try...except on screen
try:
chromdriver_folders = glob.glob('[0-9]*')
print("Temporary chromedriver folders to be deleted = {} ? ".format(chromdriver_folders))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
for item in chromdriver_folders:
shutil.rmtree(item)
except OSError as e:
print ("Error: %s - %s." % (e.filename, e.strerror)) Kindly tell if it works or not. Note: This patch will force download all the road masks too. |
Tried 1.4.0 but issue persists. I set threads=20 and get this nice warning: Running
|
Could you please tell what is the expected number of files that are to be downloaded...? I think in your code, increasing the sleep might fix the issue.
|
If I use |
I am not sure about this. Sorry I couldn't solve this, I give up. Probably a multiprocessing issue. |
Using threads=1 I left it overnight and it completed. OK thanks for looking into this, I will consider other ways to parallize if I need to in future. Cheers |
Describe the bug
I have a pandas dataframe with locations I wish to download tiles for. I am downloading a limited area around each location. However placing in a loop, I find that often no images are downloaded. Deleting the
chromedriver
and retrying can fix the issue, but not alwaysTo Reproduce
Expected behavior
Either an exception is raised if there are no images to download, or some mechanism is available to retry
Screenshots
NA
Desktop (please complete the following information):
jimutmap==1.3.9
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: