-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added download from clinvar #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs changes
data_collection/tools.py
Outdated
|
||
def get_file_from_clinvar(override=False): | ||
|
||
file_name = 'clinvar_result.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an argument with default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file has a static name when downloading. So if we want to have it with another name, we would download the file "clinvar_result" and then make os call to rename it with desired name, renaming only can be done once the file is fully downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide functionality to rename it as user wants
data_collection/tools.py
Outdated
firefox_options.set_preference("browser.download.dir", download_dir) | ||
firefox_options.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream") | ||
|
||
file_exist = file_exists_and_not_empty(download_dir, file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter, if it is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if there's no data in the file its either file is not fully downloaded yet or the file is corrupted. I reckon it should always contain data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User accidently deleted all data from file and saved it. He wants to download it again
data_collection/tools.py
Outdated
driver = webdriver.Firefox(options=firefox_options) | ||
driver.get(url) | ||
driver.execute_script("document.getElementsByName(\"EntrezSystem2.PEntrez.clinVar.clinVar_Entrez_ResultsPanel.Entrez_DisplayBar.SendToSubmit\")[0].click()") | ||
WebDriverWait(driver, 30).until(lambda driver: file_exists_and_not_empty(download_dir, file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it will take longer than 30 seconds to download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to increase the value, but from testing it seemed fine to me. From what I have read, selenium WebDriverWait is synchronous based so the whole script would freeze and wait for it to stop downloading. That's why this timeout is included. I guess we can create a new process on another thread to execute it, but that could also just hang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, tested it a bit more. It will download anyway ether the program is running or not (script can crash, but the file will still to continue downloading). So it's the matter of time when to break from waiting (file is downloading for 30 sec, we break out, it continues to download further until it's done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not using driver.quit() that's why it doesn't stop. It also means that the geckodriver keeps running and using computer's resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add driver.quit() as Nojus suggested.
data_collection/tools.py
Outdated
os.remove(os.path.join(download_dir, file_name)) | ||
|
||
driver = webdriver.Firefox(options=firefox_options) | ||
driver.get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will it throw if there's no internet connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are handling the same error for get_file_from_url
except the only difference that for that one we were using requests
. Better to catch this error.
data_collection/tools.py
Outdated
driver.get(url) | ||
driver.execute_script("document.getElementsByName(\"EntrezSystem2.PEntrez.clinVar.clinVar_Entrez_ResultsPanel.Entrez_DisplayBar.SendToSubmit\")[0].click()") | ||
WebDriverWait(driver, 30).until(lambda driver: file_exists_and_not_empty(download_dir, file_name)) | ||
print("File downloaded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to check, if file is downloaded and return True/False, so user could decide if he needs to print this info or no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of testing, it will always download. Maybe just can check if it's downloaded now (as time of function ending).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server may crash, internet may dissapear, unpredicted things may happen. Therefore
- Make sure it doesn't leave function until we are sure that download was either successful or failed
- Use
logging
module instead ofprint
- Specify which file was downloaded
data_collection/tools.py
Outdated
driver.execute_script("document.getElementsByName(\"EntrezSystem2.PEntrez.clinVar.clinVar_Entrez_ResultsPanel.Entrez_DisplayBar.SendToSubmit\")[0].click()") | ||
WebDriverWait(driver, 30).until(lambda driver: file_exists_and_not_empty(download_dir, file_name)) | ||
|
||
return file_exists_and_not_empty(download_dir, file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add new line at the end of file
Rename PR according to standarts |
Closed as issues were fixed in different PR |
No description provided.