-
Notifications
You must be signed in to change notification settings - Fork 16
Pulldown: select surveys to be exported + merged table #163
Conversation
knimin/lib/tests/test_data_access.py
Outdated
(-4, 'Surfers', True), | ||
(-5, 'Personal_Microbiome', False)], | ||
db.list_ag_surveys([-2, -4])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify - this is testing the real database if these surveys in the db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing that this method can obtain survey names AND that the third component is set to True for all surveys if NONE is provided and only those set to True that are in a list given to the function as third argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that this is a really critical check. Now it looks like every time that a survey gets added, this test would need to be updated. What do you think about this? Do you think that it is worth while to raise an issue to make these tests - so that we don't need to update these tests every time that the surveys are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that adding new surveys is a frequently occurring task. Thus, updating the test each time should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's fine - but what do you think about having some sort of documentation that contains all of the tests that need to be updated? If there is an additional survey being added - it could be a pain in the butt to hunt and find which tests need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, I totally agree with our argument that this is not future save. Thus, I changed the code in a way that additional surveys should not influence the outcome.
knimin/templates/ag_pulldown.html
Outdated
<div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div> | ||
<select name="agsurveys" id="agsurveys" multiple /> | ||
{% for s in agsurveys %} | ||
{% if s[2] == True %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if s[2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
<select name="external" id="external" multiple /> | ||
{% for s in surveys %} | ||
<option value='{{s}}'>{{s}}</option> | ||
{% end %} | ||
</select> | ||
</p> | ||
{% if merged == 'True' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a string comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is, since 'merged' holds the content of 'value' of the checkbox which I defined as the string 'True'.
@sjanssen2 looks like there are merge conflicts |
…_pulldown_filenames
80a9f6d
to
1bf5fc0
Compare
We have issues with Geocoder searching for locations where addresses are now random utf-8 characters. This results in unstable results for metadata pulldown. Thus, I only compare the first 1000 characters of each file in the pulldown archive - basically to continue working on #184 |
@wasade @josenavas @antgonza can I please get a rapid review to be able to continue working on #184 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments
knimin/handlers/ag_pulldown.py
Outdated
# be the name of the external survey. In order to not block their | ||
# pulldown I check that a skipped survey ID must be in the set of | ||
# all available surveys. | ||
if ((-1 * survey) in selected_ag_surveys) or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abs_survey = abs(survey)
and use abs_survey
instead of -1 * survey
in order to avoid code duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
knimin/handlers/ag_pulldown.py
Outdated
pd_meta.set_index('sample_name', inplace=True) | ||
results_as_pd.append(pd_meta) | ||
|
||
pd_all = pd.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these 3 lines of code be moved inside the if
below? Not sure how expensive is the merge, but if it is only needed when merged = True then just do it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. Fixed
knimin/lib/data_access.py
Outdated
Parameters | ||
---------- | ||
selected : list of int | ||
To keep track of which surveys have been selected by the user via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the resulting list of tupels third element holds this information as a boolean value
-> I don't understand what this means, can you rephrase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better now?
knimin/lib/data_access.py
Outdated
sql = """SELECT group_order, american | ||
FROM ag.survey_group | ||
WHERE group_order < 0""" | ||
return [(id, name, (selected is None) or (id in selected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
-> id_
id
is a reserved word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
knimin/lib/data_access.py
Outdated
|
||
Returns | ||
------- | ||
list of (int, str, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the last boolean needed? It is also confusing that the selected attribute is only used to mark this boolean - shouldn't it limit which ones are returned, rather than the user then going back over the list to filter the ones that he is not interested in? And by user I mean the developer that consumes this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used to render all survey names on the metadata pulldown website. The user (who is browsing labadmin) has the option to tick a survey to include or exclude it for the actual pulldown.
Thus, this function is a hybrid of DB query and interface rendering, but since it is so simple I thought it would be OK.
@@ -61,6 +61,42 @@ def write_to_buffer(self): | |||
return self.in_memory_data.getvalue() | |||
|
|||
|
|||
def extract_zip(input_zip): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function blow up the memory usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure, if you extract a huge archive. But it need to do the extraction for unit testing and I thought it is more elegant than doing system calls to 'unzip'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
knimin/lib/mem_zip.py
Outdated
|
||
Parameters | ||
---------- | ||
len : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
archive
is missing
@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self): | |||
self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key]) | |||
self.assertIn(obs[key]['participant_name'], participant_names) | |||
|
|||
def test_list_ag_surveys(self): | |||
truth = [(-1, 'Personal Information', True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test can be simplified as
self.assertItemsEqual(db.list_ag_surveys(), truth)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. Consider that we add surveys in the future. This would brake your test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test would not be correct, since it will not be testing that all surveys are returned.
knimin/templates/ag_pulldown.html
Outdated
@@ -71,14 +80,34 @@ <h3 style="color:red">Pulldown Processing, please wait for file download. It may | |||
<form enctype="multipart/form-data" action="/ag_pulldown/" name="agForm" id="agForm" method="post"> | |||
<p>Upload qiime mapping file, or other file with first line header and one barcode per line</p> | |||
<p>Barcodes File <input type="file" name="barcodes" id="barcodes"/></p> | |||
<p>American Gut Surveys: | |||
<div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addition -> additional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
knimin/tests/test_ag_pulldown.py
Outdated
@@ -53,6 +55,13 @@ def test_get(self): | |||
self.assertIn("<option value='%s'>%s</option>" % (survey, survey), | |||
response.body) | |||
self.assertNotIn('<input type="submit" disabled>', response.body) | |||
for (id, name, selected) in db.list_ag_surveys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case none are selected, correct? according to the get
call that you are doing all should be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember the definition of list_ag_surveys. There it is said that all get "selected" if selected is not set, i.e it is None. So in fact, all are selected in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:
for (_id, name, selected) in db.list_ag_surveys():
self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)
Note also the change from id
-> _id
.
…edSurveyTestData
@josenavas could you review again please? |
@wasade @antgonza @josenavas I need some timely reviews please! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions/comments.
knimin/handlers/ag_pulldown.py
Outdated
if self.get_argument('external'): | ||
|
||
# query which surveys have been selected by the user | ||
if self.get_argument('selected_ag_surveys', []): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this if really necessary? you could do:
selected_ag_surveys = map(int, self.get_argument('selected_ag_surveys', []).split(','))
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that it can be even easier than that, if we use get_aguments (not the s). Then, you can provide a list of strings as parameters.
knimin/handlers/ag_pulldown.py
Outdated
else: | ||
selected_ag_surveys = [] | ||
|
||
if self.get_argument('external', []): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
knimin/handlers/ag_pulldown.py
Outdated
|
||
# check database about what surveys are available | ||
available_agsurveys = {} | ||
for (id, name, selected) in db.list_ag_surveys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you replace id -> _id and perhaps selected by _ as it's not being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
------- | ||
dict{str, str} where the first component is the filename and the second | ||
the first <len> characters of the file.""" | ||
return map(lambda (k, v): {k: v[:len]}, archive.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, if the string is shorter it will return the full string ... this can cause problems in certain cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fine in this case, but thanks for the info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now know :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments
knimin/handlers/ag_pulldown.py
Outdated
|
||
# check database about what surveys are available | ||
available_agsurveys = {} | ||
for (id, name, selected) in db.list_ag_surveys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
-> _id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also catched by Antonio. Fixed.
knimin/handlers/ag_pulldown.py
Outdated
if (abs_survey in selected_ag_surveys) or \ | ||
(abs_survey not in available_agsurveys): | ||
meta_zip.append('survey_%s_md.txt' % | ||
available_agsurveys[-1 * survey], meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 * survey
-> abs_survey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be wrong, because those keys are actually negative numbers; it's what the user provided (i.e. the database)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it then be just survey
? Note that survey holds the negative number, while abs_survey
holds the positive one. [-1 * survey] == abs_survey
as the code stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I now fixed it. We cannot change the keys in available_agsurveys which are negative, but we can flip the sign for variable survey :-)
@@ -61,6 +61,42 @@ def write_to_buffer(self): | |||
return self.in_memory_data.getvalue() | |||
|
|||
|
|||
def extract_zip(input_zip): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self): | |||
self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key]) | |||
self.assertIn(obs[key]['participant_name'], participant_names) | |||
|
|||
def test_list_ag_surveys(self): | |||
truth = [(-1, 'Personal Information', True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test would not be correct, since it will not be testing that all surveys are returned.
knimin/tests/test_ag_pulldown.py
Outdated
@@ -53,6 +55,13 @@ def test_get(self): | |||
self.assertIn("<option value='%s'>%s</option>" % (survey, survey), | |||
response.body) | |||
self.assertNotIn('<input type="submit" disabled>', response.body) | |||
for (id, name, selected) in db.list_ag_surveys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:
for (_id, name, selected) in db.list_ag_surveys():
self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)
Note also the change from id
-> _id
.
@josenavas regarding test_list_ag_surveys: |
@sjanssen2 Unit tests should test that the functions are doing what they are supposed to do. The current get call on test_list_ag_surveys is used to test that all surveys are returned. If we add another survey and there is a bug in which we don't return the new survey, the test will pass w/o detecting the issue. However, if we change to use assertItemsEqual, if a new survey is added the test would fail, which is the expected behavior since you will also need to update the test to make sure that the function still does what it is supposed to do, correct? |
@josenavas regarding test_list_ag_surveys: |
…ng, not a list. Thus, I added code to make it compatible to both ways to provide args
Just remembered that you can retrieve a list with: self.request.arguments.get('yourvar', your_default_value) |
@antgonza unfortunately, its not working here :-( |
k, thanks for testing. |
ready to merge @josenavas ? |
would be great to get this done before I leave into the weekend, which is in 20 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 comment
knimin/handlers/ag_pulldown.py
Outdated
@@ -92,3 +135,22 @@ def get(self): | |||
except Exception as e: | |||
msg = 'ERROR: %s' % str(e) | |||
self.write(msg) | |||
|
|||
|
|||
def listify(list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list
is a python function, use a different variable name (e.g. l
it's ok given that the function is simple).
This function is also missing unit test.
I recommend to use some kind of text editor that has python syntax highlighting to easily identify the reserved words.
finally. Cool! Thanks @josenavas @antgonza @mortonjt |
Update to the metadata pulldown handler, such that the user now sees which different internal surveys are available plus he/she can select which of those should be exported to the download.
There is now an option to join all columns from all surveys into one huge file.
Filenames in resulting zip archive are more speaking now.
There seems to be a change in the flake8 on Travis. Thus, I had to add some blank lines to certain files.