Pulldown: select surveys to be exported + merged table #163

sjanssen2 · 2016-11-30T19:39:07Z

Update to the metadata pulldown handler, such that the user now sees which different internal surveys are available plus he/she can select which of those should be exported to the download.
There is now an option to join all columns from all surveys into one huge file.
Filenames in resulting zip archive are more speaking now.
There seems to be a change in the flake8 on Travis. Thus, I had to add some blank lines to certain files.

mortonjt · 2016-11-30T21:11:56Z

knimin/lib/tests/test_data_access.py

+                               (-4, 'Surfers', True),
+                               (-5, 'Personal_Microbiome', False)],
+                              db.list_ag_surveys([-2, -4]))
+


Just to clarify - this is testing the real database if these surveys in the db?

It's testing that this method can obtain survey names AND that the third component is set to True for all surveys if NONE is provided and only those set to True that are in a list given to the function as third argument.

I do think that this is a really critical check. Now it looks like every time that a survey gets added, this test would need to be updated. What do you think about this? Do you think that it is worth while to raise an issue to make these tests - so that we don't need to update these tests every time that the surveys are updated.

I don't think that adding new surveys is a frequently occurring task. Thus, updating the test each time should be fine.

Ok that's fine - but what do you think about having some sort of documentation that contains all of the tests that need to be updated? If there is an additional survey being added - it could be a pain in the butt to hunt and find which tests need to be updated.

On a second thought, I totally agree with our argument that this is not future save. Thus, I changed the code in a way that additional surveys should not influence the outcome.

wasade · 2016-12-01T18:42:06Z

knimin/templates/ag_pulldown.html

+  <div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div>
+  <select name="agsurveys" id="agsurveys" multiple />
+  {% for s in agsurveys %}
+  {% if s[2] == True %}


wasade · 2016-12-01T18:42:18Z

knimin/templates/ag_pulldown.html

  <select name="external" id="external" multiple />
  {% for s in surveys %}
  <option value='{{s}}'>{{s}}</option>
  {% end %}
  </select>
 </p>
+{% if merged == 'True' %}


this is a string comparison?

yes it is, since 'merged' holds the content of 'value' of the checkbox which I defined as the string 'True'.

wasade · 2016-12-20T16:21:39Z

@sjanssen2 looks like there are merge conflicts

…_pulldown_filenames

…archive

sjanssen2 · 2017-03-29T22:24:51Z

We have issues with Geocoder searching for locations where addresses are now random utf-8 characters. This results in unstable results for metadata pulldown. Thus, I only compare the first 1000 characters of each file in the pulldown archive - basically to continue working on #184
Let us fix Geocoder later.

coveralls · 2017-03-29T22:26:28Z

Coverage increased (+1.4%) to 93.409% when pulling a4c6729 on sjanssen2:addedSurveyTestData into 9b0c0ca on biocore:master.

sjanssen2 · 2017-03-29T22:27:31Z

@wasade @josenavas @antgonza can I please get a rapid review to be able to continue working on #184

josenavas

Few comments

josenavas · 2017-03-29T23:35:19Z

knimin/handlers/ag_pulldown.py

+            # be the name of the external survey. In order to not block their
+            # pulldown I check that a skipped survey ID must be in the set of
+            # all available surveys.
+            if ((-1 * survey) in selected_ag_surveys) or \


abs_survey = abs(survey) and use abs_survey instead of -1 * survey in order to avoid code duplication

josenavas · 2017-03-29T23:37:13Z

knimin/handlers/ag_pulldown.py

+                pd_meta.set_index('sample_name', inplace=True)
+                results_as_pd.append(pd_meta)
+
+        pd_all = pd.DataFrame()


Should these 3 lines of code be moved inside the if below? Not sure how expensive is the merge, but if it is only needed when merged = True then just do it there.

good catch. Fixed

josenavas · 2017-03-29T23:38:37Z

knimin/lib/data_access.py

+        Parameters
+        ----------
+        selected : list of int
+            To keep track of which surveys have been selected by the user via


the resulting list of tupels third element holds this information as a boolean value -> I don't understand what this means, can you rephrase?

better now?

josenavas · 2017-03-29T23:39:30Z

knimin/lib/data_access.py

+        sql = """SELECT group_order, american
+                FROM ag.survey_group
+                WHERE group_order < 0"""
+        return [(id, name, (selected is None) or (id in selected))


id -> id_
id is a reserved word

josenavas · 2017-03-29T23:41:16Z

knimin/lib/data_access.py

+
+        Returns
+        -------
+        list of (int, str, bool)


Why is the last boolean needed? It is also confusing that the selected attribute is only used to mark this boolean - shouldn't it limit which ones are returned, rather than the user then going back over the list to filter the ones that he is not interested in? And by user I mean the developer that consumes this function.

This function is used to render all survey names on the metadata pulldown website. The user (who is browsing labadmin) has the option to tick a survey to include or exclude it for the actual pulldown.
Thus, this function is a hybrid of DB query and interface rendering, but since it is so simple I thought it would be OK.

josenavas · 2017-03-29T23:42:23Z

knimin/lib/mem_zip.py

@@ -61,6 +61,42 @@ def write_to_buffer(self):
        return self.in_memory_data.getvalue()


+def extract_zip(input_zip):


Can this function blow up the memory usage?

for sure, if you extract a huge archive. But it need to do the extraction for unit testing and I thought it is more elegant than doing system calls to 'unzip'

josenavas · 2017-03-29T23:42:45Z

knimin/lib/mem_zip.py

+
+    Parameters
+    ----------
+    len : int


archive is missing

josenavas · 2017-03-29T23:44:38Z

knimin/lib/tests/test_data_access.py

@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self):
            self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key])
            self.assertIn(obs[key]['participant_name'], participant_names)

+    def test_list_ag_surveys(self):
+        truth = [(-1, 'Personal Information', True),


I think this test can be simplified as

self.assertItemsEqual(db.list_ag_surveys(), truth)

nope. Consider that we add surveys in the future. This would brake your test.

The test would not be correct, since it will not be testing that all surveys are returned.

josenavas · 2017-03-29T23:46:10Z

knimin/templates/ag_pulldown.html

@@ -71,14 +80,34 @@ <h3 style="color:red">Pulldown Processing, please wait for file download. It may
 <form enctype="multipart/form-data" action="/ag_pulldown/" name="agForm" id="agForm" method="post">
 <p>Upload qiime mapping file, or other file with first line header and one barcode per line</p>
 <p>Barcodes File <input type="file" name="barcodes" id="barcodes"/></p>
+<p>American Gut Surveys:
+  <div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div>


addition -> additional

josenavas · 2017-03-29T23:48:02Z

knimin/tests/test_ag_pulldown.py

@@ -53,6 +55,13 @@ def test_get(self):
            self.assertIn("<option value='%s'>%s</option>" % (survey, survey),
                          response.body)
        self.assertNotIn('<input type="submit" disabled>', response.body)
+        for (id, name, selected) in db.list_ag_surveys():


I think in this case none are selected, correct? according to the get call that you are doing all should be false.

remember the definition of list_ag_surveys. There it is said that all get "selected" if selected is not set, i.e it is None. So in fact, all are selected in this case.

ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:

for (_id, name, selected) in db.list_ag_surveys(): self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)

Note also the change from id -> _id.

…edSurveyTestData

coveralls · 2017-03-30T14:34:52Z

Coverage increased (+1.4%) to 93.412% when pulling 7b1d57d on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

sjanssen2 · 2017-03-30T16:20:59Z

@josenavas could you review again please?

sjanssen2 · 2017-03-30T18:56:49Z

@wasade @antgonza @josenavas I need some timely reviews please!

antgonza

Some questions/comments.

antgonza · 2017-03-30T19:00:30Z

knimin/handlers/ag_pulldown.py

-        if self.get_argument('external'):
+
+        # query which surveys have been selected by the user
+        if self.get_argument('selected_ag_surveys', []):


is this if really necessary? you could do:
selected_ag_surveys = map(int, self.get_argument('selected_ag_surveys', []).split(','))
right?

I figured that it can be even easier than that, if we use get_aguments (not the s). Then, you can provide a list of strings as parameters.

antgonza · 2017-03-30T19:00:38Z

knimin/handlers/ag_pulldown.py

+        else:
+            selected_ag_surveys = []
+
+        if self.get_argument('external', []):


Same with this.

antgonza · 2017-03-30T19:01:26Z

knimin/handlers/ag_pulldown.py

+
+        # check database about what surveys are available
+        available_agsurveys = {}
+        for (id, name, selected) in db.list_ag_surveys():


could you replace id -> _id and perhaps selected by _ as it's not being used?

antgonza · 2017-03-30T19:03:13Z

knimin/lib/mem_zip.py

+    -------
+    dict{str, str} where the first component is the filename and the second
+    the first <len> characters of the file."""
+    return map(lambda (k, v): {k: v[:len]}, archive.items())


Just FYI, if the string is shorter it will return the full string ... this can cause problems in certain cases.

should be fine in this case, but thanks for the info

I now know :-)

josenavas

Few more comments

josenavas · 2017-03-30T18:59:32Z

knimin/handlers/ag_pulldown.py

+
+        # check database about what surveys are available
+        available_agsurveys = {}
+        for (id, name, selected) in db.list_ag_surveys():


id -> _id

also catched by Antonio. Fixed.

josenavas · 2017-03-30T18:59:58Z

knimin/handlers/ag_pulldown.py

+            if (abs_survey in selected_ag_surveys) or \
+               (abs_survey not in available_agsurveys):
+                meta_zip.append('survey_%s_md.txt' %
+                                available_agsurveys[-1 * survey], meta)


-1 * survey -> abs_survey

that would be wrong, because those keys are actually negative numbers; it's what the user provided (i.e. the database)

Shouldn't it then be just survey? Note that survey holds the negative number, while abs_survey holds the positive one. [-1 * survey] == abs_survey as the code stands.

I think I now fixed it. We cannot change the keys in available_agsurveys which are negative, but we can flip the sign for variable survey :-)

josenavas · 2017-03-30T19:02:14Z

knimin/lib/mem_zip.py

@@ -61,6 +61,42 @@ def write_to_buffer(self):
        return self.in_memory_data.getvalue()


+def extract_zip(input_zip):


josenavas · 2017-03-30T19:03:06Z

knimin/lib/tests/test_data_access.py

@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self):
            self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key])
            self.assertIn(obs[key]['participant_name'], participant_names)

+    def test_list_ag_surveys(self):
+        truth = [(-1, 'Personal Information', True),


The test would not be correct, since it will not be testing that all surveys are returned.

josenavas · 2017-03-30T19:05:27Z

knimin/tests/test_ag_pulldown.py

@@ -53,6 +55,13 @@ def test_get(self):
            self.assertIn("<option value='%s'>%s</option>" % (survey, survey),
                          response.body)
        self.assertNotIn('<input type="submit" disabled>', response.body)
+        for (id, name, selected) in db.list_ag_surveys():


ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:

for (_id, name, selected) in db.list_ag_surveys(): self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)

Note also the change from id -> _id.

id -> _id

coveralls · 2017-03-30T19:46:28Z

Coverage increased (+1.4%) to 93.394% when pulling ff7aee5 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

sjanssen2 · 2017-03-30T20:58:51Z

@josenavas regarding test_list_ag_surveys:
One of @mortonjt first comments addressed that and I changed my originally static tests to account for future addition of surveys. You know ask me to revert back. I don't how to satisfy both of you. Personally, I prefer Jamies way.

josenavas · 2017-03-30T21:04:07Z

@sjanssen2 Unit tests should test that the functions are doing what they are supposed to do. The current get call on test_list_ag_surveys is used to test that all surveys are returned. If we add another survey and there is a bug in which we don't return the new survey, the test will pass w/o detecting the issue.

However, if we change to use assertItemsEqual, if a new survey is added the test would fail, which is the expected behavior since you will also need to update the test to make sure that the function still does what it is supposed to do, correct?

sjanssen2 · 2017-03-30T21:29:32Z

@josenavas regarding test_list_ag_surveys:
OK, reverted as you said.

…ng, not a list. Thus, I added code to make it compatible to both ways to provide args

antgonza · 2017-03-30T21:35:47Z

Just remembered that you can retrieve a list with: self.request.arguments.get('yourvar', your_default_value)

sjanssen2 · 2017-03-30T22:02:17Z

@antgonza unfortunately, its not working here :-(

antgonza · 2017-03-30T22:05:32Z

k, thanks for testing.

coveralls · 2017-03-30T22:05:35Z

Coverage increased (+1.4%) to 93.412% when pulling 7d49672 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

sjanssen2 · 2017-03-30T22:07:05Z

ready to merge @josenavas ?

sjanssen2 · 2017-03-30T23:44:47Z

would be great to get this done before I leave into the weekend, which is in 20 minutes

josenavas

1 comment

josenavas · 2017-03-31T14:39:50Z

knimin/handlers/ag_pulldown.py

@@ -92,3 +135,22 @@ def get(self):
        except Exception as e:
            msg = 'ERROR: %s' % str(e)
        self.write(msg)
+
+
+def listify(list):


list is a python function, use a different variable name (e.g. l it's ok given that the function is simple).
This function is also missing unit test.

I recommend to use some kind of text editor that has python syntax highlighting to easily identify the reserved words.

coveralls · 2017-03-31T15:47:53Z

Coverage increased (+1.4%) to 93.412% when pulling dc55575 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

sjanssen2 · 2017-03-31T15:56:01Z

finally. Cool! Thanks @josenavas @antgonza @mortonjt

mortonjt reviewed Nov 30, 2016

View reviewed changes

wasade reviewed Dec 1, 2016

View reviewed changes

sjanssen2 closed this Mar 24, 2017

sjanssen2 deleted the addedSurveyTestData branch March 24, 2017 23:47

sjanssen2 restored the addedSurveyTestData branch March 24, 2017 23:47

sjanssen2 reopened this Mar 28, 2017

sjanssen2 added 10 commits March 29, 2017 09:11

new test data

41068be

new function to extract a ZIP archive + unit tests

5b0b4d4

new function to return the list of internal surveys + unit tests

8e7a61b

updated test data to new scrubbed DB version

de6ade8

adapted test data to new scrubbed DB

55bb5c1

work in progress

58e61d0

Merge branch 'master' of https://github.com/biocore/labadmin into fix…

e667921

…_pulldown_filenames

updated test data

36b88a6

made parameter optional + check empty result list

43f32d4

adapted to scrubbed DB

1bf5fc0

sjanssen2 force-pushed the addedSurveyTestData branch from 80a9f6d to 1bf5fc0 Compare March 29, 2017 21:22

sjanssen2 added 5 commits March 29, 2017 14:56

updated groud truth

a99666a

update in test

f980040

a new function that only returns the first X char of each file in an …

ba14da3

…archive

comparing only first 1000 chars of each file

afabe84

docstr

a4c6729

josenavas suggested changes Mar 29, 2017

View reviewed changes

sjanssen2 added 2 commits March 29, 2017 17:44

addressing Jose's comments

c7f43ce

clarification in docstring + avoid using keyword id

64cbb39

Merge branch 'master' of https://github.com/biocore/labadmin into add…

7b1d57d

…edSurveyTestData

antgonza reviewed Mar 30, 2017

View reviewed changes

josenavas suggested changes Mar 30, 2017

View reviewed changes

sjanssen2 added 3 commits March 30, 2017 12:40

corrected passing arguments to handler, i.e. by using lists

0d81e3a

getargument_s completely avoids split(',') issues

f9e3f66

id -> _id

new ground truth

ff7aee5

sjanssen2 added 3 commits March 30, 2017 14:29

shit: get_parameters would be nice, but the HTML page provides a stri…

ccd0eac

…ng, not a list. Thus, I added code to make it compatible to both ways to provide args

using Jose's style

8051f70

also testing HTML parameter passing

003fed4

try to reduce complexity

7d49672

josenavas approved these changes Mar 31, 2017

View reviewed changes

rename var list -> _list + unit tests

dc55575

josenavas approved these changes Mar 31, 2017

View reviewed changes

josenavas merged commit 4a0e2f8 into biocore:master Mar 31, 2017

sjanssen2 deleted the addedSurveyTestData branch April 3, 2017 19:02

		@@ -61,6 +61,42 @@ def write_to_buffer(self):
		return self.in_memory_data.getvalue()


		def extract_zip(input_zip):

Pulldown: select surveys to be exported + merged table #163

Pulldown: select surveys to be exported + merged table #163

Conversation

sjanssen2 commented Nov 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wasade commented Dec 20, 2016

sjanssen2 commented Mar 29, 2017

coveralls commented Mar 29, 2017

sjanssen2 commented Mar 29, 2017

josenavas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjanssen2 Mar 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjanssen2 Mar 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

antgonza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josenavas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

josenavas commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

antgonza commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

antgonza commented Mar 30, 2017

coveralls commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

sjanssen2 commented Mar 30, 2017

josenavas left a comment

sjanssen2 commented Nov 30, 2016 •

edited

Loading

sjanssen2 Mar 30, 2017 •

edited

Loading

sjanssen2 Mar 30, 2017 •

edited

Loading