Updating annotator to work with Python 3 #90

milen-sage · 2019-02-06T02:12:48Z

See commits for various other changes.

…n required by pandas > 0.23.0; for now enforce typing < 3.5, which constrains pandas to 0.23.0; when the incompatibility is resolved we can move on with later versions of pandas and typing.

…aspects) of annotation json to df conversion are correct w.r.t. annotations Synapse table schema. Adding warning if SSL certificate verification fails w/ instructions for a possible fix.

philerooski · 2019-02-06T03:34:27Z

Change the base branch to Sage-Bionetworks:develop? This repo is long overdue for cleaning up the PR's into develop then merging develop with master.

karawoo · 2019-02-06T16:40:33Z

README.md

@@ -25,7 +25,6 @@ Clone the [source code repository](https://github.com/Sage-Bionetworks/annotator

    git clone https://github.com/Sage-Bionetworks/annotator.git
    cd annotator
-    git checkout develop


It looks like these instructions are specifically if you want to use the develop branch, so I think this line needs to stay.

I agree, but let's spell that out directly - if we want all development work to happen on 'develop' then we can state it explicitly and require it as part of a contribution guidelines. We could also have people fork annotator, handle their dev work the way they want, and PR in master (similar to the synapseAnnotations contribution workflow). I guess that goes back to @philerooski 's point: do we want annotator maintained as a project with specific up-to-date goal(s) or rather have it as an ad-hoc package of hodgepodge utilities used internally from time to time (with required bug fixes each of the 2 times a year it's used). In the latter case I'd be ok with not having contribute guidelines and be somewhat lax w/ overall code requirements, as long as whatever is in master does its job - e.g. updates synapse annotation table when we do a new release. If we want to actively maintain the package - changing/adding/removing features; keeping up-to-date with dependencies, etc. - then having a bit more formal contribution requirements would be good?

Agree, but we shouldn't make the instructions inaccurate in the meantime. This section is about how to install the develop version, which requires git checkout develop.

We could also have people fork annotator, handle their dev work the way they want, and PR in master (similar to the synapseAnnotations contribution workflow).

I am fine with this, and in this case the whole "install develop branch" section of the readme could be deleted, but just a note that this isn't the synapseAnnotations workflow -- people submit PRs to develop there.

Sounds good. Do we want to try discussing who's using/maintaining annotator; and what are the main use cases we want it to handle in the future - @philerooski @kdaily?

karawoo · 2019-02-06T16:46:27Z

annotator/schema.py

+        if list(flatten_df.columns).count("name") > 1:
+            flatten_df.columns = ['index', 'columnType', 'description', 'maximumSize', 'name', 'valueDescription', 'value', 'source', 'module'] 
+
+            #columns_map = dict(zip(flatten_df.columns, ['index', 'columnType', 'description', 'maximumSize', 'name', 'valueDescription', 'value', 'source', 'module']))


Can we delete this if it's not used?

Yes - I meant to remove those - done.

karawoo · 2019-02-06T16:55:01Z

Hmm some of my comments didn't get saved, one moment...

karawoo · 2019-02-06T16:56:02Z

annotator/__main__.py


    # re-arrange columns/fields and sort data.
    all_modules_df = all_modules_df[annotation_schema]
    all_modules_df.sort_values(key, ascending=[True, True, True], inplace=True)
-    all_modules_df.valueDescription = all_modules_df.valueDescription.str.encode('utf-8')
+    # no need for line below on python3/pandas > 20.0
+    #all_modules_df.valueDescription = all_modules_df.valueDescription.str.encode('utf-8')


Should we leave this in but make it conditional on the python/pandas version, or are we officially dropping support for python 2 in this code? I don't know who else uses this, so that might require a wider discussion.

Yes - going back to the previous discussion about package maintenance: #90 (comment)

Agreed if we are maintaining the package, we can explicitly state which versions of pandas/python we support, if we don't provide backward compatibility (which would be hard in the case of python 2 to python 3 related changes).

karawoo · 2019-02-06T16:57:10Z

annotator/__main__.py

+            # somewhat obscure errors; in this case we trust ourselves... with a warning
+            if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)): 
+                print()
+                print("WARNING: CERTIFICATE VERIFICATION FAILURE. This is probably benign and due to outdated openssl installation. Please upgrade your openssl package to the latest version (may require a pip/conda update).")


Is there a way to detect the system openssl version and check whether it is compatible?

At a minimum I think these should be true warnings rather than print statements.

:) these are lines of codes that I felt bad adding... the issue is that later versions of python by default perform certificate validation when libraries like urllib try to open https connections; there are different reasons for verification failure on mac os x vs. other systems (e.g. python 3.6 doesn't by default use mac os x root certificates and can't find them; the version of openssl shipped with python 3.7 might not work with verifying certificates coming from a package - e.g. certifi - that was compatible with an earlier version of openssl, and other fun things :)). If anyone figures out a way to detect these problems and suggest better resolution that would be great. I didn't have time to get through it in a more meaningful way; this warning message is a bit misleading, too and more of an eyesore to force us resolve that better.

milen-sage added 3 commits February 5, 2019 18:09

Updated readme: no need to switch to branch develop.

8523bc1

Avoiding incompatibility between Python 3.7 and package typing versio…

3bf016a

…n required by pandas > 0.23.0; for now enforce typing < 3.5, which constrains pandas to 0.23.0; when the incompatibility is resolved we can move on with later versions of pandas and typing.

Fixing errors due to Python 2 -> Python 3 transition; ensuring (some …

cadb49c

…aspects) of annotation json to df conversion are correct w.r.t. annotations Synapse table schema. Adding warning if SSL certificate verification fails w/ instructions for a possible fix.

milen-sage requested a review from karawoo February 6, 2019 02:12

karawoo reviewed Feb 6, 2019

View reviewed changes

Removed dead commented out code.

8e47aab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating annotator to work with Python 3 #90

Updating annotator to work with Python 3 #90

milen-sage commented Feb 6, 2019

philerooski commented Feb 6, 2019

karawoo Feb 6, 2019

milen-sage Feb 6, 2019

karawoo Feb 6, 2019

milen-sage Feb 6, 2019

karawoo Feb 6, 2019

milen-sage Feb 6, 2019

karawoo commented Feb 6, 2019

karawoo Feb 6, 2019

milen-sage Feb 6, 2019

karawoo Feb 6, 2019

milen-sage Feb 6, 2019

Updating annotator to work with Python 3 #90

Are you sure you want to change the base?

Updating annotator to work with Python 3 #90

Conversation

milen-sage commented Feb 6, 2019

philerooski commented Feb 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karawoo commented Feb 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment