-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating annotator to work with Python 3 #90
base: master
Are you sure you want to change the base?
Conversation
…n required by pandas > 0.23.0; for now enforce typing < 3.5, which constrains pandas to 0.23.0; when the incompatibility is resolved we can move on with later versions of pandas and typing.
…aspects) of annotation json to df conversion are correct w.r.t. annotations Synapse table schema. Adding warning if SSL certificate verification fails w/ instructions for a possible fix.
Change the base branch to Sage-Bionetworks:develop? This repo is long overdue for cleaning up the PR's into develop then merging develop with master. |
@@ -25,7 +25,6 @@ Clone the [source code repository](https://github.com/Sage-Bionetworks/annotator | |||
|
|||
git clone https://github.com/Sage-Bionetworks/annotator.git | |||
cd annotator | |||
git checkout develop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these instructions are specifically if you want to use the develop branch, so I think this line needs to stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but let's spell that out directly - if we want all development work to happen on 'develop' then we can state it explicitly and require it as part of a contribution guidelines. We could also have people fork annotator, handle their dev work the way they want, and PR in master (similar to the synapseAnnotations contribution workflow). I guess that goes back to @philerooski 's point: do we want annotator maintained as a project with specific up-to-date goal(s) or rather have it as an ad-hoc package of hodgepodge utilities used internally from time to time (with required bug fixes each of the 2 times a year it's used). In the latter case I'd be ok with not having contribute guidelines and be somewhat lax w/ overall code requirements, as long as whatever is in master does its job - e.g. updates synapse annotation table when we do a new release. If we want to actively maintain the package - changing/adding/removing features; keeping up-to-date with dependencies, etc. - then having a bit more formal contribution requirements would be good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but we shouldn't make the instructions inaccurate in the meantime. This section is about how to install the develop version, which requires git checkout develop
.
We could also have people fork annotator, handle their dev work the way they want, and PR in master (similar to the synapseAnnotations contribution workflow).
I am fine with this, and in this case the whole "install develop branch" section of the readme could be deleted, but just a note that this isn't the synapseAnnotations workflow -- people submit PRs to develop there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Do we want to try discussing who's using/maintaining annotator; and what are the main use cases we want it to handle in the future - @philerooski @kdaily?
annotator/schema.py
Outdated
if list(flatten_df.columns).count("name") > 1: | ||
flatten_df.columns = ['index', 'columnType', 'description', 'maximumSize', 'name', 'valueDescription', 'value', 'source', 'module'] | ||
|
||
#columns_map = dict(zip(flatten_df.columns, ['index', 'columnType', 'description', 'maximumSize', 'name', 'valueDescription', 'value', 'source', 'module'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete this if it's not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I meant to remove those - done.
Hmm some of my comments didn't get saved, one moment... |
|
||
# re-arrange columns/fields and sort data. | ||
all_modules_df = all_modules_df[annotation_schema] | ||
all_modules_df.sort_values(key, ascending=[True, True, True], inplace=True) | ||
all_modules_df.valueDescription = all_modules_df.valueDescription.str.encode('utf-8') | ||
# no need for line below on python3/pandas > 20.0 | ||
#all_modules_df.valueDescription = all_modules_df.valueDescription.str.encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave this in but make it conditional on the python/pandas version, or are we officially dropping support for python 2 in this code? I don't know who else uses this, so that might require a wider discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - going back to the previous discussion about package maintenance: #90 (comment)
Agreed if we are maintaining the package, we can explicitly state which versions of pandas/python we support, if we don't provide backward compatibility (which would be hard in the case of python 2 to python 3 related changes).
# somewhat obscure errors; in this case we trust ourselves... with a warning | ||
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)): | ||
print() | ||
print("WARNING: CERTIFICATE VERIFICATION FAILURE. This is probably benign and due to outdated openssl installation. Please upgrade your openssl package to the latest version (may require a pip/conda update).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to detect the system openssl version and check whether it is compatible?
At a minimum I think these should be true warnings rather than print statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) these are lines of codes that I felt bad adding... the issue is that later versions of python by default perform certificate validation when libraries like urllib try to open https connections; there are different reasons for verification failure on mac os x vs. other systems (e.g. python 3.6 doesn't by default use mac os x root certificates and can't find them; the version of openssl shipped with python 3.7 might not work with verifying certificates coming from a package - e.g. certifi - that was compatible with an earlier version of openssl, and other fun things :)). If anyone figures out a way to detect these problems and suggest better resolution that would be great. I didn't have time to get through it in a more meaningful way; this warning message is a bit misleading, too and more of an eyesore to force us resolve that better.
See commits for various other changes.