-
Notifications
You must be signed in to change notification settings - Fork 8
Modified namedb.py and annotate-seqs.py to work with files supplied via ... #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ia command line to annotate-seqs.py mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py. Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to annotate-seqs.py Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py that use a different command line argument pattern
|
Ram, thanks for doing, I'll take a look. Can you submit a similar pull request updating the khmer-protocols tutorial, please? |
|
Sure! First thing tomorrow! Ram On Sat, Aug 24, 2013 at 5:06 PM, C. Titus Brown notifications@github.comwrote:
|
|
@ctb : The tutorial is not wired up the the github repo. I'm not sure how to work on the linking, any help would be appreciated! |
|
See Khmer-protocols repository. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 8:36, Ram RS notifications@github.com wrote:
|
|
One immediate comment: 'namedb' is an imported script and should not reference sys.argv. If we want to have the database name be an argument, the proper way to handle this would be to have a function that you use to load the pickles, I think. I also don't like making commands even more complicated :). So, I'm leaning towards doing this differently -- you should never need multiple namedbs in a given directory, so why not have 'names.db' which is a pickle that also contains a reference to the sequence file being used? Then you set up the names.db once, with the specific sequence file, make-namedb.py mouse.protein.faa and all the info goes into a standard location, 'names.db', which contains a reference to 'mouse.protein.faa'. Whaddya think? |
|
I was thinking the exact same thing, just did not know how to express it. Like you say, scripts that are not directly directly invoked should not use sys.argv. I agree, we could introduce an intermediate step to create the names file. Also, I'm unfamiliar with the term "pickle (n)". Am I right in saying it refers to a file with the output of a serialization operation? |
|
On Sun, Aug 25, 2013 at 08:05:12AM -0700, Ram RS wrote:
We already have one, right? make-namedb.py. So let's just modify it...
http://docs.python.org/2/library/pickle.html --titusC. Titus Brown, ctb@msu.edu |
|
I am reading that and a couple of other docs too, just not fluent enough to Ram On Sun, Aug 25, 2013 at 11:07 AM, C. Titus Brown
|
|
Just clarifying, we |
|
A seems simplest. And I would just pickle the name of the sequence database as the first dump into the names.db. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 11:54, Ram RS notifications@github.com wrote:
|
|
Alright, working on that now. Just a quick question - why not dump names and load into a variable (say, "data") and access using data["names"] and Ram On Sun, Aug 25, 2013 at 12:57 PM, C. Titus Brown
|
file and constant namedb files make-namedb changed to pickle into constant names.db and fullnames.db files, and pickle sequence filename into names.db. namedb.py modified to unpickle correspondingly. Lines expecting extra CMD line args in annotate-seqs removed.
|
These changes are being overridden by the next commit, so I'm closing this pull request. I'll open a new pr on the latest commit shortly. |
|
No, please continue on this pr. If you find yourself wanting to close a pr, please don't! They record valuable history and conversation! C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 18:16, Ram RS notifications@github.com wrote:
|
|
But closed prs exist too, don't they? Ram On Sun, Aug 25, 2013 at 8:02 PM, C. Titus Brown notifications@github.comwrote:
|
|
A closed pr is an idea we've totally abandoned, or one we've merged. C. Titus Brown, ctb@msu.edu On Aug 25, 2013, at 20:26, Ram RS notifications@github.com wrote:
|
|
Alright, I'll reopen the request, but we're no longer working on the version referenced by the pr - just a note! |
|
You should be working on the version referenced by the PR; that's kinda the point. Fix the version referenced by the PR, don't create a new version or branch. |
|
Got it, will do from the next time I work on this - I'm reading up on Git Ram On Sun, Aug 25, 2013 at 9:58 PM, C. Titus Brown notifications@github.comwrote:
|
|
np. On Sun, Aug 25, 2013 at 07:15:33PM -0700, Ram RS wrote:
C. Titus Brown, ctb@msu.edu |
dict() used in previous commit has been removed, sequential calls to dump() and load() are used instead. A variable 'seqFile' id used to store sequence file name from CMD line args.
...command line to annotate-seqs.py
mouse.namedb, mouse.namedb.fullname and mouse.protein.faa were hardcoded into namedb.py.
Now, it picks up from sys.argv[4] and sys.argv[5], where sys.argv is supplied to
annotate-seqs.py
Possible regression bugs if namedb.py is used by scripts other than annotate-seqs.py
that use a different command line argument pattern