-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated models and load functions for almeida #18
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great @sherrie9. I've added a few notes, mostly just asking to remove lines of commented-out code if they're not needed.
tb_db/parsers.py
Outdated
|
||
with open(speciation_path, 'r') as f: | ||
reader = csv.DictReader(f) | ||
count = 0 | ||
#count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this count
variable if it isn't going to be used rather than leaving it commented out
tb_db/parsers.py
Outdated
#species.append(speci) | ||
#count+=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these lines?
tb_db/parsers.py
Outdated
qcs = [] | ||
with open(qc_path, 'r') as f: | ||
reader = csv.DictReader(f) | ||
for row in reader: | ||
qc = { | ||
'sample_id': row['sample_id'], | ||
'sample_name': row['sample_id'], | ||
#'sample_name': row['sample_id'], | ||
'sequencing_run_id' : sequencing_run_id, | ||
'most_abundant_species_name':row['most_abundant_species_name'], | ||
'most_abundant_species_fraction_total_reads': float(row['most_abundant_species_fraction_total_reads']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional but I have a suggestion:
I try to wrap conversions to int
and float
in try
/ catch
blocks like this:
try:
total_bases = int(row['total_bases'])
except ValueError as e:
total_bases = None
If there are several fields that need to be converted then I'll do something like this:
float_fields = [
'average_base_quality',
'percent_bases_above_q30',
'percent_gc',
]
for field in float_fields:
try:
row[field] = float(row[field])
except ValueError as e:
row[field] = None
tb_db/parsers.py
Outdated
@@ -185,7 +185,7 @@ def parse_cgmlst(cgmlst_path: str, uncalled='-'): | |||
reader = csv.DictReader(f) | |||
for row in reader: | |||
sample_id = row.pop('sample_id') | |||
sample_id = sample_id[:6] | |||
#sample_id = sample_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this line?
tb_db/models.py
Outdated
#accession = Column(String) | ||
#collection_date = Column(Date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove these if they aren't being used
tb_db/models.py
Outdated
@@ -71,9 +71,9 @@ class Library(Base): | |||
""" | |||
|
|||
sample_id = Column(Integer, ForeignKey("sample.id"), nullable=False) | |||
sample_name = Column(String) | |||
#sample_name = Column(String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this?
tb_db/models.py
Outdated
sequencing_run_id = Column(String) | ||
library_id = Column(String) | ||
#library_id = Column(String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this?
@@ -349,7 +349,7 @@ def create_cgmlst_profile(self, sample, profile,runid): | |||
assert(created_cgmlst_profile.library_id == created_libraries[0].id) | |||
|
|||
|
|||
|
|||
TestSampleCrudMachine = SampleCrudMachine.TestCase | |||
#comment the below out as it says 'dict' object has no attribute 'startswith' that i am not sure how to fix, refer to line 305 in crud.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to take a look at the overall testing processes and make sure everything passes, but I'm not going to let it block merging this. We can address that later.
No description provided.