-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete workflow for Y-STRs using custom sequence ranges #79
Conversation
lusSTR/wrappers/filter.py
Outdated
def process_input( | ||
input_name, | ||
outpath, | ||
profile_type, | ||
data_type, | ||
output_dir, | ||
info, | ||
separate, | ||
output_type, | ||
nofilters, | ||
strand, | ||
separate, | ||
custom, | ||
sex, | ||
info, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never had this happen before- @standage when I specify nofilters
as True
in the config, it's passed as True
to the function but then is changed to False
, so the no filters
if statement is bypassed. Why is that happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do notice that you name the parameter filters
in the Snakemake rule params block, but nofilters
in the config file and the Python code. That's a possible source of confusion, but it doesn't explain this behavior. And otherwise I can't see any obvious reason the variable's value would be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I do print(nofilters)
in main() it's True, if I print it in process_input() it's False. 🫤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had changed the order of the function arguments..... 🤦♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that'll do it. For functions with that many arguments, I almost always try to find which arguments have a reasonable default value and convert them to keyword arguments, which reduces that kind of issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some keyword arguments.
try: | ||
final_df = final_df.astype({"CE_Allele": "float64", "Reads": "int"}) | ||
except KeyError: | ||
final_df = None | ||
return final_df, flags_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to skip by any samples with no YSTR data without erring out.
This is ready for review @standage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions
if len(sequence) > 14: | ||
final_string = ( | ||
f"{collapse_repeats_by_length(sequence[:14], 4)} " | ||
f"{collapse_repeats_by_length(sequence[14:], 4)}" | ||
) | ||
else: | ||
final_string = collapse_repeats_by_length(sequence, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the significance of the value 14 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ensuring the beginning of the bracketed form looks like this [GAAA]3 AG GAAG ...
.
lusSTR/wrappers/filter.py
Outdated
def process_strs(dict_loc, datatype, seq_col, brack_col): | ||
def process_strs(dict_loc, datatype, seq_col, brack_col, sex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like the sex
argument is used in this function. Did I miss something, or is it really unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you're right! I've removed it.
process_input( | ||
input_name, | ||
outpath, | ||
profile_type, | ||
data_type, | ||
output_type, | ||
strand=strand, | ||
nofiltering=nofilters, | ||
separate=separate, | ||
custom=custom, | ||
sex=sex, | ||
info=info, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called whether sex
is true or false? To make sure I understand, enabling sex
includes processing of the Y-STRs in addition to the autosomals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
Completing the workflow for Y-STRs using PowerSeq custom sequence ranges. Y-STRs have to be incorporated into the
filter
step.