Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete workflow for Y-STRs using custom sequence ranges #79

Merged
merged 13 commits into from
Aug 20, 2024

Conversation

rnmitchell
Copy link
Contributor

Completing the workflow for Y-STRs using PowerSeq custom sequence ranges. Y-STRs have to be incorporated into the filter step.

Comment on lines 436 to 447
def process_input(
input_name,
outpath,
profile_type,
data_type,
output_dir,
info,
separate,
output_type,
nofilters,
strand,
separate,
custom,
sex,
info,
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never had this happen before- @standage when I specify nofilters as True in the config, it's passed as True to the function but then is changed to False, so the no filters if statement is bypassed. Why is that happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do notice that you name the parameter filters in the Snakemake rule params block, but nofilters in the config file and the Python code. That's a possible source of confusion, but it doesn't explain this behavior. And otherwise I can't see any obvious reason the variable's value would be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I do print(nofilters) in main() it's True, if I print it in process_input() it's False. 🫤

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had changed the order of the function arguments..... 🤦‍♀️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that'll do it. For functions with that many arguments, I almost always try to find which arguments have a reasonable default value and convert them to keyword arguments, which reduces that kind of issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some keyword arguments.

Comment on lines +140 to 144
try:
final_df = final_df.astype({"CE_Allele": "float64", "Reads": "int"})
except KeyError:
final_df = None
return final_df, flags_df
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to skip by any samples with no YSTR data without erring out.

@rnmitchell rnmitchell marked this pull request as ready for review August 15, 2024 10:12
@rnmitchell
Copy link
Contributor Author

This is ready for review @standage

Copy link
Member

@standage standage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions

Comment on lines +1456 to +1462
if len(sequence) > 14:
final_string = (
f"{collapse_repeats_by_length(sequence[:14], 4)} "
f"{collapse_repeats_by_length(sequence[14:], 4)}"
)
else:
final_string = collapse_repeats_by_length(sequence, 4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the significance of the value 14 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ensuring the beginning of the bracketed form looks like this [GAAA]3 AG GAAG ....

Comment on lines 69 to 93
def process_strs(dict_loc, datatype, seq_col, brack_col):
def process_strs(dict_loc, datatype, seq_col, brack_col, sex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the sex argument is used in this function. Did I miss something, or is it really unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you're right! I've removed it.

Comment on lines +536 to +548
process_input(
input_name,
outpath,
profile_type,
data_type,
output_type,
strand=strand,
nofiltering=nofilters,
separate=separate,
custom=custom,
sex=sex,
info=info,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called whether sex is true or false? To make sure I understand, enabling sex includes processing of the Y-STRs in addition to the autosomals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

@standage standage merged commit e0c5e40 into master Aug 20, 2024
2 checks passed
@standage standage deleted the ystrs-custom branch August 20, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants