Skip to content

Commit

Permalink
Merge pull request #103 from apriha/develop
Browse files Browse the repository at this point in the history
v4.2.0
  • Loading branch information
apriha authored Feb 21, 2022
2 parents 3be7a99 + 3afe893 commit 40f33d0
Show file tree
Hide file tree
Showing 9 changed files with 98 additions and 115 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ jobs:
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
persist-credentials: false
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
Expand Down
18 changes: 9 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,8 @@ calculating the centiMorgans of shared DNA and plotting the results:
>>> results = l.find_shared_dna([user662, user663], cM_threshold=0.75, snp_threshold=1100)
Downloading resources/genetic_map_HapMapII_GRCh37.tar.gz
Downloading resources/cytoBand_hg19.txt.gz
Saving output/shared_dna_User662_User663_HapMap2.png
Saving output/shared_dna_one_chrom_User662_User663_GRCh37_HapMap2.csv
Saving output/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png
Saving output/shared_dna_one_chrom_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.csv

Notice that the centiMorgan and SNP thresholds for each DNA segment can be tuned. Additionally,
notice that two files were downloaded to facilitate the analysis and plotting - future analyses
Expand All @@ -178,7 +178,7 @@ created; these files are detailed in the documentation and their generation can
``save_output=False`` argument. In this example, the output files consist of a CSV file that
details the shared segments of DNA on one chromosome and a plot that illustrates the shared DNA:

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_HapMap2.png
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png

`Find Shared Genes <https://lineage.readthedocs.io/en/stable/lineage.html#lineage.Lineage.find_shared_dna>`_
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Expand Down Expand Up @@ -214,11 +214,11 @@ Now let's find the shared genes, specifying a
Downloading resources/CEU_omni_recombination_20130507.tar
Downloading resources/knownGene_hg19.txt.gz
Downloading resources/kgXref_hg19.txt.gz
Saving output/shared_dna_User4583_User4584_CEU.png
Saving output/shared_dna_one_chrom_User4583_User4584_GRCh37_CEU.csv
Saving output/shared_dna_two_chroms_User4583_User4584_GRCh37_CEU.csv
Saving output/shared_genes_one_chrom_User4583_User4584_GRCh37_CEU.csv
Saving output/shared_genes_two_chroms_User4583_User4584_GRCh37_CEU.csv
Saving output/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png
Saving output/shared_dna_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_dna_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_genes_one_chrom_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv
Saving output/shared_genes_two_chroms_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.csv

The plot that illustrates the shared DNA is shown below. Note that in addition to outputting the
shared DNA segments on either one or both chromosomes, the shared genes on either one or both
Expand All @@ -235,7 +235,7 @@ of shared DNA:
>>> len(results['two_chrom_shared_dna'])
36

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_CEU.png
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png

Documentation
-------------
Expand Down
Binary file modified docs/images/lineage_banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 28 additions & 21 deletions docs/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,30 +53,37 @@ Shared DNA between two or more individuals can be identified with
:meth:`~lineage.Lineage.find_shared_dna`. One PNG file and up to two CSV files are output when
``save_output=True``.

In the filenames below, ``name1`` is the name of the first
:class:`~lineage.individual.Individual` and ``name2`` is the name of the second
:class:`~lineage.individual.Individual`. (If more individuals are compared, all
:class:`~lineage.individual.Individual` names will be included in the filenames and plot titles
using the same conventions.) Additionally, ``genetic_map`` corresponds to the genetic map used
in the calculations of shared DNA, specified as a parameter to :meth:`~lineage.Lineage.find_shared_dna`.
In the filenames below,

- ``name1`` is the name of the first :class:`~lineage.individual.Individual`
- ``name2`` is the name of the second :class:`~lineage.individual.Individual`
- ``cM_threshold`` corresponds to the same named parameter of
:meth:`~lineage.Lineage.find_shared_dna`; "." is replaced by "p" with precision of 2, e.g., "0p75"
- ``snp_threshold`` corresponds to the same named parameter of
:meth:`~lineage.Lineage.find_shared_dna`
- ``genetic_map`` corresponds to the same named parameter of
:meth:`~lineage.Lineage.find_shared_dna`.

.. note:: If more than two individuals are compared, all :class:`~lineage.individual.Individual`
names will be included in the filenames and plot titles using the same conventions.

.. note:: Genetic maps do not have recombination rates for the Y chromosome since the Y
chromosome does not recombine. Therefore, shared DNA will not be shown on the Y
chromosome.

shared_dna_<name1>_<name2>_<genetic_map>.png
````````````````````````````````````````````
shared_dna_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.png
````````````````````````````````````````````````````````````````````````````````````````
This plot illustrates shared DNA (i.e., no shared DNA, shared DNA on one chromosome, and shared
DNA on both chromosomes). The centromere for each chromosome is also detailed. Two examples of
this plot are shown below.

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_HapMap2.png
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User662_User663_0p75cM_1100snps_GRCh37_HapMap2.png

In the above plot, note that the two individuals only share DNA on one chromosome. In this plot,
the larger regions where "No shared DNA" is indicated are due to SNPs not being available in
those regions (i.e., SNPs were not tested in those regions).

.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_CEU.png
.. image:: https://raw.githubusercontent.com/apriha/lineage/master/docs/images/shared_dna_User4583_User4584_0p75cM_1100snps_GRCh37_CEU.png

In the above plot, the areas where "No shared DNA" is indicated are the regions where SNPs were
not tested or where DNA is not shared. The areas where "One chromosome shared" is indicated are
Expand All @@ -86,8 +93,8 @@ shared" is indicated are regions where the individuals share DNA on both chromos
Note that the regions where DNA is shared on both chromosomes is a subset of the regions where
one chromosome is shared.

shared_dna_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv
`````````````````````````````````````````````````````````````
shared_dna_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
``````````````````````````````````````````````````````````````````````````````````````````````````
If DNA is shared on one chromosome, a CSV file details the shared segments of DNA.

======= ===========
Expand All @@ -101,8 +108,8 @@ cMs CentiMorgans of matching DNA segment
snps Number of SNPs in matching DNA segment
======= ===========

shared_dna_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv
``````````````````````````````````````````````````````````````
shared_dna_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
```````````````````````````````````````````````````````````````````````````````````````````````````
If DNA is shared on two chromosomes, a CSV file details the shared segments of DNA.

======= ===========
Expand All @@ -129,11 +136,11 @@ In the filenames below, ``name1`` is the name of the first
:class:`~lineage.individual.Individual` names will be included in the filenames using the same
convention.)

shared_genes_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv
```````````````````````````````````````````````````````````````
shared_genes_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
````````````````````````````````````````````````````````````````````````````````````````````````````
If DNA is shared on one chromosome, this file details the genes shared between the individuals
on at least one chromosome; these genes are located in the shared DNA segments specified in
`shared_dna_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
`shared_dna_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.

=========== ============
Column* Description*
Expand All @@ -152,10 +159,10 @@ description Description
\* `UCSC Genome Browser <http://genome.ucsc.edu>`_ /
`UCSC Table Browser <http://genome.ucsc.edu/cgi-bin/hgTables>`_

shared_genes_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv
````````````````````````````````````````````````````````````````
shared_genes_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv
`````````````````````````````````````````````````````````````````````````````````````````````````````
If DNA is shared on both chromosomes in a pair, this file details the genes shared between the
individuals on both chromosomes; these genes are located in the shared DNA segments specified in
`shared_dna_two_chroms_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
`shared_dna_two_chroms_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.

The file has the same columns as `shared_genes_one_chrom_<name1>_<name2>_GRCh37_<genetic_map>.csv`_.
The file has the same columns as `shared_genes_one_chrom_<name1>_<name2>_<cM_threshold>cM_<snp_threshold>snps_GRCh37_<genetic_map>.csv`_.
69 changes: 33 additions & 36 deletions src/lineage/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,8 @@ def find_shared_dna(
if save_output:
self._find_shared_dna_output_helper(
individuals,
cM_threshold,
snp_threshold,
one_chrom_shared_dna,
two_chrom_shared_dna,
one_chrom_shared_genes,
Expand Down Expand Up @@ -547,12 +549,24 @@ def _find_shared_dna_helper(self, df, cM_threshold, snp_threshold, one_x_chrom):
def _find_shared_dna_output_helper(
self,
individuals,
cM_threshold,
snp_threshold,
one_chrom_shared_dna,
two_chrom_shared_dna,
one_chrom_shared_genes,
two_chrom_shared_genes,
genetic_map,
):
def output_csv(df, file, float_format="%.2f"):
save_df_as_csv(
df,
self._output_dir,
file,
comment=self._get_csv_header(),
prepend_info=False,
float_format=float_format,
)

cytobands = self._resources.get_cytoBand_hg19()

individuals_filename = ""
Expand All @@ -565,63 +579,48 @@ def _find_shared_dna_output_helper(
individuals_filename = individuals_filename[:-1]
individuals_plot_title = individuals_plot_title[:-3]

cM = "{:.2f}".format(cM_threshold).replace(".", "p")
filename_details = (
f"{individuals_filename}_{cM}cM_{snp_threshold}snps_GRCh37_{genetic_map}"
)

if create_dir(self._output_dir):
plot_chromosomes(
one_chrom_shared_dna,
two_chrom_shared_dna,
cytobands,
os.path.join(
self._output_dir,
f"shared_dna_{individuals_filename}_{genetic_map}.png",
f"shared_dna_{filename_details}.png",
),
f"{individuals_plot_title} shared DNA",
37,
)

if len(one_chrom_shared_dna) > 0:
file = (
f"shared_dna_one_chrom_{individuals_filename}_GRCh37_{genetic_map}.csv"
)
save_df_as_csv(
output_csv(
one_chrom_shared_dna,
self._output_dir,
file,
comment=self._get_csv_header(),
prepend_info=False,
float_format="%.2f",
f"shared_dna_one_chrom_{filename_details}.csv",
)

if len(two_chrom_shared_dna) > 0:
file = (
f"shared_dna_two_chroms_{individuals_filename}_GRCh37_{genetic_map}.csv"
)
save_df_as_csv(
output_csv(
two_chrom_shared_dna,
self._output_dir,
file,
comment=self._get_csv_header(),
prepend_info=False,
float_format="%.2f",
f"shared_dna_two_chroms_{filename_details}.csv",
)

if len(one_chrom_shared_genes) > 0:
file = f"shared_genes_one_chrom_{individuals_filename}_GRCh37_{genetic_map}.csv"
save_df_as_csv(
output_csv(
one_chrom_shared_genes,
self._output_dir,
file,
comment=self._get_csv_header(),
prepend_info=False,
f"shared_genes_one_chrom_{filename_details}.csv",
None,
)

if len(two_chrom_shared_genes) > 0:
file = f"shared_genes_two_chroms_{individuals_filename}_GRCh37_{genetic_map}.csv"
save_df_as_csv(
output_csv(
two_chrom_shared_genes,
self._output_dir,
file,
comment=self._get_csv_header(),
prepend_info=False,
f"shared_genes_two_chroms_{filename_details}.csv",
None,
)

def _find_shared_dna_return_helper(
Expand Down Expand Up @@ -712,7 +711,7 @@ def _compute_snp_distances(self, task):
temp = task["snps"]

# merge genetic map for this chrom
temp = temp.append(genetic_map, ignore_index=False, sort=True)
temp = pd.concat([temp, genetic_map], ignore_index=False, sort=True)

# sort based on pos
temp = temp.sort_values("pos")
Expand Down Expand Up @@ -880,8 +879,6 @@ def _remap_snps_to_GRCh37(self, individuals):

def _get_csv_header(self):
return (
"# Generated by lineage v{}, https://pypi.org/project/lineage/\n"
"# Generated at {} UTC\n".format(
__version__, datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
)
f"# Generated by lineage v{__version__}; https://pypi.org/project/lineage/{os.linesep}"
f"# Generated at {datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')} UTC{os.linesep}"
)
57 changes: 21 additions & 36 deletions src/lineage/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,23 @@ def _patch_chromosomal_features(cytobands, one_chrom_match, two_chrom_match):
the start and stop positions of particular features on each
chromosome
"""

def concat(df, chrom, start, end, gie_stain):
return pd.concat(
[
df,
pd.DataFrame(
{
"chrom": [chrom],
"start": [start],
"end": [end],
"gie_stain": [gie_stain],
}
),
],
ignore_index=True,
)

chromosomes = cytobands["chrom"].unique()

df = pd.DataFrame()
Expand All @@ -253,52 +270,20 @@ def _patch_chromosomal_features(cytobands, one_chrom_match, two_chrom_match):
]

# background of chromosome
df = df.append(
{
"chrom": chromosome,
"start": 0,
"end": chromosome_length,
"gie_stain": "gneg",
},
ignore_index=True,
)
df = concat(df, chromosome, 0, chromosome_length, "gneg")

# add markers for shared DNA on one chromosome
for marker in one_chrom_match_markers.itertuples():
df = df.append(
{
"chrom": chromosome,
"start": marker.start,
"end": marker.end,
"gie_stain": "one_chrom",
},
ignore_index=True,
)
df = concat(df, chromosome, marker.start, marker.end, "one_chrom")

# add markers for shared DNA on both chromosomes
for marker in two_chrom_match_markers.itertuples():
df = df.append(
{
"chrom": chromosome,
"start": marker.start,
"end": marker.end,
"gie_stain": "two_chrom",
},
ignore_index=True,
)
df = concat(df, chromosome, marker.start, marker.end, "two_chrom")

# add centromeres
for item in cytobands.loc[
(cytobands["chrom"] == chromosome) & (cytobands["gie_stain"] == "acen")
].itertuples():
df = df.append(
{
"chrom": chromosome,
"start": item.start,
"end": item.end,
"gie_stain": "centromere",
},
ignore_index=True,
)
df = concat(df, chromosome, item.start, item.end, "centromere")

return df
19 changes: 6 additions & 13 deletions tests/test_lineage.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,20 +195,13 @@ def _assert_does_not_exist(self, files, idx):
def _make_file_exist_assertions(
self, inds, exist="all", genetic_map="HapMap2", output_dir="output"
):
filename_details = f"{inds}_0p75cM_1100snps_GRCh37_{genetic_map}"
files = [
os.path.join(
output_dir, f"shared_dna_one_chrom_{inds}_GRCh37_{genetic_map}.csv"
),
os.path.join(
output_dir, f"shared_dna_two_chroms_{inds}_GRCh37_{genetic_map}.csv"
),
os.path.join(
output_dir, f"shared_genes_one_chrom_{inds}_GRCh37_{genetic_map}.csv"
),
os.path.join(
output_dir, f"shared_genes_two_chroms_{inds}_GRCh37_{genetic_map}.csv"
),
os.path.join(output_dir, f"shared_dna_{inds}_{genetic_map}.png"),
os.path.join(output_dir, f"shared_dna_one_chrom_{filename_details}.csv"),
os.path.join(output_dir, f"shared_dna_two_chroms_{filename_details}.csv"),
os.path.join(output_dir, f"shared_genes_one_chrom_{filename_details}.csv"),
os.path.join(output_dir, f"shared_genes_two_chroms_{filename_details}.csv"),
os.path.join(output_dir, f"shared_dna_{filename_details}.png"),
]

if exist == "all":
Expand Down

0 comments on commit 40f33d0

Please sign in to comment.