-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
1,303 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# DIGS screening database field definitions | ||
|
||
When a screen is initiated by the DIGS tool, a project-specific, relational database is created. This 'screening database' captures data generated by DIGS. | ||
|
||
### `searches_performed` table | ||
|
||
Records the details of BLAST searches that have been performed in this DIGS project. | ||
|
||
| **Field** | **Type** | **Description** | | ||
|---|---|---| | ||
| record_ID | INT | Automatically incremented primary key | | ||
| probe_ID | VARCHAR | Unique iID of probe sequence | | ||
| probe_name | VARCHAR | Name of probe sequence | | ||
| probe_gene | VARCHAR | Name of probe sequence gene | | ||
| target_id | VARCHAR | Unique identifier the TDb file | | ||
| organism | VARCHAR | Name of the organism (Latin binomial) from which TDb was generated | | ||
| target_datatype | VARCHAR | Data type of the TDb file | | ||
| target_version | VARCHAR | Version details of the TDb file | | ||
| target_name | VARCHAR | Name of TDb file | | ||
| Timestamp | TIMESTAMP | Timestamp of the table entry | | ||
|
||
|
||
### `digs_results` table | ||
|
||
Contains the extracted sequences of the loci specified in BLAST_results table, and results of the second round of paired BLAST, in which extracted sequences are 'genotyped' by BLAST comparison to the reference library. | ||
|
||
| **Field** | **Type** | **Description** | | ||
|---|---|---| | ||
| record_ID | INT | Automatically incremented primary key | | ||
| organism | VARCHAR | Organism name (Latin binomial) | | ||
| target_datatype | VARCHAR | Genome data type | | ||
| target_version | VARCHAR | Genome build version details | | ||
| target_name | VARCHAR | Name of genome data file containing the BLAST hit | | ||
| probe_type | VARCHAR | Type of probe sequence (amino acid or nucleotide) | | ||
| extract_start | INT | 5’ (start) position of reverse BLAST alignment in the RSL sequence | | ||
| extract_end | INT | 3’ (end) position of reverse BLAST alignment in the RSL sequence | | ||
| scaffold | VARCHAR | Name of scaffold/contig/chromosome containing the BLAST hit | | ||
| orientation | ENUM | Orientation of the BLAST hit relative to the probe | | ||
| assigned_name | VARCHAR | Name of closest matching sequence in RSL | | ||
| assigned_gene | VARCHAR | Name of gene of closest matching sequence in RSL | | ||
| bit_score | FLOAT | Bit score of the best match from reverse BLAST | | ||
| identity | FLOAT | Percentage identity of the best match from reverse BLAST | | ||
| e_value_num | FLOAT | Coefficient of the expect (e) value for the best match from reverse BLAST | | ||
| e_value_exp | INT | Exponent (base e) of the expect (e) value for the best match from reverse BLAST | | ||
| subject_start | INT | 5’ (start) position of reverse BLAST alignment in the RSL sequence | | ||
| subject_end | INT | 3’ (end) position of reverse BLAST alignment in the RSL sequence | | ||
| query_start | INT | 5’ (start) position of reverse BLAST alignment in the probe sequence | | ||
| query_end | INT | 3’ (end) position of reverse BLAST alignment in the probe sequence | | ||
| mismatches | INT | Number of mismatches in alignment from reverse BLAST | | ||
| gap_openings | INT | Number of gap openings in alignment from reverse BLAST | | ||
| sequence_length | INT | Length of the extracted sequence | | ||
| sequence | TEXT | Text string of the extracted sequence | | ||
| timestamp | TIMESTAMP | Timestamp of the table entry | | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
<!DOCTYPE html> | ||
<html lang="en-us"> | ||
|
||
<head> | ||
|
||
|
||
<meta charset="UTF-8"> | ||
<title>DIGS: Overview</title> | ||
<meta name="viewport" content="width=device-width, initial-scale=1"> | ||
<link rel="stylesheet" type="text/css" href="../assets/stylesheets/normalize.css" media="screen"> | ||
<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'> | ||
<link rel="stylesheet" type="text/css" href="../assets/stylesheets/stylesheet.css" media="screen"> | ||
<link rel="stylesheet" type="text/css" href="../assets/stylesheets/github-light.css" media="screen"> | ||
|
||
</head> | ||
|
||
<body> | ||
|
||
<section class="page-header"> | ||
|
||
|
||
|
||
<h1 class="project-name">DIGS</h1> | ||
<h2 class="project-tagline">Database-Integrated Genome Screening</h2> | ||
|
||
|
||
<a href="../../index.html" class="btn">Home</a> | ||
<a href="./explore.html" class="btn">Background</a> | ||
<a href="./user-guide.html" class="btn">Manual</a> | ||
<a href="https://github.com/giffordlabcvr/DIGS-tool/zipball/master" class="btn">Download</a> | ||
<a target="_blank" href="https://github.com/giffordlabcvr/DIGS-tool" class="btn">GitHub</a> | ||
<a target="_blank" href="https://twitter.com/DigsTool" class="btn">Twitter</a> | ||
|
||
</section> | ||
|
||
<section class="main-content"> | ||
|
||
<h3> | ||
<a id="SearchStructure" class="anchor" href="#SearchStructure" aria-hidden="true"><span class="octicon octicon-link"></span></a><strong>Structure of a DIGS-based investigation</strong> | ||
</h3> | ||
<hr> | ||
|
||
|
||
<p> | ||
Comparative studies using database-integrated genome screening (DIGS) entail | ||
separate 'exploration' and 'analysis' phases, | ||
with each of these phases being split into two component parts as follows: | ||
|
||
|
||
<ul> | ||
|
||
<li> <b>Exploration</b>: (1) Setting up and (2) running a similarity search-based screen.</li> | ||
<li> <b>Analysis</b>: (3) Inspecting screening output via a relational database, and (4) | ||
performing comparative analysis of exported sequence data.</li> | ||
</ul> | ||
|
||
|
||
|
||
</p> | ||
|
||
|
||
<p><img src="../assets/images/overview-phases.jpg" alt="Overview - phases" /></p> | ||
|
||
<p> | ||
|
||
As shown above, this process is usually iterative - at least to some degree - since | ||
analysis of screening results often reveals new information that can be used to | ||
design more informative or comprehensive screens. | ||
|
||
</p> | ||
|
||
|
||
<br> | ||
|
||
|
||
|
||
<h3> | ||
<a id="setUpAndRunScreen" class="anchor" href="#setUpAndRunScreen" aria-hidden="true"><span class="octicon octicon-link"></span></a><strong>Exploration phase: Setting up and running an <i>in silico</i> screen</strong> | ||
</h3> | ||
<hr> | ||
|
||
<p> | ||
|
||
DIGS is a <b>project-based framework</b> in which investigations are centred around | ||
a <b>genome feature</b> of interest. Any genome feature can be investigated in principle, | ||
so long as it contains sufficient sequence conservation to be reliably detected in a similarity | ||
search. | ||
|
||
</p> | ||
|
||
|
||
<p> | ||
The '<b>reference sequence library</b>' | ||
is a curated set of sequences relevant to the genome feature under investigation). | ||
Usually this will consist of: | ||
|
||
|
||
<ul> | ||
<li> A set of conserved DNA or polypeptide sequences derived from the genome feature of interest.</li> | ||
</ul> | ||
|
||
</p> | ||
|
||
|
||
<p> | ||
However, depending on the kind of investigation being performed, it may also contain : | ||
|
||
|
||
<ul> | ||
<li>Sequences that do not derive from the genome feature under investigation, | ||
but can provide useful information about the locus in which it occurs.</li> | ||
<li>Sequences representing genome features that are not relevant to the investigation, | ||
but are sufficiently similar to them to generate 'false positive' matches.</li> | ||
</ul> | ||
|
||
|
||
Screening entails selecting particular sequences from the reference library for use as | ||
'<b>probes</b>' in a BLAST search of a specific '<b>target database</b>'. | ||
</p> | ||
|
||
|
||
<p> | ||
Sequences that match to the query ('<b>hits</b>') can then be extracted and <b>classified</b>. | ||
A convenient way of rapidly classifying or 'genotyping' hits is via BLAST-based comparison to the | ||
reference library, as indicated in the illustration below. | ||
</p> | ||
|
||
|
||
|
||
|
||
<p><img src="../assets/images/exploration-phase.jpg" alt="Exploration phase" /></p> | ||
|
||
|
||
|
||
<blockquote> | ||
|
||
<b>Schematic representation of the <u>exploration</u> phase of a DIGS-based investigation</b>. | ||
<br> | ||
Here, the genome features being investigated are a set of related genes | ||
In <b>step (1)</b> a sequence from the reference library is selected and used | ||
as a 'probe' or 'query' in a BLAST-based search of a chosen target database. | ||
In <b>step (2)</b>, sequences identified in this search are extracted | ||
and classified via BLAST-based comparison to the reference library. | ||
These searches provides a way to effectively 'delve in' to | ||
genomic databanks and recover related sequences and as such, they provide a means | ||
to survey unmapped regions of the genomic 'landscape'. | ||
|
||
</blockquote> | ||
|
||
|
||
<br> | ||
<h3> | ||
<a id="analyseScreenOutput" class="anchor" href="#setUpAndRunScreen" aria-hidden="true"><span class="octicon octicon-link"></span></a><strong>Analysis phase: Inspecting results and exporting data for comparative analysis</strong> | ||
</h3> | ||
<hr> | ||
|
||
|
||
<p> | ||
In DIGS, a similarity search-based screening pipeline is linked to a <b>relational | ||
database management system (RDBMS)</b>, and the outputs of screening are captured | ||
in a <b>project-specific relational database</b>. | ||
</p> | ||
|
||
<p> | ||
This approach not only provides a convenient and robust | ||
basis for implementing systematic, automated screens that proceed in an efficient, | ||
non-redundant way, it also allows screening data to be interrogated using | ||
<b>structured query language (SQL)</b> - a well-established, powerful approach for | ||
querying relational databases. | ||
</p> | ||
|
||
<p> | ||
|
||
<ol> | ||
<b> | ||
<li> Investigation of output via the relational database.</li> | ||
<li> Comparative genomic analysis of exported sequence data </li> | ||
</b> | ||
</ol> | ||
|
||
|
||
<br> | ||
|
||
|
||
|
||
<p><img src="../assets/images/analysis-phase.png" alt="Analysis phase" /></p> | ||
|
||
|
||
|
||
<blockquote> | ||
<b>Analysing screening output</b>: A schematic representation of the two component | ||
parts of the 'analysis' phase of DIGS-based screen | ||
(some comparative analysis do not require an alignment, but most do). | ||
</blockquote> | ||
|
||
|
||
|
||
|
||
<br> | ||
|
||
|
||
<br> | ||
|
||
|
||
|
||
<footer class="site-footer"> | ||
<span class="site-footer-owner"><a href="https://github.com/giffordlabcvr/DIGS-tool">DIGS</a> is maintained by <a href="https://github.com/giffordlabcvr">giffordlabcvr</a>.</span> | ||
|
||
<span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a> using the <a href="https://github.com/jasonlong/cayman-theme">Cayman theme</a> by <a href="https://twitter.com/jasonlong">Jason Long</a>.</span> | ||
</footer> | ||
|
||
</section> | ||
|
||
|
||
</body> | ||
|
||
</html> |
Oops, something went wrong.