Skip to content

Commit

Permalink
Back to before REMOVE
Browse files Browse the repository at this point in the history
  • Loading branch information
breimanntools committed Sep 20, 2023
1 parent ad0253e commit 279db56
Show file tree
Hide file tree
Showing 72 changed files with 2,437 additions and 2 deletions.
4 changes: 4 additions & 0 deletions aaanalysis/plotting/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from aaanalysis.plotting.plotting_functions import plot_get_cmap, plot_get_cdict, plot_gcfs, \
plot_settings, plot_set_legend

__all__ = ["plot_get_cmap", "plot_get_cdict", "plot_settings", "plot_set_legend", "plot_gcfs"]
434 changes: 434 additions & 0 deletions aaanalysis/plotting/plotting_functions.py

Large diffs are not rendered by default.

246 changes: 246 additions & 0 deletions docs/source/_index/tables.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
..
Developer Notes:
This is the index file for all tables of the AAanalysis documentation. Each table should be saved the /tables
directory. This file will serve as template for tables.rst, which is automatically created on the information
provided here and in the .csv tables from the /tables directory. Add a new table as .csv in the /tables directory,
in the overview table at the beginning of this document, and a new section with a short description of it in this
document. Each column and important data types (e.g., categories) should be described. Each table should contain a
'Reference' column.
Ignore 'tables_template.rst: WARNING: document isn't included in any toctree' warning
Tables
======================

.. contents::
:local:
:depth: 1

Overview Table
--------------
All tables from the AAanalysis documentation are given here in chronological order of the project history.

.. _0_mapper:
.. list-table::
:header-rows: 1
:widths: 8 8 8

* - Table
- Description
- See also
* - 1_overview_benchmarks
- Protein benchmark datasets
- aa.load_dataset
* - 2_overview_scales
- Amino acid scale datasets
- aa.load_scales


Protein benchmark datasets
--------------------------
Three types of benchmark datasets are provided:

- Residue prediction (AA): Datasets used to predict residue (amino acid) specific properties.
- Domain prediction (DOM): Dataset used to predict domain specific properties.
- Sequence prediction (SEQ): Datasets used to predict sequence specific properties.

The classification of each dataset is indicated as first part of their name followed by an abbreviation for the
specific dataset (e.g., 'AA_LDR', 'DOM_GSEC', 'SEQ_AMYLO'). For some datasets, an additional version of it is provided
for positive-unlabeled (PU) learning containing only positive (1) and unlabeled (2) data samples, as indicated by
*dataset_name_PU* (e.g., 'DOM_GSEC_PU').

.. _1_overview_benchmarks:
.. list-table::
:header-rows: 1
:widths: 8 8 8 8 8 8 8 8 8 8

* - Level
- Dataset
- # Sequences
- # Amino acids
- # Positives
- # Negatives
- Predictor
- Description
- Reference
- Label
* - Amino acid
- AA_CASPASE3
- 233
- 185605
- 705
- 184900
- PROSPERous
- Prediction of caspase-3 cleavage site
- :ref:`Song18 <Song18>`
- 1 (adjacent to cleavage site), 0 (not adjacent to cleavage site)
* - Amino acid
- AA_FURIN
- 71
- 59003
- 163
- 58840
- PROSPERous
- Prediction of furin cleavage site
- :ref:`Song18 <Song18>`
- 1 (adjacent to cleavage site), 0 (not adjacent to cleavage site)
* - Amino acid
- AA_LDR
- 342
- 118248
- 35469
- 82779
- IDP-Seq2Seq
- Prediction of long intrinsically disordered regions (LDR)
- :ref:`Tang20 <Tang20>`
- 1 (disordered), 0 (ordered)
* - Amino acid
- AA_MMP2
- 573
- 312976
- 2416
- 310560
- PROSPERous
- Prediction of Matrix metallopeptidase-2 (MMP2) cleavage site
- :ref:`Song18 <Song18>`
- 1 (adjacent to cleavage site), 0 (not adjacent to cleavage site)
* - Amino acid
- AA_RNABIND
- 221
- 55001
- 6492
- 48509
- GMKSVM-RU
- Prediction of RNA-binding protein residues (RBP60 dataset)
- :ref:`Yang21 <Yang21>`
- 1 (binding), 0 (non-binding)
* - Amino acid
- AA_SA
- 233
- 185605
- 101082
- 84523
- PROSPERous
- Prediction of solvent accessibility (SA) of residue (AA_CASPASE3 data set)
- :ref:`Song18 <Song18>`
- 1 (exposed/accessible), 0 (buried/non-accessible)
* - Sequence
- SEQ_AMYLO
- 1414
- 8484
- 511
- 903
- ReRF-Pred
- Prediction of amyloidognenic regions
- :ref:`Teng21 <Teng21>`
- 1 (amyloidogenic), 0 (non-amyloidogenic)
* - Sequence
- SEQ_CAPSID
- 7935
- 3364680
- 3864
- 4071
- VIRALpro
- Prediction of capdsid proteins
- :ref:`Galiez16 <Galiez16>`
- 1 (capsid protein), 0 (non-capsid protein)
* - Sequence
- SEQ_DISULFIDE
- 2547
- 614470
- 897
- 1650
- Dipro
- Prediction of disulfide bridges in sequences
- :ref:`Cheng06 <Cheng06>`
- 1 (sequence with SS bond), 0 (sequence without SS bond)
* - Sequence
- SEQ_LOCATION
- 1835
- 732398
- 1045
- 790
- nan
- Prediction of subcellular location of protein (cytoplasm vs plasma membrane)
- :ref:`Shen19 <Shen19>`
- 1 (protein in cytoplasm), 0 (protein in plasma membrane)
* - Sequence
- SEQ_SOLUBLE
- 17408
- 4432269
- 8704
- 8704
- SOLpro
- Prediction of soluble and insoluble proteins
- :ref:`Magnan09 <Magnan09>`
- 1 (soluble), 0 (insoluble)
* - Sequence
- SEQ_TAIL
- 6668
- 2671690
- 2574
- 4094
- VIRALpro
- Prediction of tail proteins
- :ref:`Galiez16 <Galiez16>`
- 1 (tail protein), 0 (non-tail protein)
* - Domain
- DOM_GSEC
- 126
- 92964
- 63
- 63
- nan
- Prediction of gamma-secretase substrates
- :ref:`Breimann23c <Breimann23c>`
- 1 (substrate), 0 (non-substrate)
* - Domain
- DOM_GSEC_PU
- 694
- 494524
- 63
- 0
- nan
- Prediction of gamma-secretase substrates (PU dataset)
- :ref:`Breimann23c <Breimann23c>`
- 1 (substrate), 2 (unknown substrate status)


Amino acid scale datasets
-------------------------
Different amino acid scale datasets are provided

.. _2_overview_scales:
.. list-table::
:header-rows: 1
:widths: 8 8 8 8

* - Dataset
- Description
- # Scales
- Reference
* - scales
- Amino acid scales (min-max normalized)
- 586
- :ref:`Breimann23b <Breimann23b>`
* - scales_raw
- Amino acid scales (raw values)
- 586
- :ref:`Kawashima08 <Kawashima08>`
* - scales_classification
- Classification of scales (Aaontology)
- 586
- :ref:`Breimann23b <Breimann23b>`
* - scales_pc
- Principal component (PC) compressed scales
- 20
- :ref:`Breimann23a <Breimann23a>`
* - top60
- Top 60 scale subsets
- 60
- :ref:`Breimann23a <Breimann23a>`
* - top60_eval
- Evaluation of top 60 scale subsets
- 60
- :ref:`Breimann23a <Breimann23a>`


Binary file added docs/source/_index/tables/0_mapper.xlsx
Binary file not shown.
Binary file not shown.
Binary file added docs/source/_index/tables/2_overview_scales.xlsx
Binary file not shown.
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

sys.path.append(os.path.abspath('.'))

#from create_tables_doc import generate_table_rst
from create_tables_doc import generate_table_rst

# -- Path and Platform setup --------------------------------------------------
SEP = "\\" if platform.system() == "Windows" else "/"
Expand Down Expand Up @@ -172,7 +172,7 @@
]

# Create table.rst
#generate_table_rst()
generate_table_rst()

# -- Linkcode configuration ---------------------------------------------------
_module_path = os.path.dirname(importlib.util.find_spec("aaanalysis").origin) # type: ignore
Expand Down
4 changes: 4 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
Welcome to the AAanalysis documentation
=======================================
.. include:: index/badges.rst
.. include:: index/overview.rst

Install
Expand All @@ -24,12 +25,14 @@ Install
:caption: OVERVIEW

index/introduction.rst
index/usage_principles.rst
index/CONTRIBUTING_COPY.rst

.. toctree::
:maxdepth: 1
:caption: EXAMPLES

tutorials.rst

.. toctree::
:maxdepth: 2
Expand All @@ -40,6 +43,7 @@ Install
.. toctree::
:maxdepth: 1

_index/tables.rst
index/references.rst

Indices and tables
Expand Down
44 changes: 44 additions & 0 deletions docs/source/index/tables_template.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
..
Developer Notes:
This is the index file for all tables of the AAanalysis documentation. Each table should be saved the /tables
directory. This file will serve as template for tables.rst, which is automatically created on the information
provided here and in the .csv tables from the /tables directory. Add a new table as .csv in the /tables directory,
in the overview table at the beginning of this document, and a new section with a short description of it in this
document. Each column and important data types (e.g., categories) should be described. Each table should contain a
'Reference' column.
Ignore 'tables_template.rst: WARNING: document isn't included in any toctree' warning
Tables
======================

.. contents::
:local:
:depth: 1

Overview Table
--------------
All tables from the AAanalysis documentation are given here in chronological order of the project history.

.. _0_mapper:

Protein benchmark datasets
--------------------------
Three types of benchmark datasets are provided:

- Residue prediction (AA): Datasets used to predict residue (amino acid) specific properties.
- Domain prediction (DOM): Dataset used to predict domain specific properties.
- Sequence prediction (SEQ): Datasets used to predict sequence specific properties.

The classification of each dataset is indicated as first part of their name followed by an abbreviation for the
specific dataset (e.g., 'AA_LDR', 'DOM_GSEC', 'SEQ_AMYLO'). For some datasets, an additional version of it is provided
for positive-unlabeled (PU) learning containing only positive (1) and unlabeled (2) data samples, as indicated by
*dataset_name_PU* (e.g., 'DOM_GSEC_PU').

.. _1_overview_benchmarks:

Amino acid scale datasets
-------------------------
Different amino acid scale datasets are provided

.. _2_overview_scales:

22 changes: 22 additions & 0 deletions docs/source/index/usage_principles.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. Developer Notes:
This is the index file for usage principles. Files for each part are saved in the /usage_principles directory
and the overview the AAanalysis package is given as component diagram (internal dependencies) and context diagram
(external dependencies). Always give the concise code examples reflecting the usage examples. Instead of including
comprehensive tables here, add them in tables.rst and refer to them with a short explanation
Usage Principles
================
Import AAanalysis as:

.. code-block:: python
import aaanalysis as aa
.. toctree::
:maxdepth: 1

usage_principles/data_flow_entry_points
usage_principles/aaontology
usage_principles/feature_identification
usage_principles/pu_learning
usage_principles/xai
5 changes: 5 additions & 0 deletions docs/source/index/usage_principles/aaontology.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
AAontology: Classification of amino acid scales
===============================================

AAontology is a two-level classification of amino acid scale, introduced in.

8 changes: 8 additions & 0 deletions docs/source/index/usage_principles/data_flow_entry_points.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Data Flow and Enry Points
=========================

The AAanalysis toolkit uses different DataFrames starting from DataFrames containing amino acid scales information
(df_scales, df_cat) or sequence information (df_seq), which can be modified to obtain specific sequence parts (df_parts).
Amino acid scales and sequence parts together with split settings are the input for the CPP algorithm, creating
various physicochemical features (df_feat) by comparing two sets of protein sequences.

7 changes: 7 additions & 0 deletions docs/source/index/usage_principles/feature_identification.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Identifying Physicochemical Signatures using CPP
================================================

The central algorithm of the AAanalysis platform is Comparative Physicochemical Profiling (CPP), a novel sequence-based
feature engineering algorithm, designed to enable interpretable protein prediction.


Loading

0 comments on commit 279db56

Please sign in to comment.