From 261fd9f8e1af05fb45fe6fde1a05387592049301 Mon Sep 17 00:00:00 2001
From: PierreBarrat <p.barrat@live.fr>
Date: Mon, 30 Sep 2024 18:02:39 +0200
Subject: [PATCH] add readme

---
 README.md         | 44 ++++++++++++++++++++++++++++++++++++++++++++
 docs/src/index.md |  2 +-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index cf3441e..0908ed2 100644
--- a/README.md
+++ b/README.md
@@ -2,3 +2,47 @@
 
 [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://PierreBarrat.github.io/BioSequenceMappings.jl/stable/)
 [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://PierreBarrat.github.io/BioSequenceMappings.jl/dev/)
+
+The aim of the package is to facilitate the task of converting biological sequences (nucleotides, amino acids) to integers or to onehot representation. 
+It also provides simple function to manipulate alignments or to compute simple statistics. 
+
+*Note*: I do not frequently use the onehot format, so for now it is not documented / not well tested. I'll develop it more when I get the time. 
+
+## Installation
+
+From a Julia session: 
+```julia
+using Pkg
+Pkg.add("BioSequenceMappings")
+using BioSequenceMappings
+```
+
+
+## Usage
+
+### Filter sequences by Hamming distance
+
+Load an alignment, find all sequences with a Hamming distance smaller than 66% to the first one, create a new alignment object from them and save it to a file. 
+
+```julia
+using BioSequenceMappings
+A = read_fasta("example/PF00014.fasta");
+size(A) # 100 sequences of length 53
+s0 = A[1]; 
+indices = findall(s -> hamming(s, s0; normalize=true) < 0.66, A)
+B = subsample(A, indices)
+write("example/PF00014_subsample.fasta", B)
+```
+
+### Remove columns with too many gaps
+
+Load an alignment, find columns with more than 5% gaps and remove them, then create a new alignment object. 
+
+```julia
+using BioSequenceMappings
+A = read_fasta("example/PF00014.fasta"); # uses the default alphabet `aa_alphabet`
+gap_digit = A.alphabet('-') 
+f1 = site_specific_frequencies(A)
+non_gapped_colums = findall(j -> f1[gap_digit, j] <= 0.05, 1:size(f1, 2))
+B = Alignment(A.data[non_gapped_colums, :], A.alphabet; A.names) # sequences are stored as columns
+```
\ No newline at end of file
diff --git a/docs/src/index.md b/docs/src/index.md
index 272a168..e58dc45 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -12,7 +12,7 @@ It also provides simple function to manipulate alignments or to compute simple s
 ## Installation
 
 From the Julia REPL: 
-```@repl 
+```@example
 using Pkg
 Pkg.add("BioSequenceMappings")
 using BioSequenceMappings