From 60d90c513d216ea23c214e865ec9114dafdcacda Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Tue, 8 Oct 2024 13:31:02 +0000 Subject: [PATCH] build based on 22768ca --- dev/.documenter-siteinfo.json | 2 +- dev/alignments/index.html | 20 ++++++++++---------- dev/reference/index.html | 2 +- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index ac2923a..d8a7dd0 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.0","generation_timestamp":"2024-10-08T13:30:56","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.0","generation_timestamp":"2024-10-08T13:30:58","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/alignments/index.html b/dev/alignments/index.html index 4aaccac..71fddd0 100644 --- a/dev/alignments/index.html +++ b/dev/alignments/index.html @@ -20,9 +20,9 @@ 1 3 10 11 16 11 4 15 7 17 5 5 5 3 10 10 18 3 1 14 3 5 15 14 12 4 2 7 14 5 2 17 3 7 21 8 3 10 1 3 21 19 7 16 13 5 7 21 15 5 17 3 5 2 16 3 1 - 9 3 12 11 14 10 4 18 7 18 15 5 2 3 5 2 19 3 2
julia> size(A) # length and number of sequences(53, 100)

When reading from FASTA, the choice of the alphabet is made by reading the first five sequences, and comparing the observed characters with the list of default alphabets (see The Alphabet type). If they fit one of the defaults, it will be used. Otherwise, an alphabet will be created ad hoc:


julia> A = read_fasta("../../example/strange_characters.fasta"); # warning produced because no default alphabet was found┌ Warning: Could not find a default alphabet for characters ['!', '-', '/', '0', '9', '@', 'A', 'C', 'G', 'T'] -│ Using Alphabet{Char,Int64}: ['!', '-', '/', '0', '9', '@', 'A', 'C', 'G', 'T'] -└ @ BioSequenceMappings ~/work/BioSequenceMappings.jl/BioSequenceMappings.jl/src/IO.jl:62
julia> A.alphabet |> symbols |> prod"!-/09@ACGT"

Writing to a FASTA file is just as easy:

julia> write("new_fasta_file.fasta", A) # or...
julia> open("new_fasta_file.fasta", "w") do io + 9 3 12 11 14 10 4 18 7 18 15 5 2 3 5 2 19 3 2
julia> size(A) # length and number of sequences(53, 100)

When reading from FASTA, the choice of the alphabet is made by reading the first five sequences, and comparing the observed characters with the list of default alphabets (see The Alphabet type). If they fit one of the defaults, it will be used. Otherwise, an alphabet will be created ad hoc:


julia> A = read_fasta("../../example/strange_characters.fasta"); # warning produced because no default alphabet was found┌ Warning: Could not find a default alphabet for characters ['!', '-', '/', '0', '9', '@', 'A', 'C', 'G', 'T'] + Using Alphabet{Char,Int64}: ['!', '-', '/', '0', '9', '@', 'A', 'C', 'G', 'T'] +@ BioSequenceMappings ~/work/BioSequenceMappings.jl/BioSequenceMappings.jl/src/IO.jl:62
julia> A.alphabet |> symbols |> prod"!-/09@ACGT"

Writing to a FASTA file is just as easy:

julia> write("new_fasta_file.fasta", A) # or...
julia> open("new_fasta_file.fasta", "w") do io write(io, A) end

Accessing & iterating

Sequences can be accessed by indexing. Indexing using a range will return a view in the underlying data matrix.

julia> A[1] # the first sequence of the alignment53-element view(::Matrix{Int64}, :, 1) with eltype Int64:
   1
@@ -86,14 +86,14 @@
 3×10 adjoint(::Matrix{Int64}) with eltype Int64:
  4  3  5  3  3  5  5  3  5  1
  3  3  5  2  2  1  5  3  1  2
- 5  3  4  2  4  1  5  1  5  5
julia> subsample_random(A, 12) # sampling without replacement: this will error since size(A, 1) < 12ERROR: AssertionError: Cannot take 12 different sequences from alignment of size 5
julia> rand(A) # one random sequence from A (returns a view)10-element view(::Matrix{Int64}, :, 1) with eltype Int64: + 5 3 4 2 4 1 5 1 5 5
julia> subsample_random(A, 12) # sampling without replacement: this will error since size(A, 1) < 12ERROR: AssertionError: Cannot take 12 different sequences from alignment of size 5
julia> rand(A) # one random sequence from A (returns a view)10-element view(::Matrix{Int64}, :, 3) with eltype Int64: + 4 + 3 5 3 - 4 - 2 - 4 - 1 + 3 5 - 1 5 - 5

OneHotAlignment

TBA

+ 3 + 5 + 1

OneHotAlignment

TBA

diff --git a/dev/reference/index.html b/dev/reference/index.html index fd19903..eacc3ba 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -1,5 +1,5 @@ -Reference · BioSequenceMappings.jl

Documentation for BioSequenceMappings.

BioSequenceMappings.AlignmentType
mutable struct Alignment{A,T} where {A, T<:Integer}
    data::Matrix{T}
+Reference · BioSequenceMappings.jl

Documentation for BioSequenceMappings.

BioSequenceMappings.AlignmentType
mutable struct Alignment{A,T} where {A, T<:Integer}
    data::Matrix{T}
     alphabet::Union{Nothing, Alphabet{A,T}}
     weights::Vector{Float64} = ones(size(dat,1))/size(dat,1) # phylogenetic weights of sequences
     names::Vector{String} = fill("", size(dat, 1))

Biological sequences as vectors of type T<:Integer. data stores sequences in columns: size(dat) returns a tuple (L, M) with L the length and M the number of sequences. When displayed, shows data as an MxL matrix to match with traditional alignments.

alphabet{A,T} represents the mapping between integers in data and biological symbols of type A (nucleotides, amino acids...). If nothing, the alignment cannot be mapped to biological sequences.

weights represent phylogenetic weights, and are initialized to 1/M. They must sum to 1. names are the label of sequences, and are expected to be in the same order as the columns of data. They do not have to be unique, and can be ignored

Important: When built from a matrix, assumes that the sequences are stored in columns.

Methods

  • getindex(X::Alignment, i) returns a matrix/vector X.data[:, i].
  • for s in X::Alignment iterates over sequences.
  • eachsequence(X::Alignment) returns an iterator over sequences (Vector{Int}).
  • eachsequence_weighted(X::Alignment) returns an iterator over sequences and weights as tuples
  • subaln(X::Alignment, idx) constructs the subaln defined by index idx.
source
BioSequenceMappings.AlignmentMethod
Alignment(data::AbstractMatrix{T}; alphabet = :auto, kwargs...)

Keyword argument alphabet can be :auto, :none/nothing, or an input to the constructor Alphabet. Other keyword arguments are passed to the default constructor of Alignment.

source
BioSequenceMappings.AlignmentMethod
Alignment(data::AbstractMatrix, alphabet; kwargs...)

data is a matrix of integers, with sequences stored in columns. alphabet can be either

  • an Alphabet
  • nothing: no conversion from integers to biological symbols.
  • something to build an alphabet from (e.g. a symbol like :aa, a string, ...). The constructor Alphabet will be called like so: Alphabet(alphabet).

If the types of alphabet and data mismatch, data is converted.

data can also have the following shape:

  • vector of integer vectors, e.g. [[1,2], [3,4]]: each element is considered as a sequence
  • vector of integers: single sequence alignment
source