HAlign-G: A rapid and low-memory multiple-genome aligner for thousands of human genomes

HAlign-G is a tool written in C++ for aligning multiple gemones. It runs on Linux and Windows.

Usage

halign-g Input_file Output_file [-r/--reference val] [-p/--threads val] [-bwt/--bwt val] [-dc/--divide_conquer val] [-sv/--svLen val] [-h/--help]
        Input_file      :  Input file/folder path[Please use .fasta as the file suffix or a forder]
        Output_file     :  Output file path[Please use .maf or .fasta as the file suffix]
        -r/--reference            : The reference sequence name [Please delete all whitespace] (defualt value:[Longest])
        -p/--threads              : The number of threads (defualt value:1)
        -bwt/--bwt                : The global BWT threshold (defualt value:15)
        -dc/--divide_conquer      : The divide & conquer Kband threshold (defualt value:10000)
        -sv/--svLen               : The structure variation length threshold (defualt value:200)
        -h/--help                 : Show this help message (defualt value:false)

Installation

Linux/WSL (Windows Subsystem for Linux) - from Anaconda

1.Install WSL for Windows. Instructional video 1 or 2 (Copyright belongs to the original work).

2.Download and install Anaconda. Download Anaconda for different systems here. Instructional video of anaconda installation 1 or 2 (Copyright belongs to the original work).

3.Install HAlign-G.

#1 Create and activate a conda environment for HAlign-G
conda create -n haligng_env
conda activate haligng_env

#2 Add channels to conda
conda config --add channels malab
conda config --add channels conda-forge

#3 Install HAlign-G
conda install -c malab -c conda-forge halign-g

#4 Test HAlign-G
halign-g -h

Linux/WSL(Windows Subsystem for Linux) - from the source code

Download and Compile the source code. (Make sure your version of gcc >= 9.4.0 or clang >= 13.0.0)

#1 Download
git clone https://github.com/malabz/HAlign-G.git

#2 Open the folder
cd HAlign-G/src

#3 Compile
make -j16

#4 Test
./halign-g -h

Windows - from Visual Studio 2022

First, download zip and use Bandizip, WinRAR or any other archive manager software to extract this file.
Make programs from Visual Studio 2022:

First of all, download and install Visual Studio 2022.
Open halign-g.sln in Visual Studio 2022, then choose Build -> Build Solution.
Open x64\Debug folder, you will find halign-g.exe.
- Note: If you want to release this solution, please switch it to Release mode. Select Properties, choose Configuration Properties, then press Configuration Manager... button, change Active Solution configuration to Release. You can find halign-g.exe in x64\Release folder.

Or, install via conda and simple test:

conda install -c malab -c conda-forge halign-g
halign-g.exe -h

Test data

https://github.com/malabz/HAlign-G-data

HAlign-G tools in `tools` folder

tool id	file name	description
0	time_mem.py	Calculate the maximum time and memory, can process multiple threads/processes
1	quN.cpp	Remove the degenerate base
2	dotplot.py	Generate the alignment graph for two sequences in `maf` and `fasta` format file
3	maf-tongji.py	Calculate the M-score for multiple genome alignment in `MAF` format file
4	fasta-tongji.cpp	Calculate the M-score for multiple sequence alignment in `fasta` format file
5	maf-tree/	Generate the M-score similarity matrix for `maf` format
6	jhtree.py	Generate and draw evolutionary tree based on similarity matrix
7	simulate.py	Generate genome sequences with structural variations. The reference file is `sv.maf`
8	r.py	Remove all `\r` in files in a specific folder (`Catctus` can not process `\r`)
9	xmfa2maf.cpp	Convert `xmfa` format file to `maf` format file
10	delta2maf.py	Convert `delta` format from `MUMmer` to `maf` format file
11	split-maf/	Convert blocks in `maf` format with multiple sequences to two sequences, which has only one center sequence and one non-center sequence
12	maf-filter/	Filter the `maf` format file with the sequence in block, the length of block and the continuity of block
13	ref-sort-maf/	Resort the `maf` format file by a specific sequence in `block`
14	shiyan1.py	Generate the graph of Lab 1
15	shiyan2.py	Generate the graph of Lab 2
16	shiyan4.py	Generate the graph of Lab 4
17	cov100w.py	Generate M-score histogram using 100w SARS-CoV-2 sequences
18	fast_read_file/	A fast read `txt` file module without reading file to memory

Contacts

The software tools are developed and maintained by ZOU's lab.

If you find any bug, welcome to contact us on the issues page or email us.

More tools and infomation can visit our github.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
halign-g.filters		halign-g.filters
halign-g.sln		halign-g.sln
halign-g.user		halign-g.user
halign-g.vcxproj		halign-g.vcxproj
halign-g.vcxproj.user		halign-g.vcxproj.user

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAlign-G: A rapid and low-memory multiple-genome aligner for thousands of human genomes

Usage

Installation

Linux/WSL (Windows Subsystem for Linux) - from Anaconda

Linux/WSL(Windows Subsystem for Linux) - from the source code

Windows - from Visual Studio 2022

Test data

HAlign-G tools in `tools` folder

Contacts

About

Releases

Packages

Contributors 3

Languages

License

malabz/HAlign-G

Folders and files

Latest commit

History

Repository files navigation

HAlign-G: A rapid and low-memory multiple-genome aligner for thousands of human genomes

Usage

Installation

Linux/WSL (Windows Subsystem for Linux) - from Anaconda

Linux/WSL(Windows Subsystem for Linux) - from the source code

Windows - from Visual Studio 2022

Test data

HAlign-G tools in tools folder

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

HAlign-G tools in `tools` folder

Packages