Here are some of my past and ongoing projects, which I think are pretty cool. Not all projects are being actively developed, but I will certainly respond to issues and pull requests.
I am the main developer for (more recent projects are listed first):
- sweepystats: Python package implementing the statistical sweep operation
- RootCauseDiscovery: Python and Julia package for detecting disease-causing genes in rare disease patients from gene expression data
- GhostKnockoffGWAS: Package for performing knockoff-based analysis for GWAS summary statistics data
- Ghostbasil.jl: (WIP) Provides Julia wrappers to the C++ code of ghostbasil
- Knockoffs.jl: Implements the knockoff filter framework for variable selection, which performs conditional independence testing and controls the FDR (false discovery rate)
- groupknockoffs: Simple app to solve group knockoff optimization, without Julia installed!
- EasyLD.jl: Julia utilities for downloading and reading LD (linkage disequilibrium) matrices stored in Hail's
BlockMatrix
format - knockoffspy: A Python package that provides a direct wrapper over Knockoffs.jl
- knockoffsr: A R package that provides a direct wrapper over Knockoffs.jl
- MendelIHT.jl: Implements iterative hard thresholding (l0 penalized regression solver). It is highly optimized for handling compressed (binary PLINK) genotype data
- MendelImpute.jl: A package for genotype imputation, phasing, and (global/local) ancestry inference utilizing a reference haplotype panel. It is significantly faster than existing methods but slightly less accurate
- Thyrosim.jl: An updated version of THYROSIM,
Thyrosim.jl
produces individualized thyroid hormone predictions (TT4/TT3/TSH) based on a rather complicated ODE model - VCFTools.jl: Julia utilities for handling VCF (Variant Call Format) files
- fastPHASE.jl: Julia wrapper for the famous fastPHASE genetics software. The primary use case is to allow the original program to run on binary PLINK data.
I am a contributor for (at least 5 commits):
- bge_analysis: Python, R, and Julia code for imputation and quality control scripts used for the blended genome-exome (BGE) data.
- QuasiCopula.jl: Implements a new class of distribution (Quasi-Copulas) that captures correlation among non-Gaussian random variables efficiently
- SnpArrays.jl: Julia package for handling binary PLINK formatted data. It has the fastest (matrix)-(vector) multiplication routine for compressed PLINK files as far as I know.
- MendelKinship.jl: Calculates various empirical and theoretical kinship coefficients, based on pedigree or genotype data.
I enjoy sharing my knowledge with others, so here are a few tutorials I made:
- Interfacing Julia with R/Python/C++ (as of early 2024)
- A general introduction to Julia (2022 version)
- A tutorial for multithreading and parallel computation in Julia.
- Some notes on random graph theory, presented in Biomath 203 at UCLA (2020)
- A tutorial to imputation and phasing using MendelImpute.jl presented at 2020 ASHG meeting (see homepage).
- A tutorial to iterative hard threhsolding (IHT) presented in the 2020 Lange Symposium.