diff --git a/2024_edition/01_sherlockConnection.md b/2024_edition/01_sherlockConnection.md
new file mode 100644
index 0000000..891b72e
--- /dev/null
+++ b/2024_edition/01_sherlockConnection.md
@@ -0,0 +1,154 @@
+# Connection to Sherlock and familiarization with the linux shell 
+
+We will do all of our bioinformatics analysis on the Stanford High Performance Computer Cluster Sherlock (www.sherlock.stanford.edu). In this part, we will illustrate the basic workflow for connecting to the cluster and getting everything ready for a computational project. The steps are:
+- ssh into the cluster
+- set up our working directory
+- create a persistent session on the cluster to come back to 
+- request computational resources
+- load necessary packgages
+
+
+At the end of this notebook you will know how to :
+- navigate the linux filesystem with `ls`, `cd`, `mkdir`, `pwd`
+- set up a nicely organized folder for the project
+- use a Gnu Screen session
+- use SLURM to access to computational resources
+
+
+## Connection to Sherlock
+In your browser go to <login.sherlock.stanford.edu> , then click on `>Shell`
+Alternatively, you can ssh into sherlock using a terminal app on your computer. On a Mac, you can use the native terminal app Term. Open up Term (Command-space term), then
+```
+ssh <username>@login.sherlock.stanford.edu
+```
+
+
+## Setting up our workspace for the project.
+Go to the group minicourse directory
+```
+cd $GROUP_SCRATCH/biochem_minicourse_2024
+```
+Note that `$GROUP_SCRATCH` is a bash variable, which contains the path to a default "scratch" folder for our lab. You can see the content of this variable with
+```
+echo $GROUP_SCRATCH
+```
+
+Make a new folder for yourself, and move to that folder. For example for team Straight
+```
+mkdir <your_dir_name>
+cd <your_dir_name>
+```
+
+This folder is currently empty. You can list the content of the current folder with 
+```
+ls 
+```
+
+You can always see your current location in the filesystem with  
+```
+pwd
+```
+
+Now let's create a few subfolders to organize our work. We want our project directory (the team directory in this case) to look like this
+```text
+<your_dir_name>
+├── data
+│   └── samples
+|   └── external
+└──tmp
+```
+
+- The `tmp` folder is a scratch folder which we will use to put temporary files (some programs create temporary files and require a dedicated folder for these files).
+- The `data` folder will contain analysis for specific datasets, arranged into subfolders.
+
+Make these directories with 
+```bash
+mkdir data
+
+mkdir data/samples data/external
+
+mkdir tmp
+```
+Verify the tree structure with 
+```bash
+tree .
+```
+
+## Running persistent / recoverable sessions with Gnu Screen
+
+What happens if I am in the middle of some task on Sherlock and I lose internet connection or close my computer? To avoid having to go back to square one, we need to set up a persistent bash session. The standard way to do this is using a window manager such as GNU Screen.
+
+```bash
+screen -S genomics
+```
+
+This creates a new session called genomics. Let's demonstrate what it does by putting a mark in our terminal, leaving and then coming back
+
+```bash
+# leave a mark
+echo "I am here"
+# keep track of the node number in the prompt (for example sh01-ln03)
+#the screen session will only be accessible from this login node
+
+# close your terminal, then relogin into sherlock
+ssh sunetid@login.sherlock.stanford.edu
+
+# if you're not assigned the same login node as before, connect to the original one. If it's the same skip this step
+ssh sh01-ln03
+
+# get back to you persistent session
+screen -x
+```
+
+## Getting computational resources on Sherlock
+
+When we ssh into sherlock, we automatically obtain a shell on a "login node". So far, we've ran all of our commands on a login node. You can see that from the look of the prompt `[username@sh01-ln03 login ~]`. Login nodes are not meant to do any heavy computations. You can think of them as entry points into Sherlock, and places to do very simple operations, like parsing/creating directories, viewing files, etc...  
+
+For anything more computationally heavy than that, you need to request dedicated resources (in fact, if you run a command on a login node that takes too long to complete or requests too much memory, it will automatically get aborted). The way to ask for resources on Sherlock is through the resource manager and job scheduler Slurm (https://slurm.schedmd.com/documentation.html). There are several ways to ask for resources with Slurm depending on whether you want to run an interactive job (where you keep entering commands), or want to submit a job that will run without your input when resources become available. This is explained in "Running jobs on Sherlock" <https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/>.
+
+For this bootcamp, we are going to request 2 cpus each for 3h. 
+`salloc -p astraigh --time=03:00:00 --cpus-per-task=2`
+
+You should quickly get a prompt that looks like that `salloc: Pending job allocation <ID>` 
+Your command line will then read: [username@sh02-09n13 /scratch/groups/astraigh/biochem_minicourse_2022/<your_dir_name>]$
+This shell is running on a dedicated computational node (here sh02-09n13). Within this shell, you'll have access to 2 CPUS and 32 GB of RAM.
+
+This computational node is part of the partition `astraigh` (specificed by the `-p astraigh` flag) which is reserved for our lab. There are other partitions you can use, refer to the Sherlock documentation.
+
+## Loading packages for our subsequent analysis
+
+We preinstalled some bioinformatics tools we're going to use during this bootcamp inside a conda environment. Conda is a unix package manager that helps maintain your software packages in order, and an environment is an isolated set of software packages that are required for a task or computational pipeline.
+
+First add the conda path to your bash profile with these two commands:
+
+```
+echo -e "PATH=/home/groups/astraigh/miniconda3_py38_4.9.2/bin:$PATH" >> ~/.bashrc
+export PATH=/home/groups/astraigh/miniconda3_py38_4.9.2/bin:$PATH
+```
+
+Load this environment with
+```
+conda activate minicourse
+#or you can try:
+source activate minicourse
+```
+(alternatively conda activate /home/groups/astraigh/miniconda3_py38_4.9.2/envs/minicourse)
+
+## Managing your SLURM jobs
+
+You can see the jobs currently running on the cluster with 
+```
+squeue -u $USER
+```
+
+You can cancel a running job with 
+```
+scancel <JOBID>
+```
+where the JOBID is obtained from `squeue`
+
+## Additional resources
+
+- Running jobs on Sherlock : https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/
+- advanced connection options : https://www.sherlock.stanford.edu/docs/advanced-topics/connection/
+- GNU screen : https://www.howtoforge.com/linux_screen
diff --git a/2024_edition/02_externalData_downloadAndBasicsQC.md b/2024_edition/02_externalData_downloadAndBasicsQC.md
new file mode 100644
index 0000000..7aad034
--- /dev/null
+++ b/2024_edition/02_externalData_downloadAndBasicsQC.md
@@ -0,0 +1,145 @@
+# Intro to long read data
+
+## Goals
+In this part, we're going to download some sequencing data from the Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/, which is a public genomics data repository . For the purpose of familiarizing ourselves with accessing, downloading, and analyzing data from other groups, we will use a dataset derived from mouse sperm cells. https://www.ncbi.nlm.nih.gov/sra?term=SRX9822381
+
+
+
+Covered tools and concepts in this notebook:
+- fastq files
+- use basic commands to view files `cat`,`head` and `less`
+- use `wc` to count lines and very simple `awk` scripts to perform line by line operations on a file
+- use `sort` and `uniq` to create histograms
+- chain linux commands with `|` and redirect output to a file with `>`
+- run a command line program such as `fastqc` to process the data in some manner
+- download data from Sherlock to your local machine with `rsync`
+
+## GEO data download 
+Data stored on GEO can be downloaded using the `fasterq-dump` tool.
+We'll download the data within the subfolder `external` within the folder `<your_dir>/data` we created earlier. Always keep things organized to avoid headaches down the road!
+
+```bash
+#recall your folder name & set an environment variable 'me' so you can refer to this folder later
+export me="$GROUP_SCRATCH/biochem_minicourse_2024/<your_dir>"
+
+#move to the folder for this dataset
+cd $me/data/external
+
+# download the data (note that we use the tmp folder created earlier for the temporary files generated by this program)
+Note: the following command takes a bit of time, so we've already downloaded the file into our directory. Use the cp command to copy from our dir to yours (this is the first 100000 lines from the full fastq file).
+fasterq-dump SRR13403380 --progress --threads 2 --temp $me/tmp --outdir $me/data/external
+
+cp $GROUP_SCRATCH/biochem_minicourse_2024/straightlab/data/external/SRR13403380_subset.fastq ./ 
+```
+
+Let's do some quick inspection of the data we downloaded.
+```
+ls -lh 
+```
+We can see the previous command just downloaded a file called 'SRR13403380.fastq' wich is 897Mb in size
+We can also see the subset file we copied from the straightlab directory 'SRR13403380_subset.fastq' is 16Mb
+
+This is a fastq formatted file, which is the standard format for sequencing data. We can figure out how the file is organized by looking at the first 10 lines.
+
+```
+head SRR13403380_subset.fastq
+```
+
+This is of the form 
+```text
+<@READID (first read) and some other read info>
+<DNA sequence>
+<+repeat of the READID line>
+<Base call quality>
+<@READID (next read) and some other read info>
+etc..
+```
+So each read takes up 4 lines, with the sequence for read 1 at line2, sequence for read2 at line 4+2=6, etc... 
+
+## Basic qc 
+### The quick manual way (optional)
+A lot of basic analysis of sequencing data can be done without relying on external software. It is extremely useful to know a few built-in linux commands that will come in handy in many contexts and especially for parsing raw sequencing data. 4 commands that are used often are `wc`, `sort`, `awk`, `uniq` (if you want to learn more commands you should also look at `cut`, `join` and `paste`). Let's demonstrate a real world application of these commands by doing the following qc analysis:
+
+- how many reads are present in our dataset
+- what is the read length distribution
+
+We can see how many reads are in this file by counting the number of lines and dividing by 4
+
+```
+wc -l SRR13403380_subset.fastq
+# 100000 SRR13403380_subset.fastq
+# 100000/4 = 250000 reads
+
+#we can do the same for the full file
+wc -l /scratch/groups/astraigh/biochem_minicourse_2024/straightlab/data/external/SRR13403380.fastq
+```
+
+What is the length of each read? We need to count the number of characters for each sequence line. First let's  extract the sequences from the file. `awk` is a great unix program to manipulate text files line by line. Here we just filter lines with a number modulo 4 = 2 (line numbers divided by 4 that give a remainder of 2). 
+Let's extract the first 3 sequences (3 multiplied by 4 lines = `head -n12`). The pipe operator `|` allows us to chain commands. The parenthesis in `awk` serves as an if statement. `$0` is a variable containing the whole line.
+
+```
+head -n12 SRR13403380_subset.fastq | awk '(NR%4==2){print $0}'
+```
+
+Now to get the length of these first three sequences, we just print the lengh of the line instead of the line itself
+
+```
+head -n12 SRR13403380_subset.fastq | awk '(NR%4==2){print length($0)}'
+```
+
+Finally, we want to build a histogram of how many times each read length is represented. For that, we need two useful commands: sort and uniq. Sort will just sort the lines, and uniq (which requires sorted input), will count (with flag `-c`) how many times each unique line occurs. We're ready to do that for the whole file rather than the first three sequences so let's replace `head` with `cat`, which just reads through the whole file
+
+```
+cat SRR13403380_subset.fastq | awk '(NR%4==2){print length($0)}' | sort | uniq -c 
+```
+
+We can redirect the output to a file with the `>` operator. Let's call this file `readlength.hist.txt` . One last command we may want to add to the chain of commands is another `sort` command so that the histogram is sorted in such a way that the mode of the distribution comes first (the length that shows up the most often). This is achieved with 
+
+```bash
+#create histogram and save it into a file
+cat SRR13403380_subset.fastq | awk '(NR%4==2){print length($0)}' | sort | uniq -c | sort -k1,1nr > readlength.hist.txt
+```
+
+We can look at this file in a scrollable way with `less`.
+``` 
+less readlength.hist.txt
+```
+
+### QC with a dedicated tool fastqc and downloading data
+The `fastqc` tool (preinstalled on the lab partition) can be used to get some other qc metrics, in particular about sequencing quality and overrepresentated sequences.
+
+```
+fastqc SRR13403380_subset.fastq -t 2 
+```
+
+This produced an html file, which we need to download to our computer to look at. File transfer operations to and from Sherlock are best done using `rsync`
+
+Open a new terminal tab in your computer, and then run 
+```
+# replace <straightlab> with your folder name, and replace ~/Downloads if necessary
+rsync -ah --progress <username>@dtn.sherlock.stanford.edu:/scratch/groups/astraigh/biochem_minicourse_2024/<straightlab>/data/external/SRR13403380_subset_fastqc.html ~/Downloads
+
+```
+The command has the form `rsync <option flags> [source path] [destination path]`.  The optional flags `-vh` and `--progress` are just to tune the behavior of rsync and tell it to display progress in a nice way and `-a` is to preserve timestamps on files. 
+
+The source in on sherlock, and more specifically for file transfer we want to use a dedicated file transfer node on sherlock `dtn`. The target path in this example is your local Download directory.
+
+Now let's take a look. Start a new terminal window on your local computer (not on sherlock) & run:
+```
+open ~/Downloads/SRR13403380_subset_fastqc.html
+```
+
+### NanoStat
+Another simple tool which produces qc metrics more relevant to long read data is `NanoStat`
+
+```bash
+NanoStat --fastq SRR13403380_subset.fastq -n nanostat.summary
+
+head nanostat.summary
+```
+
+Questions:
+- What is the N50 for this dataset?
+- What is the median Q score?
+- what is the median error rate?
+
diff --git a/2024_edition/03_blastExample.md b/2024_edition/03_blastExample.md
new file mode 100644
index 0000000..51b818d
--- /dev/null
+++ b/2024_edition/03_blastExample.md
@@ -0,0 +1,58 @@
+# Taxonomic analysis
+
+We start with a fastq file. We want to identify where the reads are coming from. We will start with some simple blast. 
+
+
+## Blast
+
+Blast takes a fasta as an input, so we need to convert the fastq into fasta.
+
+```bash
+#recall your folder name
+export me=$GROUP_SCRATCH/biochem_minicourse_2024/<your_dir>
+
+#move to the folder for the dataset
+cd $me/data/external
+
+awk '(NR%4==1){print ">"$0; getline; print $0}' SRR13403380_subset.fastq > SRR13403380_subset.fasta
+```
+
+Let's look at the first 10 reads first
+```
+head -n 20 SRR13403380_subset.fasta > SRR13403380.head.fasta
+```
+
+We have multipe databases installed on sherlock, which correspond to some of those available on ncbi blast online. Here, because we know what type of samples we are dealing with we have downloaded just a few databases and combined them into one blastdb. The path of this database is  
+```
+/scratch/groups/astraigh/biochem_minicourse_2024/straightlab/blastdb/combined_db
+```
+
+Let's run blast using this database. We will put the blast output in a `blast` folder
+
+```bash
+#create folder for output
+mkdir -p blast
+
+#we can set a variable for this blast database so we don't have to type out the full path everytime
+myblastdb=/scratch/groups/astraigh/biochem_minicourse_2024/straightlab/blastdb/combined_db
+
+#run blast
+blastn -query SRR13403380.head.fasta -db $myblastdb -num_threads 2 > blast/SRR13403380_rawoutput.first10.txt
+```
+
+Taking a look at this file with `less` shows that this command returns the blast hits in a format similar to what we obtain when running blast online. 
+
+Note we can use "process substitution" with `<()` to get the first 10 reads on the fly without having to create an intermediate file
+
+```
+blastn -query <(head -n 20 SRR13403380_subset.fasta) -db $myblastdb -num_threads 2 > blast/SRR13403380_raw.output.first10.txt
+```
+
+Taking a look at this file with `less` shows that this command returns the blast hits in a format similar to the format we obtain when running blast online. 
+
+We can instead ask blast to produce a tabulated output, which is more computationally friendy, using the `--outfmt 6` flag.
+
+```
+blastn -query <(head -n 20 SRR13403380_subset.fasta) -db $myblastdb -num_threads 2 -subject_besthit -outfmt "6 stitle score" > blast/SRR13403380_tabulated.output.first10.txt
+```
+
diff --git a/2024_edition/04_mysterySampleID.md b/2024_edition/04_mysterySampleID.md
new file mode 100644
index 0000000..e4f41ee
--- /dev/null
+++ b/2024_edition/04_mysterySampleID.md
@@ -0,0 +1,112 @@
+# Mystery Sample Analysis
+
+Now we are going to process and analyze the sequencing data we generated earlier. 
+
+## Basecall Fast5s & Demultiplex samples
+
+Oxford Nanopore sequencers output the raw data in fast5 format. Fast5s contain the electrical conductivity signal from each nanopore. Each set of bases that are fed through the pore create a characteristic change in electrical conductivity which can be deciphered into the sequence of bases. To convert from this raw electrical signal to bases, we used a basecaller program called Guppy which outputs the data as a fastq (this is the same format of the data we downloaded from GEO). 
+
+```bash
+#recall your folder name
+export me=$GROUP_SCRATCH/biochem_minicourse_2024/<your_dir>
+
+#Go to your data folder we created earlier & create subfolders that the fastq files will go in. Change sampleX to your sample number.
+
+cd $me/data
+mkdir sampleX
+```
+Copy the data for your sample from the straightlab folder. Change sampleX to your sample number (ie. sample3). Repeat this for each sample number for your group. 
+
+In the directory $GROUP_SCRATCH/biochem_minicourse_2024/straightlab/data/samples, there are files for both sample2.fastq/sample6.fastq and sample2_full.fastq/sample6_full.fastq. Samples 2 and 6 have very long reads (on the order of 1000s of kbs), so we had to cut the readlength to 1000bp to make it run faster. The sample2.fastq/sample6.fastq files are actually capped at 1000bp reads, and the "full" files contain all the reads. We had to cut down these samples because Blast is a local alignment tool, and is not built for parsing super long reads. In general, if you're simply trying to identify the source of a sample, you probably don't need to blast super long reads to do so, as it would be a waste of time & computational resources. 
+```
+cp $GROUP_SCRATCH/biochem_minicourse_2024/straightlab/data/samples/sampleX.fastq $me/data/samples
+```
+
+## BasicQC
+As we did earlier for our downloaded data, we'll take a look at the quality of our sequencing run.
+
+```
+cd $me/data/samples
+```
+Create histogram and save it into a file. 
+Replace sampleX with the sample# for your group in both the input fastq file & the output readlength histogram file
+```
+cat $me/data/samples/sampleX.fastq | awk '(NR%4==2){print length($0)}' | sort | uniq -c | sort -k1,1nr > sampleX_readlength.hist.txt
+
+less sampleX_readlength.hist.txt
+```
+
+## Using Nanostat
+
+We used FastQC earlier to get some qc metrics, but we're now working with long read data, and thus we'll only use NanoStat for metrics.
+```
+NanoStat --fastq $me/data/samples/sampleX.fastq -n sampleX_nanostat.summary
+```
+Transfer the output file to the Downloads folder on your computer. Run this command on your local terminal.
+```
+rsync -ah --progress <username>@dtn.sherlock.stanford.edu:/scratch/groups/astraigh/biochem_minicourse_2024/<your_dir>/data/samples/sampleX_nanostat.summary ~/Downloads
+```
+Open the file using your text edit application.
+
+
+## Blast
+
+Blast takes a fasta as an input, so we need to convert the fastq into fasta.
+
+```bash
+#move to the folder for this dataset
+cd $me/data/samples
+
+awk '(NR%4==1){print ">"$0; getline; print $0}' sampleX.fastq > sampleX.fasta
+```
+
+Let's look at the first 50 reads first
+```
+head -n 100 sampleX.fasta > sampleX.head.fasta
+```
+
+We have multipe databases installed on sherlock, which correspond to some of those available on ncbi blast online. Here, because we know we what type of samples we are dealing with we have downloaded just a few databases and combined them into one blastdb. The path of this database is  
+```
+/scratch/groups/astraigh/biochem_minicourse_2024/straightlab/blastdb/combined_db
+
+#We can set a variable for this blast database so we don't have to type out the full path eachtime
+export myblastdb=/scratch/groups/astraigh/biochem_minicourse_2024/straightlab/blastdb/combined_db
+
+```
+
+Let's run blast using this database. We will put the blast output in a `blast` folder
+
+```bash
+#create folder for output
+cd $me/data/samples
+mkdir blast
+
+#run blast
+blastn -query <(head -n 100 sampleX.fasta) -db $myblastdb -num_threads 2 > blast/sampleX.rawoutput.first50.txt
+```
+
+Taking a look at this file with `less` shows that this command returns the blast hits in a format similar to that of we ontain when running blast online. 
+```
+less blast/sampleX.rawoutput.first50.txt
+```
+
+
+We can instead ask blast to produce a tabulated output, which is more computationally friendy, using the `--outfmt 6` flag.
+
+```
+blastn -query <(head -n 100 sampleX.fasta) -db $myblastdb -num_threads 2 -subject_besthit -culling_limit 2 -outfmt "6 stitle score mismatches" > blast/sampleX_tabulated.output.first50.txt
+```
+Let's sort the output and print the uniq entries
+```
+cat blast/sampleX_tabulated.output.first50.txt | awk '{OFS=" "; print $1,$2,"\t",$(NF)}' | sort -k3,3 -nr > blast/sampleX_sort.output.first50.tsv
+```
+Based on the blast output, which organism do you think each DNA sample came from?
+
+
+There are several other output format options that you can use to customize blast's output. More info here: https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.T.options_common_to_all_blast/
+
+We can run a simple python script to summarize the blast output by species
+
+```
+python /scratch/groups/astraigh/biochem_minicourse_2024/straightlab/scripts/summarize_species.py blast/sampleX_sort.output.first50.tsv blast/sampleX_species_summary.tsv
+```
diff --git a/2024_edition/draft.md b/2024_edition/draft.md
new file mode 100644
index 0000000..364044f
--- /dev/null
+++ b/2024_edition/draft.md
@@ -0,0 +1,47 @@
+conda activate /share/PI/astraigh/miniconda3/envs/bootcamp
+
+ROOTDIR="/scratch/groups/astraigh/biochem_minicourse_2021/practice_run" 
+
+cd /scratch/groups/astraigh/
+chmod -R g+rwx biochem_minicourse_2021
+setfacl -m g:astraigh:rwX biochem_minicourse_2021
+setfacl -d -m g:astraigh:rwX biochem_minicourse_2021
+
+mkdir -p "${ROOTDIR}/dataset1/guppy"
+
+## data download
+fasterq-dump SRR8351023 --progress --threads 2 --temp "${ROOTDIR}/tmp/" --outdir "${ROOTDIR}/raw/"
+
+##
+fast5s = "/scratch/groups/astraigh/minion_seq/guppy_basecalling/DiMeLo_cen_enrich_trial5/full_fast_gpu/fast5_split/fast5_eg"
+
+/home/groups/astraigh/software/ont-guppy-cpu/bin/guppy_basecaller --input_path $fast5s --save_path "${ROOTDIR}/dataset1/guppy" --config /home/groups/astraigh/software/ont-guppy-cpu/data/dna_r9.4.1_450bps_fast.cfg --num_callers 2
+
+## demultiplex
+mkdir -p "${ROOTDIR}/dataset1/guppy/split"
+
+/home/groups/astraigh/software/ont-guppy-cpu/bin/guppy_barcoder --input_path "${ROOTDIR}/dataset1/guppy/pass" --save_path "${ROOTDIR}/dataset1/guppy/split" --config /home/groups/astraigh/software/ont-guppy-cpu/data/barcoding/configuration.cfg -t 2
+
+mkdir -p "${ROOTDIR}/dataset1/guppy/split/barcode20/merged"
+
+cd "${ROOTDIR}/dataset1/guppy/split/barcode20"
+
+cat ./*.fastq >"${ROOTDIR}/dataset1/guppy/split/barcode20/merged/final.fastq"
+
+cd merged
+NanoStat --fastq final.fastq -o merged -n nanostat.summary
+
+fastqc final.fastq -o . -t 2
+
+## Make a blastdb
+
+
+cd /oak/stanford/groups/astraigh/T2T/PROseq
+
+/home/groups/astraigh/software/ncbi-blast-2.10.1+/bin/makeblastdb -in chm13.draft_v1.0.fasta -dbtype nucl -parse_seqids
+
+blastdb_aliastool -dblist "aiptasia bacteria b_impatiens c_hoffmanni human mouse racoon salmon shrimp symbio xenopus" -dbtype nucl -title Fulldb_nochm13 -out full_final
+
+## Make bowtie2 db
+ml biology bowtie2
+bowtie2-build -f --threads 16 chm13.draft_v1.0.fasta chm13.draft_v1.0
diff --git a/2024_edition/summarize_species.py b/2024_edition/summarize_species.py
new file mode 100644
index 0000000..ee9efd1
--- /dev/null
+++ b/2024_edition/summarize_species.py
@@ -0,0 +1,21 @@
+import sys
+import pandas as pd
+
+def load_data(filename):
+    df = pd.read_csv(filename, sep='\t', header=None, names=['species','N'])
+    return df
+
+def species_summary(df):
+    out = df.groupby('species').sum().sort_values('N', ascending=False)
+    return out
+
+
+if __name__ == '__main__':
+    args = sys.argv[1:]
+    filename = sys.argv[1]
+    df = load_data(filename)
+    out = species_summary(df)
+    print(out.head(n=20))
+    if len(args)>1:
+        out.to_csv(sys.argv[2], sep="\t")
+        print('Full data saved to :%s'%sys.argv[2])
\ No newline at end of file