Skip to content

Notes week1 (v49)

Xueqing edited this page Dec 5, 2017 · 4 revisions

Nov 28th, Tue, First Meeting Summary Project aim: Understanding transcriptional changes in HCC progression. Initial timeline setting: Week49 (Nov 27th- Dec 3rd): Preparation (Processing) of the data Identifying Replicates Combining Transcripts into Genes Week50 (Dec 4th- Dec 10th): Analysis of the Data Differential Expression Analysis Exploring the result (optional) Generating Co-Expression Network Week51 (Dec 11st- Dec 17th): Validation of the results Previous Studies Human Protein Atlas Co-expression Network Week2 (Jan 8th- Jan 10th): Concluding the work Summarizing Results Making the poster Additional notes:

  • The replicates are already indicated in the "metadata_patient_code.txt" file, in which: H → Healthy S → Steatosis N → NASH C → HCC Samples with the same NASH_IDENTIFIER are technical replicates of different reading depth, for which we will need to sum the corresponding data up. The last three samples (U001, X001, X002) are of poor quality which we should get rid of.
  • We need to combine the transcripts into identified genes. Example mapping files can be seen from Ensembl GTF human file.
  • Differential Expression Analysis can be done by DESeq quickly. After that we can compare the results with previous studies i.e. the TCSBN network database. Depending on the quality of result we will get, we can make PCA plots or build network based on our own findings, and analyze the result in our own way (e.g. if we choose the top 100 genes in each of the four conditions, will we see difference in the gene categories and/or co-expression stages?).

Nov 29th: The different columns in data files are divided by a space.

Nov 30th: Second Group Meeting Notes:

  • The transcripts have been preliminarily mapped to Ensembl reference genes.
  • Use gene length (exonic) to calculate TPM (directly comparable between samples). If TPM < 1, the genes are considered not differentially expressed.
  • The first filter should be based on the protein-coding genes (refer to the HPA No.6).
  • Then do differential expression analysis. (Sequencing platform: Illumina HiSeq)

To do list: By end of this week: Combine metadata and raw counts → patient code vs gene codes. Look up whether to sum up or take median/mean for the samples that were sequenced multiple times (deeper coverage). Get information regarding the workflow so far up until we get the raw count. Upload reference regarding TPM value cut-off. The details of how sequencing and mapping were done need to be further understood.

Questions in this week: 1128-1 What do we know about the initial samples? Do we have any information about which are (or how many) biological/technical replicates, or shall we figure them out all by ourselves? 1128-2 What does "cleaning up data" mean in detail? 1129-1 For files like "raw_count_data.txt" and "metadata.txt"--how does every line look like (in a string), are the two columns separated by a space or something else?

Contents

Diaries

Week 03, 2018

Jan 16th(A final wrap-up)

Week 02, 2018

Jan 12nd-14th
Jan 11st
Jan 9th

Week 01, 2018

Jan 3rd-6th
Jan 2nd
Jan 1st

Week 52, 2017

Dec 29th
Dec 28th
Dec 27th
Dec 26th
Dec 25th

Week 51, 2017

Dec 18th-24th

Week 50, 2017

Dec 11st-17th

Week 49, 2017

Dec 04th-10th

Week 48, 2017

Nov 30th
Nov 28th
Nov 24th


Group meeting notes (shared in the group):

Dec 13rd, 2017
Dec 11st, 2017
Dec 06th, 2017
Nov 30th, 2017
Nov 28th, 2017

Clone this wiki locally