Skip to content

Calculate the profile of ChIP peaks binding to specific TSS regions

Astrid Deschenes edited this page Aug 13, 2014 · 39 revisions

Analyse the profile of ChIP peaks binding to TSS regions

1. One sample example: CREB transcription factor

We want to analyse the profile of ChIP peaks binding sites of the CREB (cAMP response element-binding protein) transcription factor for a all known TSS regions in mouse. Our protocol includes one sample: an CREB enriched sample and a control sample without enrichment.

CREB all TSS graph CREB model from wikipedia

We have already run the trimming, alignment steps. So, we have access to the aligned BAM file.

In R:

  1. First, we load the metagene package:

    library(metagene)
    
  2. We then create a vector containing the alignment file (1 BAM file) used in the analysis:

    bamFileCREB <- system.file("CREB.bam")
    bamFile <- c(bamFileCREB)
    
  3. All TSS features have to be retrieve (this step might take a while) and associate to the read density from the alignment file. The distance around TSS to include in the plot is fixed to 10,000 by setting the maxDistance parameter.

    groupsFeatures <- parseFeatures(bamFiles=bamFile, features,  
                                specie="mouse", maxDistance=10000)
    

    The status of achievement of each steps of the function is printed out while processing:

    Step 1: Prepare bam files... Done!
    Step 2: Prepare regions... Done!
    Step 3: Parse bam files...
    [1] "allTSS"
    [1] "Current bam: /home/CREB.bam"
    Step 3: Parse bam files... Done!
    Step 4: Merge matrix... Done!
    
  4. To generate a plot, we first have to create a list containing the elements we wish to plot. The groupsFeatures holds the names of the elements which can be used. Since we only have one ChIP sample for the CREB transcription factor, the groupsFeatures contains only one element "allTSS". We create a list named CREB which contain the element "allTSS". The name of the list will be used as the plot title. Finaly, this list has to be embedded in a generic list as it is the formal format expected by the plotMatrices function.

    names(groupsFeatures$matrix)
    [1] "allTSS"
    groupToPlot<-list(CREB=c("allTSS"))
    
  5. The plotMatrices function is used to generate the plot. The list containing the elements to plot is passed to the matricesGroups parameter while the groupsFeatures object, created sooner, is passed to the data parameter. The binSize parameter sets the number of nucleotides included in each bin for the bootstrap step. The bootstrap step uses the data from all TSS to generate a confidence interval around the final profile. By default, a confidence interval of 95% is used. The smaller the binSize parameter is, the more refine the final plot will be and the more time consuming the bootstrap step will be.

    DF<-plotMatrices(matricesGroups = groupToPlot, 
                  data = groupsFeatures, binSize = 50)
    

    The generated graph shows the profile of ChIP peaks binding sites of the CREB (cAMP response element-binding protein) transcription factor, with a confidence interval of 95%, for a all known TSS regions in mouse.

GREP all TSS graph