Extract regions sharing features in more than one experiment — findConsensusPeakRegions • consensusSeekeR

Find regions sharing the same features for a minimum number of experiments using called peaks of signal enrichment based on pooled, normalized data (mainly coming from narrowPeak files). The peaks and narrow peaks are used to identify the consensus regions. The minimum number of experiments that must have at least on peak in a region so that it is retained as a consensus region is specified by user, as well as the size of mining regions. Only the chromosomes specified by the user are treated. The function can be parallized by specifying a number of threads superior to 1.

When the padding is small, the detected regions are smaller than the one that could be obtained by doing an overlap of the narrow regions. Even more, the parameter specifying the minimum number of experiments needed to retain a region add versatility to the function.

Beware that the side of the padding can have a large effect on the detected consensus regions. It is recommanded to test more than one size and to do some manual validation of the resulting consensus regions before selecting the final padding size.

findConsensusPeakRegions(
  narrowPeaks,
  peaks,
  chrInfo,
  extendingSize = 250,
  expandToFitPeakRegion = FALSE,
  shrinkToFitPeakRegion = FALSE,
  minNbrExp = 1L,
  nbrThreads = 1L
)

Arguments

narrowPeaks: a GRanges containing called peak regions of signal enrichment based on pooled, normalized data for all analyzed experiments. All GRanges entries must have a metadata field called "name" which identifies the region to the called peak. All GRanges entries must also have a row name which identifies the experiment of origin. Each peaks entry must have an associated narrowPeaks entry. A GRanges entry is associated to a narrowPeaks entry by having a identical metadata "name" field and a identical row name.
peaks: a GRanges containing called peaks of signal enrichment based on pooled, normalized data for all analyzed experiments. All GRanges entries must have a metadata field called "name" which identifies the called peak. All GRanges entries must have a row name which identifies the experiment of origin. Each peaks entry must have an associated narrowPeaks entry. A GRanges entry is associated to a narrowPeaks entry by having a identical metadata "name" field and a identical row name.
chrInfo: a Seqinfo containing the name and the length of the chromosomes to analyze. Only the chomosomes contained in this Seqinfo will be analyzed.
extendingSize: a numeric value indicating the size of padding on both sides of the position of the peaks median to create the consensus region. The minimum size of the consensus region is equal to twice the value of the extendingSize parameter. The size of the extendingSize must be a positive integer. Default = 250.
expandToFitPeakRegion: a logical indicating if the region size, which is set by the extendingSize parameter is extended to include the entire narrow peak regions of all peaks included in the unextended consensus region. The narrow peak regions of the peaks added because of the extension are not considered for the extension. Default: FALSE.
shrinkToFitPeakRegion: a logical indicating if the region size, which is set by the extendingSize parameter is shrinked to fit the narrow peak regions of the peaks when all those regions are smaller than the consensus region. Default: FALSE.
minNbrExp: a positive numeric or a positive integer indicating the minimum number of experiments in which at least one peak must be present for a potential consensus region. The numeric must be a positive integer inferior or equal to the number of experiments present in the narrowPeaks and peaks parameters. Default = 1.
nbrThreads: a numeric or a integer indicating the number of threads to use in parallel. The nbrThreads must be a positive integer. Default = 1.

Value

an list of class "consensusRanges" containing :

call the matched call.
consensusRanges a GRanges containing the consensus regions.

Author

Astrid Deschênes

Examples


## Loading datasets
data(A549_CTCF_MYN_NarrowPeaks_partial)
data(A549_CTCF_MYN_Peaks_partial)
data(A549_CTCF_MYJ_NarrowPeaks_partial)
data(A549_CTCF_MYJ_Peaks_partial)

## Assigning experiment name "CTCF_MYJ" to first experiment
names(A549_CTCF_MYJ_NarrowPeaks_partial) <- rep("CTCF_MYJ",
    length(A549_CTCF_MYJ_NarrowPeaks_partial))
names(A549_CTCF_MYJ_Peaks_partial) <- rep("CTCF_MYJ",
    length(A549_CTCF_MYJ_Peaks_partial))

## Assigning experiment name "CTCF_MYN" to second experiment
names(A549_CTCF_MYN_NarrowPeaks_partial) <- rep("CTCF_MYN",
    length(A549_CTCF_MYN_NarrowPeaks_partial))
names(A549_CTCF_MYN_Peaks_partial) <- rep("CTCF_MYN",
    length(A549_CTCF_MYN_Peaks_partial))

## Only choromsome 1 is going to be analysed
chrList <- Seqinfo("chr1", 249250621, NA)

## Find consensus regions with both experiments
results <- findConsensusPeakRegions(
    narrowPeaks = c(A549_CTCF_MYJ_NarrowPeaks_partial,
        A549_CTCF_MYN_NarrowPeaks_partial),
    peaks = c(A549_CTCF_MYJ_Peaks_partial,
        A549_CTCF_MYN_Peaks_partial),
    chrInfo = chrList,
    extendingSize = 300,
    expandToFitPeakRegion = TRUE,
    shrinkToFitPeakRegion = FALSE,
    minNbrExp = 2,
    nbrThreads = 1)

## Print 2 first consensus regions
head(results$consensusRanges, 2)
#> GRanges object with 2 ranges and 0 metadata columns:
#>       seqnames              ranges strand
#>          <Rle>           <IRanges>  <Rle>
#>   [1]     chr1 246022229-246022829      *
#>   [2]     chr1 246051998-246052598      *
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths