R/findConsensusPeakRegions.R
findConsensusPeakRegions.Rd
Find regions sharing the same features for a minimum number of experiments using called peaks of signal enrichment based on pooled, normalized data (mainly coming from narrowPeak files). The peaks and narrow peaks are used to identify the consensus regions. The minimum number of experiments that must have at least on peak in a region so that it is retained as a consensus region is specified by user, as well as the size of mining regions. Only the chromosomes specified by the user are treated. The function can be parallized by specifying a number of threads superior to 1.
When the padding is small, the detected regions are smaller than the one that could be obtained by doing an overlap of the narrow regions. Even more, the parameter specifying the minimum number of experiments needed to retain a region add versatility to the function.
Beware that the side of the padding can have a large effect on the detected consensus regions. It is recommanded to test more than one size and to do some manual validation of the resulting consensus regions before selecting the final padding size.
findConsensusPeakRegions(
narrowPeaks,
peaks,
chrInfo,
extendingSize = 250,
expandToFitPeakRegion = FALSE,
shrinkToFitPeakRegion = FALSE,
minNbrExp = 1L,
nbrThreads = 1L
)
a GRanges
containing
called peak regions of signal enrichment based on pooled, normalized data
for all analyzed experiments. All GRanges
entries must
have a metadata field called "name" which identifies the region to
the called peak. All GRanges
entries must also
have a row name which identifies the experiment of origin. Each
peaks
entry must have an associated narrowPeaks
entry.
A GRanges
entry is associated to a narrowPeaks
entry by
having a identical metadata "name" field and a identical row name.
a GRanges
containing called peaks of signal
enrichment based on pooled, normalized data
for all analyzed experiments. All GRanges
entries must
have a metadata field called "name" which identifies the called
peak. All GRanges
entries must
have a row name which identifies the experiment of origin. Each
peaks
entry must have an associated narrowPeaks
entry. A
GRanges
entry is associated to a narrowPeaks
entry by having
a identical metadata "name" field and a identical row name.
a Seqinfo
containing the name and the length of the
chromosomes to analyze. Only the chomosomes contained in this
Seqinfo
will be analyzed.
a numeric
value indicating the size of padding
on both sides of the position of the peaks median to create the
consensus region. The minimum size of the consensus region is
equal to twice the value of the extendingSize
parameter.
The size of the extendingSize
must be a positive integer.
Default = 250.
a logical
indicating if the region size,
which is set by the extendingSize
parameter is extended to include
the entire narrow peak regions of all peaks included in the unextended
consensus region. The narrow peak regions of the peaks added because of the
extension are not considered for the extension. Default: FALSE
.
a logical
indicating if the region size,
which is set by the extendingSize
parameter is shrinked to
fit the narrow peak regions of the peaks when all those regions
are smaller than the consensus region. Default: FALSE
.
a positive numeric
or a positive integer
indicating the minimum number of experiments in which at least one peak
must be present for a potential consensus region. The numeric must be a
positive integer inferior or equal to the number of experiments present
in the narrowPeaks
and peaks
parameters. Default = 1.
a numeric
or a integer
indicating the
number of threads to use in parallel. The nbrThreads
must be a
positive integer. Default = 1.
an list
of class
"consensusRanges" containing :
call
the matched call.
consensusRanges
a GRanges
containing the
consensus regions.
## Loading datasets
data(A549_CTCF_MYN_NarrowPeaks_partial)
data(A549_CTCF_MYN_Peaks_partial)
data(A549_CTCF_MYJ_NarrowPeaks_partial)
data(A549_CTCF_MYJ_Peaks_partial)
## Assigning experiment name "CTCF_MYJ" to first experiment
names(A549_CTCF_MYJ_NarrowPeaks_partial) <- rep("CTCF_MYJ",
length(A549_CTCF_MYJ_NarrowPeaks_partial))
names(A549_CTCF_MYJ_Peaks_partial) <- rep("CTCF_MYJ",
length(A549_CTCF_MYJ_Peaks_partial))
## Assigning experiment name "CTCF_MYN" to second experiment
names(A549_CTCF_MYN_NarrowPeaks_partial) <- rep("CTCF_MYN",
length(A549_CTCF_MYN_NarrowPeaks_partial))
names(A549_CTCF_MYN_Peaks_partial) <- rep("CTCF_MYN",
length(A549_CTCF_MYN_Peaks_partial))
## Only choromsome 1 is going to be analysed
chrList <- Seqinfo("chr1", 249250621, NA)
## Find consensus regions with both experiments
results <- findConsensusPeakRegions(
narrowPeaks = c(A549_CTCF_MYJ_NarrowPeaks_partial,
A549_CTCF_MYN_NarrowPeaks_partial),
peaks = c(A549_CTCF_MYJ_Peaks_partial,
A549_CTCF_MYN_Peaks_partial),
chrInfo = chrList,
extendingSize = 300,
expandToFitPeakRegion = TRUE,
shrinkToFitPeakRegion = FALSE,
minNbrExp = 2,
nbrThreads = 1)
## Print 2 first consensus regions
head(results$consensusRanges, 2)
#> GRanges object with 2 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 246022229-246022829 *
#> [2] chr1 246051998-246052598 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths