R/synthetic.R
select1KGPop.Rd
The function randomly selects a fixed number of reference for each subcontinental population present in the 1KG GDS file. When a subcontinental population has less samples than the fixed number, all samples from the subcontinental population are selected.
select1KGPop(gdsReference, nbProfiles)
an object of class gds.class (a GDS file), the opened 1KG GDS file.
a single positive integer
representing the number
of samples that will be selected for each subcontinental population present
in the 1KG GDS file. If the number of samples in a specific subcontinental
population is smaller than the nbProfiles
, the number of samples
selected in this
subcontinental population will correspond to the size of this population.
a data.frame
containing those columns:
a character
string representing the sample
identifier.
a character
string representing the
subcontinental population assigned to the sample.
a character
string representing the
super-population assigned to the sample.
## Required library
library(gdsfmt)
## The number of samples needed by subcontinental population
## The number is small for demonstration purpose
nbProfiles <- 5L
## Open 1KG GDS Demo file
## This file only one superpopulation (for demonstration purpose)
dataDir <- system.file("extdata", package="RAIDS")
fileGDS <- file.path(dataDir, "PopulationReferenceDemo.gds")
gdsFileOpen <- openfn.gds(fileGDS, readonly=TRUE)
## Extract a selected number of random samples
## for each subcontinental population
## In the 1KG GDS Demo file, there is one subcontinental population
dataR <- select1KGPop(gdsReference=gdsFileOpen, nbProfiles=nbProfiles)
## Close the 1KG GDS Demo file (important)
closefn.gds(gdsFileOpen)