R/synthetic.R
select1KGPopForSynthetic.Rd
The function randomly selects a fixed number of reference for each subcontinental population present in the 1KG GDS file. When a subcontinental population has less samples than the fixed number, all samples from the subcontinental population are selected.
select1KGPopForSynthetic(fileReferenceGDS, nbProfiles)
a character
string representing the file
name of the Reference GDS file. The file must exist.
a single positive integer
representing the number
of samples that will be selected for each subcontinental population present
in the 1KG GDS file. If the number of samples in a specific subcontinental
population is smaller than the nbProfiles
, the number of samples
selected in this
subcontinental population will correspond to the size of this population.
a data.frame
containing those columns:
a character
string representing the sample
identifier.
a character
string representing the
subcontinental population assigned to the sample.
a character
string representing the
super-population assigned to the sample.
## Required library
library(gdsfmt)
## The number of samples needed by subcontinental population
## The number is small for demonstration purpose
nbProfiles <- 5L
## 1KG GDS Demo file
## This file only one superpopulation (for demonstration purpose)
dataDir <- system.file("extdata", package="RAIDS")
fileGDS <- file.path(dataDir, "PopulationReferenceDemo.gds")
## Extract a selected number of random samples
## for each subcontinental population
## In the 1KG GDS Demo file, there is one subcontinental population
dataR <- select1KGPopForSynthetic(fileReferenceGDS=fileGDS, nbProfiles=nbProfiles)