The function randomly selects a fixed number of reference for each subcontinental population present in the 1KG GDS file. When a subcontinental population has less samples than the fixed number, all samples from the subcontinental population are selected.

select1KGPopForSynthetic(fileReferenceGDS, nbProfiles)

Arguments

fileReferenceGDS

a character string representing the file name of the Reference GDS file. The file must exist.

nbProfiles

a single positive integer representing the number of samples that will be selected for each subcontinental population present in the 1KG GDS file. If the number of samples in a specific subcontinental population is smaller than the nbProfiles, the number of samples selected in this subcontinental population will correspond to the size of this population.

Value

a data.frame containing those columns:

sample.id

a character string representing the sample identifier.

pop.group

a character string representing the subcontinental population assigned to the sample.

superPop

a character string representing the super-population assigned to the sample.

Author

Pascal Belleau, Astrid Deschênes and Alexander Krasnitz

Examples


## Required library
library(gdsfmt)

## The number of samples needed by subcontinental population
## The number is small for demonstration purpose
nbProfiles <- 5L

## 1KG GDS Demo file
## This file only one superpopulation (for demonstration purpose)
dataDir <- system.file("extdata", package="RAIDS")
fileGDS <- file.path(dataDir, "PopulationReferenceDemo.gds")

## Extract a selected number of random samples
## for each subcontinental population
## In the 1KG GDS Demo file, there is one subcontinental population
dataR <- select1KGPopForSynthetic(fileReferenceGDS=fileGDS, nbProfiles=nbProfiles)