Using the pedigree file from Reference, this function extracts needed information and formats it into a data.frame so in can be used in following steps of the ancestry inference process. The function also requires that the genotyping files associated to each sample be available in a specified directory.

prepPed1KG(filePed, pathGeno = file.path("data", "sampleGeno"), batch = 0L)

Arguments

filePed

a character string representing the path and file name of the pedigree file (PED file) that contains the information related to the profiles present in the Reference GDS file. The PED file must exist.

pathGeno

a character string representing the path where the Reference genotyping files for each profile are located. Only the profiles with associated genotyping files are retained in the creation of the final data.frame. The name of the genotyping files must correspond to the individual identification (Individual.ID) in the pedigree file (PED file). Default: "./data/sampleGeno".

batch

ainteger that uniquely identifies the source of the pedigree information. The Reference is usually 0L. Default: 0L.

Value

a data.frame containing the needed pedigree information from Reference. The data.frame contains those columns:

sample.id

a character string representing the profile unique ID.

Name.ID

a character string representing the profile name.

sex

a character string representing the sex of the profile.

pop.group

a character string representing the sub-continental ancestry of the profile.

superPop

a character string representing the continental ancestry of the profile.

superPop

a integer representing the batch of the profile.

Author

Pascal Belleau, Astrid Deschênes and Alexander Krasnitz

Examples


## Path to the demo pedigree file is located in this package
dataDir <- system.file("extdata", package="RAIDS")

## Path where the demo genotype CSV files are located
pathGeno <- file.path(dataDir, "demoProfileGenotypes")

## Demo pedigree file
pedDemoFile <- file.path(dataDir, "PedigreeDemo.ped")

## Create a data.frame containing the information of the retained
## samples (samples with existing genotyping files)
prepPed1KG(filePed=pedDemoFile, pathGeno=pathGeno, batch=0L)
#>         sample.id Name.ID sex pop.group superPop batch
#> HG00100   HG00100 HG00100   1       ACB      AFR     0
#> HG00101   HG00101 HG00101   2       ACB      AFR     0
#> HG00102   HG00102 HG00102   2       ACB      AFR     0
#> HG00103   HG00103 HG00103   1       ACB      AFR     0
#> HG00104   HG00104 HG00104   2       ACB      AFR     0
#> HG00105   HG00105 HG00105   1       ACB      AFR     0
#> HG00106   HG00106 HG00106   2       ACB      AFR     0
#> HG00107   HG00107 HG00107   1       ACB      AFR     0
#> HG00108   HG00108 HG00108   2       ACB      AFR     0
#> HG00109   HG00109 HG00109   2       ACB      AFR     0