R/process1KG.R
prepPed1KG.Rd
Using the pedigree file from Reference, this function extracts
needed information and formats it into a data.frame
so in can
be used in following steps of the ancestry inference process. The
function also requires that the genotyping files associated to each
sample be available in a specified directory.
prepPed1KG(filePed, pathGeno = file.path("data", "sampleGeno"), batch = 0L)
a character
string representing the path and
file name of the pedigree file (PED file) that contains the information
related to the profiles present in the Reference GDS file. The PED file must
exist.
a character
string representing the path where
the Reference genotyping files for each profile are located. Only the
profiles with associated genotyping files are retained in the creation of
the final data.frame
. The name of the genotyping files must
correspond to the individual identification (Individual.ID) in the
pedigree file (PED file).
Default: "./data/sampleGeno"
.
ainteger
that uniquely identifies the source of the
pedigree information. The Reference is usually 0L
.
Default: 0L
.
a data.frame
containing the needed pedigree information
from Reference. The data.frame
contains those columns:
a character
string representing the profile unique
ID.
a character
string representing the profile name.
a character
string representing the sex of the profile.
a character
string representing the
sub-continental ancestry of the profile.
a character
string representing the continental
ancestry of the profile.
a integer
representing the batch of the profile.
## Path to the demo pedigree file is located in this package
dataDir <- system.file("extdata", package="RAIDS")
## Path where the demo genotype CSV files are located
pathGeno <- file.path(dataDir, "demoProfileGenotypes")
## Demo pedigree file
pedDemoFile <- file.path(dataDir, "PedigreeDemo.ped")
## Create a data.frame containing the information of the retained
## samples (samples with existing genotyping files)
prepPed1KG(filePed=pedDemoFile, pathGeno=pathGeno, batch=0L)
#> sample.id Name.ID sex pop.group superPop batch
#> HG00100 HG00100 HG00100 1 ACB AFR 0
#> HG00101 HG00101 HG00101 2 ACB AFR 0
#> HG00102 HG00102 HG00102 2 ACB AFR 0
#> HG00103 HG00103 HG00103 1 ACB AFR 0
#> HG00104 HG00104 HG00104 2 ACB AFR 0
#> HG00105 HG00105 HG00105 1 ACB AFR 0
#> HG00106 HG00106 HG00106 2 ACB AFR 0
#> HG00107 HG00107 HG00107 1 ACB AFR 0
#> HG00108 HG00108 HG00108 2 ACB AFR 0
#> HG00109 HG00109 HG00109 2 ACB AFR 0