R/synthetic_internal.R
prepPedSynthetic1KG.Rd
The function extracts the information for the profiles associated to a specific study in the GDS Sample file. The information is extracted from the 'study.annot' node as a 'data.frame'.
Then, the function used the 1KG GDS file to extract specific information about each sample and add it, as an extra column, to the 'data.frame'.
As example, this function can extract the synthetic profiles for a GDS Sample and the super-population of the 1KG samples used to generate each synthetic profile would be added as an extra column to the final 'data.frame'.
prepPedSynthetic1KG(gdsReference, gdsSample, studyID, popName)
an object of class
gdsfmt::gds.class
, the opened 1 KG GDS file.
an object of class
gdsfmt::gds.class
, the opened Profile GDS
file.
a character
string representing the name of the
study that will be extracted from the GDS Sample 'study.annot' node.
a character
string representing the name of the
column from the data.frame
stored in the 'sample.annot' node of the
1KG GDS file. The column must be present in the data.frame
.
data.frame
containing the columns extracted from the
GDS Sample 'study.annot' node with a extra column named as the 'popName'
parameter that has been extracted from the 1KG GDS 'sample.annot' node.
Only the rows corresponding to the specified study ('studyID' parameter)
are returned.
As example, this function can extract the synthetic profiles for a Profile GDS and the super-population of the 1KG samples used to generate each synthetic profile would be added as an extra column to the final 'data.frame'. In that situation, the 'popName' parameter would correspond to the super-population column and the 'studyID' parameter would be the name given to the synthetic dataset.
## Required library
library(gdsfmt)
## The open 1KG GDS file is required (this is a demo file)
dataDir <- system.file("extdata", package="RAIDS")
gds_1KG_file <- file.path(dataDir, "PopulationReferenceDemo.gds")
gds1KG <- openfn.gds(gds_1KG_file)
fileSampleGDS <- file.path(dataDir, "GDS_Sample_with_study_demo.gds")
gdsSample <- openfn.gds(fileSampleGDS)
## Extract the study information for "TCGA.Synthetic" study present in the
## Profile GDS file and merge column "superPop" from 1KG GDS to the
## returned data.frame
## This function enables to extract the super-population associated to the
## 1KG samples that has been used to create the synthetic profiles
RAIDS:::prepPedSynthetic1KG(gdsReference=gds1KG, gdsSample=gdsSample,
studyID="TCGA.Synthetic", popName="superPop")
#> data.id case.id sample.type diagnosis
#> HG00101.Synthetic.01 HG00101.Synthetic.01 HG00101 Synthetic C
#> HG00101.Synthetic.02 HG00101.Synthetic.02 HG00101 Synthetic C
#> HG00102.Synthetic.02 HG00102.Synthetic.02 HG00102 Synthetic C
#> HG00109.Synthetic.02 HG00109.Synthetic.02 HG00109 Synthetic C
#> source study.id superPop
#> HG00101.Synthetic.01 Synthetic TCGA.Synthetic SAS
#> HG00101.Synthetic.02 Synthetic TCGA.Synthetic SAS
#> HG00102.Synthetic.02 Synthetic TCGA.Synthetic EAS
#> HG00109.Synthetic.02 Synthetic TCGA.Synthetic AMR
## The GDS files must be closed
gdsfmt::closefn.gds(gds1KG)
gdsfmt::closefn.gds(gdsSample)