The function extracts the information for the profiles associated to a specific study in the GDS Sample file. The information is extracted from the 'study.annot' node as a 'data.frame'.

Then, the function used the 1KG GDS file to extract specific information about each sample and add it, as an extra column, to the 'data.frame'.

As example, this function can extract the synthetic profiles for a GDS Sample and the super-population of the 1KG samples used to generate each synthetic profile would be added as an extra column to the final 'data.frame'.

prepPedSynthetic1KG(gdsReference, gdsSample, studyID, popName)

Arguments

gdsReference

an object of class gdsfmt::gds.class, the opened 1 KG GDS file.

gdsSample

an object of class gdsfmt::gds.class, the opened Profile GDS file.

studyID

a character string representing the name of the study that will be extracted from the GDS Sample 'study.annot' node.

popName

a character string representing the name of the column from the data.frame stored in the 'sample.annot' node of the 1KG GDS file. The column must be present in the data.frame.

Value

data.frame containing the columns extracted from the GDS Sample 'study.annot' node with a extra column named as the 'popName' parameter that has been extracted from the 1KG GDS 'sample.annot' node. Only the rows corresponding to the specified study ('studyID' parameter) are returned.

Details

As example, this function can extract the synthetic profiles for a Profile GDS and the super-population of the 1KG samples used to generate each synthetic profile would be added as an extra column to the final 'data.frame'. In that situation, the 'popName' parameter would correspond to the super-population column and the 'studyID' parameter would be the name given to the synthetic dataset.

Author

Pascal Belleau, Astrid Deschênes and Alexander Krasnitz

Examples


## Required library
library(gdsfmt)

## The open 1KG GDS file is required (this is a demo file)
dataDir <- system.file("extdata", package="RAIDS")
gds_1KG_file <- file.path(dataDir, "PopulationReferenceDemo.gds")
gds1KG <- openfn.gds(gds_1KG_file)

fileSampleGDS <- file.path(dataDir, "GDS_Sample_with_study_demo.gds")
gdsSample <- openfn.gds(fileSampleGDS)

## Extract the study information for "TCGA.Synthetic" study present in the
## Profile GDS file and merge column "superPop" from 1KG GDS to the
## returned data.frame
## This function enables to extract the super-population associated to the
## 1KG samples that has been used to create the synthetic profiles
RAIDS:::prepPedSynthetic1KG(gdsReference=gds1KG, gdsSample=gdsSample,
    studyID="TCGA.Synthetic", popName="superPop")
#>                                   data.id case.id sample.type diagnosis
#> HG00101.Synthetic.01 HG00101.Synthetic.01 HG00101   Synthetic         C
#> HG00101.Synthetic.02 HG00101.Synthetic.02 HG00101   Synthetic         C
#> HG00102.Synthetic.02 HG00102.Synthetic.02 HG00102   Synthetic         C
#> HG00109.Synthetic.02 HG00109.Synthetic.02 HG00109   Synthetic         C
#>                         source       study.id superPop
#> HG00101.Synthetic.01 Synthetic TCGA.Synthetic      SAS
#> HG00101.Synthetic.02 Synthetic TCGA.Synthetic      SAS
#> HG00102.Synthetic.02 Synthetic TCGA.Synthetic      EAS
#> HG00109.Synthetic.02 Synthetic TCGA.Synthetic      AMR

## The GDS files must be closed
gdsfmt::closefn.gds(gds1KG)
gdsfmt::closefn.gds(gdsSample)