The functions uses one cancer profile in combination with one 1KG reference profile to generate an synthetic profile that is saved in the Profile GDS file.

When more than one 1KG reference profiles are specified, the function recursively generates synthetic profiles for each cancer profile + 1KG reference profile combination.

The number of synthetic profiles generated by combination is specified by the number of simulation requested.

syntheticGeno(
  gdsReference,
  gdsRefAnnot,
  fileProfileGDS,
  profileID,
  listSampleRef,
  nbSim = 1L,
  prefix = "",
  pRecomb = 0.01,
  minProb = 0.999,
  seqError = 0.001
)

Arguments

gdsReference

an object of class gds.class (a GDS file), the opened 1KG GDS file.

gdsRefAnnot

an object of class gds.class (a GDS file), the opened 1KG SNV Annotation GDS file.

fileProfileGDS

a character string representing the file name of Profile GDS file containing the information about the sample. The file must exist.

profileID

a character string representing the unique identifier of the cancer profile.

listSampleRef

a vector of character strings representing the sample identifiers of the 1KG selected reference samples.

nbSim

a single positive integer representing the number of simulations that will be generated per sample + 1KG reference combination. Default: 1L.

prefix

a character string that represent the prefix that will be added to the name of the synthetic profiles generated by the function. Default: "".

pRecomb

a single positive numeric between 0 and 1 that represents the frequency of phase switching in the synthetic profiles, Default: 0.01.

minProb

a single positive numeric between 0 and 1 that represents the probability that the genotype is correct. Default: 0.999.

seqError

a single positive numeric between 0 and 1 representing the sequencing error rate. Default: 0.001.

Value

The integer OL when the function is successful.

Author

Pascal Belleau, Astrid Deschênes and Alexander Krasnitz

Examples


## Required library
library(gdsfmt)

## Path to the demo 1KG GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")

## Profile GDS file (temporary)
fileNameGDS <- file.path(tempdir(), "ex1.gds")

## Copy the Profile GDS file demo that has been pruned and annotated
file.copy(file.path(dataDir, "ex1_demo_with_pruning_and_1KG_annot.gds"),
                 fileNameGDS)
#> [1] TRUE

## Information about the synthetic data set
syntheticStudyDF <- data.frame(study.id="MYDATA.Synthetic",
        study.desc="MYDATA synthetic data", study.platform="PLATFORM",
        stringsAsFactors=FALSE)

## Add information related to the synthetic profiles into the Profile GDS
prepSynthetic(fileProfileGDS=fileNameGDS,
        listSampleRef=c("HG00243", "HG00150"), profileID="ex1",
        studyDF=syntheticStudyDF, nbSim=1L, prefix="synthTest",
        verbose=FALSE)
#> [1] 0

## The 1KG files
gds1KG <- snpgdsOpen(file.path(dataDir,
                            "ex1_good_small_1KG.gds"))
gds1KGAnnot <- openfn.gds(file.path(dataDir,
                            "ex1_good_small_1KG_Annot.gds"))

## Generate the synthetic profiles and add them into the Profile GDS
syntheticGeno(gdsReference=gds1KG, gdsRefAnnot=gds1KGAnnot,
        fileProfileGDS=fileNameGDS, profileID="ex1",
        listSampleRef=c("HG00243", "HG00150"), nbSim=1,
        prefix="synthTest",
        pRecomb=0.01, minProb=0.999, seqError=0.001)
#> [1] 0

## Open Profile GDS file
profileGDS <- openfn.gds(fileNameGDS)

tail(read.gdsn(index.gdsn(profileGDS, "sample.id")))
#> [1] "NA20872"                 "NA20906"                
#> [3] "NA20875"                 "ex1"                    
#> [5] "synthTest.ex1.HG00243.1" "synthTest.ex1.HG00150.1"

## Close GDS files (important)
closefn.gds(profileGDS)
closefn.gds(gds1KG)
closefn.gds(gds1KGAnnot)

## Remove Profile GDS file (created for demo purpose)
unlink(fileNameGDS, force=TRUE)