R/synthetic.R
prepSynthetic.Rd
This function add entries related to synthetic profiles into a Profile GDS file. The entries are related to two types of information: the synthetic study and the synthetic profiles.
The study information is appended to the Profile GDS file "study.list" node. The "study.platform" entry is always set to 'Synthetic'.
The profile information, for all selected synthetic profiles, is appended to the Profile GDS file "study.annot" node. Both the "Source" and the "Sample.Type" entries are always set to 'Synthetic'.
The synthetic profiles are assigned unique names by combining:
prefix
.data.id.profile
.listSampleRef
.simulation
number(1 to nbSim)
prepSynthetic(
fileProfileGDS,
listSampleRef,
profileID,
studyDF,
nbSim = 1L,
prefix = "",
verbose = FALSE
)
a character
string representing the file name
of the Profile GDS file containing the information about the reference
profiles used to generate the synthetic profiles.
a vector
of character
string
representing the
identifiers of the selected 1KG profiles that will be used as reference to
generate the synthetic profiles.
a character
string representing the profile
identifier present in the fileProfileGDS
that will be used to
generate synthetic profiles.
a data.frame
containing the information about the
study associated to the analysed sample(s). The data.frame
must have
those 2 columns: "study.id" and "study.desc". Those 2 columns
must be in character
strings (no factor). Other columns can be
present, such as "study.platform", but won't be used.
a single positive integer
representing the number of
simulations per combination of sample and 1KG reference. Default: 1L
.
a single character
string representing the prefix that
is going to be added to the name of the synthetic profile. The prefix
enables the creation of multiple synthetic profile using the same
combination of sample and 1KG reference. Default: ""
.
a logical
indicating if messages should be printed
to show how the different steps in the function. Default: FALSE
.
0L
when successful.
## Required library
library(gdsfmt)
## Path to the demo 1KG GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")
## Temporary Profile GDS file
fileNameGDS <- file.path(tempdir(), "ex1.gds")
## Copy the Profile GDS file demo that has been pruned and annotated
file.copy(file.path(dataDir, "ex1_demo_with_pruning_and_1KG_annot.gds"),
fileNameGDS)
#> [1] TRUE
## Information about the synthetic data set
syntheticStudyDF <- data.frame(study.id="MYDATA.Synthetic",
study.desc="MYDATA synthetic data", study.platform="PLATFORM",
stringsAsFactors=FALSE)
## Add information related to the synthetic profiles into the Profile GDS
prepSynthetic(fileProfileGDS=fileNameGDS,
listSampleRef=c("HG00243", "HG00150"), profileID="ex1",
studyDF=syntheticStudyDF, nbSim=1L, prefix="synthetic",
verbose=FALSE)
#> [1] 0
## Open Profile GDS file
profileGDS <- openfn.gds(fileNameGDS)
## The synthetic profiles should be added in the 'study.annot' entry
tail(read.gdsn(index.gdsn(profileGDS, "study.annot")))
#> data.id case.id sample.type diagnosis source
#> 154 NA20908 NA20908 Reference Reference IGSR
#> 155 NA20872 NA20872 Reference Reference IGSR
#> 156 NA20906 NA20906 Reference Reference IGSR
#> 157 NA20875 NA20875 Reference Reference IGSR
#> 158 synthetic.ex1.HG00243.1 HG00243 Synthetic Cancer Synthetic
#> 159 synthetic.ex1.HG00150.1 HG00150 Synthetic Cancer Synthetic
#> study.id
#> 154 Ref.1KG
#> 155 Ref.1KG
#> 156 Ref.1KG
#> 157 Ref.1KG
#> 158 MYDATA.Synthetic
#> 159 MYDATA.Synthetic
## The synthetic study information should be added to
## the 'study.list' entry
tail(read.gdsn(index.gdsn(profileGDS, "study.list")))
#> study.id study.desc study.platform
#> 1 MYDATA Description PLATFORM
#> 2 Ref.1KG Unrelated samples from 1000 Genomes GRCh38 1000 genotypes
#> 3 MYDATA.Synthetic MYDATA synthetic data Synthetic
## Close GDS file (important)
closefn.gds(profileGDS)
## Remove Profile GDS file (created for demo purpose)
unlink(fileNameGDS, force=TRUE)