R/processStudy.R
createStudy2GDS1KG.RdThe function uses the information for the Reference GDS file
and the RDS Sample Description file to create the Profile GDS file. One
Profile GDS file is created per profile. One Profile GDS file will be
created for each entry present in the listProfiles parameter.
a character string representing the path to the
directory containing the VCF output of SNP-pileup for each sample. The
SNP-pileup files must be compressed (gz files) and have the name identifiers
of the samples. A sample with "Name.ID" identifier would have an
associated file called
if genoSource is "VCF", then "Name.ID.vcf.gz",
if genoSource is "generic", then "Name.ID.generic.txt.gz"
if genoSource is "snp-pileup", then "Name.ID.txt.gz".
a character string representing the path to the
RDS file that contains the information about the sample to analyse.
The RDS file must
include a data.frame with those mandatory columns: "Name.ID",
"Case.ID", "Sample.Type", "Diagnosis", "Source". All columns must be in
character strings. The data.frame
must contain the information for all the samples passed in the
listSamples parameter. Only filePedRDS or pedStudy
can be defined.
a data.frame with those mandatory columns: "Name.ID",
"Case.ID", "Sample.Type", "Diagnosis", "Source". All columns must be in
character strings (no factor). The data.frame
must contain the information for all the samples passed in the
listSamples parameter. Only filePedRDS or pedStudy
can be defined.
a character string representing the file name of
the Reference GDS file. The file must exist.
a single positive integer representing the current
identifier for the batch. Beware, this field is not stored anymore.
Default: 1.
a data.frame containing the information about the
study associated to the analysed sample(s). The data.frame must have
those 3 columns: "study.id", "study.desc", "study.platform". All columns
must be in character strings (no factor).
a vector of character string corresponding
to the profile identifiers that will have a Profile GDS file created. The
profile identifiers must be present in the "Name.ID" column of the Profile
RDS file passed to the filePedRDS parameter.
If NULL, all profiles present in the filePedRDS are selected.
Default: NULL.
a character string representing the path to
the directory where the Profile GDS files will be created.
Default: NULL.
a character string with two possible values:
'snp-pileup', 'generic' or 'VCF'. It specifies if the genotype files
are generated by snp-pileup (Facets) or are a generic format CSV file
with at least those columns:
'Chromosome', 'Position', 'Ref', 'Alt', 'Count', 'File1R' and 'File1A'.
The 'Count' is the depth at the specified position;
'FileR' is the depth of the reference allele and
'File1A' is the depth of the specific alternative allele.
Finally the file can be a VCF file with at least those genotype
fields: GT, AD, DP.
a logical indicating if message information should be
printed. Default: FALSE.
The function returns 0L when successful.
## Path to the demo 1KG GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")
fileGDS <- file.path(dataDir, "ex1_good_small_1KG.gds")
## The data.frame containing the information about the study
## The 3 mandatory columns: "study.id", "study.desc", "study.platform"
## The entries should be strings, not factors (stringsAsFactors=FALSE)
studyDF <- data.frame(study.id = "MYDATA",
study.desc = "Description",
study.platform = "PLATFORM",
stringsAsFactors = FALSE)
## The data.frame containing the information about the samples
## The entries should be strings, not factors (stringsAsFactors=FALSE)
samplePED <- data.frame(Name.ID=c("ex1", "ex2"),
Case.ID=c("Patient_h11", "Patient_h12"),
Diagnosis=rep("Cancer", 2),
Sample.Type=rep("Primary Tumor", 2),
Source=rep("Databank B", 2), stringsAsFactors=FALSE)
rownames(samplePED) <- samplePED$Name.ID
## Create the Profile GDS File for samples in 'listSamples' vector
## (in this case, samples "ex1")
## The Profile GDS file is created in the pathProfileGDS directory
result <- createStudy2GDS1KG(pathGeno=dataDir,
pedStudy=samplePED, fileNameGDS=fileGDS,
studyDF=studyDF, listProfiles=c("ex1"),
pathProfileGDS=tempdir(),
genoSource="snp-pileup",
verbose=FALSE)
## The function returns OL when successful
result
#> [1] 0
## The Profile GDS file 'ex1.gds' has been created in the
## specified directory
list.files(tempdir())
#> [1] "downlit" "ex1.gds" "file4d2deb02dfc"
## Remove Profile GDS file (created for demo purpose)
unlink(file.path(tempdir(), "ex1.gds"), force=TRUE)