R/processStudy.R
computePCARefSample.Rd
This function generates a PCA using the know reference profiles. Them, it projects the specified profile onto the PCA axes.
computePCARefSample(
gdsProfile,
currentProfile,
studyIDRef = "Ref.1KG",
np = 1L,
algorithm = c("exact", "randomized"),
eigenCount = 32L,
missingRate = NaN,
verbose = FALSE
)
an object of class gds.class, an opened Profile GDS file.
a single character
string representing
the profile identifier.
a single character
string representing the
study identifier.
a single positive integer
representing the number of CPU
that will be used. Default: 1L
.
a character
string representing the algorithm used
to calculate the PCA. The 2 choices are "exact" (traditional exact
calculation) and "randomized" (fast PCA with randomized algorithm
introduced in Galinsky et al. 2016). Default: "exact"
.
a single integer
indicating the number of
eigenvectors that will be in the output of the snpgdsPCA
function; if 'eigen.cnt' <= 0, then all eigenvectors are returned.
Default: 32L
.
a numeric
value representing the threshold
missing rate at with the SNVs are discarded; the SNVs are retained in the
snpgdsPCA
with "<= missingRate" only; if NaN
, no missing threshold.
Default: NaN
.
a logical
indicating if messages should be printed
to show how the different steps in the function. Default: FALSE
.
a list
containing 3 entries:
sample.id
a character
string representing the unique
identifier of the analyzed profile.
eigenvector.ref
a matrix
of numeric
representing the eigenvectors of the reference profiles.
eigenvector
a matrix
of numeric
representing
the eigenvectors of the analyzed profile.
Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, Price AL. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet. 2016 Mar 3;98(3):456-72. doi: 10.1016/j.ajhg.2015.12.022. Epub 2016 Feb 25.
## Required library
library(gdsfmt)
## Path to the demo Profile GDS file is located in this package
dataDir <- system.file("extdata/demoAncestryCall", package="RAIDS")
## Open the Profile GDS file
gdsProfile <- snpgdsOpen(file.path(dataDir, "ex1.gds"))
## Project a profile onto a PCA generated using reference profiles
## The reference profiles come from 1KG
resPCA <- computePCARefSample(gdsProfile=gdsProfile,
currentProfile=c("ex1"), studyIDRef="Ref.1KG", np=1L, verbose=FALSE)
resPCA$sample.id
#> [1] "ex1"
resPCA$eigenvector
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> ex1 -0.03917926 0.0290796 -0.1861643 -0.05760641 -0.01053691 -0.08274071
#> [,7] [,8] [,9] [,10] [,11] [,12]
#> ex1 0.0777924 -0.2437205 -0.008855972 0.2156765 -0.1139829 -0.08007963
#> [,13] [,14] [,15] [,16] [,17] [,18] [,19]
#> ex1 -0.1452985 0.233155 0.5753156 -0.1938115 0.504467 -0.8293339 0.5437238
#> [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#> ex1 -0.1480745 0.03492421 -0.2146903 0.1610501 -0.3487348 -0.2806519 0.4095053
#> [,27] [,28] [,29] [,30] [,31] [,32]
#> ex1 -0.1480394 -1.001517 0.2316207 -0.3235428 -0.3843232 -0.3291498
## Close the GDS files (important)
closefn.gds(gdsProfile)