R/processStudy.R
computeKNNRefSample.Rd
The function runs k-nearest neighbors analysis on a one specific profile. The function uses the 'knn' package.
a list
with 3 entries:
'sample.id', 'eigenvector.ref' and 'eigenvector'. The list
represents
the PCA done on the 1KG reference profiles and one specific profile
projected onto it. The 'sample.id' entry must contain only one identifier
(one profile).
a vector
of character
string
representing the list of possible ancestry assignations. Default:
c("EAS", "EUR", "AFR", "AMR", "SAS")
.
vector
of character
strings representing the
known super population ancestry for the 1KG profiles. The 1KG profile
identifiers are used as names for the vector
.
a character
string representing the name of
the column that will contain the inferred ancestry for the specified
profile. Default: "SuperPop"
.
a vector
of integer
representing the list of
values tested for the K parameter. The K parameter represents the
number of neighbors used in the K-nearest neighbor analysis. If NULL
,
the value seq(2,15,1)
is assigned.
Default: seq(2,15,1)
.
a vector
of integer
representing the list of
values tested for the D parameter. The D parameter represents the
number of dimensions used in the PCA analysis. If NULL
,
the value seq(2, 15, 1)
is assigned.
Default: seq(2, 15, 1)
.
a list
containing 4 entries:
sample.id
a vector
of character
strings
representing the identifier of the profile analysed.
matKNN
a data.frame
containing the super population
inference for the profile for different values of PCA
dimensions D
and k-neighbors values K
. The fourth column title
corresponds to the fieldPopInfAnc
parameter.
The data.frame
contains 4 columns:
sample.id
a character
string representing
the identifier of the profile analysed.
D
a numeric
strings representing
the value of the PCA dimension used to infer the ancestry.
K
a numeric
strings representing
the value of the k-neighbors used to infer the ancestry..
fieldPopInfAnc
a character
string representing
the inferred ancestry.
## Load the demo PCA on the synthetic profiles projected on the
## demo 1KG reference PCA
data(demoPCASyntheticProfiles)
## Load the known ancestry for the demo 1KG reference profiles
data(demoKnownSuperPop1KG)
## The PCA with 1 profile projected on the 1KG reference PCA
## Only one profile is retained
pca <- demoPCASyntheticProfiles
pca$sample.id <- pca$sample.id[1]
pca$eigenvector <- pca$eigenvector[1, , drop=FALSE]
## Projects profile on 1KG PCA
results <- computeKNNRefSample(listEigenvector=pca,
listCatPop=c("EAS", "EUR", "AFR", "AMR", "SAS"),
spRef=demoKnownSuperPop1KG, fieldPopInfAnc="SuperPop",
kList=seq(10, 15, 1), pcaList=seq(10, 15, 1))
## The assigned ancestry to the profile for different values of K and D
head(results$matKNN)
#> sample.id D K SuperPop
#> 1 1.ex1.HG00246.1 10 10 SAS
#> 2 1.ex1.HG00246.1 10 11 SAS
#> 3 1.ex1.HG00246.1 10 12 SAS
#> 4 1.ex1.HG00246.1 10 13 SAS
#> 5 1.ex1.HG00246.1 10 14 SAS
#> 6 1.ex1.HG00246.1 10 15 EAS