R/synthetic.R
computeSyntheticROC.Rd
The function calculates the AUROC of the inferences for specific values of D and K using the inferred ancestry results from the synthetic profiles. The calculations are done on each super-population separately as well as on all the results together.
computeSyntheticROC(
matKNN,
matKNNAncestryColumn,
pedCall,
pedCallAncestryColumn,
listCall = c("EAS", "EUR", "AFR", "AMR", "SAS")
)
a data.frame
containing the inferred ancestry results
for fixed values of D and K. On of the column names of the
data.frame
must correspond to the matKNNAncestryColumn
argument.
a character
string
representing the
name of the column that contains the inferred ancestry for the specified
synthetic profiles. The column must be present in the matKNN
argument.
a data.frame
containing the information about
the super-population information from the 1KG GDS file
for profiles used to generate the synthetic profiles. The data.frame
must contained a column named as the pedCallAncestryColumn
argument.
The row names must correspond to the sample identifiers (mandatory).
a character
string representing the
name of the column that contains the known ancestry for the reference
profiles in the Reference GDS file. The column must be present in
the pedCall
argument.
a vector
of character
strings representing
the list of all possible ancestry assignations.
Default: c("EAS", "EUR", "AFR", "AMR", "SAS")
.
list
containing 3 entries:
matAUROC.All
a data.frame
containing the AUROC for all
the ancestry results.
matAUROC.Call
a data.frame
containing the AUROC
information for each super-population.
listROC.Call
a list
containing the output from the
roc
function for each super-population.
## Loading demo dataset containing pedigree information for synthetic
## profiles and known ancestry of the profiles used to generate the
## synthetic profiles
data(pedSynthetic)
## Loading demo dataset containing the inferred ancestry results
## for the synthetic data
data(matKNNSynthetic)
## The inferred ancestry results for the synthetic data using
## values of D=6 and K=5
matKNN <- matKNNSynthetic[matKNNSynthetic$K == 6 & matKNNSynthetic$D == 5, ]
## Compile statistics from the
## synthetic profiles for fixed values of D and K
results <- RAIDS:::computeSyntheticROC(matKNN=matKNN,
matKNNAncestryColumn="SuperPop",
pedCall=pedSynthetic, pedCallAncestryColumn="superPop",
listCall=c("EAS", "EUR", "AFR", "AMR", "SAS"))
results$matAUROC.All
#> pcaD K ROC.AUC ROC.CI N NBNA
#> 1 5 6 0.6883929 0 52 0
results$matAUROC.Call
#> pcaD K Call L AUC H
#> 1 5 6 EAS 0.5197913 0.6904762 0.8611611
#> 2 5 6 EUR 0.4807257 0.6547619 0.8287981
#> 3 5 6 AFR 0.8168697 0.9154135 1.0000000
#> 4 5 6 AMR 0.4009287 0.5681818 0.7354350
#> 5 5 6 SAS 0.4729463 0.6404762 0.8080061
results$listROC.Call
#> $EAS
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6905
#> 95% CI: 0.5198-0.8612 (DeLong)
#>
#> $EUR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6548
#> 95% CI: 0.4807-0.8288 (DeLong)
#>
#> $AFR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 38 controls (fCur 0) < 14 cases (fCur 1).
#> Area under the curve: 0.9154
#> 95% CI: 0.8169-1 (DeLong)
#>
#> $AMR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 44 controls (fCur 0) < 8 cases (fCur 1).
#> Area under the curve: 0.5682
#> 95% CI: 0.4009-0.7354 (DeLong)
#>
#> $SAS
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6405
#> 95% CI: 0.4729-0.808 (DeLong)
#>