-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hello, I noticed an issue while using kBET with the subsampling of my data, basically I saw differences in the result ok kBET if I declare batch as vector rather than as a factor
I attach here below an example.
Here I declare batch as factor
batch <- setNames(as.factor(metadatad$batch), metadatad$Cell.ID)
batch_tmp <- batch[clusters == "80"] #example with cluster 80
class(batch_tmp)
[1] "factor"
kBET_tmp.factor <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)
Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
user system elapsed
0.045 0.002 0.048
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 32.
kBET_tmp.factor$summary$kBET.observed
[1] NaN NA NA NA
If I declare batch as vector:
batch_vector<-setNames(as.character(batch_tmp),names(batch_tmp))
class(batch_vector)
[1] "character"
kBET_tmp.vector <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)
Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
user system elapsed
0.045 0.003 0.047
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 21.
There were 40 warnings (use warnings() to see them)
warnings()
In full.classes[class.freq$class %in% names(freq.env)] <- freq.env :
number of items to replace is not a multiple of replacement length
kBET_tmp.vector$summary$kBET.observed
[1] 0.01666667 0.00000000 0.01851852 0.05555556
And this is the distribution of each batch in the cluster
table(batch_vector)
A B C D
2 526 7 1
So I was wondering if there is a suggested format that batch has to have for a proper execution of the kBET and why I got different results?