Format of batch input : vector or factor ?

Hello, I noticed an issue while using kBET with the subsampling of my data, basically I saw differences in the result ok kBET if I declare batch as vector rather than as a factor
I attach here below an example.


Here **I declare batch as factor** 

>batch <- setNames(as.factor(metadatad$batch), metadatad$Cell.ID) 
>batch_tmp <- batch[clusters == "80"] #example with cluster 80
>class(batch_tmp)

[1] "factor"

>kBET_tmp.factor <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)

Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
   user  system elapsed 
  0.045   0.002   0.048 
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 32.

> kBET_tmp.factor$summary$kBET.observed

[1] NaN  NA  NA  NA



**If I declare batch as vector:** 

>batch_vector<-setNames(as.character(batch_tmp),names(batch_tmp))
>class(batch_vector)

[1] "character"

>kBET_tmp.vector <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)

Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
   user  system elapsed 
  0.045   0.003   0.047 
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 21.
_There were 40 warnings (use warnings() to see them)_

>warnings()
In full.classes[class.freq$class %in% names(freq.env)] <- freq.env :
  number of items to replace is not a multiple of replacement length

>kBET_tmp.vector$summary$kBET.observed

[1] 0.01666667 0.00000000 0.01851852 0.05555556

And this is the distribution of each batch in the cluster

table(batch_vector)
     A   B   C D 
     2   526  7 1
     
     
So I was wondering if there is a suggested format that batch has to have for a proper execution of the kBET and why I got different results?






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format of batch input : vector or factor ? #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Format of batch input : vector or factor ? #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions