Skip to content

Format of batch input : vector or factor ? #83

@martina811

Description

@martina811

Hello, I noticed an issue while using kBET with the subsampling of my data, basically I saw differences in the result ok kBET if I declare batch as vector rather than as a factor
I attach here below an example.

Here I declare batch as factor

batch <- setNames(as.factor(metadatad$batch), metadatad$Cell.ID)
batch_tmp <- batch[clusters == "80"] #example with cluster 80
class(batch_tmp)

[1] "factor"

kBET_tmp.factor <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)

Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
user system elapsed
0.045 0.002 0.048
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 32.

kBET_tmp.factor$summary$kBET.observed

[1] NaN NA NA NA

If I declare batch as vector:

batch_vector<-setNames(as.character(batch_tmp),names(batch_tmp))
class(batch_vector)

[1] "character"

kBET_tmp.vector <- kBET(df=data_tmp, batch=batch_tmp, plot=FALSE, verbose=TRUE)

Initial neighbourhood size is set to 100.
reducing dimensions with svd first...
finding knns...done. Time:
user system elapsed
0.045 0.003 0.047
KNN input is a list, extracting nearest neighbour index.
Number of kBET tests is set to 54.
There are 62 cells (11.567%) that do not appear in any neighbourhood.
The expected frequencies for each category have been adapted.
Cell indexes are saved to result list.
Determining optimal neighbourhood size ...done.
New size of neighbourhood is set to 21.
There were 40 warnings (use warnings() to see them)

warnings()
In full.classes[class.freq$class %in% names(freq.env)] <- freq.env :
number of items to replace is not a multiple of replacement length

kBET_tmp.vector$summary$kBET.observed

[1] 0.01666667 0.00000000 0.01851852 0.05555556

And this is the distribution of each batch in the cluster

table(batch_vector)
A B C D
2 526 7 1

So I was wondering if there is a suggested format that batch has to have for a proper execution of the kBET and why I got different results?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions