Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the numerical generative process to calibrate the model #14

Open
CastielZhao opened this issue Jun 4, 2021 · 11 comments
Open

Use the numerical generative process to calibrate the model #14

CastielZhao opened this issue Jun 4, 2021 · 11 comments

Comments

@CastielZhao
Copy link

Does the false positive rate we claim (e.g. 0.05) correspond to 5% of false positives given our no-association, no-outlier simulated data?

Calibration:

@stemangiola
Copy link
Collaborator

stemangiola commented Jun 4, 2021

Calibrate inference of associations

  • Generate 100 datasets with the same total counts per subject (M size vector, where M is the number of subjects), for each dataset
  • Number of subjects 30, number of categories 20
  • Design matrix would have an intercept column and a factor of interest between -1 and 1
  • Setup coefficient to have same intercept (for simplicity), and zero slope
  • Generate the data
  • Execute sccomp (visit homepage of this repository)
    • FOR INSTALLATION DO: devtools::install_github("stemangiola/sccomp")
    • library(sccomp)
    • Follow the readme
  • Count how many categories were labelled as significantly changing (by default we are using the 95% credible interval. Which means that we expect 5% of calls to be false)

@CastielZhao
Copy link
Author

CastielZhao commented Jun 7, 2021

"Setup coefficient to have same intercept (for simplicity), and zero slope"
Are there any other constraints on coefficient? i.e. integer ? Range ?
Also, I assume that "zero slope" means coeff=(beta0,beta0,...,beta0; beta1,beta1,...,beta1); that the first column repeats 20 times.

@stemangiola
Copy link
Collaborator

"Setup coefficient to have same intercept (for simplicity), and zero slope"
Are there any other constraints on coefficient? i.e. integer ? Range ?

Execute the code at the homepage of this repository and you will see what coefficients you get for a real dataset. You can get the range from those (except the intercept that should be zero for this test)

@stemangiola
Copy link
Collaborator

About integer or not, it is exactly the same. When you do matrix multiplication between design and coefficient is the same.

@CastielZhao
Copy link
Author

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

@stemangiola
Copy link
Collaborator

Hi Stefano,

I have successfully created 100 data frames from my function. To detect the change, do I need to use sccomp library? Or I shall find out a way to do that ?

Yes, run sccomp on your data set. See example dataset from github README. Start from a few and try to draw descriptive statistics.

@CastielZhao
Copy link
Author

which function in the sccomp is used for detecting variation ?

@CastielZhao
Copy link
Author

As I noticed the fuction: res =
counts_obj %>%
sccomp_glm(
~ type,
sample, cell_group, count,
approximate_posterior_inference = FALSE
)
When analyzing multiple data frames, do I need to merge the data frames, or specifying different data frame by "cell goup " above? Also, type=category, count=count, sample=subject in our dictionary, right?

@stemangiola
Copy link
Collaborator

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

@CastielZhao
Copy link
Author

if you analyse different studies no, you analyse them independently. I don't know what you mean by data frames. Data frame can be anything. Please be more precise.

Also, type=category, count=count, sample=subject in our dictionary, right?

yes

By data frames, I mean the output simulated data frames from my numeric generation process.

@stemangiola
Copy link
Collaborator

one data frame includes M categories and N subjects.

another data frame includes M categories and N subjects.

one subject does constitute a very small dataset that cannot be used for regression, size = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants