How do I do a non-inferiority study #171
Replies: 11 comments
-
Yes, but it is not an explicit feature. You must use the standard output and reframe it for the non-inferiority test. Please refer to Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies." |
Beta Was this translation helpful? Give feedback.
-
Hi, I am working on a non-inferiority test for ROC studies. Since you are a ROC expert at FDA, my advisor Dr. Mia Markey asked me to come here to ask for your suggestions. I applied two methods on the same cases and want to compare two paired ROC curves. I found a R package called "rocNIT" (https://cran.rstudio.com/web/packages/rocNIT/) to run non-inferiority tests that are referenced from Jen-Pei Liu et al. Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves. STATISTICS IN MEDICINE. DOI: 10.1002/sim.2358. We wonder, are you aware of the "rocNIT" package? Also, could you please give some suggestions on how to determine the margin when comparing two ROC curves? For example, may we define the difference in areas under curves of 0.05 or 0.1 as the margin for non-inferiority tests? |
Beta Was this translation helpful? Give feedback.
-
Hi Yao (I hope I got your name right), First, I do not know the rocNIT package. Thanks for pointing it out to us. I hope we can learn about it some day. Unfortunately, we have other work on our plate right now. Have you looked at the paper mentioned above? Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies." The concepts are not too deep, you should be able to use the iMRMC output to do your noninferiority test. If you do not have MRMC data (multiple readers), you will have to trick the iMRMC software by duplicating your single reader data in each modality (giving the duplicates different names). There won't be any reader variability if you duplicate exactly. It should produce the AUC variance estimate for a single reader as given in Eq. A.25 of Gallas2006_Acad-Radiol_v13p353. Chen's paper and the one you reference should guide you well. If there are any inconsistencies between the two methods, let us know. If you need more help, we'll. |
Beta Was this translation helpful? Give feedback.
-
Regarding the non-inferiority margin ... There is no right answer. It should be motivated by clinical decisions, but that is tough for AUC and reader studies. A margin of 0.10 sounds big enough to drive a truck through it. A margin of 0.05 is reasonable, in my opinion. You might want to reach out to your "audience" or your funding source or at least discuss with your collaborators. |
Beta Was this translation helpful? Give feedback.
-
Hi, Yao & Brandon,
Thank you for sharing an interesting discussion. I know I am not at the
position to suddenly intrude like this, but I'd like to share my own humble
experience with you. I do have experience in using AUC and non-inferiority
in clinical research.
First of all, AUC is not a parameter that practitioners are really
interested in. I feel it merely exists in an academic domain. Simply (and
rudely :) ) stating, AUC is a parameter that researchers somehow created
just for convenience only to SUMMARIZE BOTH sensitivity and specificity. A
non-inferiority margin has to be set a CLINICALLY unimportant threshold
considering another benefit(s) of a new treatment. If they do not use AUC
in real-world practice, it is unmeaningful to set a non-inferiority margin
in terms of AUC.
You could state "with all other conditions including specificity (or
sensitivity) are equal (or virtually equal), I would accept 10
percentage-point defect in sensitivity (or specificity) as the defect can
be regarded as clinically unimportant". (My group has been using this
approach, even though I am not fully satisfied. If you want further detail,
you can contact Park JH, pjihoon79@gmail.com, my colleague). On the
contrary, as AUC is a mixture of both sensitivity and specificity, the same
statement in terms of AUC instead of sensitivity (or specificity) may not
sound reasonable.
My ultimate recommendation to Yao is to not to use non-inferiority in terms
of AUC. Importantly, a non-inferiority margin has to be defined a priori.
If your study is a retrospective one (this is merely my conjecture as an
MRMC study is typically retrospective), then it is never reasonable to
set a non-inferiority hypothesis.
Otherwise, more ideally, you could identify any important clinical
consequence of false-positive diagnosis or false-negative diagnosis to set
it as the primary endpoint and then to set a non-inferiority margin that
can be accepted in real-world practice. But again, this can be possible
only in a prospective study, perhaps only in an RCT.
BTW, I'd like to appreciate Brandon's consistency in developing iMRMC. Your
work is truly a stellar contribution in all diagnostic research field, not
limited to radiology field.
Best,
Kyoung Ho Lee
…On Tue, Mar 5, 2019 at 6:21 PM Brandon Gallas ***@***.***> wrote:
Regarding the non-inferiority margin ... There is no right answer. It
should be motivated by clinical decisions, but that is tough for AUC and
reader studies. A margin of 0.10 sounds big enough to drive a truck through
it. A margin of 0.05 is reasonable, in my opinion. You might want to reach
out to your "audience" or your funding source or at least discuss with your
collaborators.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#31 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AOn7sSWQ6nM-b0vveWO2qrhvi63DnRIEks5vTqexgaJpZM4St3tZ>
.
|
Beta Was this translation helpful? Give feedback.
-
Hi Brandon & Kyoung, I really appreciate your replies! Yes, I had a look at the paper Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies." This paper actually cited the paper I mentioned as Reference 13: Jen-Pei Liu et al. "Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves". STATISTICS IN MEDICINE. DOI: 10.1002/sim.2358. Regarding the non-inferiority margin, I will need to further discuss with our collaborators in the clinic. Kyoung's recommendation is not to use non-inferiority in terms of AUC. At the same time, I would also like to contact Park JH at pjihoon79@gmail.com to learn more about testing on sensitivity and specificity. Thank you very much for your help! Sincerely, |
Beta Was this translation helpful? Give feedback.
-
Hi Brandon and all, Thanks for this thread, providing some additional information on how to translate from superiority to a noninferiority setting. We also want to test for noninferiority by comparing radiologists (reading cases within a 4x4 split plot design) against stand-alone AI (reading all case), but have some difficulty calculating a P-value for this test within, and using the output of, the iMRMC tool. Based on the paper of Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies.", we understand how to interpret the average AUCs and the CIs of the two modalities and how to conclude noninferiority. E.g. noninferiority can be concluded if the AUC difference AI − radiologists is greater than 0 and the lower limit of the 95% confidence interval of the difference was greater than the negative value of the noninferiority margin (e.g. −0.05). However, we would also like to acquire a p-value. Again, in the work of Chen et al. its provided that for a noninferiority test, P = 2(1-F(t; df0|H0)), where F (t; df0|H0) is the cumulative distribution function of the test statistic t under the null hypothesis H0, which is a Student's t distribution with df0 degrees of freedom. df0 can be estimated by applying Hillis's method (equation 3), which uses various variance components from the OR method. This is where we get stuck. Within our analysis (utilizing a non-fully crossed design) we are only provided variance components in BCK and BDG format, (and not in OR which is used within the paper of Chen et al.) and therefore, it seems like we can not calculate the df0 following equation 3 from the paper (which uses OR parameters as input). Would there be any way to overcome this issue? For example, is there a way to do a conversion from BCK to OR parameters? Or is there another way to determine the cumulative distribution function of the test statistic t under the null hypothesis H0? Or would it maybe be better to start working in R and not use the JAVA tool when it comes to more extensive analysis (not sure whether this is the case)? Thanks in advance. If more information is needed to help us out, please let us know. Best, Jasper |
Beta Was this translation helpful? Give feedback.
-
I see this but haven't been able to get to it. My quick feedback is to draw a picture of the null and alternative hypotheses. The analysis of a non-inferiority study is generally the same as a superiority study except the distributions are shifted to lower levels of performance. I think the answer is to simply shift the iMRMC calculations (means, variances, test threshold, integral to calculate p-value, confidence interval). Brandon |
Beta Was this translation helpful? Give feedback.
-
Hi Brandon and All, you wrote:
When duplicating the data with a second reader, I get the following Warning message:
I assume that this is to be expected, since we duplicated the data? |
Beta Was this translation helpful? Give feedback.
-
@hubtub2, The warnings you received are for an MRMC analysis. The trick I provided is to get you to a single-reader analysis. As such, the warnings are not for your use case. We are getting close to sharing a script and example for doing the non-inferiority analysis. |
Beta Was this translation helpful? Give feedback.
-
All, I would like to point out that my research assistant @emma-gardecki-FDA created an R document outlining how to do a non-inferiority test with the iMRMC software. I hope this helps. I apologize for the delay. I will be moving this "issue" to the "discussion" area. I will also take this opportunity to give a few sentences on my perspective of AUC versus Sensitivity and Specificity. AUC is very relevant for studies that involve humans, especially studies with multiple humans. This is specifically because AUC averages over the threshold, which can be very different person vs. person, and clinical setting vs. clinical study. It is a primary endpoint for many regulatory submissions of diagnostic imaging. Here are a couple papers that support this opinion:
|
Beta Was this translation helpful? Give feedback.
-
Original issue reported on code.google.com by
Brandon.Gallas
on 25 Mar 2015 at 5:58Beta Was this translation helpful? Give feedback.
All reactions