How do I do a non-inferiority study #171

GoogleCodeExporter · 2015-06-01T16:11:48Z

GoogleCodeExporter
Jun 1, 2015

From face-to-face communication with an end user:
The software is designed with a superiority null-hypothesis test. Is it 
possible to do a non-inferiority test?

Original issue reported on code.google.com by Brandon.Gallas on 25 Mar 2015 at 5:58

brandon-gallas · 2018-03-16T14:23:19Z

brandon-gallas
Mar 16, 2018
Maintainer

Yes, but it is not an explicit feature. You must use the standard output and reframe it for the non-inferiority test. Please refer to Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies."

0 replies

zhangyao1994 · 2019-03-04T20:15:01Z

zhangyao1994
Mar 4, 2019

Hi, I am working on a non-inferiority test for ROC studies. Since you are a ROC expert at FDA, my advisor Dr. Mia Markey asked me to come here to ask for your suggestions.

I applied two methods on the same cases and want to compare two paired ROC curves. I found a R package called "rocNIT" (https://cran.rstudio.com/web/packages/rocNIT/) to run non-inferiority tests that are referenced from Jen-Pei Liu et al. Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves. STATISTICS IN MEDICINE. DOI: 10.1002/sim.2358. We wonder, are you aware of the "rocNIT" package?

Also, could you please give some suggestions on how to determine the margin when comparing two ROC curves? For example, may we define the difference in areas under curves of 0.05 or 0.1 as the margin for non-inferiority tests?

0 replies

brandon-gallas · 2019-03-05T17:13:06Z

brandon-gallas
Mar 5, 2019
Maintainer

Hi Yao (I hope I got your name right),

First, I do not know the rocNIT package. Thanks for pointing it out to us. I hope we can learn about it some day. Unfortunately, we have other work on our plate right now.

Have you looked at the paper mentioned above? Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies." The concepts are not too deep, you should be able to use the iMRMC output to do your noninferiority test. If you do not have MRMC data (multiple readers), you will have to trick the iMRMC software by duplicating your single reader data in each modality (giving the duplicates different names). There won't be any reader variability if you duplicate exactly. It should produce the AUC variance estimate for a single reader as given in Eq. A.25 of Gallas2006_Acad-Radiol_v13p353.

Chen's paper and the one you reference should guide you well. If there are any inconsistencies between the two methods, let us know. If you need more help, we'll.

0 replies

brandon-gallas · 2019-03-05T17:21:53Z

brandon-gallas
Mar 5, 2019
Maintainer

Regarding the non-inferiority margin ... There is no right answer. It should be motivated by clinical decisions, but that is tough for AUC and reader studies. A margin of 0.10 sounds big enough to drive a truck through it. A margin of 0.05 is reasonable, in my opinion. You might want to reach out to your "audience" or your funding source or at least discuss with your collaborators.

0 replies

kholee · 2019-03-05T19:20:51Z

kholee
Mar 5, 2019

Hi, Yao & Brandon, Thank you for sharing an interesting discussion. I know I am not at the position to suddenly intrude like this, but I'd like to share my own humble experience with you. I do have experience in using AUC and non-inferiority in clinical research. First of all, AUC is not a parameter that practitioners are really interested in. I feel it merely exists in an academic domain. Simply (and rudely :) ) stating, AUC is a parameter that researchers somehow created just for convenience only to SUMMARIZE BOTH sensitivity and specificity. A non-inferiority margin has to be set a CLINICALLY unimportant threshold considering another benefit(s) of a new treatment. If they do not use AUC in real-world practice, it is unmeaningful to set a non-inferiority margin in terms of AUC. You could state "with all other conditions including specificity (or sensitivity) are equal (or virtually equal), I would accept 10 percentage-point defect in sensitivity (or specificity) as the defect can be regarded as clinically unimportant". (My group has been using this approach, even though I am not fully satisfied. If you want further detail, you can contact Park JH, pjihoon79@gmail.com, my colleague). On the contrary, as AUC is a mixture of both sensitivity and specificity, the same statement in terms of AUC instead of sensitivity (or specificity) may not sound reasonable. My ultimate recommendation to Yao is to not to use non-inferiority in terms of AUC. Importantly, a non-inferiority margin has to be defined a priori. If your study is a retrospective one (this is merely my conjecture as an MRMC study is typically retrospective), then it is never reasonable to set a non-inferiority hypothesis. Otherwise, more ideally, you could identify any important clinical consequence of false-positive diagnosis or false-negative diagnosis to set it as the primary endpoint and then to set a non-inferiority margin that can be accepted in real-world practice. But again, this can be possible only in a prospective study, perhaps only in an RCT. BTW, I'd like to appreciate Brandon's consistency in developing iMRMC. Your work is truly a stellar contribution in all diagnostic research field, not limited to radiology field. Best, Kyoung Ho Lee

…

On Tue, Mar 5, 2019 at 6:21 PM Brandon Gallas ***@***.***> wrote: Regarding the non-inferiority margin ... There is no right answer. It should be motivated by clinical decisions, but that is tough for AUC and reader studies. A margin of 0.10 sounds big enough to drive a truck through it. A margin of 0.05 is reasonable, in my opinion. You might want to reach out to your "audience" or your funding source or at least discuss with your collaborators. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#31 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOn7sSWQ6nM-b0vveWO2qrhvi63DnRIEks5vTqexgaJpZM4St3tZ> .

0 replies

zhangyao1994 · 2019-03-05T23:35:01Z

zhangyao1994
Mar 5, 2019

Hi Brandon & Kyoung,

I really appreciate your replies!

Yes, I had a look at the paper Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies." This paper actually cited the paper I mentioned as Reference 13: Jen-Pei Liu et al. "Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves". STATISTICS IN MEDICINE. DOI: 10.1002/sim.2358.
Their difference is whether to consider reader variability or not (ie, applied to MRMC studies or not).
Sure. I can check the consistenby between these two methods.

Regarding the non-inferiority margin, I will need to further discuss with our collaborators in the clinic.

Kyoung's recommendation is not to use non-inferiority in terms of AUC.
Well, my goal is to demonstrate that two methods perform equally well on the same cases (or my new method A is non-inferior to the traditional method B). We are comparing two paired ROC curves since ROC analysis seems a well-developed and widely-used evaluation method.
I understand that AUC is a parameter to SUMMARIZE BOTH sensitivity and specificity, and it could be more meaningful to set a non-inferiority margin in terms of sensitivity (or specificity) with the other fixed.
I wonder, may I still use AUC to demontrate the "comprehensive" performance of the compared two methods? I hope that I can use an existing evaluation method since my focus is actually devoloping the method for diagnosis instead of the evaluatiion method :).

At the same time, I would also like to contact Park JH at pjihoon79@gmail.com to learn more about testing on sensitivity and specificity. Thank you very much for your help!

Sincerely,
Yao

0 replies

jsprtwlt · 2023-01-02T13:18:53Z

jsprtwlt
Jan 2, 2023

Hi Brandon and all,

Thanks for this thread, providing some additional information on how to translate from superiority to a noninferiority setting.

We also want to test for noninferiority by comparing radiologists (reading cases within a 4x4 split plot design) against stand-alone AI (reading all case), but have some difficulty calculating a P-value for this test within, and using the output of, the iMRMC tool.

Based on the paper of Chen2012_Acad-Radiol_v19p1158, "Hypothesis testing in noninferiority and equivalence MRMC ROC studies.", we understand how to interpret the average AUCs and the CIs of the two modalities and how to conclude noninferiority. E.g. noninferiority can be concluded if the AUC difference AI − radiologists is greater than 0 and the lower limit of the 95% confidence interval of the difference was greater than the negative value of the noninferiority margin (e.g. −0.05).

However, we would also like to acquire a p-value. Again, in the work of Chen et al. its provided that for a noninferiority test, P = 2(1-F(t; df0|H0)), where F (t; df0|H0) is the cumulative distribution function of the test statistic t under the null hypothesis H0, which is a Student's t distribution with df0 degrees of freedom. df0 can be estimated by applying Hillis's method (equation 3), which uses various variance components from the OR method. This is where we get stuck.

Within our analysis (utilizing a non-fully crossed design) we are only provided variance components in BCK and BDG format, (and not in OR which is used within the paper of Chen et al.) and therefore, it seems like we can not calculate the df0 following equation 3 from the paper (which uses OR parameters as input).

Would there be any way to overcome this issue? For example, is there a way to do a conversion from BCK to OR parameters? Or is there another way to determine the cumulative distribution function of the test statistic t under the null hypothesis H0? Or would it maybe be better to start working in R and not use the JAVA tool when it comes to more extensive analysis (not sure whether this is the case)?

Thanks in advance. If more information is needed to help us out, please let us know.

Best, Jasper

0 replies

brandon-gallas · 2023-01-06T17:01:23Z

brandon-gallas
Jan 6, 2023
Maintainer

I see this but haven't been able to get to it.

My quick feedback is to draw a picture of the null and alternative hypotheses. The analysis of a non-inferiority study is generally the same as a superiority study except the distributions are shifted to lower levels of performance. I think the answer is to simply shift the iMRMC calculations (means, variances, test threshold, integral to calculate p-value, confidence interval).

Brandon

0 replies

hubtub2 · 2023-03-06T15:37:21Z

hubtub2
Mar 6, 2023

Hi Brandon and All,

you wrote:

If you do not have MRMC data (multiple readers), you will have to trick the iMRMC software by duplicating your single reader data in each modality (giving the duplicates different names). There won't be any reader variability if you duplicate exactly. It should produce the AUC variance estimate for a single reader as given in Eq. A.25 of Gallas2006_Acad-Radiol_v13p353.

When duplicating the data with a second reader, I get the following Warning message:

"DF_Hillis was calculated to be infinite. It is likely due to ms_tr=0.0 (difference in modalities analysis) or rms_r=0.0 (single modality analysis). DF_Hillis is being set to 50. Please check your data."

I assume that this is to be expected, since we duplicated the data?

0 replies

brandon-gallas · 2023-03-07T15:19:59Z

brandon-gallas
Mar 7, 2023
Maintainer

@hubtub2, The warnings you received are for an MRMC analysis. The trick I provided is to get you to a single-reader analysis. As such, the warnings are not for your use case.

We are getting close to sharing a script and example for doing the non-inferiority analysis.

0 replies

brandon-gallas · 2023-05-08T14:19:35Z

brandon-gallas
May 8, 2023
Maintainer

All,

I would like to point out that my research assistant @emma-gardecki-FDA created an R document outlining how to do a non-inferiority test with the iMRMC software. I hope this helps. I apologize for the delay.

How to perform a non-inferiority analysis with the iMRMC software

I will be moving this "issue" to the "discussion" area.

I will also take this opportunity to give a few sentences on my perspective of AUC versus Sensitivity and Specificity. AUC is very relevant for studies that involve humans, especially studies with multiple humans. This is specifically because AUC averages over the threshold, which can be very different person vs. person, and clinical setting vs. clinical study. It is a primary endpoint for many regulatory submissions of diagnostic imaging. Here are a couple papers that support this opinion:

F. Samuelson et al., “The importance of ROC data,” Acad Radiol, vol. 18, pp. 257–258, 2011.
B. D. Gallas et al., “Evaluating imaging and computer-aided detection and diagnosis devices at the FDA,” Acad Radiol, vol. 19, no. 4, pp. 463–477, Feb. 2012, doi: 10.1016/j.acra.2011.12.016.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I do a non-inferiority study #171

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How do I do a non-inferiority study #171

GoogleCodeExporter Jun 1, 2015

Replies: 11 comments

brandon-gallas Mar 16, 2018 Maintainer

zhangyao1994 Mar 4, 2019

brandon-gallas Mar 5, 2019 Maintainer

brandon-gallas Mar 5, 2019 Maintainer

kholee Mar 5, 2019

zhangyao1994 Mar 5, 2019

jsprtwlt Jan 2, 2023

brandon-gallas Jan 6, 2023 Maintainer

hubtub2 Mar 6, 2023

brandon-gallas Mar 7, 2023 Maintainer

brandon-gallas May 8, 2023 Maintainer

GoogleCodeExporter
Jun 1, 2015

brandon-gallas
Mar 16, 2018
Maintainer

zhangyao1994
Mar 4, 2019

brandon-gallas
Mar 5, 2019
Maintainer

brandon-gallas
Mar 5, 2019
Maintainer

kholee
Mar 5, 2019

zhangyao1994
Mar 5, 2019

jsprtwlt
Jan 2, 2023

brandon-gallas
Jan 6, 2023
Maintainer

hubtub2
Mar 6, 2023

brandon-gallas
Mar 7, 2023
Maintainer

brandon-gallas
May 8, 2023
Maintainer