AUTHOR=Li Hongsen , Shen Jiaying , Shou Jiawei , Han Weidong , Gong Liu , Xu Yiming , Chen Peng , Wang Kaixin , Zhang Shuangfeng , Sun Chao , Zhang Jie , Niu Zhongfeng , Pan Hongming , Cai Wenli , Fang Yong 

TITLE=Exploring the Interobserver Agreement in Computer-Aided Radiologic Tumor Measurement and Evaluation of Tumor Response

JOURNAL=Frontiers in Oncology

VOLUME=Volume 11 - 2021

YEAR=2022

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2021.691638

DOI=10.3389/fonc.2021.691638

ISSN=2234-943X

ABSTRACT=The accurate, objective, and reproducible evaluation of tumor response to therapy is indispensable in clinical trials. This study aimed at investigating the reliability and reproducibility of a computer-aided contouring (CAC) tool in tumor measurements, and its impact on evaluation of tumor response in terms of RECIST 1.1 criteria. A total of 200 cancer patients were retrospectively collected in this study, which were randomly divided into two sets of 100 patients for experiential learning and testing. A total of 744 target lesions were identified by a senior radiologist in distinctive body parts, of which 278 lesions were in dataset 1 (learning set) and 466 lesions in dataset 2 (testing set). Five image analysts were respectively instructed to measure lesion diameter using manual and CAC tools in dataset 1 and subsequently tested in dataset 2. The interobserver variability of tumor measurements were validated by using the coefficient of variance (CV), the Pearson correlation coefficient (PCC), and the interobserver correlation coefficient (ICC). We verified that the mean CV of manual measurement remained constant between the learning and testing datasets (0.33 vs. 0.32, p=0.490), whereas it decreased for the CAC measurements after learning (0.24 vs. 0.19, p<0.001). The interobserver measurements with good agreement (CV<0.20) were 29.9% (manual) vs 49.0% (CAC) in the learning set (p<0.001), and 30.9% (manual) vs 64.4% (CAC) in the testing set (p<0.001). The mean PCCs were 0.56±0.11mm (manual) vs 0.69±0.10mm (CAC) in the learning set (p=0.013), and 0.73±0.07mm (manual) vs 0.84±0.03mm (CAC) in the testing set (p<0.001). ICCs were 0.633 (manual) vs 0.698 (CAC) in the learning set (p<0.001), and 0.716 (manual) vs 0.824 (CAC) in the testing set (p<0.001). The Fleiss’ kappa analysis revealed that the overall agreement were 58.7% (manual) vs 58.9% (CAC) in the learning set, and 62.9% (manual) vs 74.5% (CAC) in the testing set. The 80% agreements of tumor response evaluation were 55.0% (manual) vs 66.0% in the learning set, and 60.6% (manual) and 79.7% (CAC) in the testing set. In conclusion, CAC can reduce the interobserver variability of radiological tumor measurements, and thus improve the agreement of imaging evaluation of tumor response.