Although artificial intelligence (AI), and the machine learning algorithm that drives it, shows promise as a screening tool for ocular pathology such as diabetic retinopathy (DR), it’s not perfect. But researchers at Google found using a small set of DR cases judged by ophthalmologists and retina specialists can improve the algorithm’s accuracy.1 

The researchers had the clinicians and the algorithm grade retinal fundus images from DR screening programs and then chose the consensus of the retina specialists as the reference standard. 

After adding the reference cases “as a tuning dataset,” the investigators looked at the area under the curve (AUC), sensitivity and specificity data between the manual grading and the algorithm. With the new data, the algorithm improved its AUC from 0.934 to 0.986 when screening moderate or worse DR, a boost that enabled the AI system to perform similarly to or even exceed US board-certified ophthalmologists and retinal specialists, according to study author Lily Peng, MD, PhD.1,2 

Although reference cases are key to “teach” any deep learning machine to properly screen for DR, creating those standards is time-consuming. The researchers were happy to find using just a small subset of carefully adjudicated cases could make a huge difference in the algorithm’s performance.

1. Krause J, Gulshan V, Rahimy E, et al. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology. March 12, 2018. [Epub ahead of print].
2. Human adjudication of DR grading enhances machine learning algorithm. Healio. March 14, 2018.