Abstract
Background: Diabetic retinopathy (DR) is a leading cause of visual impairment worldwide. Manual grading of fundus images is the gold standard in DR screening, although it is time-consuming. Artificial intelligence (AI)-based algorithms offer a faster alternative, though concerns remain about their diagnostic reliability. Methods: A cross-sectional pilot study among patients (≥18 years) with diabetes was established for DR and diabetic macular edema (DME) screening at the Oslo University Hospital (OUH), Department of Ophthalmology, and the Norwegian Association of the Blind and Partially Sighted (NABP). The aim of the study was to evaluate the validity (accuracy, sensitivity, specificity) and reliability (inter-rater agreement) of automated AI-based compared to manual consensus (MC) grading of DR and DME, performed by a multidisciplinary team of healthcare professionals. Grading of DR and DME was performed manually and by EyeArt (Eyenuk) software version v2.1.0, based on the International Clinical Disease Severity Scale (ICDR) for DR. Agreement was measured by Quadratic Weighted Kappa (QWK). Sensitivity, specificity, and diagnostic test accuracy (Area Under the Curve (AUC)) were also calculated. Results: A total of 128 individuals (247 eyes) (51 women, 77 men) were included, with a median age of 52.5 years. Prevalence of any vs. referable DR (RDR) was 20.2% vs. 11.7%, while sensitivity was 94.0% vs. 89.7%, specificity was 72.6% was 83.0%, and AUC was 83.5% vs. 86.3%, respectively. DME was detected only in one eye by both methods. Conclusions: AI-based grading offered high sensitivity and acceptable specificity for detecting DR, showing moderate agreement with manual assessments. Such grading may serve as an effective screening tool to support clinical evaluation, while ongoing training of human graders remains essential to ensure high-quality reference standards for accurate diagnostic accuracy and the development of AI algorithms.
| Original language | English |
|---|---|
| Article number | 4810 |
| Journal | Journal of Clinical Medicine |
| Volume | 14 |
| Issue number | 13 |
| DOIs | |
| Publication status | Published - 1 Jul 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- diagnostic accuracy
- manual consensus grading
- EyeArt
- artificial intelligence (AI)
- fundus photography
- automated grading
- screening program
- diabetic retinopathy
- diabetic macular edema
OECD Field of Science
- 3. Medical and Health Sciences
Fingerprint
Dive into the research topics of 'Comparison of Validity and Reliability of Manual Consensus Grading vs. Automated AI Grading for Diabetic Retinopathy Screening in Oslo, Norway: A Cross-Sectional Pilot Study'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver