Fundus imaging is an indispensable tool in primary care for the early detection of major ophthalmic diseases—such as diabetic retinopathy and glaucoma—and for guiding treatment decisions. By noninvasively visualizing the retinal vasculature and subtle changes at the optic nerve head, fundus exams also serve as indicators of systemic health, making them a first line of patient management. With the widespread adoption of high-resolution, digital camera–based fundus imaging, a variety of imaging modalities have rapidly entered clinical practice. Recently, deep-learning–based models for classifying fundus diseases have demonstrated high sensitivity and specificity and have proven their clinical utility by being integrated into numerous software medical devices (SaMD). For example, automated diabetic retinopathy screening systems and glaucoma-progression monitoring tools are already commercially available, contributing broadly to diagnostic support and patient screening. However, most models are trained and validated on data from a single camera type, which limits their performance when applied to images from new or infrequently used devices. To overcome these practical constraints, this challenge aims to develop AI models that deliver consistent diagnostic performance across diverse camera environments. Through the Multi-Camera Robust Diagnosis of Fundus Diseases (MuCaRD) challenge, we will evaluate both robust classification algorithms that generalize to unseen devices and adaptive learning techniques that can quickly fine-tune using only a few sample images from a new camera.
The MuCaRD challenge addresses a critical gap in AI‐driven fundus screening: ensuring consistent performance across both familiar and unseen camera systems. Participants will develop and benchmark models under realistic constraints—training on a limited set of images from one device and then evaluating robustness and adaptability on entirely new devices. By simulating clinical and commercial deployment scenarios, MuCaRD promotes methods that generalize beyond a single data source and can quickly fine‐tune to novel imaging hardware.
Performance is measured by the average of the Area Under the ROC Curve (AUROC) and the Area Under the Precision–Recall Curve (AUPRC) for each disease. To mirror clinical feasibility, all inference and adaptation steps must complete within 10 seconds per image, though this limit does not directly penalize the score. Submissions are limited to two runs per day during validation to curtail leaderboard overfitting.
Certificates will be presented to the top three teams in each task.
The first and corresponding authors of the winning teams will be invited to co-author the challenge summary paper and to present their results at the workshop.
For inquiries, please email: g.young@mediwhale.com