Low-cost deep learning for
melanoma detection.
An automated screening system reaching 98.59% recall and 98.59% precision on HAM10000 using a ConvNeXtBase ensemble — built as a proof of concept for accessible diagnostic aid in underserved regions.
Context
Lifetime risk of developing melanoma. Early detection is the single strongest predictor of survival.
Diagnostic accuracy of visual assessment by non-specialists — a wide gap to the standard of care.
Typical out-of-pocket cost of a dermatology visit, forming a real financial barrier to early screening.
Results
Evaluated on HAM10000 combined with an auxiliary Skin Lesions dataset at a 99% confidence threshold. The system catches nearly every melanoma while keeping false positives rare.
Finding
A rare win–win on recall and precision.
Typically, pushing recall upward drags precision down. Combining ConvNeXtBase, a 5-model ensemble, expanded training data, and a 99% confidence threshold moved both metrics together.
Demo
A short walkthrough of the system analysing dermoscopy images with per-image confidence and Grad-CAM attention overlaid.
Visualizations
ROC curves, confusion matrix, ensemble comparison and Grad-CAM interpretability — the evidence behind the headline numbers.
ROC Curve Analysis
· AUC = 0.997Receiver Operating Characteristic comparing a single model versus the ensemble.
Grad-CAM Visualization
· Lesion-focused attentionVisual explanations showing where the model looks when it predicts melanoma.
Confusion Matrix
· Only 7 missed casesClassification outcomes at the 99% confidence threshold.
Ensemble vs Single Model
· 5-model diversityPerformance of the ensemble against the best single model across seeds.
Methodology
Progressive Training
- Phase 1 · Classifier only — 10 epochs, frozen backbone
- Phase 2 · Partial unfreezing — 15 epochs, last 3 blocks
- Phase 3 · Full fine-tuning — 25 epochs, discriminative learning rates
Medical Preprocessing
- Black corner inpainting — threshold = 50, radius = 15
- Hair removal — via black-hat transform
- CLAHE contrast enhancement — applied in LAB color space
- Conservative augmentation — horizontal / vertical flips only
Optimization
- Focal Loss — with class weighting, γ = 2.0
- AdamW — with decoupled weight decay
- CosineAnnealingWarmRestarts — learning-rate scheduler
- Test-time augmentation — 9 variants averaged at inference
Approach
A narrow problem solved with boring engineering — medical-grade preprocessing, an honest ensemble, and a confidence gate that says “I don’t know” when it should.
ConvNeXtBase Ensemble
Five independently seeded ConvNeXtBase models averaged at inference for robust predictions and reduced variance.
Medical Preprocessing
Black-corner inpainting, hair removal, and CLAHE contrast enhancement — tuned for dermoscopy, not generic imagery.
Confidence Thresholding
A 99% confidence gate keeps precision surgical and surfaces the uncertain cases for human review instead of forcing a call.
Grad-CAM Interpretability
Every prediction ships with a visual explanation of the lesion regions that drove it, so clinicians can audit the model’s attention.
Disclaimer
This is a research project for educational purposes only. The system is not approved for clinical diagnosis and should not replace professional medical evaluation. Always consult a qualified healthcare professional for medical advice.
Read the
full work.
The repository contains the training pipeline, preprocessing code, evaluation notebooks, and trained model artifacts. Everything is reproducible.