Research · Medical AI

Low-cost deep learning for
melanoma detection.

An automated screening system reaching 98.59% recall and 98.59% precision on HAM10000 using a ConvNeXtBase ensemble — built as a proof of concept for accessible diagnostic aid in underserved regions.

0
Recall
Sensitivity
0
Precision
Accuracy of positives
0
Accuracy
Overall classification
0
Missed
Out of 529 melanomas
01

Context

1 in 27 men · 1 in 40 women

Lifetime risk of developing melanoma. Early detection is the single strongest predictor of survival.

60–80%

Diagnostic accuracy of visual assessment by non-specialists — a wide gap to the standard of care.

€120+

Typical out-of-pocket cost of a dermatology visit, forming a real financial barrier to early screening.

02

Results

Evaluated on HAM10000 combined with an auxiliary Skin Lesions dataset at a 99% confidence threshold. The system catches nearly every melanoma while keeping false positives rare.

Recall (Sensitivity) +20.3%
Baseline
78.3%
Ours
98.59%
Precision +29.4%
Baseline
69.2%
Ours
98.59%
Accuracy +3.5%
Baseline
96%
Ours
99.49%
F1-Score +25.2%
Baseline
73.4%
Ours
98.59%

Finding

A rare win–win on recall and precision.

Typically, pushing recall upward drags precision down. Combining ConvNeXtBase, a 5-model ensemble, expanded training data, and a 99% confidence threshold moved both metrics together.

03

Demo

A short walkthrough of the system analysing dermoscopy images with per-image confidence and Grad-CAM attention overlaid.

Real-time analysis on dermoscopy samples Video loads on play
04

Visualizations

ROC curves, confusion matrix, ensemble comparison and Grad-CAM interpretability — the evidence behind the headline numbers.

ROC Curve Analysis

· AUC = 0.997

Receiver Operating Characteristic comparing a single model versus the ensemble.

Grad-CAM Visualization

· Lesion-focused attention

Visual explanations showing where the model looks when it predicts melanoma.

Confusion Matrix

· Only 7 missed cases

Classification outcomes at the 99% confidence threshold.

Ensemble vs Single Model

· 5-model diversity

Performance of the ensemble against the best single model across seeds.

05

Methodology

1

Progressive Training

  • Phase 1 · Classifier only — 10 epochs, frozen backbone
  • Phase 2 · Partial unfreezing — 15 epochs, last 3 blocks
  • Phase 3 · Full fine-tuning — 25 epochs, discriminative learning rates
2

Medical Preprocessing

  • Black corner inpainting — threshold = 50, radius = 15
  • Hair removal — via black-hat transform
  • CLAHE contrast enhancement — applied in LAB color space
  • Conservative augmentation — horizontal / vertical flips only
3

Optimization

  • Focal Loss — with class weighting, γ = 2.0
  • AdamW — with decoupled weight decay
  • CosineAnnealingWarmRestarts — learning-rate scheduler
  • Test-time augmentation — 9 variants averaged at inference
06

Approach

A narrow problem solved with boring engineering — medical-grade preprocessing, an honest ensemble, and a confidence gate that says “I don’t know” when it should.

ConvNeXtBase Ensemble

Five independently seeded ConvNeXtBase models averaged at inference for robust predictions and reduced variance.

Medical Preprocessing

Black-corner inpainting, hair removal, and CLAHE contrast enhancement — tuned for dermoscopy, not generic imagery.

Confidence Thresholding

A 99% confidence gate keeps precision surgical and surfaces the uncertain cases for human review instead of forcing a call.

Grad-CAM Interpretability

Every prediction ships with a visual explanation of the lesion regions that drove it, so clinicians can audit the model’s attention.

Built with
PythonPyTorchConvNeXtBaseOpenCVscikit-learnGrad-CAM

Disclaimer

This is a research project for educational purposes only. The system is not approved for clinical diagnosis and should not replace professional medical evaluation. Always consult a qualified healthcare professional for medical advice.

Read the
full work.

The repository contains the training pipeline, preprocessing code, evaluation notebooks, and trained model artifacts. Everything is reproducible.

Type
Research project · MSc coursework
Dataset
HAM10000 + Skin Lesions
Stack
PyTorch · ConvNeXtBase · OpenCV