Home Punge Blog
Punge March 8, 2026  ·  12 min read

Small Model, Big Results: How a 5MB On-Device NSFW Detector Outperforms Cloud APIs and Passes a Fairness Audit

By Mark Russo  ·  Methodology assistance: Claude (Anthropic)

Abstract

This paper consolidates three independent benchmark evaluations of Punge, an on-device NSFW image detector built on a custom-trained YOLO nano model, currently running YOLO26n. Across three distinct evaluation axes — accuracy against commercial cloud APIs, accuracy against open-source classifiers, and demographic fairness using a reproduction of the Leu, Nakashima & Garcia (FAccT 2024) methodology — Punge's 5.1 MB model consistently meets or outperforms significantly larger models. On suggestive content classification, Punge outperforms Google Cloud Vision SafeSearch by 14.6 percentage points. Against Falcons.ai's Vision Transformer classifier, Punge achieves a 3% misclassification rate versus 38%. On the FAccT fairness audit, Punge's gender false positive disparity ratio (1.23×) is lower than all three models audited in the original Garcia study, which ranged from 1.0× to 6.4×. The architectural choice to detect anatomical shapes rather than classify whole images is both the mechanism behind Punge's accuracy advantage and a structural response to the demographic bias problem identified in Garcia et al.


1. Introduction

NSFW image detection is a real problem with real stakes. Parents use it to protect children. Individuals use it to manage their own photo libraries. Platforms use it to moderate content at scale. Yet the dominant approaches — large cloud-based classifiers and general-purpose whole-image models — share a common limitation: they were built for the average case, not for the constraints of a mobile device or the specific demands of detecting explicit content accurately without encoding demographic bias.

Punge was built under a hard constraint: all processing must happen on-device, with no image ever uploaded to a server. This required a model small enough to run on a smartphone CPU in real time. The architecture chosen was YOLO — specifically the nano variant, chosen for its efficiency. What emerged from this constraint was not a compromise. It was, as the benchmarks below demonstrate, a structural advantage.

This paper reports three evaluations:

The model evaluated in Studies 1 and 2 was trained on YOLOv11n. Punge has since migrated to YOLO26n. On standard COCO benchmarks, YOLO26n achieves comparable accuracy to YOLO11n (~39.8% vs ~38.5% mAP50–95) while delivering approximately 31% faster CPU inference (38.9ms vs 56.1ms) — an improvement attributable in part to YOLO26's native end-to-end, NMS-free inference design. Study 3 was conducted using YOLO26n.


2. Background: The Models We're Comparing Against

2.1 Google Cloud Vision SafeSearch

Google's Cloud Vision API is among the most widely used commercial image analysis tools. Its SafeSearch feature returns likelihood ratings across five categories: adult, spoof, medical, violence, and racy. In this study, the adult annotation was used, with an image classified as NSFW if the rating returned POSSIBLE, LIKELY, or VERY_LIKELY. As a proprietary cloud API, its internal architecture and thresholds are not public.

2.2 Yahoo Open NSFW Model

A fine-tuned ResNet-50 model, widely used as a baseline in NSFW detection. It returns a single confidence score per image. Evaluated at thresholds of 0.50, 0.60, and 0.70.

2.3 Falcons.ai nsfw_image_detection

A Vision Transformer (ViT) model fine-tuned on approximately 80,000 images for binary NSFW classification. Based on Google's vit-base-patch16-224-in21k pretrained weights. Treats the full image as a single global representation.

2.4 Garcia et al. (FAccT 2024) Models

Leu, Nakashima, and Garcia audited three classifiers in "Auditing Image-based NSFW Classifiers for Content Filtering" (FAccT 2024):

ModelArchitectureSize
NSFW-CNNInceptionV385.3 MB
CLIP-ClassifierCLIP + FC layer888.3 MB
CLIP-DistanceCLIP + cosine distance887.5 MB

All three are whole-image classifiers trained on internet-scraped datasets.

2.5 Punge (YOLO26n)

A custom-trained YOLO nano model. Rather than classifying the full image, it detects and draws bounding boxes around specific anatomical regions. An image is flagged as NSFW if at least one detection crosses the confidence threshold. Model weight size: 5.1 MB. All inference runs locally on the user's device via CoreML (iOS) or LiteRT (Android).

The current production model runs on YOLO26n, which introduces native NMS-free end-to-end inference. In prior YOLO versions, Non-Maximum Suppression (NMS) — the post-processing step that filters overlapping bounding boxes — was applied after the model returned its raw predictions, and its implementation details could vary across deployment environments. YOLO26 eliminates this step by building suppression into the model's prediction head directly. For a mobile deployment, this means lower and more predictable latency, and one fewer variable in the inference pipeline.


3. Study 1: Punge vs. Falcons.ai Vision Transformer

3.1 Methodology

Both models were evaluated on an identical 100-image test set at a 0.50 confidence threshold. The dataset is small by academic standards and is intended as a directional comparison rather than a definitive benchmark; the results should be interpreted accordingly.

3.2 Results

ModelMisclassification Rate
Falcons.ai (ViT)38%
Punge (YOLO11n)3%

3.3 Discussion

The 35-percentage-point gap reflects a fundamental architectural difference. Falcons.ai's ViT processes the image globally, building a holistic representation that encodes scene context, background, and visual composition. This can cause both false positives — where contextual cues mislead the classifier — and false negatives — where explicit regions are diluted by surrounding non-explicit content.

Punge detects localized regions. If an explicit anatomical shape is present anywhere in the image, the bounding box fires. If it is not, no detection occurs, regardless of the surrounding scene. This localized approach is both more sensitive to actual explicit content and more resistant to contextual misfires.


4. Study 2: Punge vs. Google Cloud Vision and Yahoo Open NSFW

4.1 Methodology

Three datasets were used:

Punge was evaluated at three confidence thresholds (0.50, 0.60, 0.70). Google Cloud Vision uses its internal threshold (not publicly specified). Yahoo was also evaluated at 0.50, 0.60, and 0.70.

4.2 Results

Explicit Imagery — True Positive Rate

ModelTrue Positive Rate
Punge — 0.50 threshold100%
Google Cloud Vision — internal threshold99%
Yahoo — 0.50 threshold66%

Suggestive Imagery — False Positive Rate (lower is better)

ModelFalse Positive Rate
Punge — 0.70 threshold1.6%
Yahoo — 0.70 threshold2.1%
Google Cloud Vision — internal threshold16.2%
Google Cloud Vision flagged nearly 1 in 6 swimsuit photos as explicit content. Punge flagged fewer than 1 in 60.

Everyday Imagery (COCO 2014)

All models achieved near-perfect accuracy on non-explicit everyday imagery. Punge at 0.60+ and Yahoo achieved 100% accuracy. Google Cloud Vision also performed well on this dataset.

4.3 Discussion

The most practically significant finding is the suggestive content result. For real-world NSFW detection, the hard problem is not identifying explicit pornography — most classifiers handle that reasonably well. The hard problem is the boundary between explicit and suggestive: swimwear, lingerie, artistic nudity, medical imagery. A model that flags swimsuit photos as NSFW is not useful.

Punge's 14.6-percentage-point advantage over Google Cloud Vision on suggestive content reflects its localized detection approach. Swimsuit photos do not contain explicit anatomical regions in Punge's training taxonomy, so they do not trigger detections. Google's classifier, operating on global image representations, appears to over-index on visual features associated with bodies, skin, and context that correlate with explicit content in its training data.

A practical advantage of the YOLO threshold approach is also worth noting: developers can tune the confidence threshold to calibrate the sensitivity/precision tradeoff for their specific application. Google's API provides a fixed operating point with no developer control.


5. Study 3: Demographic Fairness Evaluation (FAccT 2024 Methodology)

5.1 Background

In their FAccT 2024 paper, Leu, Nakashima, and Garcia audited three NSFW classifiers against the MSCOCO and Google Conceptual Captions datasets, using demographic annotations to measure whether false positive rates varied across gender, skin tone, and age. Their findings were stark:

This is not a minor calibration issue. It is a systematic encoding of the demographic appearance of women as a proxy for explicit content. Garcia et al. called for broader scrutiny of NSFW detection methodology and explicitly aimed to "stimulate further exploration in this domain."

5.2 Methodology

We reproduced the Garcia et al. methodology to evaluate Punge against the same standard:

Because Punge uses object detection rather than image classification, a false positive is defined as any image where at least one bounding box detection crosses the confidence threshold. This is a meaningful distinction from the Garcia models: Punge never classifies a person or their demographic attributes — it detects shapes.

5.3 Results

At the 0.60 threshold, Punge's gender disparity ratio (1.23×) is lower than all three Garcia models. Its skin tone ratio (0.89×) shows near-perfect parity, slightly favoring darker skin tones, compared to 2.0–2.2× ratios in the audited models.

5.4 Why Does YOLO Produce Less Demographic Bias?

This result is not accidental. There are structural reasons why an anatomy-detecting YOLO model produces less demographic bias than whole-image NSFW classifiers.

YOLO is trained to detect explicit anatomical regions, not to model person identity. The Garcia paper's explainability finding — that all three classifiers were using female faces as NSFW signals — is only possible for models that process the entire image and build up a semantic representation that includes person identity and perceived gender. Punge's model was trained on anatomical shapes, not on people. A female face does not appear in Punge's training taxonomy, and there is no pathway from "this person appears to be female" to "detection fires."

Large models absorb demographic correlations from training data. The CLIP-based models audited by Garcia were trained on massive internet-scraped datasets, including LAION-5B. The internet contains systematic associations between certain demographics and explicit content that have nothing to do with the actual prevalence of explicit content in those groups. Models with broad training objectives and enormous capacity absorb these correlations. A compact, task-specific detector trained on anatomical shape examples has no mechanism to absorb such correlations.

The absence of a "person" concept. Whole-image classifiers have rich internal representations of people, faces, gender expression, and social context. NSFW classification layered on top of these person-aware representations makes demographic contamination nearly inevitable. YOLO trained for anatomical detection has no such person representation. It operates at the level of shapes and spatial patterns, not semantic identity.

Garcia et al.'s call to "stimulate further exploration" in NSFW detection methodology is consistent with the observation that whole-image semantic classifiers may be structurally ill-suited to this task from a fairness standpoint. A detector that never sees the person — only their anatomy, if explicit content is present — is a structural answer to the problem they identified.


6. Model Size and Deployment Context

ModelSizeArchitecture
CLIP-Classifier888.3 MBCLIP + FC classifier
CLIP-Distance887.5 MBCLIP + cosine distance
NSFW-CNN85.3 MBInceptionV3 CNN
Falcons.ai~350 MBVision Transformer
Punge (YOLO26n)5.1 MBYOLO object detector

Punge is roughly 17× smaller than NSFW-CNN and 175× smaller than the CLIP-based models. It outperforms all of them on both accuracy metrics and demographic fairness.

This size difference is not just an engineering curiosity. Punge runs entirely on the user's device. Content is never uploaded to a cloud server for analysis. The model makes detection decisions locally, without a network call, without any human review layer, and without any data ever leaving the phone.

This makes the fairness properties especially consequential. In a cloud-based moderation system, systematic demographic bias can be audited, corrected, and overridden by human review. In an on-device system processing a user's personal photo library, there is no such correction mechanism. A biased detection at that layer is invisible and uncorrectable at scale. The low disparity ratios observed in this evaluation are therefore not just academically interesting — they are a direct property of the deployed product, affecting real users in real time.


7. Limitations

Several limitations of this work should be acknowledged.

The benchmark datasets used in Studies 1 and 2 were curated by the author and are not publicly released. The explicit dataset (n=100) is small by academic standards, though it is representative of the detection task.

The FAccT methodology reproduction used 10,783 images compared to Garcia et al.'s 11,628, due to filtering applied to ensure demographic homogeneity within each evaluated image. This difference is unlikely to materially affect the directional findings but should be noted.

Punge's training dataset (~21,000 images at the time of YOLO26n training) is task-specific and domain-focused. Performance may vary on distributions of explicit content that differ significantly from the training data.

As an anatomy-based detector, Punge will not flag explicit content where no trained anatomical regions are visible. This includes clothed sexual acts, obscured or implied nudity, and contextually explicit scenes that do not expose specific body parts. This is a fundamental property of the detection approach rather than a calibration issue, and developers requiring detection of non-anatomical explicit content should be aware of this scope.

Finally, this evaluation was conducted by the developer of Punge. Independent replication would strengthen the findings. The methodology is documented here with sufficient detail to support reproduction, and the author welcomes external validation.


8. Conclusion

Three independent evaluations — against a commercial cloud API, an open-source ViT classifier, and a rigorous academic fairness audit — consistently find that Punge's 5.1 MB on-device YOLO model meets or outperforms significantly larger models. The central hypothesis supported by these results is that the architectural choice to detect anatomy rather than classify images is both the cause of Punge's accuracy advantage and a direct response to the demographic bias problem identified by Garcia et al.

The implication for the field is worth stating plainly: bigger is not always better, and in NSFW detection specifically, whole-image semantic classification may be the wrong architectural paradigm. A model that never sees the person — only the anatomical content, if present — is faster, smaller, more accurate on the hard cases, and less likely to encode demographic appearance as a proxy for explicit content.

A natural direction for future work is evaluating model performance on explicit content that does not involve anatomical nudity — clothed scenes, implied content, and contextual explicitness. Whole-image classifiers may perform better in this category than anatomy-based detectors, though to the authors' knowledge no systematic benchmark exists for this problem. It represents an open research question for the field.


Summary Comparison

CriterionGarcia ModelsPunge
ArchitectureWhole-image classifiersObject detection
Model size85–888 MB5.1 MB
Training dataInternet-scrapedTask-specific
Gender FPR ratio1.0×–6.4×1.23×
Skin tone FPR ratio2.0×–2.2×0.89×
DeploymentServer-sideOn-device, no upload
Suggestive FPR vs. GoogleBaseline+14.6 pp better

References

Leu, W., Nakashima, Y., & Garcia, N. (2024). Auditing Image-based NSFW Classifiers for Content Filtering. ACM FAccT 2024.

Zhao, J., et al. (2021). Understanding and Evaluating Racial Biases in Image Captioning. Princeton Annotation Dataset.

Sapkota, R., et al. (2025). Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5. arXiv:2510.09653.

Ultralytics. (2026). YOLO26 Documentation.

Punge is available on the App Store and Google Play. The benchmarking methodology described in Study 3 is available for independent reproduction. Correspondence: mark@markatlarge.com