iiiorg/piiranha-v1-detect-personal-information · Incredible amount of false positives.

This model detects an incredible amount of false positives. In attempting to give full rows of data a score, thinking there would be some sort of contextual scoring, I noticed the majority of the dataset were showing a probability of over 0.4. In fact, 90% were over 0.9.

I decided to try to give individual cells of data a score to test what was considered PII. The results show that even the most mundane thing, such as a timestamp, is shown as PII.

Here are some examples:

Format:
Data, Score

Timestamps:
02-009-24 5:00:27.975, 0.9903076887130737

Text:
Nowater, 0.726058840751648
Shop - residential , 0.655426561832428
Brick, 0.8724555969238281

Number on its own:
1, 0.7731475830078125