Incredible amount of false positives.
#10
by
anakaine
- opened
This model detects an incredible amount of false positives. In attempting to give full rows of data a score, thinking there would be some sort of contextual scoring, I noticed the majority of the dataset were showing a probability of over 0.4. In fact, 90% were over 0.9.
I decided to try to give individual cells of data a score to test what was considered PII. The results show that even the most mundane thing, such as a timestamp, is shown as PII.
Here are some examples:
Format:
Data, Score
Timestamps:
02-009-24 5:00:27.975, 0.9903076887130737
Text:
Nowater, 0.726058840751648
Shop - residential , 0.655426561832428
Brick, 0.8724555969238281
Number on its own:
1, 0.7731475830078125