Model Card Vuurwerkverkenner
This model, developed by the Netherlands Forensic Institute, is designed to link fragments from exploded fireworks to their corresponding firework types. An application utilizing this model is available at www.vuurwerkverkenner.nl.
Architecture
The classification process involves two components: an embedding model that generates embeddings, and a classification model that determines classifications based on the distances between these embeddings. While the classification component aids in model evaluation, in practice, the embedding model compares embeddings of wrappers in the database to the embedding of the snippet image provided. This setup allows for the addition of new wrappers without the need to retrain the model.
Embedding model
Initially, we train an embedding model that ensures similar embeddings for snippets from the same source, and diverse embeddings for snippets from different sources. This model is based on the Vision Transformer architecture (arXiv) and fine-tuned with the following specifications:
- Model: ViT-B/32, with an L2-normalized linear layer as embedding head
- Input: RBG image of 448x448 pixels
- Output/embedding layer size: 128
- Training loss: ProxyAnchorLoss ( see here) with margin = 0.5 and alpha = 64
- Fixed learning rate of 1e-4 for the model weights and 1e-2 for the proxy vectors with AdamW optimizer
- Batch size: 150
- Epochs: 20
Classification
To connect a snippet photo to a firework wrapper, reference embeddings are generated for comparison from a background dataset using the trained embedding model. Similarly, we generate an embedding for the snippet photo. Classification is achieved by calculating the cosine distance between the snippet photo embedding and the reference embeddings for each firework wrapper. The minimum distance among the reference embeddings determines the representative score for each category.
Text filter
A text filter can be optionally applied following classification, which matches fireworks labels based on text found on the snippet. The snippet text must be manually entered, and all text fragments must be present on the label to get a a match.
Data
The model is trained and evaluated using data from fireworks involved in cases at the Netherlands Forensic Institute since 2010. The dataset is divided into three parts, with the train and validation used in the training and model selection and final model in the application trained on all data except for a holdout set. Further information on the development and application data can be found here and here.
Real snippets
We have generated snippets for the available firework categories by detonating the fireworks. These real snippets (also called 'lab snippets') are photographed with a high-quality DSLR camera against a white background, with optimal lighting conditions. The snippets are segmented, distributed across train, validation, and holdout sets, and grouped into images containing 1 to 10 snippets.
Mock-crime scene snippets
In certain categories, we have created photos that imitate crime scene conditions, e.g. by using suboptimal lighting and/or a phone camera. To optimize model performance, less background noise is desirable, hence photos are created with snippets set against 'DNA blankets,' providing a somewhat uniform background.
Artificial snippets
To ensure the embedding model outputs embeddings for all firework wrappers, including those without real snippets, we create 'artificial snippets' by randomly cropping wrapper images. Each artificial snippet image comprises 1 to 10 snippet pieces, creating a number of images per wrapper. An additional set is generated for each wrapper to serve as the reference dataset of which embeddings are stored for comparisons against the provided image in the application.
Evaluation
To assess differences in performance across conditions, we formulate a test set featuring artificial, real, and mock-pd images. The evaluation encompasses the entire set and reviews snippet types and performance across categories with numerous similar wrappers.
Metrics
| Metric | Value |
|---|---|
| RecallAtKValidator(k=1) | 0.9475017269168777 |
| RecallAtKValidator(k=3) | 0.9715634354133088 |
| RecallAtKValidator(k=5) | 0.9757080359198711 |
| CategoricalRecallAtKValidator(k=1) | 0.985493898227032 |
| CategoricalRecallAtKValidator(k=5) | 0.9945889937830993 |
Limitations
The evaluation results may not depict the model's real-world performance due to several factors. Training and testing have occurred exclusively with snippets featuring plain backgrounds and optimal lighting. This might not always be achievable in practice, as model performance is likely heightened with better-quality photos, ample distinctive snippets, and properly entered text. Conversely, performance may diminish when these criteria are unmet. Additionally, if the firework type under scrutiny is novel or rare, it may be absent from the reference database and thus unattainable by the model.
Using the model
This model is intended for use with the Vuurwerkverkenner application, which includes the necessary code for operation. The application's source code can be accessed on GitHub.