Add examples, remove warning
Browse files
README.md
CHANGED
|
@@ -20,9 +20,13 @@ widget:
|
|
| 20 |
- text: Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
|
| 21 |
to Paris.
|
| 22 |
example_title: Amelia Earhart
|
| 23 |
-
- text: Leonardo
|
| 24 |
Lisa del Giocondo.
|
| 25 |
example_title: Leonardo da Vinci
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
base_model: roberta-large
|
| 27 |
model-index:
|
| 28 |
- name: SpanMarker w. roberta-large on finegrained, supervised FewNERD by Tom Aarsen
|
|
@@ -150,7 +154,7 @@ from span_marker import SpanMarkerModel
|
|
| 150 |
# Download from the 🤗 Hub
|
| 151 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
|
| 152 |
# Run inference
|
| 153 |
-
entities = model.predict("Most of the Steven Seagal movie ``
|
| 154 |
```
|
| 155 |
|
| 156 |
### Downstream Use
|
|
@@ -178,11 +182,6 @@ trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finet
|
|
| 178 |
```
|
| 179 |
</details>
|
| 180 |
|
| 181 |
-
### ⚠️ Tokenizer Warning
|
| 182 |
-
The [roberta-large](https://huggingface.co/roberta-large) tokenizer distinguishes between punctuation directly attached to a word and punctuation separated from a word by a space. For example, `Paris.` and `Paris .` are tokenized into different tokens. During training, this model is only exposed to the latter style, i.e. all words are separated by a space. Consequently, the model may perform worse when the inference text is in the former style.
|
| 183 |
-
|
| 184 |
-
In short, it is recommended to preprocess your inference text such that all words and punctuation are separated by a space. Some potential approaches to convert regular text into this format are NLTK [`word_tokenize`](https://www.nltk.org/api/nltk.tokenize.word_tokenize.html) or spaCy [`Doc`](https://spacy.io/api/doc#iter) and join the resulting words with a space.
|
| 185 |
-
|
| 186 |
## Training Details
|
| 187 |
|
| 188 |
### Training Set Metrics
|
|
|
|
| 20 |
- text: Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
|
| 21 |
to Paris.
|
| 22 |
example_title: Amelia Earhart
|
| 23 |
+
- text: Leonardo da Vinci painted the Mona Lisa based on Italian noblewoman
|
| 24 |
Lisa del Giocondo.
|
| 25 |
example_title: Leonardo da Vinci
|
| 26 |
+
- text: Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones)
|
| 27 |
+
was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at
|
| 28 |
+
Battleship Memorial Park and open to the public.
|
| 29 |
+
example_title: Under Siege
|
| 30 |
base_model: roberta-large
|
| 31 |
model-index:
|
| 32 |
- name: SpanMarker w. roberta-large on finegrained, supervised FewNERD by Tom Aarsen
|
|
|
|
| 154 |
# Download from the 🤗 Hub
|
| 155 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
|
| 156 |
# Run inference
|
| 157 |
+
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")
|
| 158 |
```
|
| 159 |
|
| 160 |
### Downstream Use
|
|
|
|
| 182 |
```
|
| 183 |
</details>
|
| 184 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
## Training Details
|
| 186 |
|
| 187 |
### Training Set Metrics
|