Fix broken URLs
Browse files
    	
        README.md
    CHANGED
    
    | @@ -50,14 +50,14 @@ model-index: | |
| 50 |  | 
| 51 | 
             
            # SpanMarker with roberta-large on FewNERD
         | 
| 52 |  | 
| 53 | 
            -
            This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [roberta-large](https://huggingface.co/ | 
| 54 |  | 
| 55 | 
             
            ## Model Details
         | 
| 56 |  | 
| 57 | 
             
            ### Model Description
         | 
| 58 |  | 
| 59 | 
             
            - **Model Type:** SpanMarker
         | 
| 60 | 
            -
            - **Encoder:** [roberta-large](https://huggingface.co/ | 
| 61 | 
             
            - **Maximum Sequence Length:** 256 tokens
         | 
| 62 | 
             
            - **Maximum Entity Length:** 8 words
         | 
| 63 | 
             
            - **Training Dataset:** [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd)
         | 
| @@ -179,7 +179,7 @@ trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finet | |
| 179 | 
             
            </details>
         | 
| 180 |  | 
| 181 | 
             
            ### ⚠️ Tokenizer Warning
         | 
| 182 | 
            -
            The [roberta-large](https://huggingface.co/ | 
| 183 |  | 
| 184 | 
             
            In short, it is recommended to preprocess your inference text such that all words and punctuation are separated by a space. Some potential approaches to convert regular text into this format are NLTK [`word_tokenize`](https://www.nltk.org/api/nltk.tokenize.word_tokenize.html) or spaCy [`Doc`](https://spacy.io/api/doc#iter) and join the resulting words with a space.
         | 
| 185 |  | 
|  | |
| 50 |  | 
| 51 | 
             
            # SpanMarker with roberta-large on FewNERD
         | 
| 52 |  | 
| 53 | 
            +
            This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [roberta-large](https://huggingface.co/roberta-large) as the underlying encoder. See [train.py](train.py) for the training script.
         | 
| 54 |  | 
| 55 | 
             
            ## Model Details
         | 
| 56 |  | 
| 57 | 
             
            ### Model Description
         | 
| 58 |  | 
| 59 | 
             
            - **Model Type:** SpanMarker
         | 
| 60 | 
            +
            - **Encoder:** [roberta-large](https://huggingface.co/roberta-large)
         | 
| 61 | 
             
            - **Maximum Sequence Length:** 256 tokens
         | 
| 62 | 
             
            - **Maximum Entity Length:** 8 words
         | 
| 63 | 
             
            - **Training Dataset:** [FewNERD](https://huggingface.co/datasets/DFKI-SLT/few-nerd)
         | 
|  | |
| 179 | 
             
            </details>
         | 
| 180 |  | 
| 181 | 
             
            ### ⚠️ Tokenizer Warning
         | 
| 182 | 
            +
            The [roberta-large](https://huggingface.co/roberta-large) tokenizer distinguishes between punctuation directly attached to a word and punctuation separated from a word by a space. For example, `Paris.` and `Paris .` are tokenized into different tokens. During training, this model is only exposed to the latter style, i.e. all words are separated by a space. Consequently, the model may perform worse when the inference text is in the former style.
         | 
| 183 |  | 
| 184 | 
             
            In short, it is recommended to preprocess your inference text such that all words and punctuation are separated by a space. Some potential approaches to convert regular text into this format are NLTK [`word_tokenize`](https://www.nltk.org/api/nltk.tokenize.word_tokenize.html) or spaCy [`Doc`](https://spacy.io/api/doc#iter) and join the resulting words with a space.
         | 
| 185 |  | 
