fjmgAI
/

col1-210M-EuroBERT

@@ -28,40 +28,36 @@ model-index:
     - type: accuracy
       value: 0.9848384857177734
       name: Accuracy
 ---
-# PyLate model based on EuroBERT/EuroBERT-210m
-This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) on the [rag-comprehensive-triplets](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets) dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
-## Model Details
-### Model Description
-- **Model Type:** PyLate model
-- **Base model:** [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) <!-- at revision 5a0c63d3e255a4f2005d3591d5508b7fd07cae94 -->
-- **Document Length:** 180 tokens
-- **Query Length:** 32 tokens
-- **Output Dimensionality:** 128 tokens
-- **Similarity Function:** MaxSim
-- **Training Dataset:**
-    - [rag-comprehensive-triplets](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets)
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [PyLate Documentation](https://lightonai.github.io/pylate/)
-- **Repository:** [PyLate on GitHub](https://github.com/lightonai/pylate)
-- **Hugging Face:** [PyLate models on Hugging Face](https://huggingface.co/models?library=PyLate)
-### Full Model Architecture
-```
-ColBERT(
-  (0): Transformer({'max_seq_length': 31, 'do_lower_case': False}) with Transformer model: EuroBertModel
-  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
-)
-```
 ## Usage
 First install the PyLate library:
@@ -70,379 +66,79 @@ First install the PyLate library:
 pip install -U pylate
 ```
-### Retrieval
-PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
-#### Indexing documents
-First, load the ColBERT model and initialize the Voyager index, then encode and index your documents:
-```python
-from pylate import indexes, models, retrieve
-# Step 1: Load the ColBERT model
-model = models.ColBERT(
-    model_name_or_path=pylate_model_id,
-)
-# Step 2: Initialize the Voyager index
-index = indexes.Voyager(
-    index_folder="pylate-index",
-    index_name="index",
-    override=True,  # This overwrites the existing index if any
-)
-# Step 3: Encode the documents
-documents_ids = ["1", "2", "3"]
-documents = ["document 1 text", "document 2 text", "document 3 text"]
-documents_embeddings = model.encode(
-    documents,
-    batch_size=32,
-    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
-    show_progress_bar=True,
-)
-# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
-index.add_documents(
-    documents_ids=documents_ids,
-    documents_embeddings=documents_embeddings,
-)
-```
-Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:
-```python
-# To load an index, simply instantiate it with the correct folder/name and without overriding it
-index = indexes.Voyager(
-    index_folder="pylate-index",
-    index_name="index",
-)
-```
-#### Retrieving top-k documents for queries
-Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
-To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
-```python
-# Step 1: Initialize the ColBERT retriever
-retriever = retrieve.ColBERT(index=index)
-# Step 2: Encode the queries
-queries_embeddings = model.encode(
-    ["query for document 3", "query for document 1"],
-    batch_size=32,
-    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
-    show_progress_bar=True,
-)
-# Step 3: Retrieve top-k documents
-scores = retriever.retrieve(
-    queries_embeddings=queries_embeddings,
-    k=10,  # Retrieve the top 10 matches for each query
-)
-```
-### Reranking
-If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
 ```python
-from pylate import rank, models
-queries = [
-    "query A",
-    "query B",
-]
-documents = [
-    ["document A", "document B"],
-    ["document 1", "document C", "document B"],
-]
-documents_ids = [
-    [1, 2],
-    [1, 3, 2],
-]
-model = models.ColBERT(
-    model_name_or_path=pylate_model_id,
-)
-queries_embeddings = model.encode(
-    queries,
-    is_query=True,
-)
-documents_embeddings = model.encode(
-    documents,
-    is_query=False,
-)
-reranked_documents = rank.rerank(
-    documents_ids=documents_ids,
-    queries_embeddings=queries_embeddings,
-    documents_embeddings=documents_embeddings,
-)
 ```
-<!--
-### Direct Usage (Transformers)
-<details><summary>Click to see the direct usage in Transformers</summary>
-</details>
--->
-<!--
-### Downstream Usage (Sentence Transformers)
-You can finetune this model on your own dataset.
-<details><summary>Click to expand</summary>
-</details>
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-## Evaluation
-### Metrics
-#### Col BERTTriplet
-* Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
-| Metric       | Value      |
-|:-------------|:-----------|
-| **accuracy** | **0.9848** |
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
-## Training Details
-### Training Dataset
-#### rag-comprehensive-triplets
-* Dataset: [rag-comprehensive-triplets](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets) at [678e83e](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets/tree/678e83ed6a74d17c38b33344168abc7787e39754)
-* Size: 909,188 training samples
-* Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, <code>original_id</code>, <code>dataset_source</code>, <code>category</code>, and <code>language</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | query                                                                            | positive                                                                          | negative                                                                          | original_id                                                                     | dataset_source                                                                    | category                                                                        | language                                                                       |
-  |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
-  | type    | string                                                                           | string                                                                            | string                                                                            | string                                                                          | string                                                                            | string                                                                          | string                                                                         |
-  | details | <ul><li>min: 8 tokens</li><li>mean: 23.7 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 28.42 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 29.19 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.93 tokens</li><li>max: 4 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 16.0 tokens</li><li>max: 16 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 4.62 tokens</li><li>max: 5 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.0 tokens</li><li>max: 3 tokens</li></ul> |
-* Samples:
-  | query                                                                                                                                                                                                                                          | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | original_id        | dataset_source                                                 | category                      | language        |
-  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------|:---------------------------------------------------------------|:------------------------------|:----------------|
-  | <code>Escriba una historia sobre un viaje de pesca en hielo en Minnesota que incluya todos los detalles importantes, desde la ropa hasta el equipo de pesca, y que también mencione la importancia de la seguridad y los buenos modales</code> | <code>Ah, Â¡invierno! Es hora de ponerse la ropa interior larga. Ponte unos calcetines de lana y un jersey. Ponte los pantalones de nieve. Ponte el gorro de media. Coge la caÃ±a de pescar y el cubo de cebo.<br>Hay hielo en el lago, y es la estaciÃ³n de disfrutar de una autÃ©ntica aventura en Minnesota: la pesca en hielo. No te preocupes por pasar frÃo o aburrirte en un lago helado. Pescar en el hielo es fÃ¡cil y emocionante. Es divertido caminar por el hielo imaginando soles hambrientos o morsas acechando debajo. Es una aventura pasar el rato alrededor de un agujero de hielo con los amigos y la familia, contando historias y sujetando una caÃ±a de pescar de aspecto gracioso mientras esperÃ¡is un bocado. Y es emocionante cuando tu bobber se desvanece de repente en el agujero y sacas un pez escurridizo del agua con un chapoteo. AsÃ que coge a un adulto, un termo de cacao caliente y prepÃ¡rate para una aventura de pesca en el hielo.<br><br>Empieza con una visita a tu tienda de cebos local o a la oficin...</code> | <code>Una aventura de pesca en hielo en Minnesota puede ser una experiencia emocionante y divertida, siempre y cuando se estÃ© preparado con la ropa y el equipo adecuados para el verano</code>                                                                                                                                                                                                                                                                                              | <code>10954</code> | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>creative_writing</code> | <code>es</code> |
-  | <code>¿Cuáles son los materiales necesarios para realizar la impresión con bloques?</code>                                                                                                                                                     | <code>La impresiÃ³n con bloques es una forma de arte en la que el artista talla un bloque (normalmente de vinilo o goma) y utiliza tinta para imprimir la imagen. Los materiales necesarios para ello son el bloque para tallar, una herramienta de tallado, un rodillo para aplicar la tinta, tinta, papel o material para la imagen. TambiÃ©n se necesita una superficie lisa y plana para extender la tinta; una pequeÃ±a lÃ¡mina de cristal o plexiglÃ¡s funciona bien para ello.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <code>La impresiÃ³n con bloques es una forma de arte en la que el artista talla un bloque (normalmente de madera o cartÃ³n) y utiliza tinta para imprimir la imagen. Los materiales necesarios para ello son el bloque para tallar, una herramienta de tallado, un rodillo para aplicar la tinta, tinta, papel o material para la imagen. TambiÃ©n se necesita una superficie rugosa y curva para extender la tinta; una pequeÃ±a lÃ¡mina de plÃ¡stico o tela funciona bien para ello.</code> | <code>13815</code> | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>brainstorming</code>    | <code>es</code> |
-  | <code>¿Cuál es el propósito de la Primera Enmienda de la Constitución de Estados Unidos?</code>                                                                                                                                                | <code>La Primera Enmienda garantiza la libertad de expresión y de culto en Estados Unidos.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <code>La Primera Enmienda garantiza la libertad de asociación y de expresión en Estados Unidos.</code>                                                                                                                                                                                                                                                                                                                                                                                        | <code>4168</code>  | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>open_qa</code>          | <code>es</code> |
-* Loss: <code>pylate.losses.contrastive.Contrastive</code>
-### Evaluation Dataset
-#### rag-comprehensive-triplets
-* Dataset: [rag-comprehensive-triplets](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets) at [678e83e](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets/tree/678e83ed6a74d17c38b33344168abc7787e39754)
-* Size: 909,188 evaluation samples
-* Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, <code>original_id</code>, <code>dataset_source</code>, <code>category</code>, and <code>language</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | query                                                                             | positive                                                                         | negative                                                                          | original_id                                                                     | dataset_source                                                                    | category                                                                        | language                                                                       |
-  |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                           | string                                                                            | string                                                                          | string                                                                            | string                                                                          | string                                                                         |
-  | details | <ul><li>min: 6 tokens</li><li>mean: 23.24 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 28.7 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 29.32 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.94 tokens</li><li>max: 4 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 16.0 tokens</li><li>max: 16 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 4.65 tokens</li><li>max: 5 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.0 tokens</li><li>max: 3 tokens</li></ul> |
-* Samples:
-  | query                                                                                                  | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | negative                                                                                                                                                                                           | original_id        | dataset_source                                                 | category                            | language        |
-  |:-------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------|:---------------------------------------------------------------|:------------------------------------|:----------------|
-  | <code>¿Alguien puede decirme sobre el Firefly Music Festival que dura 4 días?</code>                   | <code>El Festival de Música Firefly es un evento multigénero que se celebra en Dover, Delaware, y que comenzó en 2012</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | <code>El Festival de Música Firefly es un evento de música en vivo que se celebra en Dover, Delaware, y que comenzó en 2015, con una duración de 5 días y géneros como rock, pop y hip hop.</code> | <code>3446</code>  | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>open_qa</code>                | <code>es</code> |
-  | <code>¿Cuáles son los nombres de los siete países alpinos de oeste a este?</code>                      | <code>Los Alpes (/Ã¦lps/) son la cadena montaÃ±osa mÃ¡s alta y extensa de Europa, que se extiende aproximadamente 1.200 km a travÃ©s de siete paÃses alpinos (de oeste a este): Francia, Suiza, Italia, Liechtenstein, Austria, Alemania y Eslovenia.<br>El arco alpino se extiende desde Niza, en el MediterrÃ¡neo occidental, hasta Trieste, en el AdriÃ¡tico, y Viena, en el inicio de la cuenca panÃ³nica. Las montaÃ±as se formaron a lo largo de decenas de millones de aÃ±os al chocar las placas tectÃ³nicas africana y euroasiÃ¡tica. El acortamiento extremo provocado por este acontecimiento hizo que las rocas sedimentarias marinas se elevaran por empuje y plegamiento hasta formar altos picos montaÃ±osos como el Mont Blanc y el Cervino.<br>El Mont Blanc se extiende por la frontera franco-italiana y, con 4.809 m, es la montaÃ±a mÃ¡s alta de los Alpes. La zona de los Alpes contiene 128 picos de mÃ¡s de 4.000 m de altura.</code> | <code>La regiÃ³n de los Alpes se extiende a lo largo de ocho paÃses, desde Francia en el oeste hasta HungrÃa en el este.</code>                                                                  | <code>13897</code> | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>information_extraction</code> | <code>es</code> |
-  | <code>quiero saber si estos números son pares o no: 13, 200, 334, 420, 5, 12, ¿me puedes decir?</code> | <code>13: Impar<br>200: Pares<br>334: Par<br>420: Par<br>5: Impar<br>12: Par</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | <code>13 es par, 200 es impar, 334 es par, 420 es par, 5 es par, 12 es impar</code>                                                                                                                | <code>12562</code> | <code>argilla/databricks-dolly-15k-curated-multilingual</code> | <code>classification</code>         | <code>es</code> |
-* Loss: <code>pylate.losses.contrastive.Contrastive</code>
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `eval_strategy`: steps
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `learning_rate`: 2e-05
-- `num_train_epochs`: 1
-- `fp16`: True
-- `load_best_model_at_end`: True
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: False
-- `do_predict`: False
-- `eval_strategy`: steps
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 1
-- `eval_accumulation_steps`: None
-- `torch_empty_cache_steps`: None
-- `learning_rate`: 2e-05
-- `weight_decay`: 0.0
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 1
-- `max_steps`: -1
-- `lr_scheduler_type`: linear
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.0
-- `warmup_steps`: 0
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: False
-- `fp16`: True
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: None
-- `local_rank`: 0
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: False
-- `dataloader_num_workers`: 0
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: True
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: False
-- `resume_from_checkpoint`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_private_repo`: None
-- `hub_always_push`: False
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
-- `include_inputs_for_metrics`: False
-- `include_for_metrics`: []
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `dispatch_batches`: None
-- `split_batches`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `eval_on_start`: False
-- `use_liger_kernel`: False
-- `eval_use_gather_object`: False
-- `average_tokens_across_devices`: False
-- `prompts`: None
-- `batch_sampler`: batch_sampler
-- `multi_dataset_batch_sampler`: proportional
-</details>
-### Training Logs
-| Epoch      | Step     | Training Loss | Validation Loss | accuracy |
-|:----------:|:--------:|:-------------:|:---------------:|:--------:|
-| 0.1065     | 500      | 1.6396        | -               | -        |
-| 0          | 0        | -             | -               | 0.8016   |
-| 0.1065     | 500      | -             | 0.8725          | -        |
-| 0.2131     | 1000     | 0.699         | -               | -        |
-| 0          | 0        | -             | -               | 0.8968   |
-| 0.2131     | 1000     | -             | 0.5092          | -        |
-| 0.3196     | 1500     | 0.4315        | -               | -        |
-| 0          | 0        | -             | -               | 0.9242   |
-| 0.3196     | 1500     | -             | 0.3369          | -        |
-| 0.4262     | 2000     | 0.2833        | -               | -        |
-| 0          | 0        | -             | -               | 0.9522   |
-| 0.4262     | 2000     | -             | 0.2331          | -        |
-| 0.5327     | 2500     | 0.1848        | -               | -        |
-| 0          | 0        | -             | -               | 0.9661   |
-| 0.5327     | 2500     | -             | 0.1655          | -        |
-| 0.6392     | 3000     | 0.1317        | -               | -        |
-| 0          | 0        | -             | -               | 0.9776   |
-| 0.6392     | 3000     | -             | 0.1162          | -        |
-| 0.7458     | 3500     | 0.0975        | -               | -        |
-| 0          | 0        | -             | -               | 0.9815   |
-| 0.7458     | 3500     | -             | 0.0947          | -        |
-| 0.8523     | 4000     | 0.0716        | -               | -        |
-| 0          | 0        | -             | -               | 0.9815   |
-| 0.8523     | 4000     | -             | 0.0806          | -        |
-| **0.9589** | **4500** | **0.059**     | **-**           | **-**    |
-| 0          | 0        | -             | -               | 0.9848   |
-| **0.9589** | **4500** | **-**         | **0.0673**      | **-**    |
-* The bold row denotes the saved checkpoint.
-### Framework Versions
 - Python: 3.10.12
 - Sentence Transformers: 3.4.1
 - PyLate: 1.1.7
@@ -452,48 +148,11 @@ You can finetune this model on your own dataset.
 - Datasets: 3.3.1
 - Tokenizers: 0.21.0
-## Citation
-### BibTeX
-#### Sentence Transformers
-```bibtex
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://arxiv.org/abs/1908.10084"
-}
-```
-#### PyLate
-```bibtex
-@misc{PyLate,
-title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
-author={Chaffin, Antoine and Sourty, Raphaël},
-url={https://github.com/lightonai/pylate},
-year={2024}
-}
-```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

     - type: accuracy
       value: 0.9848384857177734
       name: Accuracy
+license: apache-2.0
+language:
+- es
+- en
 ---
+[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
+## Fine-Tuned Model
+**`fjmgAI/col1-210M-EuroBERT`**
+## Base Model
+**`EuroBERT/EuroBERT-210m`**
+## Fine-Tuning Method
+Fine-tuning was performed using **[PyLate](https://github.com/lightonai/pylate)**, with contrastive training on the [rag-comprehensive-triplets](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets) dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
+## Dataset
+**[`baconnier/rag-comprehensive-triplets`](https://huggingface.co/datasets/baconnier/rag-comprehensive-triplets)**
+### Description
+This dataset has been filtered for the Spanish language containing **303,000 examples**, designed for **rag-comprehensive-triplets**.
+## Fine-Tuning Details
+- The model was trained using the **Contrastive Training**.
+- * Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
+| Metric       | Value      |
+|:-------------|:-----------|
+| **accuracy** | **0.9848** |
 ## Usage
 First install the PyLate library:
 pip install -U pylate
 ```
+### Calculate Similarity
 ```python
+import torch
+from pylate import models
+# Load the ColBERT model from Hugging Face Hub
+# 'trust_remote_code=True' is required for custom models like ColBERT
+model = models.ColBERT("fjmgAI/col1-210M-EuroBERT", trust_remote_code=True)
+# Move the model to GPU if available, otherwise use CPU
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+# Example data for similarity comparison
+query = "¿Cuál es la capital de España?"  # Query sentence
+positive_doc = "La capital de España es Madrid."  # Relevant document
+negative_doc = "Florida es un estado en los Estados Unidos."  # Irrelevant document
+sentences = [query, positive_doc, negative_doc]  # Combine all texts
+# Tokenize the input sentences using ColBERT's tokenizer
+# This converts text to token IDs and attention masks
+inputs = model.tokenize(sentences)
+# Move all input tensors to the same device as the model (GPU/CPU)
+inputs = {key: value.to(device) for key, value in inputs.items()}
+# Generate token embeddings (no gradients needed for inference)
+with torch.no_grad():
+    # Forward pass through the model
+    embeddings_dict = model(inputs)  # Returns dictionary with model outputs
+    # Extract token-level embeddings (shape: [batch_size, seq_length, embedding_dim])
+    embeddings = embeddings_dict['token_embeddings']
+    print(embeddings.shape)  # Expected: [3, 32, 128] (3 texts, 32 tokens max, 128-dim embeddings)
+# Define ColBERT's MaxSim similarity function
+def colbert_similarity(query_emb, doc_emb):
+    """
+    Computes ColBERT-style similarity between query and document embeddings.
+    Uses maximum similarity (MaxSim) between individual tokens.
+    Args:
+        query_emb: [query_tokens, embedding_dim]
+        doc_emb: [doc_tokens, embedding_dim]
+    Returns:
+        Normalized similarity score
+    """
+    # Compute dot product between all token pairs
+    similarity_matrix = torch.matmul(query_emb, doc_emb.T)  # [query_tokens, doc_tokens]
+    # Get maximum similarity for each query token (MaxSim)
+    max_similarities = similarity_matrix.max(dim=1)[0]
+    # Return average of maximum similarities (normalized by query length)
+    return max_similarities.sum() / query_emb.shape[0]
+# Extract embeddings for each text
+query_emb = embeddings[0]  # [32, 128] - Query embeddings
+positive_emb = embeddings[1]  # [32, 128] - Positive document embeddings
+negative_emb = embeddings[2]  # [32, 128] - Negative document embeddings
+# Compute similarity scores
+positive_score = colbert_similarity(query_emb, positive_emb)  # Query vs positive doc
+negative_score = colbert_similarity(query_emb, negative_emb)  # Query vs negative doc
+# Print results (move scores to CPU first if using GPU)
+print(f"Similarity with positive document: {positive_score.item():.4f}")
+print(f"Similarity with negative document: {negative_score.item():.4f}")
 ```
+## Framework Versions
 - Python: 3.10.12
 - Sentence Transformers: 3.4.1
 - PyLate: 1.1.7
 - Datasets: 3.3.1
 - Tokenizers: 0.21.0
+## Purpose
+This tuned model is designed for **Spanish applications** that require the use of **efficient semantic search** comparing embeddings at the token level with its MaxSim operation, ideal for **question-answering and document retrieval**.
+- **Developed by:** fjmgAI
+- **License:** apache-2.0
+[<img src="https://github.com/lightonai/pylate/blob/main/docs/img/logo.png?raw=true" width="200"/>](https://github.com/lightonai/pylate)