zli12321
/

answer_equivalence_roberta-large

Text Classification

question-answering

Model card Files Files and versions

Zongxia Li commited on Feb 21, 2024

Commit

766b554

·

verified ·

1 Parent(s): 0f4fd69

Update README.md

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ pipeline_tag: text-classification
 [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
-QA-Evaluation-Metrics is a fast and lightweight Python package for evaluating question-answering models. It provides various basic metrics to assess the performance of QA models. Check out our paper [**CFMatcher**](https://arxiv.org/abs/2401.13170), a matching method going beyond token-level matching and is more efficient than LLM matchings but still retains competitive evaluation performance of transformer LLM models.
 ## Installation
@@ -63,7 +63,7 @@ match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
 print("F1 Match: ", match_result)
 ```
-#### CFMatch
 ```python
 from qa_metrics.cfm import CFMatcher
@@ -76,13 +76,13 @@ print("Score: %s; bert Match: %s" % (scores, match_result))
 If you find this repo avialable, please cite our paper:
 ```bibtex
-@misc{li2024cfmatch,
-  title={CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering},
-  author={Zongxia Li and Ishani Mondal and Yijun Liang and Huy Nghiem and Jordan Boyd-Graber},
-  year={2024},
-  eprint={2401.13170},
-  archivePrefix={arXiv},
-  primaryClass={cs.CL}
 }
 ```

 [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
+QA-Evaluation-Metrics is a fast and lightweight Python package for evaluating question-answering models. It provides various basic metrics to assess the performance of QA models. Check out our paper [**PANDA**](https://arxiv.org/abs/2402.11161), a matching method going beyond token-level matching and is more efficient than LLM matchings but still retains competitive evaluation performance of transformer LLM models.
 ## Installation
 print("F1 Match: ", match_result)
 ```
+#### PANDA
 ```python
 from qa_metrics.cfm import CFMatcher
 If you find this repo avialable, please cite our paper:
 ```bibtex
+@misc{li2024panda,
+      title={PANDA (Pedantic ANswer-correctness Determination and Adjudication):Improving Automatic Evaluation for Question Answering and Text Generation},
+      author={Zongxia Li and Ishani Mondal and Yijun Liang and Huy Nghiem and Jordan Lee Boyd-Graber},
+      year={2024},
+      eprint={2402.11161},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
 }
 ```