arxiv:2305.13303

Towards Unsupervised Recognition of Semantic Differences in Related Documents

Published on May 22, 2023

Authors:

Jannis Vamvas ,

Abstract

Recognizing semantic differences is approached as a token-level regression task using unsupervised methods with masked language models, showing correlation with gold labels but room for improvement.

AI-generated summary

Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three unsupervised approaches that rely on a masked language model. To assess the approaches, we begin with basic English sentences and gradually move to more complex, cross-lingual document pairs. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels. However, all unsupervised approaches still leave a large margin of improvement. Code to reproduce our experiments is available at https://github.com/ZurichNLP/recognizing-semantic-differences

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Towards Unsupervised Recognition of Semantic Differences in Related Documents

Abstract

Community

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 1