Handle truncated image boundaries in `_convert` to avoid tensor size mismatch
#54
by
maikezu
- opened
Summary
This PR proposes a change in _convert to handle cases where truncation (max_inp_length)
could leave an unmatched <im_start> (or <slice_start>) token without its closing <im_end> / <slice_end>.
When this happens, image_start_idx and image_end_idx have different lengths,
causing a runtime error in line 274:
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size x but got size x-1 for tensor number 1 in the list.
Changes
- Changed
valid_image_numsfrommax(len(start), len(end))tomin(len(start), len(end))
→ only keep valid start–end pairs