Shreya Goyal
commited on
Commit
·
8a73979
1
Parent(s):
842c1e0
update readme for NJTs
Browse files
README.md
CHANGED
|
@@ -166,3 +166,20 @@ If you find our paper or models helpful, please consider cite as follows:
|
|
| 166 |
year={2025}
|
| 167 |
}
|
| 168 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
year={2025}
|
| 167 |
}
|
| 168 |
```
|
| 169 |
+
|
| 170 |
+
## Efficient DRAMA
|
| 171 |
+
### Nested Tensors
|
| 172 |
+
[Nested Tensors](https://docs.pytorch.org/docs/stable/nested.html) provide a way to handle ragged-shaped data within a single tensor, allowing for efficient operations on such data.
|
| 173 |
+
They store data in a compact packed representation while offering a standard PyTorch tensor interface, making it easy to apply various
|
| 174 |
+
operations.
|
| 175 |
+
Nested Tensors are particularly advantageous for model deployments that perform inference on large batches of sequences with varying
|
| 176 |
+
lengths. Traditional tensors require padding all sequences in a batch to the same length, which can be inefficient, especially when
|
| 177 |
+
the batch includesmany short sequences and a single long sequence. Nested Tensors eliminate the need for padding, thus avoiding
|
| 178 |
+
unnecessary computation on extra pad tokens. This results in more efficient processing of batches with varying sequence lengths.
|
| 179 |
+
|
| 180 |
+
### Performance
|
| 181 |
+
Experiments have demonstrated a 1.7x to 2.3x (base,large and 1B) improvement in queries per second (QPS) for batch inference with sequences of varied lengths.
|
| 182 |
+
|
| 183 |
+
### Usage
|
| 184 |
+
To enable Nested Tensors, simply set the use_nested variable to true. This will activate the nested jagged tensors and allow you to
|
| 185 |
+
take advantage of efficient inference.
|