jamie8johnson commited on
Commit
37958cc
·
verified ·
1 Parent(s): 4baf009

Add model card documenting negative result

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - code
5
+ library_name: sentence-transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - cross-encoder
9
+ - reranker
10
+ - code-search
11
+ - onnx
12
+ - bert
13
+ datasets:
14
+ - code_search_net
15
+ license: apache-2.0
16
+ ---
17
+
18
+ # code-reranker-v1
19
+
20
+ A cross-encoder reranker for code search, trained on CodeSearchNet pairs. **Experimental — does not improve retrieval in our benchmarks.** Published for reproducibility.
21
+
22
+ ## Status: Negative Result
23
+
24
+ This reranker **regresses** retrieval quality on our hard eval (55 confusable function pairs):
25
+
26
+ | Config | Recall@1 | Delta |
27
+ |--------|----------|-------|
28
+ | No reranker | 90.9% | — |
29
+ | Web-trained cross-encoder | 80.0% | **-10.9pp** |
30
+ | **This model (code-trained)** | **9.1%** | **-81.8pp** |
31
+
32
+ **Root cause:** Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.
33
+
34
+ ## Training
35
+
36
+ - **Architecture:** Cross-encoder (BERT-base)
37
+ - **Data:** 50,000 CodeSearchNet pairs + 7,500 docstring pairs
38
+ - **Epochs:** 3
39
+ - **Negatives:** Random same-language (this was the mistake)
40
+
41
+ ## Usage (if you want to experiment)
42
+
43
+ ```bash
44
+ # In cqs — NOT default, opt-in only
45
+ CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank
46
+ ```
47
+
48
+ ## License
49
+
50
+ Apache 2.0.