Lexoid / leaderboard.csv
dilithjay's picture
Update benchmark and add document-wise bar plot
7cb4732
Model,sequence_matcher,cosine,jaccard,precision,recall,f1_score,Time (s),Cost ($)
AUTO,0.9050.111),0.9670.051),0.9440.069),0.9760.031),0.9660.061),0.9700.038),10.312432592565363,0.0006797545454545454
gemini-1.5-flash,0.8680.198),0.9650.041),0.9250.064),0.9480.056),0.9740.029),0.9600.035),17.19163768941706,0.0004351772727272727
gemini-1.5-pro,0.7820.341),0.8330.252),0.7690.309),0.7930.287),0.9320.185),0.8310.245),27.126133116808806,0.01274568181818182
gemini-2.0-flash,0.9000.127),0.9710.040),0.9460.067),0.9760.031),0.9670.060),0.9710.037),12.433907595547764,0.0008061636363636364
gemini-2.5-flash,0.9020.151),0.9840.030),0.9560.078),0.9860.019),0.9690.078),0.9760.045),48.67426510290666,0.010509618181818182
gemini-2.5-pro,0.9070.151),0.9730.053),0.9370.091),0.9820.023),0.9540.092),0.9650.054),22.231554529883645,0.023052272727272727
claude-opus-4-20250514,0.7980.230),0.8780.159),0.8090.238),0.8230.238),0.9720.038),0.8730.173),21.011434186588634,0.09233181818181818
claude-sonnet-4-20250514,0.8140.197),0.9030.150),0.8430.220),0.8620.219),0.9730.045),0.8980.156),21.98797264966098,0.020450454545454546
claude-3-7-sonnet-20250219,0.6340.395),0.7520.298),0.6670.338),0.7390.266),0.7950.323),0.7480.286),70.10332116213712,0.017747727272727273
claude-3-5-sonnet-20241022,0.8730.195),0.9370.095),0.8720.161),0.8910.160),0.9740.030),0.9230.108),16.859260472384367,0.017785909090909092
qwen/qwen-2.5-vl-7b-instruct,0.4690.364),0.6170.441),0.5600.421),0.5840.441),0.6930.446),0.6100.439),13.234280304475265,0.0005954181818181818
google/gemma-3-27b-it,0.6240.357),0.7500.327),0.6820.341),0.7010.338),0.9180.151),0.7550.304),24.50513126633384,0.00019525454545454547
microsoft/phi-4-multimodal-instruct,0.6650.258),0.8000.217),0.7380.221),0.7790.239),0.9440.050),0.8290.183),10.961827516555786,0.0004915272727272727
accounts/fireworks/models/llama4-maverick-instruct-basic,0.7920.206),0.9140.128),0.8430.201),0.8540.200),0.9810.025),0.9010.141),10.71379793773998,0.0014935999999999999
accounts/fireworks/models/llama4-scout-instruct-basic,0.8040.242),0.9310.067),0.8810.099),0.9160.076),0.9590.078),0.9340.058),9.759119467301803,0.0008719636363636363
gpt-4.1,0.6220.314),0.7820.191),0.6280.227),0.6830.224),0.8990.149),0.7490.180),34.65604066848755,0.014605454545454545
gpt-4.1-mini,0.7670.243),0.8070.197),0.7060.221),0.7510.229),0.9290.080),0.8070.178),22.64150684530085,0.003515527272727273
gpt-4o,0.7960.264),0.8980.117),0.8210.186),0.8670.178),0.9480.108),0.8900.123),28.233586398037996,0.014729545454545455
gpt-4o-mini,0.7270.245),0.8320.136),0.7550.209),0.7840.210),0.9510.048),0.8440.150),17.197155345569957,0.006503372727272727
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo,0.6770.226),0.8500.134),0.7800.166),0.8270.122),0.9190.130),0.8670.112),7.225299813530662,0.0001541290909090909
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo,0.5590.233),0.8220.119),0.6850.193),0.8070.155),0.8280.194),0.7990.139),27.7422512661327,0.011019381818181817
meta-llama/Llama-Vision-Free,0.6820.223),0.8470.135),0.7810.163),0.8280.126),0.9230.126),0.8680.111),12.31139588356018,0.0
ds4sd/SmolDocling-256M-preview,0.4860.378),0.5830.355),0.5100.348),0.5800.350),0.8220.301),0.6070.330),108.91359732367776,0.0
mistral-ocr-latest,0.8900.097),0.9300.095),0.9120.089),0.9340.073),0.9730.037),0.9520.051),5.688163432207975,0.0012727272727272728