This collection contains held-out splits for testing Flow-Judge-v0.1.
			
	
	Flow AI
company
						
	Verified
						
						
						AI & ML interests
LLM system evaluation, Automatic LM improvements
			Organization Card
		
		Flow AI is the system for evaluating and improving your LLM application.
			models
			7
		
			
	
	
	
	
	 
				flowaicom/Flow-Judge-v0.1-W8A16
		
				1B
			• 
	
				Updated
					
				
				• 
					
					2
				
	
				• 
					
					1
				
 
				flowaicom/Flow-Judge-v0.1-W4A16
		
				0.7B
			• 
	
				Updated
					
				
				• 
					
					2
				
	
				• 
					
					1
				
 
				flowaicom/Flow-Judge-v0.1-FP8
		
				4B
			• 
	
				Updated
					
				
				• 
					
					1
				
	
				• 
					
					1
				
 
				flowaicom/Flow-Judge-v0.1-AWQ
			Text Generation
			• 
		
				0.7B
			• 
	
				Updated
					
				
				• 
					
					377k
				
	
				• 
					
					6
				
 
				flowaicom/Flow-Judge-v0.1
			Text Generation
			• 
		
				4B
			• 
	
				Updated
					
				
				• 
					
					5.5k
				
	
				• 
					
					67
				
 
				flowaicom/Flow-Judge-v0.1-Llamafile
		
	
				Updated
					
				
				• 
					
					11
				
	
				• 
					
					1
				
 
				flowaicom/Flow-Judge-v0.1-GGUF
			Text Generation
			• 
		
				4B
			• 
	
				Updated
					
				
				• 
					
					29
				
	
				• 
					
					10
				
			datasets
			9
		
			
	
	
	
	
	flowaicom/legalbench_contracts_qa_subset
			Viewer
			• 
	
				Updated
					
				• 
			
			100
	
				• 
					
					8
				
				
				
flowaicom/Flow-Judge-v0.1-3-likert-heldout
			Viewer
			• 
	
				Updated
					
				• 
			
			300
	
				• 
					
					55
				
				
				
flowaicom/Flow-Judge-v0.1-5-likert-heldout
			Viewer
			• 
	
				Updated
					
				• 
			
			274
	
				• 
					
					26
				
				
				
flowaicom/Flow-Judge-v0.1-binary-heldout
			Viewer
			• 
	
				Updated
					
				• 
			
			316
	
				• 
					
					24
				
				
				
flowaicom/RAGTruth_test
			Viewer
			• 
	
				Updated
					
				• 
			
			2.7k
	
				• 
					
					11
				
				• 
					
					1
				
flowaicom/covid_qa
			Viewer
			• 
	
				Updated
					
				• 
			
			1k
	
				• 
					
					10
				
				
				
flowaicom/PubMedQA
			Viewer
			• 
	
				Updated
					
				• 
			
			1k
	
				• 
					
					10
				
				
				
flowaicom/HaluEval
			Viewer
			• 
	
				Updated
					
				• 
			
			10k
	
				• 
					
					60
				
				
				
flowaicom/Feedback-Bench
			Viewer
			• 
	
				Updated
					
				• 
			
			1k
	
				• 
					
					14