muhammad-mujtaba-ai commited on
Commit
4c44977
·
verified ·
1 Parent(s): f184350

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -9
README.md CHANGED
@@ -62,8 +62,6 @@ On the following parameters
62
  - **Gas Efficiency(%)** - Degree of gas optimization based on Slither’s suggestions.
63
  - **Security(%)** - Percentage of code free from common vulnerabilities detected by Slither.
64
  - **Average Lines of Code** - Average number of non-empty, commented-included lines in generated contracts, indicating verbosity or conciseness
65
- - **Correctness (OpenAI Evaluation)** – GPT-4o Mini-assessed alignment of generated code with prompt using a structured correctness rubric.
66
- - **Correctness (Human Evaluation)** – Expert-reviewed rating of how well the generated contract fulfills the original prompt and intent.
67
 
68
  ## Benchmark
69
  Below is a figure summarizing the performance of each model across the four evaluation metrics.
@@ -295,13 +293,6 @@ We analyzed each contract for known security vulnerabilities using Slither’s b
295
  - **Average Lines of Code (LOC)**
296
  Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
297
 
298
- - **Correctness (OpenAI Evaluation)**
299
- Evaluates how accurately the generated contract matches the intended prompt using GPT-4o Mini. Prompts and outputs are scored against a structured rubric, providing a scalable LLM-based perspective on prompt alignment.
300
-
301
- - **Correctness (Human Evaluation)**
302
- Involves manual review by a blockchain expert to assess how well the output satisfies the original prompt and category. This provides human-validated insight into the practical applicability and quality of the generated code.
303
-
304
-
305
  These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
306
 
307
 
 
62
  - **Gas Efficiency(%)** - Degree of gas optimization based on Slither’s suggestions.
63
  - **Security(%)** - Percentage of code free from common vulnerabilities detected by Slither.
64
  - **Average Lines of Code** - Average number of non-empty, commented-included lines in generated contracts, indicating verbosity or conciseness
 
 
65
 
66
  ## Benchmark
67
  Below is a figure summarizing the performance of each model across the four evaluation metrics.
 
293
  - **Average Lines of Code (LOC)**
294
  Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
295
 
 
 
 
 
 
 
 
296
  These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
297
 
298