Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -62,8 +62,6 @@ On the following parameters
|
|
| 62 |
- **Gas Efficiency(%)** - Degree of gas optimization based on Slither’s suggestions.
|
| 63 |
- **Security(%)** - Percentage of code free from common vulnerabilities detected by Slither.
|
| 64 |
- **Average Lines of Code** - Average number of non-empty, commented-included lines in generated contracts, indicating verbosity or conciseness
|
| 65 |
-
- **Correctness (OpenAI Evaluation)** – GPT-4o Mini-assessed alignment of generated code with prompt using a structured correctness rubric.
|
| 66 |
-
- **Correctness (Human Evaluation)** – Expert-reviewed rating of how well the generated contract fulfills the original prompt and intent.
|
| 67 |
|
| 68 |
## Benchmark
|
| 69 |
Below is a figure summarizing the performance of each model across the four evaluation metrics.
|
|
@@ -295,13 +293,6 @@ We analyzed each contract for known security vulnerabilities using Slither’s b
|
|
| 295 |
- **Average Lines of Code (LOC)**
|
| 296 |
Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
|
| 297 |
|
| 298 |
-
- **Correctness (OpenAI Evaluation)**
|
| 299 |
-
Evaluates how accurately the generated contract matches the intended prompt using GPT-4o Mini. Prompts and outputs are scored against a structured rubric, providing a scalable LLM-based perspective on prompt alignment.
|
| 300 |
-
|
| 301 |
-
- **Correctness (Human Evaluation)**
|
| 302 |
-
Involves manual review by a blockchain expert to assess how well the output satisfies the original prompt and category. This provides human-validated insight into the practical applicability and quality of the generated code.
|
| 303 |
-
|
| 304 |
-
|
| 305 |
These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
|
| 306 |
|
| 307 |
|
|
|
|
| 62 |
- **Gas Efficiency(%)** - Degree of gas optimization based on Slither’s suggestions.
|
| 63 |
- **Security(%)** - Percentage of code free from common vulnerabilities detected by Slither.
|
| 64 |
- **Average Lines of Code** - Average number of non-empty, commented-included lines in generated contracts, indicating verbosity or conciseness
|
|
|
|
|
|
|
| 65 |
|
| 66 |
## Benchmark
|
| 67 |
Below is a figure summarizing the performance of each model across the four evaluation metrics.
|
|
|
|
| 293 |
- **Average Lines of Code (LOC)**
|
| 294 |
Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
|
| 295 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
|
| 297 |
|
| 298 |
|