Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -61,6 +61,8 @@ On the following parameters
|
|
| 61 |
- OpenZeppelin Compliance(%)--Adherence to OpenZeppelin library usage and standards.
|
| 62 |
- Gas Efficiency(%)--Degree of gas optimization based on Slither’s suggestions.
|
| 63 |
- Security(%)--Percentage of code free from common vulnerabilities detected by Slither.
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## Benchmark
|
| 66 |
Below is a figure summarizing the performance of each model across the four evaluation metrics.
|
|
@@ -275,7 +277,7 @@ contract DecentralizedLibrary is Ownable(msg.sender) {
|
|
| 275 |
# Evaluation Matrics
|
| 276 |
To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code.
|
| 277 |
|
| 278 |
-
We focused on
|
| 279 |
|
| 280 |
- **Compilation Success Rate**
|
| 281 |
We measured the percentage of generated smart contracts that compile successfully without modification. This helps assess the syntactic and structural correctness of the model outputs.
|
|
@@ -289,8 +291,16 @@ Using Slither’s gas optimization analysis, we identified areas in the generate
|
|
| 289 |
- **Security Vulnerabilities**
|
| 290 |
We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
|
| 291 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 292 |
These evaluation metrics help quantify the practical usability and reliability of the generated smart contracts in real-world scenarios.
|
| 293 |
|
| 294 |
|
|
|
|
|
|
|
| 295 |
# Summary
|
| 296 |
Model shows improved understanding and generation capabilities in Solidity when compared to baseline LLMs not trained on Solidity data.
|
|
|
|
| 61 |
- OpenZeppelin Compliance(%)--Adherence to OpenZeppelin library usage and standards.
|
| 62 |
- Gas Efficiency(%)--Degree of gas optimization based on Slither’s suggestions.
|
| 63 |
- Security(%)--Percentage of code free from common vulnerabilities detected by Slither.
|
| 64 |
+
- Average Lines of Code--How lengthy, complete,
|
| 65 |
+
- Correctness of Code--
|
| 66 |
|
| 67 |
## Benchmark
|
| 68 |
Below is a figure summarizing the performance of each model across the four evaluation metrics.
|
|
|
|
| 277 |
# Evaluation Matrics
|
| 278 |
To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code.
|
| 279 |
|
| 280 |
+
We focused on six key evaluation criteria:
|
| 281 |
|
| 282 |
- **Compilation Success Rate**
|
| 283 |
We measured the percentage of generated smart contracts that compile successfully without modification. This helps assess the syntactic and structural correctness of the model outputs.
|
|
|
|
| 291 |
- **Security Vulnerabilities**
|
| 292 |
We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
|
| 293 |
|
| 294 |
+
- **Average Lines of Code**
|
| 295 |
+
This metric provides insight into the verbosity or conciseness of the model’s output. Higher LOC may suggest redundancy or complete code, while lower LOC could indicate either efficiency or missing implementation details, depending on context.
|
| 296 |
+
|
| 297 |
+
- **Correctness of Code**
|
| 298 |
+
To assess how well the generated code aligns with the given prompt and category, We conducted both manual and OpenAI LLM evaluation of each generated contract. The prompt and the generated code were keenly observed for alignment analysis.
|
| 299 |
+
|
| 300 |
These evaluation metrics help quantify the practical usability and reliability of the generated smart contracts in real-world scenarios.
|
| 301 |
|
| 302 |
|
| 303 |
+
|
| 304 |
+
|
| 305 |
# Summary
|
| 306 |
Model shows improved understanding and generation capabilities in Solidity when compared to baseline LLMs not trained on Solidity data.
|