Solidity LLM : AI for BlockChain

Solidity LLM : AI for BlockChain

Solidity LLM : AI for BlockChain

LLM understand, generate, and analyze smart contracts written in Solidity

Photo by Traxer on Unsplash

LLMs have been fine-tuned for a number of usecases, be it finance, language, codes, etc. But for the 1st time, an LLM has been fine-tuned on Solidity, a language for BlockChain

What is Solidity-LLM?

Solidity-Code-LLM is a fine-tuned AI model made to help developers write, understand, and analyze smart contracts written in Solidity. It’s developed by ChainGPT, a company focused on AI tools for blockchain and Web3. This model doesn’t try to do everything. It’s built only for one task: Solidity smart contracts.

It’s based on the codegen-2B-multi model from Salesforce, with 2 billion parameters. The model was trained in two steps. First, it learned from a large collection of raw Solidity code. Then it was fine-tuned using cleaner, well-structured examples of real contracts. This training makes it more accurate and reliable than general coding models when working specifically with Ethereum-based applications.

The model is open-sourced and is available below:

Chain-GPT/Solidity-LLM · Hugging Face

Who Built It and How

  • Developer: ChainGPT
  • Base Model: Salesforce/codegen-2B-multi
  • License: MIT (Open-source)
  • Parameters: 2 billion
  • Tokenizer: GPT2Tokenizer
  • Max Context Length: 2048 tokens
  • Precision: bfloat16
  • Demo: Available on Hugging Face

ChainGPT-Solidity-LLM – a Hugging Face Space by Chain-GPT

It was trained on a compute cluster with 4 GPUs (each 80GB) for over 1,095 hours (about 1.5 months). That’s a long, focused training cycle just on Solidity contracts.

What Makes It Special

Most large models today are trained to handle many tasks across many languages. Solidity-Code-LLM is not like that. It only cares about one language: Solidity. And that’s why it does a better job when it comes to smart contracts.

It was evaluated across five important metrics:

Compilation Rate:

  • 83% of contracts it generates compile successfully without needing changes.
  • This means it understands Solidity syntax and structure well.

OpenZeppelin Compliance:

  • 65% of contracts follow standard patterns using OpenZeppelin libraries.
  • That’s important because OpenZeppelin is widely used and trusted in the blockchain world.

Gas Efficiency:

  • 72% efficiency score using Slither (a static analysis tool).
  • It writes contracts that cost less to run.

Security Score:

  • 58% of code is free from common vulnerabilities, again tested with Slither.
  • Decent performance, though there’s still room to improve.

Code Length (LOC):

  • Scores around 70% meaning the contracts are neither too long nor too short.
  • The code is usually clean and not full of clutter.

What You Can Use It For

This model is a useful tool for Solidity developers and learners.

You can use it to:

  • Write smart contracts faster.
  • Learn Solidity by asking for examples.
  • Generate contract templates (ERC-20, ERC-721, DAO, governance, etc).
  • Create simple documentation for your contracts.
  • Integrate it into smart contract IDEs or blockchain developer tools.

But avoid using it to:

  • Write code in other languages like Python or JavaScript it’s not trained for that.
  • Skip manual review and testing. Never use the generated contracts directly in production without auditing.
  • Replace a human auditor. It can miss subtle security issues.

How It Was Trained (More Details)

Pre-training

  • Used 1 billion tokens of raw, unfiltered Solidity code.
  • These were public contracts, scraped from the internet.

Fine-tuning

  • Focused only on:
  • Solidity version 0.5 or later.
  • Contracts between 200 to 4000 tokens.
  • Clean contracts without duplicate code, useless comments, or broken imports.
  • Only kept contracts that could actually compile and run.

Instruction Tuning

  • Used 650,000 prompt-response pairs.
  • This helps the model answer questions and follow specific instructions better.

Evaluation Tool:

  • Slither was used to test the contracts. It checks for:
  • Compilation errors
  • Security risks
  • Gas usage issues
  • OpenZeppelin usage
  • Code structure

Risks and Limitations

  • The model can reflect outdated or bad practices from the web.
  • It might generate code that looks correct but isn’t safe or logical.
  • It should not be used to deploy contracts without expert review.
  • It doesn’t replace human understanding or testing.

ChainGPT clearly states that this tool is an assistant not a replacement for developer experience or security audits.

Summary

Solidity-Code-LLM is not the biggest model, but it’s one of the most focused ones. It’s built only for Solidity, and that gives it an edge. It compiles code well, uses trusted libraries, writes gas-efficient code, and keeps things readable.

Compared to much bigger models like GPT-4.5 or Qwen 7B, it holds its own even outperforming them in certain areas like gas usage. It’s a good example of how smaller, task-specific models can be more useful than large general-purpose ones.

If you’re building smart contracts and want a tool that helps without getting in the way, this is a solid option. Just remember: always review the output before trusting it with real money or live deployments.


Solidity LLM : AI for BlockChain was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

Flux.1 Krea dev : Best Photorealistic AI Image Generation model

Next Post

When not to use AI Agents?

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..