Generate music with lyrics using AI for free
The landscape of AI-generated music just hit a new high note. Enter ACE-STEP, an open-source foundation model that sets a new benchmark for speed, fidelity, and control in music generation. While others juggle trade-offs between coherence, quality, and usability, ACE-STEP delivers it all, solidifying its role as the “ChatGPT moment” for the music AI world.
Data Science in Your Pocket – No Rocket Science
Why ACE-STEP is a Breakthrough
Most existing music models fall into two camps, each with serious limitations:
- LLM-based models (like Yue or SongGen): Good with lyrics, but slow and prone to structural glitches.
- Diffusion models (like DiffRhythm): Fast, but struggle with long-range coherence.
ACE-STEP changes the game by merging the best of both. Its architecture combines:
- Diffusion-based generation for lightning speed
- Deep Compression AutoEncoder (DCAE) for studio-grade audio quality
- Linear Transformer for maintaining musical structure
- REPA (MERT + m-hubert) for fast and accurate semantic alignment
The Result?
- Generate 4 minutes of music in just 20 seconds (15× faster than LLMs)
- Lyric alignment that matches melody, harmony, and rhythm
- Fine-grained control over vocals, instruments, and remixing
Key Capabilities
Diverse Music Generation
- Covers genres from pop and EDM to jazz and classical
- Supports lyrics in over 15 languages
- Produces expressive vocals and realistic instrumentals
High-Level Controllability
- Variations Generator — Adjust noise to create new styles
- Repainting — Change vocals, lyrics, or genre without full regeneration
- Lyric Editing — Modify specific lines while preserving the song’s structure
Built for Creators
Use Cases
- Lyric2Vocal — Instantly turn lyrics into vocal demos
- Text2Samples — Generate loops and effects directly from text
- Voice Cloning & Remixing — Customize vocals with surgical precision
A Clean Interface for Seamless Music Creation

ACE-STEP ships with a user-friendly UI broken into focused tabs for various tasks:
Text2Music Tab
- Input fields for tags, lyrics, and duration
- Basic and advanced settings for customization
- One-click generation for full audio synthesis
Retake Tab
- Regenerate variations of a track
- Adjust randomness to explore new ideas
Repainting Tab
- Modify select portions of a song
- Set custom in/out points
- Choose from previous outputs or upload your own
Edit Tab
- Change lyrics or genre without losing musicality
- Choose between only_lyrics and remix modes
- Tune edit strength to preserve or innovate
Extend Tab
- Add music before or after a track
- Control how much to extend and from where
How to use Ace-Step for free?
All the instructions with weights are present on the Ace-Step git repo
GitHub – ace-step/ACE-Step: ACE-Step: A Step Towards Music Generation Foundation Model
The Future of Music AI Starts Here
ACE-STEP isn’t just another model — it’s a creative engine. Its unmatched blend of speed, accuracy, and control opens up serious potential for:
- Musicians to prototype songs in minutes
- Producers to remix and experiment effortlessly
- Content creators to score videos with tailor-made tracks
Whether you’re building the next viral jingle or an AI-powered DAW, ACE-STEP puts professional-grade music generation at your fingertips.
This is not just evolution. It’s a revolution.
Ace-Step: ChatGPT moment for AI Music Generation was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.