Keep AI Studio for custom renders, then move to the Store for catalog tracks, SFX, plugins, and licensing coverage on client work.
Grab a free starter kit — 50 sounds, no card.
Drum hits, one-shots, a few loops. Open in any DAW.Stems are not in v1 of SONICHAOS AI Studio. v1 renders a stereo bounce. v1.5 adds post-render Demucs separation into drums, bass, vocals, and other, plus optional re-rendering of one stem from a re-prompted persona.
May 9, 2026 · 4 min read · SONICHAOS editorialStems are a v1.5 feature. The honest version of a roadmap names the version, not the quarter.
A question that lands in the inbox most weeks: do SONICHAOS AI Studio
renders come with stems. The honest answer is no, not yet. v1 prints a
stereo bounce — 24-bit WAV, 44.1 kHz, written to object storage with a
signed URL on the job row. There is no per-instrument multitrack on the
other end of Generate. This post is a scope statement, not a tease.
What follows is what v1 does, what v1.5 will do, and how the file
contract holds across both.
The v1 render path ends at a single file. The latent-diffusion stack
denoises a 96-channel mel-spectrogram, the vocoder lifts mel back to
waveform, and the result is muxed to a stereo 24-bit WAV. There is no
intermediate tensor for drums or bass or vocals because the model
does not generate the song that way. It generates the mix as one signal
conditioned on the persona embedding and the prompt.
We ship the bounce, the prompt, the persona id, the seed, the model version, and the integrated LUFS reading. The audit row is what makes the take licensable under the Studio tier. The Studio licence covers the bounce. No per-stem licence exists in v1, because there are no stems to licence.
If you need stems today, the workflow is to render the bounce, drop it into your DAW, and run a separator yourself. That is fine. v1.5 just moves that step inside the product so the file contract matches the licence.
v1.5 adds a separation stage after the render finishes. The chosen
model is Demucs v4 with the htdemucs_ft weights, the four-stem
fine-tuned variant, which separates a stereo mix into:
drums — kit and percussion.bass — low-end, including 808s.vocals — lead and stacked vocals when present.other — everything that is not the first three.The separation runs on the same H100 that handled the render, while the GPU is still warm. We measured the latency budget at 18 to 26 seconds for a 90-second bounce on a single H100 — well under the 30-second ceiling we set for the post-render stage. The composer shows a second progress bar only when the user opted into stems on the brief. Otherwise the page goes quiet at the end of the render, the same as v1.
Stems ship as a ZIP next to the bounce. One folder per job id, one
WAV per stem, plus the original bounce, plus a manifest.json with
the job metadata. The naming is positional and boring on purpose:
sonichaos-{jobId}/
bounce.wav # 24-bit, 44.1 kHz, stereo
drums.wav # 24-bit, 44.1 kHz, stereo
bass.wav # 24-bit, 44.1 kHz, stereo
vocals.wav # 24-bit, 44.1 kHz, stereo
other.wav # 24-bit, 44.1 kHz, stereo
manifest.json # prompt, persona, seed, model, lufs, stems
Sample rate stays at 44.1 kHz across stems and bounce. We do not upsample to 48 kHz on export because every conversion costs transient definition, and the v1 render is native 44.1 kHz. If your post chain needs 48 kHz, do the conversion once on your end with a known SRC.
The other half of the v1.5 release is a one-stem re-render. After the
separator returns four stems, you can pick one (say vocals), open a
small re-prompt panel, swap the persona, and request a new vocal stem
that the gateway then aligns to the original drum and bass tracks.
The alignment is mechanical, not generative. We re-render the chosen
stem with the original tempo map and key as conditioning, then run a
short phase-aligned crossfade against the timing grid of the original
bounce. It works cleanly on vocals and other. It is a closer call
on drums, where micro-timing carries the groove, so the v1.5
release will gate drums re-rendering behind a flag while we measure
drift.
A short spec for the re-render call:
key and tempoMap.v2.zip next to
v1.zip — so the original take stays intact.The obvious question, why not have the model output four stems in the first place, has a real answer. The diffusion stack is trained on full mixes. Training a stem-aware model would change the data contract, the loss, and the persona embedding. We would rather ship separation as a v1.5 feature on a known-good separator than push a stem-aware retrain into v1 and slip the launch.
A stem-aware model is on the longer roadmap, past v2. Until then, Demucs v4 on the warm GPU is the right answer.