SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Jiale Qian1*, Hao Meng1,3*, Tian Zheng1, Pengcheng Zhu2, Haopeng Lin1, Yuhang Dai1,4, Hanke Xie1,4, Wenxiao Cao1, Ruixuan Shang1, Jun Wu1, Hongmei Liu1, Hanlin Wen1, Jian Zhao2, Zhonglin Jiang2, Yong Chen2, Shunshun Yin1, Ming Tao1, Jianguo Wei3, Lei Xie4, Xinsheng Wang1
1Soul AI Lab, China
2AI Center, Geely Automobile Research Institute (Ningbo) Co., Ltd., Ningbo, China
3Audio-Visual Cognitive Computing Team, Tianjin University, Tianjin, China
4Audio, Speech and Language Processing Group (ASLP@NPU), Northwestern Polytechnical University, Xi’an, China
* Equal contribution. † Corresponding author.

Abstract

While recent years have witnessed rapid progress in speech synthesis, open-source singing voice synthesis (SVS) systems still face significant barriers to industrial deployment, particularly in terms of robustness and zero-shot generalization. In this report, we introduce SoulX-Singer, a high-quality open-source SVS system designed with practical deployment considerations in mind. SoulX-Singer supports controllable singing generation conditioned on either symbolic musical scores (MIDI) or melodic representations, enabling flexible and expressive control in real-world production workflows. Trained on more than 42,000 hours of vocal data, the system supports Mandarin Chinese, English, and Cantonese, and consistently achieves state-of-the-art synthesis quality across languages under diverse musical conditions. Furthermore, to enable reliable evaluation of zero-shot SVS performance in practical scenarios, we construct SoulX-Singer-Eval, a dedicated benchmark with strict training–test disentanglement, facilitating systematic assessment in zero-shot settings.

Model Architecture

SoulX-Singer Overview

Figure 1: The overall architecture of SoulX-Singer.

1. Zero-Shot SVS

Mandarin Comparison

Sample Info StyleSinger TCSinger YingMusic-Singer Vevosing SoulX-Singer
(Melody-based)
SoulX-Singer
(Score-based)
以身外身做梦中梦
Ground Truth
Prompt
明明对你念念不忘 思前想后越发紧张
Ground Truth
Prompt
卷起千堆雪
Ground Truth
Prompt
像我这样懦弱的人 凡事都要留几分
Ground Truth
Prompt

English Comparison

Sample Info TCSinger Vevosing SoulX-Singer
(Melody-based)
SoulX-Singer
(Score-based)
where the skies are blue to see you once again my love
Ground Truth
Prompt
i heard that you've settled down that you
Ground Truth
Prompt
every sha la la la every whoa ooh whoa still shines
Ground Truth
Prompt

Cantonese Comparison

Sample Info SoulX-Singer
(Melody-based)
SoulX-Singer
(Score-based)
马路戏院商店 天空海阔任你行
Ground Truth
Prompt
仍然紧守于身边 与你进退也共鸣
Ground Truth
Prompt
想与你一起 乘搭最早的班机
Ground Truth
Prompt
得不到多么好当得到不知怎算好
奢侈的一生天荒地老
Ground Truth
Prompt

2. Lyric Editing

Editing Info YingMusic-Singer Vevosing SoulX-Singer
(Score-based)
Original: 原谅捧花的我盛装出席只为错过你
Modified: 思念藏心的我悄然离席只剩回忆你
Ground Truth (Ref)
Prompt
Original: 无法深情挽着你的手
Modified: 不能轻轻握住你心跳
Ground Truth (Ref)
Prompt
Original: a rush a glance a touch a dance a look in somebody's eyes
Modified: a spark a glow a kiss a show a stare beneath the skies
Ground Truth (Ref)
Prompt
Original: i guess you didn't care and i guess i liked that and when i fell hard you took a step back
Modified: you knew i was there you knew i admired it but when i held on you pulled back and retired it
Ground Truth (Ref)
Prompt

3. Timbre&Style Transfer

Case (Text) Source Prompt Result
没那么简单:在周末晚上关上了手机舒服窝在沙发里
(懒羊羊)
(孙燕姿)
(Taylor Swift)
隐形的翅膀:我知道我一直有双隐形的翅膀
(中文 童声)
(中文 戏曲)
(中文 说唱)
who says:c'mon who says who says you're not perfect who says you're not worth it
(English Hardcore)
(English Metal)
(English Rap)

4. Long Context Generation

Prompt

Playing: 传奇 (Long Context)

Loading lyrics…

5. Other Abilities

Case (Text) Source Prompt Melody-based Score-based
Humming to Singing: 我们总把人生想的太坏
Humming to Singing: 会在这 一人留两人疚三人游
Speech Prompt to Singing:
跟着红尘跟随我浪迹一生 雨纷纷旧故里草木深
Singer? Rapper!:
做着普通的工作 在外漂泊思乡 有多少人都希望 自己的生活 能过得好一点 能改变自己的历史 让父母的压力小一点 每逢过节的时候 少不了对父母的那份思念 不能回家因为自己的梦想 还没有实现

Real-world AI Singer Showcase

Explore AI-generated songs and covers produced using SoulX-Singer on major streaming platforms.

Soul Virtual Idol: Jiang Yu

八小时时差
八小时时差
江屿
QQ Music ↗
八小时时差 MV
江屿
Video Play
怎叹 MV
江屿
Video Play

Douyin AI Singer Operations