Abstract
While recent years have witnessed rapid progress in speech synthesis, open-source singing voice synthesis (SVS) systems still face significant barriers to industrial deployment, particularly in terms of robustness and zero-shot generalization. In this report, we introduce SoulX-Singer, a high-quality open-source SVS system designed with practical deployment considerations in mind. SoulX-Singer supports controllable singing generation conditioned on either symbolic musical scores (MIDI) or melodic representations, enabling flexible and expressive control in real-world production workflows. Trained on more than 42,000 hours of vocal data, the system supports Mandarin Chinese, English, and Cantonese, and consistently achieves state-of-the-art synthesis quality across languages under diverse musical conditions. Furthermore, to enable reliable evaluation of zero-shot SVS performance in practical scenarios, we construct SoulX-Singer-Eval, a dedicated benchmark with strict training–test disentanglement, facilitating systematic assessment in zero-shot settings.
Model Architecture
Figure 1: The overall architecture of SoulX-Singer.
1. Zero-Shot SVS
Mandarin Comparison
| Sample Info | StyleSinger | TCSinger | YingMusic-Singer | Vevosing | SoulX-Singer (Melody-based) |
SoulX-Singer (Score-based) |
|---|---|---|---|---|---|---|
|
以身外身做梦中梦
Ground Truth
Prompt
|
||||||
|
明明对你念念不忘 思前想后越发紧张
Ground Truth
Prompt
|
||||||
|
卷起千堆雪
Ground Truth
Prompt
|
||||||
|
像我这样懦弱的人 凡事都要留几分
Ground Truth
Prompt
|
English Comparison
| Sample Info | TCSinger | Vevosing | SoulX-Singer (Melody-based) |
SoulX-Singer (Score-based) |
|---|---|---|---|---|
|
where the skies are blue to see you once again my love
Ground Truth
Prompt
|
||||
|
i heard that you've settled down that you
Ground Truth
Prompt
|
||||
|
every sha la la la every whoa ooh whoa still shines
Ground Truth
Prompt
|
Cantonese Comparison
| Sample Info | SoulX-Singer (Melody-based) |
SoulX-Singer (Score-based) |
|---|---|---|
|
马路戏院商店 天空海阔任你行
Ground Truth
Prompt
|
||
|
仍然紧守于身边 与你进退也共鸣
Ground Truth
Prompt
|
||
|
想与你一起 乘搭最早的班机
Ground Truth
Prompt
|
||
|
得不到多么好当得到不知怎算好
奢侈的一生天荒地老
Ground Truth
Prompt
|
2. Lyric Editing
| Editing Info | YingMusic-Singer | Vevosing | SoulX-Singer (Score-based) |
|---|---|---|---|
|
Original: 原谅捧花的我盛装出席只为错过你
Modified: 思念藏心的我悄然离席只剩回忆你
Ground Truth (Ref)
Prompt
|
|||
|
Original: 无法深情挽着你的手
Modified: 不能轻轻握住你心跳
Ground Truth (Ref)
Prompt
|
|||
|
Original: a rush a glance a touch a dance a look in somebody's eyes
Modified: a spark a glow a kiss a show a stare beneath the skies
Ground Truth (Ref)
Prompt
|
|||
|
Original: i guess you didn't care and i guess i liked that and when i fell hard you took a step back
Modified: you knew i was there you knew i admired it but when i held on you pulled back and retired it
Ground Truth (Ref)
Prompt
|
3. Timbre&Style Transfer
| Case (Text) | Source | Prompt | Result |
|---|---|---|---|
|
没那么简单:在周末晚上关上了手机舒服窝在沙发里
|
(懒羊羊)
|
||
|
(孙燕姿)
|
|||
|
(Taylor Swift)
|
|||
|
隐形的翅膀:我知道我一直有双隐形的翅膀
|
(中文 童声)
|
||
|
(中文 戏曲)
|
|||
|
(中文 说唱)
|
|||
|
who says:c'mon who says who says you're not perfect who says you're not worth it
|
(English Hardcore)
|
||
|
(English Metal)
|
|||
|
(English Rap)
|
4. Long Context Generation
Playing: 传奇 (Long Context)
5. Other Abilities
| Case (Text) | Source | Prompt | Melody-based | Score-based |
|---|---|---|---|---|
|
Humming to Singing: 我们总把人生想的太坏
|
||||
|
Humming to Singing: 会在这 一人留两人疚三人游
|
||||
|
Speech Prompt to Singing:
跟着红尘跟随我浪迹一生 雨纷纷旧故里草木深 |
||||
|
Singer? Rapper!:
做着普通的工作 在外漂泊思乡 有多少人都希望 自己的生活 能过得好一点 能改变自己的历史 让父母的压力小一点 每逢过节的时候 少不了对父母的那份思念 不能回家因为自己的梦想 还没有实现 |
Real-world AI Singer Showcase
Explore AI-generated songs and covers produced using SoulX-Singer on major streaming platforms.