You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
12 KiB
12 KiB
Changelog: 批量对话配音生成接口
日期: 2026-02-13
类型: Feature
影响范围: AI Service, API, Celery Tasks, Storyboard Resources
变更概述
新增 /api/v1/ai/generate-dialogue-voiceovers 接口,支持为多个分镜对话批量生成 AI 配音,并自动写入 storyboard_voiceovers 表。
核心功能
1. 批量生成配音
- ✅ 一次请求处理最多 50 个对话
- ✅ 自动从
storyboard_dialogues读取文本 - ✅ 调用 TTS Provider 生成音频
- ✅ 上传到 MinIO
- ✅ 写入
storyboard_voiceovers表
2. 完整参数支持
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
storyboardId |
string |
必填 | 分镜 ID |
dialogueIds |
string[] |
必填 | 对话 ID 列表(1-50 个) |
voiceId |
string |
必填 | 音色 ID |
voiceName |
string |
可选 | 音色名称 |
speed |
number |
1.0 | 语速(0.25-4.0) |
volume |
number |
1.0 | 音量(0.0-2.0) |
pitch |
number |
1.0 | 音调(0.5-2.0) |
isActive |
boolean |
false | 是否设为激活配音 |
注意:
- ✅ 使用 camelCase 参数格式(API 规范)
- ✅ 系统自动选择最佳音频模型
3. 部分失败容错
- ✅ 单个对话失败不影响其他对话
- ✅ 已成功的配音保留
- ✅ 失败的对话在结果中标记
详细变更
1. 数据模型
文件: server/app/models/ai_job.py
class AIJobType(IntEnum):
# ... 现有类型 ...
DIALOGUE_VOICEOVER = 10 # 批量对话配音生成
2. Request Schema
文件: server/app/schemas/ai.py
class GenerateDialogueVoiceoversRequest(BaseModel):
"""批量对话配音生成请求"""
storyboard_id: str = Field(..., alias="storyboardId", description="分镜 ID")
dialogue_ids: list[str] = Field(..., min_length=1, max_length=50, alias="dialogueIds")
voice_id: str = Field(..., alias="voiceId")
voice_name: Optional[str] = Field(None, alias="voiceName")
speed: float = Field(1.0, ge=0.25, le=4.0)
volume: float = Field(1.0, ge=0.0, le=2.0)
pitch: float = Field(1.0, ge=0.5, le=2.0)
is_active: bool = Field(False, alias="isActive")
model_config = ConfigDict(populate_by_name=True)
3. API 路由
文件: server/app/api/v1/ai.py
@router.post("/generate-dialogue-voiceovers", response_model=SuccessResponse[AIJobResponse])
async def generate_dialogue_voiceovers(
request: GenerateDialogueVoiceoversRequest,
current_user: User = Depends(get_current_user),
db: AsyncSession = Depends(get_session)
):
"""批量为对话生成 AI 配音"""
# ...
端点: POST /api/v1/ai/generate-dialogue-voiceovers
4. Service 方法
文件: server/app/services/ai_service.py
async def generate_dialogue_voiceovers(
self,
user_id: str,
dialogue_ids: list[str],
voice_id: str,
voice_name: Optional[str] = None,
speed: float = 1.0,
volume: float = 1.0,
pitch: float = 1.0,
is_active: bool = False,
model: Optional[str] = None,
**kwargs
) -> Dict[str, Any]:
"""批量为对话生成 AI 配音"""
# 1. 验证用户存在
# 2. 验证对话数量(1-50)
# 3. 获取所有对话并验证
# 4. 验证所有对话属于同一分镜
# 5. 验证分镜权限
# 6. 检查配额
# 7. 获取模型配置
# 8. 计算所需积分(基于总字符数)
# 9. 预扣积分
# 10. 创建 AI Job
# 11. 启动 Celery 任务
# 12. 返回 job_id
关键验证:
- ✅ 所有对话必须属于同一分镜
- ✅ 用户对分镜至少有 viewer 权限
- ✅ 对话数量 1-50 个
5. Celery 任务
文件: server/app/tasks/ai_tasks.py
@celery_app.task(base=AITask, bind=True, max_retries=3)
def generate_dialogue_voiceovers_task(
self,
job_id: str,
user_id: str,
dialogue_ids: list[str],
voice_id: str,
model: str,
voice_name: Optional[str] = None,
speed: float = 1.0,
volume: float = 1.0,
pitch: float = 1.0,
is_active: bool = False,
**kwargs
):
"""批量对话配音生成任务"""
successful_voiceovers = []
failed_dialogues = []
# 逐个处理对话
for idx, dialogue_id in enumerate(dialogue_ids):
try:
# 1. 获取对话内容
dialogue = await get_dialogue_by_id(dialogue_id)
text = dialogue.content
# 2. 调用 TTS 生成配音
result = await provider.generate_voice(
text=text,
voice_type=voice_id,
speed=speed,
...
)
# 3. 上传音频到 MinIO
metadata = await file_storage.upload_file(
file_content=result['audio_data'],
filename=f"dialogue_voice_{dialogue_id}.mp3",
category='ai-generated/dialogue-voiceovers',
...
)
# 4. 写入 storyboard_voiceovers 表
if is_active:
await deactivate_all_voiceovers(dialogue_id)
voiceover = StoryboardVoiceover(
voiceover_id=generate_uuid(),
dialogue_id=dialogue_id,
storyboard_id=storyboard_id,
audio_url=metadata.file_url,
status=ResourceStatus.COMPLETED,
is_active=is_active,
voice_id=voice_id,
voice_name=voice_name,
speed=speed,
volume=volume,
pitch=pitch,
...
)
await create_voiceover(voiceover)
successful_voiceovers.append({
'dialogue_id': dialogue_id,
'voiceover_id': str(voiceover.voiceover_id),
'audio_url': metadata.file_url
})
except Exception as e:
# 记录失败,继续处理下一个
failed_dialogues.append({
'dialogue_id': dialogue_id,
'error': str(e)
})
# 更新任务状态为完成
await update_job_status(
job_id,
AIJobStatus.COMPLETED,
progress=100,
output_data={
'successful_count': len(successful_voiceovers),
'failed_count': len(failed_dialogues),
'successful_voiceovers': successful_voiceovers,
'failed_dialogues': failed_dialogues
}
)
容错策略:
- ✅ 部分失败继续处理
- ✅ 已成功的配音保留
- ✅ 积分不退还(按总字符数预扣)
使用示例
API 请求
curl -X POST "http://localhost:6160/api/v1/ai/generate-dialogue-voiceovers" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dialogue_ids": [
"d1d2d3d4-1234-5678-90ab-cdef12345678",
"e2e3e4e5-1234-5678-90ab-cdef12345679",
"f3f4f5f6-1234-5678-90ab-cdef12345680"
],
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"voice_name": "Bella",
"speed": 1.0,
"volume": 1.0,
"pitch": 1.0,
"is_active": true
}'
响应:
{
"code": 200,
"message": "批量配音生成任务创建成功",
"data": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"taskId": "celery-task-id",
"status": "pending",
"estimatedCredits": 150,
"dialogueCount": 3
}
}
查询任务状态
curl -X GET "http://localhost:6160/api/v1/ai/jobs/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer $TOKEN"
任务完成后:
{
"code": 200,
"message": "查询成功",
"data": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"jobType": 10,
"status": 3,
"progress": 100,
"outputData": {
"successful_count": 2,
"failed_count": 1,
"successful_voiceovers": [
{
"dialogue_id": "d1d2d3d4-1234-5678-90ab-cdef12345678",
"voiceover_id": "v1v2v3v4-1234-5678-90ab-cdef12345678",
"audio_url": "https://minio.example.com/ai-generated/dialogue-voiceovers/dialogue_voice_d1d2d3d4.mp3"
},
{
"dialogue_id": "e2e3e4e5-1234-5678-90ab-cdef12345679",
"voiceover_id": "v2v3v4v5-1234-5678-90ab-cdef12345679",
"audio_url": "https://minio.example.com/ai-generated/dialogue-voiceovers/dialogue_voice_e2e3e4e5.mp3"
}
],
"failed_dialogues": [
{
"dialogue_id": "f3f4f5f6-1234-5678-90ab-cdef12345680",
"error": "TTS 生成失败: timeout"
}
]
}
}
}
前端集成
// 1. 批量生成配音
const { data } = await fetch('/api/v1/ai/generate-dialogue-voiceovers', {
method: 'POST',
headers: { Authorization: `Bearer ${token}` },
body: JSON.stringify({
dialogue_ids: dialogueIds,
voice_id: selectedVoiceId,
voice_name: selectedVoiceName,
speed: 1.0,
volume: 1.0,
pitch: 1.0,
is_active: true
})
});
// 2. 轮询任务状态
const jobId = data.jobId;
const interval = setInterval(async () => {
const { data: job } = await fetch(`/api/v1/ai/jobs/${jobId}`);
if (job.status === 3) { // COMPLETED
clearInterval(interval);
console.log(`成功: ${job.outputData.successful_count}`);
console.log(`失败: ${job.outputData.failed_count}`);
// 刷新分镜对话列表
await refreshDialogues();
}
}, 2000);
与现有接口对比
| 特性 | /generate-voice |
/generate-dialogue-voiceovers |
|---|---|---|
| 用途 | 通用 TTS | 分镜对话配音 |
| 输入 | 自由文本 | 对话 ID 列表 |
| 数据源 | 用户输入 | storyboard_dialogues 表 |
| 写入表 | ai_generation_results |
storyboard_voiceovers |
| 批量支持 | ❌ | ✅ 最多 50 个 |
| 分镜关联 | 可选 | 强制验证 |
| 权限验证 | 用户存在性 | 分镜权限 |
| 失败策略 | 全部失败 | 部分失败继续 |
保留原有接口:
- ✅
/generate-voice可用于通用 TTS(非分镜场景) - ✅ 测试音色、预览效果
- ✅ 其他业务模块使用
注意事项
1. 积分消耗
- 预扣全部积分:基于所有对话的总字符数一次性预扣
- 部分失败不退款:即使部分对话失败,积分不退还
- 建议:先测试少量对话,确认效果后再批量生成
2. 性能考虑
- 最大批量限制:50 个对话/请求
- 推荐批量大小:10-20 个对话
- 超时设置:每个对话最多 60 秒
3. 失败处理
- 部分失败:已成功的配音保留
- 重试策略:失败的对话可单独重新提交
- 查看原因:查询
outputData.failed_dialogues
4. 激活配音
is_active=true:自动停用该对话的其他配音is_active=false:需手动激活
后续优化
短期
- 并行生成(提升性能)
- 进度细化(实时显示每个对话进度)
- 失败自动重试
长期
- 音色预设(角色默认音色)
- 情绪映射(根据 emotion 字段调整参数)
- 实时预览
- 批量导出
相关文档
- RFC 145: 批量对话配音生成接口
- RFC 142: ElevenLabs Integration
- RFC 144: AI Models Capability Config
测试验证
# 1. 创建测试对话
curl -X POST "http://localhost:6160/api/v1/storyboard-resources/dialogues" \
-H "Authorization: Bearer $TOKEN" \
-d '{"storyboard_id":"xxx","content":"测试文本1",...}'
# 2. 批量生成配音
curl -X POST "http://localhost:6160/api/v1/ai/generate-dialogue-voiceovers" \
-H "Authorization: Bearer $TOKEN" \
-d '{"dialogue_ids":["d1","d2"],"voice_id":"alloy"}'
# 3. 验证配音已写入
curl -X GET "http://localhost:6160/api/v1/storyboard-resources/dialogues/d1/voiceovers" \
-H "Authorization: Bearer $TOKEN"
验收标准
- API 接口实现完成
- 支持批量生成(1-50 个对话)
- 配音自动写入
storyboard_voiceovers表 - 支持部分失败容错
- 完整的权限验证
- 积分预扣和消耗记录
- Celery 异步任务实现
- 文档完整(RFC + Changelog)
- 单元测试覆盖(待补充)
- 集成测试覆盖(待补充)