# AI Skill Prompt 格式强化 - 修复分镜未生成问题

**日期**: 2026-02-09  
**类型**: Bug Fix  
**影响范围**: AI 剧本解析、分镜生成

---

## 问题描述

### 现象

剧本解析成功提取了场景（locations），但没有生成分镜（storyboards）：

```json
{
  "locations_created": 6,
  "characters_created": 0,
  "storyboards_created": 0  // ❌ 应该生成分镜
}
```

### 根本原因

AI 返回的数据格式不符合预期：

**AI 实际返回格式**（错误）:
```json
{
  "scenes": [  // ❌ 使用了 scenes 而不是 locations
    {
      "scene_number": 1,
      "location": "海边",  // ❌ 使用了 location 字段而不是 name
      "characters": ["女孩", "大盖帽"],  // ❌ 角色嵌套在场景内
      "shots": [...]  // ❌ 镜头嵌套在场景内，而不是顶层 storyboards
    }
  ]
}
```

**期望格式**（正确）:
```json
{
  "characters": [...],      // ✅ 顶层角色数组
  "locations": [...],       // ✅ 顶层场景数组
  "storyboards": [...]      // ✅ 顶层分镜数组
}
```

---

## 解决方案

### 方案选择

**方案 1**：增强数据转换逻辑（兼容多种格式）  
**方案 2**：优化 AI Skill Prompt（从源头解决）✅

选择方案 2，因为：
1. 从源头规范 AI 输出格式
2. 代码逻辑更简单清晰
3. 减少后期维护成本

---

## 实施内容

### 1. 强化 AI Skill Prompt 格式要求

**文件**: `server/app/resources/ai_skills/screenplay_parsing.md`

**变更内容**:

#### 1.1 明确禁止的错误格式

```markdown
⚠️ **禁止的错误格式**:
```json
// ❌ 错误：使用 scenes 而不是 locations
{"scenes": [...]}

// ❌ 错误：角色嵌套在场景内
{"locations": [{"characters": [...]}]}

// ❌ 错误：镜头嵌套在场景内
{"locations": [{"shots": [...]}]}

// ❌ 错误：场景对象使用 location 字段而不是 name
{"locations": [{"location": "海边"}]}
```
```

#### 1.2 增加格式检查清单

```markdown
**格式检查清单**（输出前必须自检）:
- [ ] 是否有顶层 `characters` 数组？
- [ ] 是否有顶层 `locations` 数组（不是 `scenes`）？
- [ ] 是否有顶层 `storyboards` 数组（不是 `shots`）？
- [ ] `locations` 中的对象是否使用 `name` 字段（不是 `location` 或 `title`）？
- [ ] 角色是否在顶层数组，而不是嵌套在场景内？
- [ ] 分镜是否在顶层数组，而不是嵌套在场景内？
```

#### 1.3 强调 7 个必需的顶层键

```markdown
⚠️ **严格要求**: 必须返回以下7个顶层键，缺一不可：

1. **characters** - 顶层角色数组（不要嵌套在 scenes 内）
2. **character_tags** - 角色标签字典
3. **locations** - 顶层场景数组（禁止使用 "scenes"）
4. **location_tags** - 场景标签字典
5. **props** - 顶层道具数组
6. **prop_tags** - 道具标签字典
7. **storyboards** - 顶层分镜数组（禁止使用 "shots" 或嵌套在 scenes 内）
```

### 2. 更新 AI Skill 版本

**数据库更新**:
```sql
UPDATE ai_skills_registry 
SET version = '1.2.0' 
WHERE name = 'screenplay_parsing';
```

**版本历史**:
- `1.2.0` (2026-02-09): 强化输出格式要求，禁止嵌套格式，增加格式检查清单
- `1.1.0` (2026-02-08): 修复 locations/scenes 键名问题，强化格式要求
- `1.0.0` (2026-02-07): 初始版本，支持动态参数注入

---

## 验证步骤

### 1. 重置剧本解析状态

```sql
UPDATE screenplays 
SET parsing_status = 0 
WHERE screenplay_id = '019c4179-3dc6-7423-a30a-e965a7df0e09';
```

### 2. 重启 Celery Worker

```bash
docker restart jointo-server-celery-ai
```

### 3. 重新触发解析

```bash
curl 'http://localhost:6160/api/v1/screenplays/019c4179-3dc6-7423-a30a-e965a7df0e09/parse' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  --data-raw '{"custom_requirements":"","storyboard_count":6}'
```

### 4. 检查结果

期望输出：
```json
{
  "characters_created": > 0,
  "locations_created": > 0,
  "storyboards_created": > 0  // ✅ 应该有分镜生成
}
```

---

## 影响范围

### 受影响的组件

1. **AI Skill Registry** (`ai_skills_registry` 表)
   - 版本号更新为 `1.2.0`

2. **剧本解析任务** (`parse_screenplay_task`)
   - 使用新版本的 AI Skill Prompt

3. **前端分镜展示**
   - 解析成功后应该能看到生成的分镜

### 不受影响的组件

- 数据库 Schema（无变更）
- API 接口（无变更）
- 前端代码（无变更）

---

## 技术细节

### AI Skill Prompt 加载流程

```python
# server/app/tasks/ai_tasks.py (parse_screenplay_task)

# 1. 从数据库加载 AI Skill
skill = await skill_service.get_skill_by_name("screenplay_parsing")
system_prompt = skill['content']

# 2. 动态注入用户个性化要求
if custom_requirements:
    system_prompt += f"""
## 用户特殊要求
{custom_requirements}
"""

# 3. 调用 AI Provider
result = await provider.process_text(
    task_type='screenplay_parse',
    text=screenplay_content,
    output_format='json',
    system_prompt=system_prompt
)
```

### 数据存储流程

```python
# server/app/services/screenplay_service.py (store_parsed_elements)

# 1. 转换 AI 返回格式（兼容旧格式）
parsed_data = self._transform_ai_tags_format(parsed_data)

# 2. 创建项目元素（角色/场景/道具）
character_id_map = {...}
location_id_map = {...}
prop_id_map = {...}

# 3. 创建元素标签
tag_id_maps = {...}

# 4. 创建分镜（如果启用）
if auto_create_storyboards and parsed_data.get('storyboards'):
    storyboard_ids = await self._create_storyboards_from_ai(...)
```

---

## 后续优化建议

### 1. 增加数据验证

在 `parse_screenplay_task` 中增加 AI 返回数据的格式验证：

```python
def validate_ai_response(parsed_data: Dict[str, Any]) -> bool:
    """验证 AI 返回数据格式"""
    required_keys = [
        'characters', 'character_tags',
        'locations', 'location_tags',
        'props', 'prop_tags',
        'storyboards'
    ]
    
    for key in required_keys:
        if key not in parsed_data:
            logger.error(f"❌ AI 返回数据缺少必需键: {key}")
            return False
    
    # 检查是否使用了错误的键名
    if 'scenes' in parsed_data:
        logger.error("❌ AI 返回数据使用了 'scenes' 而不是 'locations'")
        return False
    
    return True
```

### 2. 增加降级策略

如果 AI 返回格式不正确，可以：
1. 记录错误日志
2. 尝试自动转换格式（方案 1）
3. 通知用户重新解析

### 3. 监控 AI 输出质量

定期检查 AI 返回数据的格式合规性：
- 统计格式错误率
- 分析常见错误模式
- 持续优化 Prompt

---

## 相关文档

- AI Skill Prompt: `server/app/resources/ai_skills/screenplay_parsing.md`
- 剧本解析任务: `server/app/tasks/ai_tasks.py` (parse_screenplay_task)
- 剧本服务: `server/app/services/screenplay_service.py` (store_parsed_elements)
- 原始需求文档: `docs/prompts/screenplay-ai-parse-prompt.md`

---

**维护人员**: AI Agent  
**最后更新**: 2026-02-09