# 从 scenes[].shots 提取分镜数据

**日期**: 2026-02-09  
**类型**: Bug 修复  
**影响范围**: 剧本解析 AI 任务

## 背景

在修复 Markdown JSON 解析问题后，发现分镜数据仍然为空。分析发现 AI 返回的数据结构与代码期望不匹配。

### 问题表现

```
剧本解析完成: characters=0, locations=5, props=0, storyboards=0
剧本元素存储成功: 角色=0, 场景=5, 道具=0, 标签=0, 分镜=0
```

### AI 返回的数据结构

```python
{
  "scenes": [
    {
      "scene_number": 1,
      "location": "海边",
      "time": "晨",
      "description": "...",
      "characters": ["女孩", "渔民", "老警察", "小警察"],
      "shots": [
        {
          "shot_number": 1,
          "shot_size": "特写",
          "camera_movement": "static",
          "description": "...",
          "duration": 5
        }
      ]
    }
  ]
}
```

### 代码期望的数据结构

```python
{
  "characters": [...],
  "locations": [...],
  "storyboards": [...]  # 顶层分镜数组
}
```

### 根本原因

1. AI 返回的是 `scenes` 数组，每个 scene 包含 `shots` 数组
2. 代码期望顶层有 `storyboards` 数组
3. `_transform_ai_tags_format` 方法只转换了 `scenes` 为 `locations`，没有提取 `shots`

## 解决方案

在 `_transform_ai_tags_format` 方法中添加逻辑，将 `scenes[].shots` 提取并转换为顶层的 `storyboards` 数组。

### 修改内容

**文件**: `server/app/services/screenplay_service.py`

**步骤 1.5：提取 scenes[].shots 转换为顶层 storyboards 数组**

```python
# ✅ 步骤1.5：提取 scenes[].shots 转换为顶层 storyboards 数组
if not result.get('storyboards'):
    storyboards = []
    shot_counter = 1
    
    for scene in scenes:
        scene_location = scene.get('location') or scene.get('name') or scene.get('title')
        scene_characters = scene.get('characters', [])
        
        for shot in scene.get('shots', []):
            # 构建标准的 storyboard 对象
            storyboard = {
                'shot_number': shot_counter,
                'title': shot.get('title') or f"镜头 {shot_counter}",
                'description': shot.get('description', ''),
                'dialogue': shot.get('dialogue', ''),
                'shot_size': shot.get('shot_size'),
                'camera_movement': shot.get('camera_movement'),
                'estimated_duration': shot.get('estimated_duration') or shot.get('duration', 5),
                'characters': shot.get('characters') or scene_characters,
                'character_tags': shot.get('character_tags', {}),
                'locations': [scene_location] if scene_location else [],
                'location_tags': shot.get('location_tags', {}),
                'props': shot.get('props', []),
                'prop_tags': shot.get('prop_tags', {}),
                'meta_data': {
                    'scene_number': scene.get('scene_number'),
                    'scene_location': scene_location,
                    'scene_time': scene.get('time'),
                    **shot.get('meta_data', {})
                }
            }
            storyboards.append(storyboard)
            shot_counter += 1
    
    if storyboards:
        result['storyboards'] = storyboards
        logger.info("✅ 成功从 scenes[].shots 提取 %d 个分镜", len(storyboards))
```

### 转换逻辑

1. **遍历所有场景**：从 `scenes` 数组中提取每个场景
2. **提取场景信息**：获取场景的 `location` 和 `characters`
3. **遍历场景的镜头**：从 `scene.shots` 中提取每个镜头
4. **构建分镜对象**：
   - `shot_number`: 全局镜头编号（自动递增）
   - `title`: 镜头标题（如果没有则生成"镜头 N"）
   - `characters`: 优先使用镜头的角色列表，否则使用场景的角色列表
   - `locations`: 使用场景的位置
   - `meta_data`: 保存场景编号、场景位置、场景时间等信息
5. **添加到结果**：将所有分镜添加到顶层 `storyboards` 数组

### 重启服务

```bash
docker restart jointo-server-celery-ai jointo-server-app
```

## 测试验证

### 测试结果

```
✅ 成功转换 5 个场景为 locations 格式
✅ 成功从 scenes[].shots 提取 12 个分镜
分镜创建完成: screenplay_id=..., 总数=12
剧本元素存储成功: 角色=0, 场景=5, 道具=0, 标签=0, 分镜=12
```

### 验证步骤

1. ✅ 场景数据正确：5 个场景
2. ✅ 分镜数据正确：12 个分镜
3. ✅ `scene_count` 正确更新为 5
4. ⚠️ 角色数据缺失：需要进一步修复

## 后续工作

### 待修复问题

1. **角色数据缺失**：
   - AI 返回的角色信息在 `scenes[].characters` 中
   - 需要从场景中提取所有唯一的角色名称
   - 创建顶层 `characters` 数组

2. **道具数据缺失**：
   - AI 可能没有返回道具信息
   - 或者道具信息在 `scenes[].props` 或 `shots[].props` 中

### 优化方向

1. **优化 AI Skill Prompt**：
   - 明确要求返回顶层 `characters`、`locations`、`props`、`storyboards` 数组
   - 提供标准的 JSON 示例

2. **增强数据转换逻辑**：
   - 支持更多的 AI 返回格式
   - 自动从嵌套结构中提取数据

3. **添加数据验证**：
   - 验证必需字段是否存在
   - 提供更详细的错误信息

## 相关文档

- [修复 Markdown JSON 解析](./2026-02-09-fix-markdown-json-parsing-in-ai-tasks.md)
- [切换回 GPT-4o Mini](./2026-02-09-switch-back-to-gpt4o-mini-for-screenplay-parsing.md)
- [剧本场景统计更新修复](./2026-02-09-screenplay-scene-count-update-fix.md)

## 技术债务

- [ ] 从 scenes[].characters 提取角色数据
- [ ] 优化 AI Skill Prompt 以返回标准格式
- [ ] 添加数据格式验证和错误处理
- [ ] 支持更多的 AI 返回格式变体