# Token 风险分析报告 - 剧本解析 AI Skill

**日期**: 2026-02-07  
**分析对象**: `screenplay_parsing` AI Skill  
**结论**: ⚠️ **存在中等风险，需要优化**

---

## 📊 当前 Token 消耗分析

### 1. System Prompt（固定部分）

**来源**: `server/app/resources/ai_skills/screenplay_parsing.md`  
**文件统计**:
- 总词数: 415 英文词
- 中文字符: 916 字
- 英文单词: 236 个

**Token 估算** (粗略):
```
中文字符 * 2 + 英文单词 * 1.3 = 916 * 2 + 236 * 1.3 ≈ 2,138 tokens
```

**实际 Token 消耗**（根据 GPT-4 tokenizer）:
- System Prompt: **约 2,500 tokens**（含 JSON 格式示例）

### 2. 动态注入部分

| 组件 | 最大 Token 估算 | 说明 |
|------|----------------|------|
| `custom_requirements` | 500 字符 → **1,000 tokens** | 用户个性化要求（最大 500 字符） |
| `storyboard_count` 说明 | **200 tokens** | 分镜生成要求说明 |
| **动态部分总计** | **~1,200 tokens** | |

### 3. 用户输入（剧本内容）

| 剧本规模 | 字数 | Token 估算 | 风险等级 |
|----------|------|-----------|---------|
| 短剧本 | 1,000 字 | 2,000 tokens | ✅ 安全 |
| 中型剧本 | 5,000 字 | 10,000 tokens | ⚠️ 中等 |
| 长剧本 | 10,000 字 | 20,000 tokens | ❌ 高风险 |
| 电影剧本 | 30,000+ 字 | 60,000+ tokens | 🚨 **必炸** |

### 4. AI 输出（响应部分）

| 组件 | Token 估算 | 说明 |
|------|-----------|------|
| JSON 结构 | 500 tokens | 基础 JSON 框架 |
| 角色/场景/道具 | 1,000-3,000 tokens | 取决于剧本复杂度 |
| 分镜（10 个） | 2,000-4,000 tokens | 每个分镜约 200-400 tokens |
| **输出总计** | **3,500-7,500 tokens** | |

---

## 🔥 风险评估

### 总 Token 消耗（单次解析）

```
System Prompt:     2,500 tokens
动态参数注入:      1,200 tokens
剧本内容:         2,000 - 60,000 tokens（变量）
AI 输出:          3,500 - 7,500 tokens
-------------------------------------------
总计:             9,200 - 71,200 tokens
```

### GPT-4 模型限制

| 模型 | Context Window | Max Output | 安全阈值 (80%) |
|------|---------------|-----------|---------------|
| GPT-4 | 8,192 tokens | 4,096 tokens | 6,553 tokens (input) |
| GPT-4-32k | 32,768 tokens | 4,096 tokens | 26,214 tokens (input) |
| GPT-4-Turbo | 128,000 tokens | 4,096 tokens | 102,400 tokens (input) |

### 当前配置

**检查**: `server/app/schemas/screenplay.py`
```python
max_tokens: Optional[int] = Field(
    4000,  # ✅ 符合 GPT-4 max_output 限制
    alias="maxTokens",
    ge=100,
    le=8000  # ⚠️ 但 8000 超过 GPT-4 的 4096 限制
)
```

---

## ⚠️ 风险场景

### 场景 1: 短剧本 (1,000 字) + GPT-4
```
Input:  2,500 + 1,200 + 2,000 = 5,700 tokens
Output: 4,000 tokens (max_tokens 设置)
Total:  9,700 tokens
```
**结果**: ❌ **超出 GPT-4 8K 限制** (input 5,700 + output 4,000 > 8,192)

### 场景 2: 中型剧本 (5,000 字) + GPT-4
```
Input:  2,500 + 1,200 + 10,000 = 13,700 tokens
```
**结果**: 🚨 **Request 阶段就会失败**

### 场景 3: 长剧本 (10,000 字) + GPT-4-32k
```
Input:  2,500 + 1,200 + 20,000 = 23,700 tokens
Output: 4,000 tokens
Total:  27,700 tokens
```
**结果**: ✅ **可以工作**（但接近 32K 限制）

### 场景 4: 电影剧本 (30,000 字) + GPT-4-Turbo
```
Input:  2,500 + 1,200 + 60,000 = 63,700 tokens
Output: 4,000 tokens
Total:  67,700 tokens
```
**结果**: ✅ **安全**（128K 限制）

---

## 🎯 优化建议

### 优先级 P0（立即执行）

#### 1. 模型自动选择策略
```python
# server/app/services/ai_service.py
async def parse_screenplay(self, ...):
    # 计算剧本 token 数
    content_tokens = estimate_tokens(screenplay_content)
    
    # 自动选择模型
    if content_tokens < 5000:
        model = 'gpt-4'  # 8K context
    elif content_tokens < 20000:
        model = 'gpt-4-32k'  # 32K context
    else:
        model = 'gpt-4-turbo'  # 128K context
    
    logger.info(f"自动选择模型: {model}, 估算 tokens: {content_tokens}")
```

#### 2. 动态调整 max_tokens
```python
# 根据输入长度动态限制输出
total_input = system_prompt_tokens + content_tokens + dynamic_tokens
max_output = min(
    4000,  # 用户期望
    model_max_tokens - total_input - 500  # 预留 buffer
)
```

#### 3. 剧本内容截断保护
```python
MAX_SCREENPLAY_TOKENS = {
    'gpt-4': 4000,      # 预留 4K 给 system + output
    'gpt-4-32k': 25000,
    'gpt-4-turbo': 100000
}

if content_tokens > MAX_SCREENPLAY_TOKENS[model]:
    # 方案 A: 拒绝请求，提示用户选择更大模型
    raise ValueError(f"剧本过长，请使用 {recommend_model}")
    
    # 方案 B: 自动分段解析（推荐）
    chunks = split_screenplay(content, max_tokens=MAX_SCREENPLAY_TOKENS[model])
    results = await parse_screenplay_chunks(chunks)
```

### 优先级 P1（短期优化）

#### 4. System Prompt 精简
**当前**: 2,500 tokens  
**目标**: 1,500 tokens（节省 40%）

**优化点**:
- 移除冗余的 JSON 格式示例（代码注释中已有）
- 简化重复的字段描述
- 用表格替代列表（更紧凑）

#### 5. 分段解析策略
```python
# 对于长剧本，按场景分段解析
async def parse_long_screenplay(content: str, model: str):
    scenes = split_by_scenes(content)  # 按场景分割
    
    results = []
    for scene in scenes:
        result = await parse_screenplay_task.delay(
            screenplay_content=scene,
            model=model
        )
        results.append(result)
    
    # 合并结果
    return merge_screenplay_results(results)
```

### 优先级 P2（长期优化）

#### 6. Prompt 压缩技术
- 使用 **LLMLingua** 压缩 system prompt（可节省 50% token）
- 使用 **Few-shot Examples** 替代详细说明

#### 7. 缓存机制
```python
# 对于相同剧本，缓存解析结果
cache_key = f"screenplay:{screenplay_id}:{model}:{version_hash}"
if cached := await redis.get(cache_key):
    return cached
```

---

## 📋 实施计划

### Phase 1: 紧急修复（1-2 天）
- [ ] 添加 token 估算函数（`estimate_tokens`）
- [ ] 实现模型自动选择逻辑
- [ ] 添加剧本长度验证（请求前）
- [ ] 更新错误提示（告知用户建议模型）

### Phase 2: 功能增强（3-5 天）
- [ ] 实现分段解析（长剧本支持）
- [ ] 优化 system prompt（精简到 1,500 tokens）
- [ ] 添加 token 消耗统计（监控面板）

### Phase 3: 性能优化（1-2 周）
- [ ] 集成 LLMLingua 压缩
- [ ] 实现结果缓存
- [ ] 添加 A/B 测试（验证优化效果）

---

## 🔬 测试建议

### 单元测试
```python
def test_token_estimation():
    """测试 token 估算准确性"""
    content = "这是一个测试剧本，包含 100 个字。"
    tokens = estimate_tokens(content)
    assert 150 <= tokens <= 250  # 允许 20% 误差

def test_model_selection():
    """测试模型自动选择"""
    # 短剧本 → GPT-4
    model = select_model(content_tokens=3000)
    assert model == 'gpt-4'
    
    # 长剧本 → GPT-4-Turbo
    model = select_model(content_tokens=50000)
    assert model == 'gpt-4-turbo'
```

### 压力测试
```bash
# 测试不同长度剧本
pytest tests/stress/test_screenplay_parsing.py \
  --screenplay-lengths=1000,5000,10000,30000 \
  --models=gpt-4,gpt-4-32k,gpt-4-turbo
```

---

## 📚 参考资料

- [OpenAI Tokenizer](https://platform.openai.com/tokenizer)
- [GPT-4 Model Limits](https://platform.openai.com/docs/models/gpt-4)
- [LLMLingua: Prompt Compression](https://github.com/microsoft/LLMLingua)
- [Token Estimation Best Practices](https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken)

---

**维护人员**: AI Agent  
**最后更新**: 2026-02-07  
**状态**: ⚠️ 待优化