7.6 KiB

Raw Blame History

Token 风险分析报告 - 剧本解析 AI Skill

日期: 2026-02-07
分析对象: screenplay_parsing AI Skill
结论: ⚠️ 存在中等风险，需要优化

📊 当前 Token 消耗分析

1. System Prompt（固定部分）

来源: server/app/resources/ai_skills/screenplay_parsing.md
文件统计:

总词数: 415 英文词
中文字符: 916 字
英文单词: 236 个

Token 估算 (粗略):

中文字符 * 2 + 英文单词 * 1.3 = 916 * 2 + 236 * 1.3 ≈ 2,138 tokens

实际 Token 消耗（根据 GPT-4 tokenizer）:

System Prompt: 约 2,500 tokens（含 JSON 格式示例）

2. 动态注入部分

组件	最大 Token 估算	说明
`custom_requirements`	500 字符 → 1,000 tokens	用户个性化要求（最大 500 字符）
`storyboard_count` 说明	200 tokens	分镜生成要求说明
动态部分总计	~1,200 tokens

3. 用户输入（剧本内容）

剧本规模	字数	Token 估算	风险等级
短剧本	1,000 字	2,000 tokens	✅ 安全
中型剧本	5,000 字	10,000 tokens	⚠️ 中等
长剧本	10,000 字	20,000 tokens	❌ 高风险
电影剧本	30,000+ 字	60,000+ tokens	🚨 必炸

4. AI 输出（响应部分）

组件	Token 估算	说明
JSON 结构	500 tokens	基础 JSON 框架
角色/场景/道具	1,000-3,000 tokens	取决于剧本复杂度
分镜（10 个）	2,000-4,000 tokens	每个分镜约 200-400 tokens
输出总计	3,500-7,500 tokens

🔥 风险评估

总 Token 消耗（单次解析）

System Prompt:     2,500 tokens
动态参数注入:      1,200 tokens
剧本内容:         2,000 - 60,000 tokens（变量）
AI 输出:          3,500 - 7,500 tokens
-------------------------------------------
总计:             9,200 - 71,200 tokens

GPT-4 模型限制

模型	Context Window	Max Output	安全阈值 (80%)
GPT-4	8,192 tokens	4,096 tokens	6,553 tokens (input)
GPT-4-32k	32,768 tokens	4,096 tokens	26,214 tokens (input)
GPT-4-Turbo	128,000 tokens	4,096 tokens	102,400 tokens (input)

当前配置

检查: server/app/schemas/screenplay.py

max_tokens: Optional[int] = Field(
    4000,  # ✅ 符合 GPT-4 max_output 限制
    alias="maxTokens",
    ge=100,
    le=8000  # ⚠️ 但 8000 超过 GPT-4 的 4096 限制
)

⚠️ 风险场景

场景 1: 短剧本 (1,000 字) + GPT-4

Input:  2,500 + 1,200 + 2,000 = 5,700 tokens
Output: 4,000 tokens (max_tokens 设置)
Total:  9,700 tokens

结果: ❌ 超出 GPT-4 8K 限制 (input 5,700 + output 4,000 > 8,192)

场景 2: 中型剧本 (5,000 字) + GPT-4

Input:  2,500 + 1,200 + 10,000 = 13,700 tokens

结果: 🚨 Request 阶段就会失败

场景 3: 长剧本 (10,000 字) + GPT-4-32k

Input:  2,500 + 1,200 + 20,000 = 23,700 tokens
Output: 4,000 tokens
Total:  27,700 tokens

结果: ✅ 可以工作（但接近 32K 限制）

场景 4: 电影剧本 (30,000 字) + GPT-4-Turbo

Input:  2,500 + 1,200 + 60,000 = 63,700 tokens
Output: 4,000 tokens
Total:  67,700 tokens

结果: ✅ 安全（128K 限制）

🎯 优化建议

优先级 P0（立即执行）

1. 模型自动选择策略

# server/app/services/ai_service.py
async def parse_screenplay(self, ...):
    # 计算剧本 token 数
    content_tokens = estimate_tokens(screenplay_content)
    
    # 自动选择模型
    if content_tokens < 5000:
        model = 'gpt-4'  # 8K context
    elif content_tokens < 20000:
        model = 'gpt-4-32k'  # 32K context
    else:
        model = 'gpt-4-turbo'  # 128K context
    
    logger.info(f"自动选择模型: {model}, 估算 tokens: {content_tokens}")

2. 动态调整 max_tokens

# 根据输入长度动态限制输出
total_input = system_prompt_tokens + content_tokens + dynamic_tokens
max_output = min(
    4000,  # 用户期望
    model_max_tokens - total_input - 500  # 预留 buffer
)

3. 剧本内容截断保护

MAX_SCREENPLAY_TOKENS = {
    'gpt-4': 4000,      # 预留 4K 给 system + output
    'gpt-4-32k': 25000,
    'gpt-4-turbo': 100000
}

if content_tokens > MAX_SCREENPLAY_TOKENS[model]:
    # 方案 A: 拒绝请求，提示用户选择更大模型
    raise ValueError(f"剧本过长，请使用 {recommend_model}")
    
    # 方案 B: 自动分段解析（推荐）
    chunks = split_screenplay(content, max_tokens=MAX_SCREENPLAY_TOKENS[model])
    results = await parse_screenplay_chunks(chunks)

优先级 P1（短期优化）

4. System Prompt 精简

当前: 2,500 tokens
目标: 1,500 tokens（节省 40%）

优化点:

移除冗余的 JSON 格式示例（代码注释中已有）
简化重复的字段描述
用表格替代列表（更紧凑）

5. 分段解析策略

# 对于长剧本，按场景分段解析
async def parse_long_screenplay(content: str, model: str):
    scenes = split_by_scenes(content)  # 按场景分割
    
    results = []
    for scene in scenes:
        result = await parse_screenplay_task.delay(
            screenplay_content=scene,
            model=model
        )
        results.append(result)
    
    # 合并结果
    return merge_screenplay_results(results)

优先级 P2（长期优化）

6. Prompt 压缩技术

使用 LLMLingua 压缩 system prompt（可节省 50% token）
使用 Few-shot Examples 替代详细说明

7. 缓存机制

# 对于相同剧本，缓存解析结果
cache_key = f"screenplay:{screenplay_id}:{model}:{version_hash}"
if cached := await redis.get(cache_key):
    return cached

📋 实施计划

Phase 1: 紧急修复（1-2 天）

添加 token 估算函数（estimate_tokens）
实现模型自动选择逻辑
添加剧本长度验证（请求前）
更新错误提示（告知用户建议模型）

Phase 2: 功能增强（3-5 天）

实现分段解析（长剧本支持）
优化 system prompt（精简到 1,500 tokens）
添加 token 消耗统计（监控面板）

Phase 3: 性能优化（1-2 周）

集成 LLMLingua 压缩
实现结果缓存
添加 A/B 测试（验证优化效果）

🔬 测试建议

单元测试

def test_token_estimation():
    """测试 token 估算准确性"""
    content = "这是一个测试剧本，包含 100 个字。"
    tokens = estimate_tokens(content)
    assert 150 <= tokens <= 250  # 允许 20% 误差

def test_model_selection():
    """测试模型自动选择"""
    # 短剧本 → GPT-4
    model = select_model(content_tokens=3000)
    assert model == 'gpt-4'
    
    # 长剧本 → GPT-4-Turbo
    model = select_model(content_tokens=50000)
    assert model == 'gpt-4-turbo'

压力测试

# 测试不同长度剧本
pytest tests/stress/test_screenplay_parsing.py \
  --screenplay-lengths=1000,5000,10000,30000 \
  --models=gpt-4,gpt-4-32k,gpt-4-turbo

📚 参考资料

维护人员: AI Agent
最后更新: 2026-02-07
状态: ⚠️ 待优化

7.6 KiB Raw Blame History