# 测试剧本解析状态流转 本文档说明如何验证剧本文件解析时的状态流转是否正确。 --- ## 🎯 目标 验证在上传剧本文件后,状态能够正确流转: ``` PENDING → PARSING → COMPLETED/FAILED ↑ ⚠️ 关键:必须能捕获到此状态 ``` --- ## 🔧 修复内容 ### 问题 之前的实现中,`PARSING` 状态更新后**未立即 commit**,导致: - 外部查询因事务隔离看不到中间状态 - 状态直接从 `PENDING` 跳到 `COMPLETED` - 前端无法显示"正在解析中"的提示 ### 解决方案 在 `ScreenplayFileParserService.parse_file()` 中,更新状态为 `PARSING` 后**立即 commit**: ```python # 0. 更新状态为"解析中" (RFC 140) await self.repository.update(screenplay_id, { 'parsing_status': ParsingStatus.PARSING }) await self.db.commit() # ✅ 立即提交,使状态对外部可见 logger.debug("状态已更新为 PARSING 并已提交 | 剧本ID: %s", screenplay_id) ``` --- ## 📋 前置准备 ### 1. 确保容器运行 ```bash cd /Users/panta/py_work/jointoai_work/Jointoai/server docker compose ps ``` 预期输出: ``` NAME STATUS jointo-server-app Up jointo-server-celery-ai Up jointo-server-postgres Up (healthy) jointo-server-redis Up (healthy) jointo-server-rabbitmq Up (healthy) ``` ### 2. 重启应用和 Celery 应用修复后需重启容器: ```bash docker compose restart app celery-worker-ai ``` ### 3. 准备测试文件 准备一个测试用的剧本文件(PDF/DOCX),例如: - `test.pdf` - 测试 PDF 解析 - `test.docx` - 测试 DOCX 解析 --- ## 🧪 测试方法 ### 方法 1:使用脚本测试(推荐) #### 步骤 1:获取认证 Token ```bash # 登录获取 token TOKEN=$(curl -s -X POST "http://localhost:6170/api/v1/auth/login" \ -H "Content-Type: application/json" \ -d '{ "email": "your-email@example.com", "password": "your-password" }' | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['accessToken'])") echo "Token: $TOKEN" ``` #### 步骤 2:创建或获取项目 ID ```bash # 获取项目列表 curl -s "http://localhost:6170/api/v1/projects" \ -H "Authorization: Bearer $TOKEN" | python3 -m json.tool # 记录项目 ID PROJECT_ID="your-project-id" ``` #### 步骤 3:上传文件并快速轮询 ```bash # 上传文件 RESPONSE=$(curl -s -X POST "http://localhost:6170/api/v1/screenplays/file" \ -H "Authorization: Bearer $TOKEN" \ -F "file=@test.pdf" \ -F "name=测试剧本-$(date +%H%M%S)" \ -F "projectId=$PROJECT_ID") # 提取 screenplay_id SCREENPLAY_ID=$(echo $RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['screenplayId'])") echo "剧本 ID: $SCREENPLAY_ID" # 快速轮询状态(每 0.5 秒一次,持续 30 秒) echo "开始轮询状态..." for i in {1..60}; do STATUS_RESPONSE=$(curl -s "http://localhost:6170/api/v1/screenplays/${SCREENPLAY_ID}/parse-status" \ -H "Authorization: Bearer $TOKEN") STATUS=$(echo $STATUS_RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['parsingStatus'])") MESSAGE=$(echo $STATUS_RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data'].get('message', ''))") echo "[${i}] ${STATUS} - ${MESSAGE}" # 如果看到 PARSING 状态,标记成功 if [ "$STATUS" == "parsing" ]; then echo "✅ 成功捕获到 PARSING 状态!" fi # 如果完成或失败,退出循环 if [ "$STATUS" == "completed" ] || [ "$STATUS" == "failed" ]; then echo "最终状态: ${STATUS}" break fi sleep 0.5 done ``` --- ### 方法 2:使用 Python 脚本测试 创建 `test_parsing.py`: ```python import time import httpx import json from datetime import datetime API_BASE = "http://localhost:6170/api/v1" TOKEN = "your-token-here" # 替换为实际 token PROJECT_ID = "your-project-id" # 替换为实际项目 ID headers = {"Authorization": f"Bearer {TOKEN}"} def upload_file(file_path: str): """上传剧本文件""" with open(file_path, "rb") as f: files = {"file": (file_path, f, "application/pdf")} data = { "name": f"测试剧本-{datetime.now().strftime('%H%M%S')}", "projectId": PROJECT_ID } response = httpx.post( f"{API_BASE}/screenplays/file", headers=headers, files=files, data=data, timeout=30.0 ) return response.json() def poll_status(screenplay_id: str, max_attempts: int = 60): """轮询解析状态""" print(f"\n开始轮询状态 (screenplay_id: {screenplay_id})...\n") parsing_seen = False last_status = None start_time = time.time() for i in range(max_attempts): elapsed = time.time() - start_time response = httpx.get( f"{API_BASE}/screenplays/{screenplay_id}/parse-status", headers=headers, timeout=10.0 ) data = response.json()['data'] status = data['parsingStatus'] message = data.get('message', '') # 状态变化时输出 if status != last_status: print(f"[{elapsed:.1f}s] {status.upper()} - {message}") if status == 'parsing': print(" ✅ 成功捕获到 PARSING 状态!") parsing_seen = True elif status == 'completed': word_count = data.get('wordCount', 0) print(f" 字数: {word_count}") break elif status == 'failed': error = data.get('parsingError', '') print(f" 错误: {error}") break last_status = status time.sleep(0.5) print("\n" + "="*50) if parsing_seen: print("✅ 测试成功:成功捕获到 PARSING 状态!") else: print("❌ 测试失败:未捕获到 PARSING 状态") print("可能原因:") print(" 1. 解析速度太快(< 0.5s)") print(" 2. 状态更新未立即 commit") print(" 3. Celery worker 未重启") print("="*50) if __name__ == "__main__": # 1. 上传文件 print("上传测试文件...") result = upload_file("test.pdf") screenplay_id = result['data']['screenplayId'] print(f"✅ 上传成功,剧本 ID: {screenplay_id}") # 2. 立即开始轮询 poll_status(screenplay_id) ``` 运行: ```bash python3 test_parsing.py ``` --- ### 方法 3:手动测试(通过前端) 1. **打开前端应用**:`http://localhost:6160` 2. **登录账号** 3. **进入项目**:选择一个项目 4. **上传剧本文件**:点击"上传剧本" → 选择文件 → 上传 5. **观察状态变化**: - 初始:⏳ **等待解析**(`pending`) - 1-2 秒后:🔄 **正在解析中...**(`parsing`)← ⚠️ 关键 - 10-30 秒后:✅ **解析完成**(`completed`) --- ## 📊 预期结果 ### 成功的状态流转 ``` 时间轴 | 状态 | 前端显示 -------|-----------|------------------ T+0s | PENDING | "等待解析" T+1s | PARSING | "正在解析中..." ✅ T+2s | PARSING | "正在解析中..." ✅ T+5s | PARSING | "正在解析中..." ✅ T+10s | COMPLETED | "解析完成 (8524 字)" ✅ ``` ### 失败的状态流转(修复前) ``` 时间轴 | 状态 | 前端显示 -------|-----------|------------------ T+0s | PENDING | "等待解析" T+10s | COMPLETED | "解析完成 (8524 字)" ❌ 直接跳过 PARSING ``` --- ## 🔍 验证检查点 ### 1. Celery 日志检查 ```bash docker compose logs --tail=50 celery-worker-ai | grep -E "(PARSING|状态)" ``` 预期输出: ``` [INFO] 状态已更新为 PARSING 并已提交 | 剧本ID: 019c325e-... [INFO] 更新剧本: screenplay_id=019c325e-..., 更新字段=['parsing_status'] [INFO] 剧本文件解析成功 | 剧本ID: 019c325e-... | 字数: 27221 ``` ### 2. 数据库状态检查 在解析过程中查询数据库: ```sql SELECT screenplay_id, name, parsing_status, word_count, created_at FROM screenplays WHERE screenplay_id = '019c325e-ef62-7e72-a4b3-c5bc5577170b'; ``` 应该能看到 `parsing_status = 2`(`PARSING`)的中间状态。 ### 3. API 响应检查 ```bash curl -s "http://localhost:6170/api/v1/screenplays/${SCREENPLAY_ID}/parse-status" \ -H "Authorization: Bearer $TOKEN" | python3 -m json.tool ``` 预期响应(解析中): ```json { "data": { "screenplayId": "019c325e-ef62-7e72-a4b3-c5bc5577170b", "parsingStatus": "parsing", "message": "正在解析中...", "progress": null } } ``` --- ## 🐛 故障排查 ### 问题 1:仍然看不到 PARSING 状态 **可能原因**: 1. Celery worker 未重启 2. 代码更改未生效 3. 解析速度太快(< 500ms) **解决方案**: ```bash # 1. 重启容器 docker compose restart app celery-worker-ai # 2. 检查代码版本 docker compose exec app cat /app/app/services/screenplay_file_parser_service.py | grep -A 2 "立即提交" # 3. 使用更大的测试文件(> 5MB)增加解析时间 ``` ### 问题 2:Celery 任务未执行 **检查**: ```bash # 查看 Celery 日志 docker compose logs celery-worker-ai --tail=100 # 检查 RabbitMQ 队列 docker compose exec rabbitmq rabbitmqctl list_queues ``` ### 问题 3:数据库连接错误 **检查**: ```bash # 测试数据库连接 docker compose exec app python -c "from app.core.database import get_async_session; print('DB OK')" ``` --- ## 📝 相关文档 - [RFC 140: 剧本文件存储重构](../rfcs/140-screenplay-file-storage-refactor.md) - [Changelog: 修复解析状态更新](../changelogs/2026-02-06-fix-parsing-status-update.md) - [ParsingStatus 枚举定义](../../app/models/screenplay.py) --- ## ✨ 总结 通过在状态更新后**立即 commit**,确保了: - ✅ 外部查询能看到 `PARSING` 中间状态 - ✅ 前端能正确显示"正在解析中"提示 - ✅ 用户体验更好,不会误以为系统卡住 - ✅ 符合工程最佳实践(长时间操作前先更新状态)