You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
10 KiB
10 KiB
测试剧本解析状态流转
本文档说明如何验证剧本文件解析时的状态流转是否正确。
🎯 目标
验证在上传剧本文件后,状态能够正确流转:
PENDING → PARSING → COMPLETED/FAILED
↑
⚠️ 关键:必须能捕获到此状态
🔧 修复内容
问题
之前的实现中,PARSING 状态更新后未立即 commit,导致:
- 外部查询因事务隔离看不到中间状态
- 状态直接从
PENDING跳到COMPLETED - 前端无法显示"正在解析中"的提示
解决方案
在 ScreenplayFileParserService.parse_file() 中,更新状态为 PARSING 后立即 commit:
# 0. 更新状态为"解析中" (RFC 140)
await self.repository.update(screenplay_id, {
'parsing_status': ParsingStatus.PARSING
})
await self.db.commit() # ✅ 立即提交,使状态对外部可见
logger.debug("状态已更新为 PARSING 并已提交 | 剧本ID: %s", screenplay_id)
📋 前置准备
1. 确保容器运行
cd /Users/panta/py_work/jointoai_work/Jointoai/server
docker compose ps
预期输出:
NAME STATUS
jointo-server-app Up
jointo-server-celery-ai Up
jointo-server-postgres Up (healthy)
jointo-server-redis Up (healthy)
jointo-server-rabbitmq Up (healthy)
2. 重启应用和 Celery
应用修复后需重启容器:
docker compose restart app celery-worker-ai
3. 准备测试文件
准备一个测试用的剧本文件(PDF/DOCX),例如:
test.pdf- 测试 PDF 解析test.docx- 测试 DOCX 解析
🧪 测试方法
方法 1:使用脚本测试(推荐)
步骤 1:获取认证 Token
# 登录获取 token
TOKEN=$(curl -s -X POST "http://localhost:6170/api/v1/auth/login" \
-H "Content-Type: application/json" \
-d '{
"email": "your-email@example.com",
"password": "your-password"
}' | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['accessToken'])")
echo "Token: $TOKEN"
步骤 2:创建或获取项目 ID
# 获取项目列表
curl -s "http://localhost:6170/api/v1/projects" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
# 记录项目 ID
PROJECT_ID="your-project-id"
步骤 3:上传文件并快速轮询
# 上传文件
RESPONSE=$(curl -s -X POST "http://localhost:6170/api/v1/screenplays/file" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@test.pdf" \
-F "name=测试剧本-$(date +%H%M%S)" \
-F "projectId=$PROJECT_ID")
# 提取 screenplay_id
SCREENPLAY_ID=$(echo $RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['screenplayId'])")
echo "剧本 ID: $SCREENPLAY_ID"
# 快速轮询状态(每 0.5 秒一次,持续 30 秒)
echo "开始轮询状态..."
for i in {1..60}; do
STATUS_RESPONSE=$(curl -s "http://localhost:6170/api/v1/screenplays/${SCREENPLAY_ID}/parse-status" \
-H "Authorization: Bearer $TOKEN")
STATUS=$(echo $STATUS_RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['parsingStatus'])")
MESSAGE=$(echo $STATUS_RESPONSE | python3 -c "import sys, json; print(json.load(sys.stdin)['data'].get('message', ''))")
echo "[${i}] ${STATUS} - ${MESSAGE}"
# 如果看到 PARSING 状态,标记成功
if [ "$STATUS" == "parsing" ]; then
echo "✅ 成功捕获到 PARSING 状态!"
fi
# 如果完成或失败,退出循环
if [ "$STATUS" == "completed" ] || [ "$STATUS" == "failed" ]; then
echo "最终状态: ${STATUS}"
break
fi
sleep 0.5
done
方法 2:使用 Python 脚本测试
创建 test_parsing.py:
import time
import httpx
import json
from datetime import datetime
API_BASE = "http://localhost:6170/api/v1"
TOKEN = "your-token-here" # 替换为实际 token
PROJECT_ID = "your-project-id" # 替换为实际项目 ID
headers = {"Authorization": f"Bearer {TOKEN}"}
def upload_file(file_path: str):
"""上传剧本文件"""
with open(file_path, "rb") as f:
files = {"file": (file_path, f, "application/pdf")}
data = {
"name": f"测试剧本-{datetime.now().strftime('%H%M%S')}",
"projectId": PROJECT_ID
}
response = httpx.post(
f"{API_BASE}/screenplays/file",
headers=headers,
files=files,
data=data,
timeout=30.0
)
return response.json()
def poll_status(screenplay_id: str, max_attempts: int = 60):
"""轮询解析状态"""
print(f"\n开始轮询状态 (screenplay_id: {screenplay_id})...\n")
parsing_seen = False
last_status = None
start_time = time.time()
for i in range(max_attempts):
elapsed = time.time() - start_time
response = httpx.get(
f"{API_BASE}/screenplays/{screenplay_id}/parse-status",
headers=headers,
timeout=10.0
)
data = response.json()['data']
status = data['parsingStatus']
message = data.get('message', '')
# 状态变化时输出
if status != last_status:
print(f"[{elapsed:.1f}s] {status.upper()} - {message}")
if status == 'parsing':
print(" ✅ 成功捕获到 PARSING 状态!")
parsing_seen = True
elif status == 'completed':
word_count = data.get('wordCount', 0)
print(f" 字数: {word_count}")
break
elif status == 'failed':
error = data.get('parsingError', '')
print(f" 错误: {error}")
break
last_status = status
time.sleep(0.5)
print("\n" + "="*50)
if parsing_seen:
print("✅ 测试成功:成功捕获到 PARSING 状态!")
else:
print("❌ 测试失败:未捕获到 PARSING 状态")
print("可能原因:")
print(" 1. 解析速度太快(< 0.5s)")
print(" 2. 状态更新未立即 commit")
print(" 3. Celery worker 未重启")
print("="*50)
if __name__ == "__main__":
# 1. 上传文件
print("上传测试文件...")
result = upload_file("test.pdf")
screenplay_id = result['data']['screenplayId']
print(f"✅ 上传成功,剧本 ID: {screenplay_id}")
# 2. 立即开始轮询
poll_status(screenplay_id)
运行:
python3 test_parsing.py
方法 3:手动测试(通过前端)
- 打开前端应用:
http://localhost:6160 - 登录账号
- 进入项目:选择一个项目
- 上传剧本文件:点击"上传剧本" → 选择文件 → 上传
- 观察状态变化:
- 初始:⏳ 等待解析(
pending) - 1-2 秒后:🔄 正在解析中...(
parsing)← ⚠️ 关键 - 10-30 秒后:✅ 解析完成(
completed)
- 初始:⏳ 等待解析(
📊 预期结果
成功的状态流转
时间轴 | 状态 | 前端显示
-------|-----------|------------------
T+0s | PENDING | "等待解析"
T+1s | PARSING | "正在解析中..." ✅
T+2s | PARSING | "正在解析中..." ✅
T+5s | PARSING | "正在解析中..." ✅
T+10s | COMPLETED | "解析完成 (8524 字)" ✅
失败的状态流转(修复前)
时间轴 | 状态 | 前端显示
-------|-----------|------------------
T+0s | PENDING | "等待解析"
T+10s | COMPLETED | "解析完成 (8524 字)" ❌ 直接跳过 PARSING
🔍 验证检查点
1. Celery 日志检查
docker compose logs --tail=50 celery-worker-ai | grep -E "(PARSING|状态)"
预期输出:
[INFO] 状态已更新为 PARSING 并已提交 | 剧本ID: 019c325e-...
[INFO] 更新剧本: screenplay_id=019c325e-..., 更新字段=['parsing_status']
[INFO] 剧本文件解析成功 | 剧本ID: 019c325e-... | 字数: 27221
2. 数据库状态检查
在解析过程中查询数据库:
SELECT
screenplay_id,
name,
parsing_status,
word_count,
created_at
FROM screenplays
WHERE screenplay_id = '019c325e-ef62-7e72-a4b3-c5bc5577170b';
应该能看到 parsing_status = 2(PARSING)的中间状态。
3. API 响应检查
curl -s "http://localhost:6170/api/v1/screenplays/${SCREENPLAY_ID}/parse-status" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
预期响应(解析中):
{
"data": {
"screenplayId": "019c325e-ef62-7e72-a4b3-c5bc5577170b",
"parsingStatus": "parsing",
"message": "正在解析中...",
"progress": null
}
}
🐛 故障排查
问题 1:仍然看不到 PARSING 状态
可能原因:
- Celery worker 未重启
- 代码更改未生效
- 解析速度太快(< 500ms)
解决方案:
# 1. 重启容器
docker compose restart app celery-worker-ai
# 2. 检查代码版本
docker compose exec app cat /app/app/services/screenplay_file_parser_service.py | grep -A 2 "立即提交"
# 3. 使用更大的测试文件(> 5MB)增加解析时间
问题 2:Celery 任务未执行
检查:
# 查看 Celery 日志
docker compose logs celery-worker-ai --tail=100
# 检查 RabbitMQ 队列
docker compose exec rabbitmq rabbitmqctl list_queues
问题 3:数据库连接错误
检查:
# 测试数据库连接
docker compose exec app python -c "from app.core.database import get_async_session; print('DB OK')"
📝 相关文档
✨ 总结
通过在状态更新后立即 commit,确保了:
- ✅ 外部查询能看到
PARSING中间状态 - ✅ 前端能正确显示"正在解析中"提示
- ✅ 用户体验更好,不会误以为系统卡住
- ✅ 符合工程最佳实践(长时间操作前先更新状态)