# RFC 144: ai_models 模型参数能力配置规划 ## 概述 基于 AIHubMix 图片/视频生成 API 文档,规划 `ai_models` 表的能力配置方案,使不同模型支持差异化参数,同时为前端提供统一的 API 契约。 **参考文档**: - [图片生成](https://docs.aihubmix.com/en/api/Image-Gen) - [视频生成](https://docs.aihubmix.com/cn/api/Video-Gen) **当前架构**: ``` API → Service → Celery Task → AIProviderFactory → AIHubMixProvider → AIHubMix API ``` **核心问题**: - `AIHubMixProvider` 中硬编码了模型参数逻辑(if/elif 判断) - 每次新增模型需要修改代码 - 前端无法动态获取模型支持的参数 - 不同模型参数差异巨大(OpenAI、Flux、Qwen、Doubao、Imagen 等) --- ## 1. 方案概要 | 决策 | 说明 | |------|------| | **新增 capabilities 列** | 独立 JSONB 列,与 config 职责分离(config=运行时配置,capabilities=能力描述) | | **每模型独立配置** | 每个模型的 capabilities 只存其支持的参数子集,参数不一致是预期行为 | | **前端统一契约** | API 层定义「全量能力字段枚举」,对不支持的项返回 `supported: false`,前端按统一 schema 渲染 | | **config 保留** | 继续用于 timeout、API 特有参数等运行时配置 | --- ## 2. 数据库设计 ### 2.1 表结构调整 ```diff ai_models - model_id - model_name - model_type - config # 运行时配置:timeout、API 特定参数等 + capabilities # 模型能力:size/quality/duration 等可选值与约束(JSONB) ... ``` ### 2.2 迁移脚本 ```python # alembic/versions/xxxx_add_capabilities_to_ai_models.py def upgrade(): op.add_column('ai_models', sa.Column( 'capabilities', postgresql.JSONB, nullable=False, server_default='{}', comment='模型参数能力配置(尺寸、质量、时长等可选值与约束)' )) op.create_index( 'idx_ai_models_capabilities_gin', 'ai_models', ['capabilities'], postgresql_using='gin' ) def downgrade(): op.drop_index('idx_ai_models_capabilities_gin', table_name='ai_models') op.drop_column('ai_models', 'capabilities') ``` ### 2.3 DB capabilities 存储规则 - 每个模型只存**其支持的参数**,key 可 snake_case - 不支持的能力不写入,由 API 层补全为 `supported: false` --- ## 3. 前端统一契约 ### 3.1 全量能力字段枚举 | 字段 | 说明 | 适用 model_type | 适用模型 | |------|------|-----------------|----------| | `size` | 尺寸/分辨率 | 2(图)、3(视频) | 所有 | | `quality` | 质量档位 | 2 | OpenAI, 通用 | | `aspectRatio` | 宽高比 | 2 | Flux | | `duration` | 视频时长(秒) | 3 | 所有视频模型 | | `referenceImage` | 支持的参考图数量(0=不支持) | 2、3 | 支持图生图/图生视频的模型 | | `watermark` | 水印开关 | 2 | Qwen, Doubao | | `seed` | 随机种子 | 2 | Flux, Qwen, Doubao | | `outputFormat` | 输出格式(png/jpeg/webp) | 2 | OpenAI | | `moderation` | 内容审核档位 | 2 | OpenAI | | `inputFidelity` | 输入保真度(high/low) | 2 | OpenAI | | `safetyTolerance` | 审核宽松度(0-5) | 2 | Flux | | `raw` | 原始模式(更自然的视觉效果) | 2 | Flux | | `responseFormat` | 返回格式(url/base64_json) | 2 | Doubao | | `sequentialImageGeneration` | 连续图片生成控制 | 2 | Doubao | | `n` | 生成数量(1-10) | 2 | OpenAI, 通用 | ### 3.2 API 响应 schema(统一结构) **规则**:所有模型的能力 API 返回同一套 key,不支持的项为 `{ "supported": false }`。 ```json { "modelId": "xxx", "modelName": "flux-2-pro", "modelType": 2, "capabilities": { "size": { "supported": true, "values": ["1K", "2K", "4K", "auto"], "default": "auto" }, "quality": { "supported": false }, "aspectRatio": { "supported": true, "values": ["16:9", "1:1", "4:3"], "default": "16:9" }, "duration": { "supported": false }, "referenceImage": { "supported": true, "num": 5 }, "watermark": { "supported": false }, "seed": { "supported": true }, "outputFormat": { "supported": false }, "moderation": { "supported": false } } } ``` **视频模型示例**(Sora 2): ```json { "capabilities": { "size": { "supported": true, "values": ["720x1280", "1280x720"], "default": "720x1280" }, "quality": { "supported": false }, "aspectRatio": { "supported": false }, "duration": { "supported": true, "values": ["4", "8", "12"], "default": "4" }, "referenceImage": { "supported": true, "num": 1 }, "watermark": { "supported": false }, "seed": { "supported": false }, "outputFormat": { "supported": false }, "moderation": { "supported": false } } } ``` ### 3.3 DB → API 映射 | DB capabilities key | API 响应 key | 说明 | |---------------------|--------------|------| | size | size | 尺寸/分辨率 | | quality | quality | 质量档位 | | aspect_ratio | aspectRatio | 宽高比 | | seconds | duration | 视频时长 | | image / input_reference | referenceImage | 参考图数量(0=不支持) | | watermark | watermark | 水印开关 | | seed | seed | 随机种子 | | output_format | outputFormat | 输出格式 | | moderation | moderation | 内容审核 | | input_fidelity | inputFidelity | 输入保真度 | | safety_tolerance | safetyTolerance | 审核宽松度 | | raw | raw | 原始模式 | | response_format | responseFormat | 返回格式 | | sequential_image_generation | sequentialImageGeneration | 连续图片生成 | | n | n | 生成数量 | **API 层职责**: 1. 读取 DB capabilities(snake_case) 2. 按全量枚举补全缺失字段为 `{ "supported": false }` 3. 将 snake_case 转为 camelCase 输出 4. 特殊处理 `reference_image` 和 `input_reference` 的数字转对象格式 5. 保持类型结构一致性 #### 转换逻辑示例(Python) ```python def transform_capabilities_to_api(db_capabilities: dict, model_type: int) -> dict: """将 DB capabilities 转换为前端 API 格式""" # 定义全量能力字段(按 model_type) if model_type == 2: # 图片 all_fields = [ 'size', 'quality', 'aspectRatio', 'referenceImage', 'watermark', 'seed', 'outputFormat', 'moderation', 'inputFidelity', 'safetyTolerance', 'raw', 'responseFormat', 'sequentialImageGeneration', 'n' ] elif model_type == 3: # 视频 all_fields = ['size', 'duration', 'referenceImage'] else: all_fields = [] result = {} for field in all_fields: # snake_case → camelCase 映射 db_key = camel_to_snake(field) # 特殊处理:reference_image / input_reference if field == 'referenceImage': db_key = 'reference_image' if model_type == 2 else 'input_reference' num_value = db_capabilities.get(db_key, 0) if num_value > 0: result[field] = { "supported": True, "num": num_value } else: result[field] = {"supported": False} continue # 特殊处理:seconds → duration if field == 'duration': db_key = 'seconds' # 通用处理 if db_key in db_capabilities: value = db_capabilities[db_key] # 枚举类型 if isinstance(value, dict) and 'values' in value: result[field] = { "supported": True, **value # 包含 values, default } # 整数范围类型 elif isinstance(value, dict) and 'type' in value: result[field] = { "supported": True, **value } # 布尔类型 elif isinstance(value, dict) and 'type' in value and value['type'] == 'boolean': result[field] = { "supported": True, **value } # 其他对象 elif isinstance(value, dict): result[field] = { "supported": True, **value } else: result[field] = {"supported": True} else: # 不存在则标记为不支持 result[field] = {"supported": False} return result def camel_to_snake(name: str) -> str: """camelCase → snake_case""" import re return re.sub(r'(? } // 对象 | { supported: false } // 不支持 | { supported: true, num: number } // 参考图数量 ``` ### 8.2 AIHubMix 模型与参数映射 | 模型名 | model_type | size 示例 | seconds | |--------|------------|----------|---------| | dall-e-3 | 2 | 1024x1024, 1792x1024, 1024x1792 | - | | gpt-image-1.5 | 2 | 1024x1024 等 | - | | flux-2-pro | 2 | 1K, 2K, 4K, auto | - | | qwen-image | 2 | 512*1024, 1024*1024 等 | - | | doubao-seedream-4-5 | 2 | 2K, 4K, auto | - | | imagen-4.0-fast-generate-001 | 2 | 1K, 2K, 4K, auto | - | | sora-2 | 3 | 720x1280, 1280x720 | 4, 8, 12 | | sora-2-pro | 3 | 720x1280, 1280x720 | 4, 8, 12 | | veo-3.1-generate-preview | 3 | 720P, 1080P | 4, 6, 8 | | veo-3.0-generate-preview | 3 | 720P, 1080P | 4, 6, 8 | | wan2.2-t2v-plus | 3 | 见 4.2 节 | 5 | | wan2.5-t2v-preview | 3 | 见 4.2 节 | 5, 10 | | wan2.2-i2v-plus | 3 | 见 4.2 节 | 5 | | wan2.5-i2v-preview | 3 | 见 4.2 节 | 5 | ### 8.3 向后兼容 - capabilities 为空 `{}` 的模型:API 返回全量字段且均为 `supported: false`,Provider 回退原有硬编码逻辑 - 旧接口:不依赖 capabilities 的接口保持不变 --- ## 9. 决策记录 | 决策 | 理由 | |------|------| | 新增 capabilities 列 | 与 config 职责分离,便于按能力查询、索引 | | 每模型独立 capabilities | 不同模型参数本就不同,按需配置 | | 前端统一字段契约 | 前端用同一 schema,按 supported 控制展示 | | API 补全 supported: false | 前端无需判断 key 存在性,逻辑更简单 | | config 保留 | 继续承载 timeout 等运行时配置,不混入能力描述 | | **支持参数依赖关系** | duration ← size/aspectRatio 等条件约束通过 `constraints` 表达,前端动态更新选项 | --- ## 10. 调用 AIHubMix API 的参数转换 ### 10.1 图片生成 API 调用格式 **AIHubMix 图片生成端点**: ``` POST https://aihubmix.com/v1/models///predictions ``` **请求体格式**: ```json { "input": { "prompt": "...", "size": "...", "quality": "...", ... } } ``` #### 参数映射表(我们 → AIHubMix) | 我们的参数名 | AIHubMix API 参数名 | 说明 | |-------------|---------------------|------| | prompt | prompt | 提示词(一致) | | width + height | size | 拼接为 `"{width}x{height}"` | | quality | quality | 质量档位(一致) | | reference_images | image | 单张:直接传 URL;多张:数组 `["url1", "url2"]` | | num_images | n | 生成数量 | | watermark | watermark | 水印开关 | | seed | seed | 随机种子 | | aspect_ratio | aspect_ratio | Flux 专用 | | safety_tolerance | safety_tolerance | Flux 专用 | | raw_mode | raw | Flux 专用 | | input_fidelity | input_fidelity | OpenAI 专用 | | moderation_level | moderation | OpenAI 专用 | | output_format | output_format | OpenAI 专用 | | response_format | response_format | Doubao 专用 | | sequential_generation | sequential_image_generation | Doubao 专用 | #### 调用示例(OpenAI) ```python # 我们的接口参数 { "prompt": "A cat in the garden", "width": 1024, "height": 1024, "quality": "high", "reference_images": ["https://example.com/cat.jpg"] } # 转换为 AIHubMix API 调用 import httpx response = await httpx.post( "https://aihubmix.com/v1/models/openai/gpt-image-1.5/predictions", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "input": { "prompt": "A cat in the garden", "size": "1024x1024", # width x height "quality": "high", "image": "https://example.com/cat.jpg" # 取第一张 } } ) ``` #### 调用示例(Flux 异步) ```python # 步骤 1:发起生成请求 response = await httpx.post( "https://aihubmix.com/v1/models/bfl/flux-2-pro/predictions", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "input": { "prompt": "A cat in the garden", "aspect_ratio": "16:9", "safety_tolerance": 2 } } ) task_id = response.json()["output"][0]["taskId"] # 步骤 2:轮询获取结果 result = await httpx.get( f"https://api.aihubmix.com/v1/tasks/{task_id}", headers={"Authorization": f"Bearer {api_key}"} ) ``` #### 调用示例(Doubao 多参考图) ```python # 我们的接口参数 { "prompt": "将图1的服装换为图2的服装", "reference_images": [ "https://example.com/image1.jpg", "https://example.com/image2.jpg" ], "size": "2K" } # 转换为 AIHubMix API 调用 response = await httpx.post( "https://aihubmix.com/v1/models/doubao/doubao-seedream-4-5/predictions", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "input": { "prompt": "将图1的服装换为图2的服装", "image": [ "https://example.com/image1.jpg", "https://example.com/image2.jpg" ], # 多张图片用数组 "size": "2K", "sequential_image_generation": "disabled", "watermark": false } } ) ``` ### 10.2 视频生成 API 调用格式 **AIHubMix 视频生成端点**: ``` POST https://aihubmix.com/v1/videos ``` **请求体格式(JSON)**: ```json { "model": "sora-2", "prompt": "...", "size": "720x1280", "seconds": "4" } ``` **请求体格式(带引导图 - multipart/form-data)**: ``` --form 'model="sora-2"' --form 'prompt="..."' --form 'size="1280x720"' --form 'seconds="4"' --form 'input_reference=@"/path/to/image.jpg"' ``` #### 参数映射表(我们 → AIHubMix) | 我们的参数名 | AIHubMix API 参数名 | 说明 | |-------------|---------------------|------| | model_name | model | 模型名称(一致) | | prompt | prompt | 提示词(一致) | | width + height | size | Sora/Wan: `"{width}x{height}"`;Veo: `"720P"` | | duration | seconds | 视频时长,转为字符串 `"4"` | | reference_image | input_reference | 图生视频:文件或 URL | #### 调用示例(Sora 文生视频 - JSON) ```python # 我们的接口参数 { "model": "sora-2", "prompt": "A cat playing in the garden", "width": 720, "height": 1280, "duration": 4 } # 转换为 AIHubMix API 调用 response = await httpx.post( "https://aihubmix.com/v1/videos", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "model": "sora-2", "prompt": "A cat playing in the garden", "size": "720x1280", # width x height "seconds": "4" # 整数转字符串 } ) ``` #### 调用示例(Sora 图生视频 - multipart) ```python # 我们的接口参数 { "model": "sora-2", "prompt": "The kitten is taking a nap", "width": 1280, "height": 720, "duration": 4, "reference_image": "https://example.com/cat.jpg" } # 下载参考图(如果是 URL) async with httpx.AsyncClient() as client: image_response = await client.get("https://example.com/cat.jpg") image_data = image_response.content # 转换为 AIHubMix API 调用(multipart) response = await httpx.post( "https://aihubmix.com/v1/videos", headers={"Authorization": f"Bearer {api_key}"}, data={ "model": "sora-2", "prompt": "The kitten is taking a nap", "size": "1280x720", "seconds": "4" }, files={ "input_reference": ("cat.jpg", image_data, "image/jpeg") } ) ``` #### 调用示例(Veo) ```python # Veo 使用分辨率档位(720P/1080P),不是像素值 { "model": "veo-3.1-generate-preview", "prompt": "A beautiful landscape", "size": "1080P", # 注意:是档位,不是像素 "duration": 8 } # 转换为 AIHubMix API 调用 response = await httpx.post( "https://aihubmix.com/v1/videos", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "model": "veo-3.1-generate-preview", "prompt": "A beautiful landscape", "size": "1080P", # 直接传档位字符串 "seconds": "8" } ) ``` ### 10.3 Provider 参数转换核心逻辑 在 `AIHubMixProvider` 中需实现: ```python def _convert_size_for_api( self, width: int, height: int, model_capabilities: dict ) -> str: """根据模型 capabilities 转换 size 参数格式""" size_cap = model_capabilities.get('size', {}) values = size_cap.get('values', []) # 1. 档位格式(Veo: 720P/1080P) if any('P' in v for v in values): max_dimension = max(width, height) if max_dimension <= 720: return '720P' return '1080P' # 2. K 格式(Flux/Imagen: 1K/2K/4K) if any('K' in v for v in values): max_dimension = max(width, height) if max_dimension <= 1024: return '1K' elif max_dimension <= 2048: return '2K' elif max_dimension <= 4096: return '4K' return 'auto' # 3. * 分隔(Qwen: 512*1024) if any('*' in v for v in values): return f"{width}*{height}" # 4. 标准 x 分隔(OpenAI/Sora/Wan: 1024x1024) return f"{width}x{height}" ``` ### 10.4 参数转换检查清单 调用 AIHubMix API 前必须检查: | 检查项 | 说明 | |--------|------| | ✅ size 格式 | 根据 capabilities.size.values 判断用哪种分隔符 | | ✅ seconds 类型 | 整数转字符串 `str(duration)` | | ✅ reference_image 数量 | 不超过 `capabilities.reference_image` 限制 | | ✅ multipart vs JSON | 图片有 input_reference 时用 multipart,视频同理 | | ✅ 嵌套结构 | 图片用 `{ "input": {...} }`,视频直接 `{...}` | | ✅ 模型专属参数 | 仅当 capabilities 中存在时才传递 | ---