35 KiB

Raw Blame History

RFC 144: ai_models 模型参数能力配置规划

概述

基于 AIHubMix 图片/视频生成 API 文档，规划 ai_models 表的能力配置方案，使不同模型支持差异化参数，同时为前端提供统一的 API 契约。

参考文档：

当前架构：

API → Service → Celery Task → AIProviderFactory → AIHubMixProvider → AIHubMix API

核心问题：

AIHubMixProvider 中硬编码了模型参数逻辑（if/elif 判断）
每次新增模型需要修改代码
前端无法动态获取模型支持的参数
不同模型参数差异巨大（OpenAI、Flux、Qwen、Doubao、Imagen 等）

1. 方案概要

决策	说明
新增 capabilities 列	独立 JSONB 列，与 config 职责分离（config=运行时配置，capabilities=能力描述）
每模型独立配置	每个模型的 capabilities 只存其支持的参数子集，参数不一致是预期行为
前端统一契约	API 层定义「全量能力字段枚举」，对不支持的项返回 `supported: false`，前端按统一 schema 渲染
config 保留	继续用于 timeout、API 特有参数等运行时配置

2. 数据库设计

2.1 表结构调整

  ai_models
  - model_id
  - model_name
  - model_type
  - config              # 运行时配置：timeout、API 特定参数等
+ capabilities          # 模型能力：size/quality/duration 等可选值与约束（JSONB）
  ...

2.2 迁移脚本

# alembic/versions/xxxx_add_capabilities_to_ai_models.py

def upgrade():
    op.add_column('ai_models', sa.Column(
        'capabilities',
        postgresql.JSONB,
        nullable=False,
        server_default='{}',
        comment='模型参数能力配置（尺寸、质量、时长等可选值与约束）'
    ))
    op.create_index(
        'idx_ai_models_capabilities_gin',
        'ai_models',
        ['capabilities'],
        postgresql_using='gin'
    )

def downgrade():
    op.drop_index('idx_ai_models_capabilities_gin', table_name='ai_models')
    op.drop_column('ai_models', 'capabilities')

2.3 DB capabilities 存储规则

每个模型只存其支持的参数，key 可 snake_case
不支持的能力不写入，由 API 层补全为 supported: false

3. 前端统一契约

3.1 全量能力字段枚举

字段	说明	适用 model_type	适用模型
`size`	尺寸/分辨率	2(图)、3(视频)	所有
`quality`	质量档位	2	OpenAI, 通用
`aspectRatio`	宽高比	2	Flux
`duration`	视频时长(秒)	3	所有视频模型
`referenceImage`	支持的参考图数量（0=不支持）	2、3	支持图生图/图生视频的模型
`watermark`	水印开关	2	Qwen, Doubao
`seed`	随机种子	2	Flux, Qwen, Doubao
`outputFormat`	输出格式（png/jpeg/webp）	2	OpenAI
`moderation`	内容审核档位	2	OpenAI
`inputFidelity`	输入保真度（high/low）	2	OpenAI
`safetyTolerance`	审核宽松度（0-5）	2	Flux
`raw`	原始模式（更自然的视觉效果）	2	Flux
`responseFormat`	返回格式（url/base64_json）	2	Doubao
`sequentialImageGeneration`	连续图片生成控制	2	Doubao
`n`	生成数量（1-10）	2	OpenAI, 通用

3.2 API 响应 schema（统一结构）

规则：所有模型的能力 API 返回同一套 key，不支持的项为 { "supported": false }。

{
  "modelId": "xxx",
  "modelName": "flux-2-pro",
  "modelType": 2,
  "capabilities": {
    "size": {
      "supported": true,
      "values": ["1K", "2K", "4K", "auto"],
      "default": "auto"
    },
    "quality": { "supported": false },
    "aspectRatio": {
      "supported": true,
      "values": ["16:9", "1:1", "4:3"],
      "default": "16:9"
    },
    "duration": { "supported": false },
    "referenceImage": {
      "supported": true,
      "num": 5
    },
    "watermark": { "supported": false },
    "seed": { "supported": true },
    "outputFormat": { "supported": false },
    "moderation": { "supported": false }
  }
}

视频模型示例（Sora 2）：

{
  "capabilities": {
    "size": {
      "supported": true,
      "values": ["720x1280", "1280x720"],
      "default": "720x1280"
    },
    "quality": { "supported": false },
    "aspectRatio": { "supported": false },
    "duration": {
      "supported": true,
      "values": ["4", "8", "12"],
      "default": "4"
    },
    "referenceImage": {
      "supported": true,
      "num": 1
    },
    "watermark": { "supported": false },
    "seed": { "supported": false },
    "outputFormat": { "supported": false },
    "moderation": { "supported": false }
  }
}

3.3 DB → API 映射

DB capabilities key	API 响应 key	说明
size	size	尺寸/分辨率
quality	quality	质量档位
aspect_ratio	aspectRatio	宽高比
seconds	duration	视频时长
image / input_reference	referenceImage	参考图数量（0=不支持）
watermark	watermark	水印开关
seed	seed	随机种子
output_format	outputFormat	输出格式
moderation	moderation	内容审核
input_fidelity	inputFidelity	输入保真度
safety_tolerance	safetyTolerance	审核宽松度
raw	raw	原始模式
response_format	responseFormat	返回格式
sequential_image_generation	sequentialImageGeneration	连续图片生成
n	n	生成数量

API 层职责：

读取 DB capabilities（snake_case）
按全量枚举补全缺失字段为 { "supported": false }
将 snake_case 转为 camelCase 输出
特殊处理 reference_image 和 input_reference 的数字转对象格式
保持类型结构一致性

转换逻辑示例（Python）

def transform_capabilities_to_api(db_capabilities: dict, model_type: int) -> dict:
    """将 DB capabilities 转换为前端 API 格式"""
    
    # 定义全量能力字段（按 model_type）
    if model_type == 2:  # 图片
        all_fields = [
            'size', 'quality', 'aspectRatio', 'referenceImage', 
            'watermark', 'seed', 'outputFormat', 'moderation',
            'inputFidelity', 'safetyTolerance', 'raw', 'responseFormat',
            'sequentialImageGeneration', 'n'
        ]
    elif model_type == 3:  # 视频
        all_fields = ['size', 'duration', 'referenceImage']
    else:
        all_fields = []
    
    result = {}
    
    for field in all_fields:
        # snake_case → camelCase 映射
        db_key = camel_to_snake(field)
        
        # 特殊处理：reference_image / input_reference
        if field == 'referenceImage':
            db_key = 'reference_image' if model_type == 2 else 'input_reference'
            num_value = db_capabilities.get(db_key, 0)
            
            if num_value > 0:
                result[field] = {
                    "supported": True,
                    "num": num_value
                }
            else:
                result[field] = {"supported": False}
            continue
        
        # 特殊处理：seconds → duration
        if field == 'duration':
            db_key = 'seconds'
        
        # 通用处理
        if db_key in db_capabilities:
            value = db_capabilities[db_key]
            
            # 枚举类型
            if isinstance(value, dict) and 'values' in value:
                result[field] = {
                    "supported": True,
                    **value  # 包含 values, default
                }
            # 整数范围类型
            elif isinstance(value, dict) and 'type' in value:
                result[field] = {
                    "supported": True,
                    **value
                }
            # 布尔类型
            elif isinstance(value, dict) and 'type' in value and value['type'] == 'boolean':
                result[field] = {
                    "supported": True,
                    **value
                }
            # 其他对象
            elif isinstance(value, dict):
                result[field] = {
                    "supported": True,
                    **value
                }
            else:
                result[field] = {"supported": True}
        else:
            # 不存在则标记为不支持
            result[field] = {"supported": False}
    
    return result


def camel_to_snake(name: str) -> str:
    """camelCase → snake_case"""
    import re
    return re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()

转换示例

DB 存储（Flux 2 Flex）：

{
  "size": { "values": ["1K", "2K", "4K", "auto"], "default": "auto" },
  "aspect_ratio": { "values": ["16:9", "1:1", "4:3"], "default": "16:9" },
  "safety_tolerance": { "type": "integer", "min": 0, "max": 5, "default": 2 },
  "seed": { "type": "integer" },
  "reference_image": 5
}

API 输出：

{
  "size": {
    "supported": true,
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "quality": { "supported": false },
  "aspectRatio": {
    "supported": true,
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "referenceImage": {
    "supported": true,
    "num": 5
  },
  "watermark": { "supported": false },
  "seed": { "supported": true },
  "outputFormat": { "supported": false },
  "moderation": { "supported": false },
  ...
}

4. AIHubMix API 参数对照

4.1 图片生成（完整参数对照）

通用参数（所有模型）

参数	类型	必填	说明
prompt	string	✓	提示词
size	string	-	图片尺寸
image	string	-	参考图片路径（图生图）
n	integer	-	生成数量（1-10，默认 1）
quality	string	-	渲染质量（low/medium/high）

OpenAI 特有参数

参数	类型	说明	可选值
input_fidelity	string	保真度	high, low（默认）
moderation	string	内容审核严格度	auto（默认）, low
output_format	string	输出格式	png, jpeg（默认）, webp

尺寸支持：

DALL-E 3: 1024x1024, 1792x1024, 1024x1792
GPT Image 1.5: 1024x1024 等

Flux 特有参数

参数	类型	说明	可选值/范围
aspect_ratio	string	宽高比	16:9（默认）, 1:1, 4:3
safety_tolerance	integer	审核宽松度	0-5（默认 2）
seed	integer	随机种子	-
raw	boolean	原始模式	true, false（默认）

尺寸支持：1K, 2K, 4K, auto（默认）

Qwen 特有参数

参数	类型	说明	可选值/范围
watermark	boolean	水印	true（默认）, false
seed	integer	随机种子	0-2147483647

尺寸支持：512*1024, 768*512, 768*1024, 1024*576, 576*1024, 1024*1024（默认）

Doubao 特有参数

参数	类型	说明	可选值/范围
sequential_image_generation	string	连续图片生成控制	auto（默认）, disabled
sequential_image_generation_options	object	连续图片配置	`{max_images: 1-15}`
watermark	boolean	水印	true（默认）, false
seed	integer	随机种子	-1 到 2147483647（默认 -1）
response_format	string	返回格式	url（默认）, base64_json

尺寸支持：

Doubao 4.5: 2K, 4K, auto（默认）
Doubao 4.0: 1K, 2K, 4K, auto（默认）

Imagen 特有参数

尺寸支持：1K, 2K, 4K, auto（默认）

4.2 视频生成（完整参数规范）

基于 AIHubMix 视频生成接口

通用参数

参数	类型	必填	说明
prompt	string	✓	提示词，描述镜头类型、主体、动作、场景、光线、镜头运动。建议描述内容单一
model	string	-	模型路径
size	string	-	分辨率（宽度×高度）
seconds	string	-	视频时长（秒）
input_reference	File	-	参考图像，作为视频第一帧。支持 image/jpeg、image/png、image/webp

input_reference 调用方式：使用引导图时需 multipart/form-data，否则用 JSON body。

按模型系列参数取值

系列	size 可选值	默认	seconds 可选值	默认
Sora	720x1280, 1280x720	720x1280	4, 8, 12	4
Veo	720P, 1080P	720P	4, 6, 8	8
Wan 480P	832x480, 480x832, 624x624	-	5	5
Wan 720P	1088x832, 832x1088, 960x960, 1280x720, 720x1280	-	5	5
Wan 1080P	1248x1632, 1632x1248, 1440x1440, 1080x1920, 1920x1080	-	5/10*	5

*wan2.5-t2v-preview 支持 10s，wan2.2-t2v-plus 仅 5s。

i2v / t2v 能力区分

模型	文生视频(t2v)	图生视频(i2v)	说明
sora-2, sora-2-pro	✓	✓	两种都支持
veo-3.0, veo-3.1	✓	✓	两种都支持
wan2.2-t2v-plus	✓	✗	仅文生视频
wan2.2-i2v-plus	✗	✓	仅图生视频，input_reference 必填
wan2.5-t2v-preview	✓	✗	仅文生视频
wan2.5-i2v-preview	✗	✓	仅图生视频，input_reference 必填

capabilities 中的表达：

input_reference: 1 表示支持 1 张参考图
input_reference: 0 表示不支持（仅文生视频）
对 wan i2v 系列，可增 videoMode: "i2v" 标识仅图生

5. 参数规范（capabilities 定义标准）

5.1 参数定义结构

每个能力参数在 capabilities 中的存储需符合以下结构：

结构类型	JSON 示例	说明
枚举	`{ "values": ["a","b"], "default": "a" }`	固定可选值
整数范围	`{ "type": "integer", "min": 0, "max": 5, "default": 2 }`	数值区间
布尔	`{ "type": "boolean", "default": true }`	开关
参考图数量	`{ "supported": true, "num": 5 }`	支持的最大数量
不支持	`{ "supported": false }`	该能力不支持
对象	`{ "type": "object", "properties": {...} }`	对象类型（如 sequential_image_generation_options）

5.2 图片参数规范汇总

参数 key	类型	values / 约束	默认	适用模型
size	enum	见 4.1 各供应商	-	所有
quality	enum	low/medium/high 或 standard/hd	-	OpenAI, Imagen
aspect_ratio	enum	16:9, 1:1, 4:3	16:9	Flux
n	integer	min:1, max:10	1	通用
reference_image	integer	0=不支持，1+=最大数量	0	支持图生图的模型
watermark	bool	-	true	Qwen, Doubao
seed	integer	min/max 见供应商	-	Flux, Qwen, Doubao
output_format	enum	png, jpeg, webp	jpeg	OpenAI
moderation	enum	auto, low	auto	OpenAI
input_fidelity	enum	high, low	low	OpenAI
safety_tolerance	integer	0-5	2	Flux
raw	bool	-	false	Flux
response_format	enum	url, base64_json	url	Doubao
sequential_image_generation	enum	auto, disabled	auto	Doubao

reference_image 数量限制示例：

不支持模型：DB 不存或存 0，API 返回 { "supported": false }
大部分模型：DB 存 1，API 返回 { "supported": true, "num": 1 }
Flux 2 Flex：DB 存 5，API 返回 { "supported": true, "num": 5 }

前端根据 num 值动态限制上传数量，supported: false 时隐藏上传控件。

5.3 视频参数规范汇总

参数 key	类型	values / 约束	默认	适用模型
size	enum	见 4.2 按系列	按系列	所有视频
seconds	enum	4/8/12 或 4/6/8 或 5 或 5/10	按系列	所有视频
input_reference	integer	0=不支持，1+=最大数量	0	Sora, Veo, Wan i2v

input_reference 数量限制示例：

不支持模型（t2v 系列）：DB 不存或存 0，API 返回 { "supported": false }
Sora/Veo：DB 存 1，API 返回 { "supported": true, "num": 1 }
Wan i2v：DB 存 1，API 返回 { "supported": true, "num": 1 }

5.4 参数依赖关系（条件约束）

某些参数的可选值依赖于其他参数，常见场景：

依赖关系	说明	示例
duration ← size	不同分辨率支持不同时长	高分辨率可能最长仅 5s，低分辨率可达 12s
duration ← aspectRatio	横屏/竖屏时长限制不同	竖屏(9:16) 可能最长 8s，横屏(16:9) 可达 12s
size ← aspectRatio	选定宽高比后，size 限定为对应尺寸	Flux: aspectRatio=16:9 时 size 只能选 16:9 比例值

在 capabilities 中表达依赖

使用 constraints 字段描述条件约束：

{
  "seconds": {
    "values": ["4", "8", "12"],
    "default": "4",
    "constraints": {
      "when": { "size": ["720x1280", "480x832"] },
      "then": { "values": ["4", "8"] }
    }
  }
}

解释：当 size 为 720x1280 或 480x832（高分辨率/竖屏）时，seconds 只能选 4 或 8，不能选 12。

多条件约束示例

{
  "seconds": {
    "values": ["3", "5", "10", "12"],
    "default": "5",
    "constraints": [
      {
        "when": { "size": ["1920x1080", "1080x1920"] },
        "then": { "values": ["3", "5"] }
      },
      {
        "when": { "size": ["1280x720", "720x1280"] },
        "then": { "values": ["5", "10"] }
      },
      {
        "when": { "size": ["720P"] },
        "then": { "values": ["5", "10", "12"] }
      }
    ]
  }
}

前端处理逻辑：

用户选择 size = "1920x1080"
前端读取 constraints[0]，发现匹配 when 条件
动态更新 seconds 可选值为 ["3", "5"]
如果当前选中的 seconds 不在新范围内，重置为 default

5.5 参数校验规则

values 枚举：用户传入值必须在 values 中
integer：必须在 min～max 范围内
必填能力：如 wan i2v 的 input_reference，调用时缺则拒绝
条件约束：后端校验时需检查 constraints，前端需动态更新选项

6. 模型 capabilities 配置示例

6.1 图片模型

DALL-E 3

{
  "size": {
    "values": ["1024x1024", "1792x1024", "1024x1792"],
    "default": "1024x1024"
  },
  "quality": {
    "values": ["standard", "hd"],
    "default": "standard"
  },
  "input_fidelity": {
    "values": ["high", "low"],
    "default": "low"
  },
  "moderation": {
    "values": ["auto", "low"],
    "default": "auto"
  },
  "output_format": {
    "values": ["png", "jpeg", "webp"],
    "default": "jpeg"
  },
  "n": {
    "type": "integer",
    "min": 1,
    "max": 10,
    "default": 1
  },
  "reference_image": 1
}

GPT Image 1.5

{
  "size": {
    "values": ["1024x1024", "1792x1024", "1024x1792"],
    "default": "1024x1024"
  },
  "quality": {
    "values": ["standard", "hd"],
    "default": "standard"
  },
  "input_fidelity": {
    "values": ["high", "low"],
    "default": "low"
  },
  "output_format": {
    "values": ["png", "jpeg", "webp"],
    "default": "jpeg"
  },
  "n": {
    "type": "integer",
    "min": 1,
    "max": 10,
    "default": 1
  },
  "reference_image": 1
}

Flux 2 Flex（支持多参考图）

{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "aspect_ratio": {
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "safety_tolerance": {
    "type": "integer",
    "min": 0,
    "max": 5,
    "default": 2
  },
  "seed": {
    "type": "integer"
  },
  "raw": {
    "type": "boolean",
    "default": false
  },
  "reference_image": 5
}

说明：Flux 2 Flex 支持最多 5 张参考图进行风格融合。

Flux 2 Pro

{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "aspect_ratio": {
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "safety_tolerance": {
    "type": "integer",
    "min": 0,
    "max": 5,
    "default": 2
  },
  "seed": {
    "type": "integer"
  },
  "raw": {
    "type": "boolean",
    "default": false
  },
  "reference_image": 1
}

Qwen Image

{
  "size": {
    "values": ["512*1024", "768*512", "768*1024", "1024*576", "576*1024", "1024*1024"],
    "default": "1024*1024"
  },
  "watermark": {
    "type": "boolean",
    "default": true
  },
  "seed": {
    "type": "integer",
    "min": 0,
    "max": 2147483647
  },
  "reference_image": 1
}

Doubao SeedDream 4.5

{
  "size": {
    "values": ["2K", "4K", "auto"],
    "default": "auto"
  },
  "watermark": {
    "type": "boolean",
    "default": true
  },
  "seed": {
    "type": "integer",
    "min": -1,
    "max": 2147483647,
    "default": -1
  },
  "response_format": {
    "values": ["url", "base64_json"],
    "default": "url"
  },
  "sequential_image_generation": {
    "values": ["auto", "disabled"],
    "default": "auto"
  },
  "sequential_image_generation_options": {
    "type": "object",
    "properties": {
      "max_images": {
        "type": "integer",
        "min": 1,
        "max": 15,
        "default": 15
      }
    }
  },
  "reference_image": 1
}

Imagen 4.0 Fast

{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "quality": {
    "values": ["low", "medium", "high"],
    "default": "medium"
  },
  "reference_image": 1
}

6.2 视频模型

Sora 2（支持条件约束）

{
  "size": {
    "values": ["720x1280", "1280x720"],
    "default": "720x1280"
  },
  "seconds": {
    "values": ["4", "8", "12"],
    "default": "4",
    "constraints": [
      {
        "when": { "size": ["720x1280"] },
        "then": { "values": ["4", "8"] },
        "reason": "竖屏高分辨率最长 8 秒"
      },
      {
        "when": { "size": ["1280x720"] },
        "then": { "values": ["4", "8", "12"] },
        "reason": "横屏支持最长 12 秒"
      }
    ]
  },
  "input_reference": 1
}

说明：这是假设示例，实际约束需根据 AIHubMix 真实限制调整。如果 Sora 没有此类限制，则无需 constraints。

Veo 3.1

{
  "size": { "values": ["720P", "1080P"], "default": "720P" },
  "seconds": { "values": ["4", "6", "8"], "default": "8" },
  "input_reference": 1
}

Wan 2.5 T2V（文生视频）

{
  "size": {
    "values": [
      "832x480", "480x832", "624x624",
      "1088x832", "832x1088", "960x960", "1280x720", "720x1280",
      "1248x1632", "1632x1248", "1440x1440", "1080x1920", "1920x1080"
    ],
    "default": "1280x720"
  },
  "seconds": { "values": ["5", "10"], "default": "5" },
  "input_reference": 0
}

Wan 2.2/2.5 i2v（图生视频，input_reference 必填）

{
  "size": {
    "values": [
      "832x480", "480x832", "624x624",
      "1088x832", "832x1088", "960x960", "1280x720", "720x1280",
      "1248x1632", "1632x1248", "1440x1440", "1080x1920", "1920x1080"
    ],
    "default": "1280x720"
  },
  "seconds": { "values": ["5"], "default": "5" },
  "input_reference": 1,
  "video_mode": "i2v"
}

wan2.5-i2v-preview 的 seconds 支持 5、10。

7. 实施计划

7.1 阶段一：数据库与模型

新增 capabilities 列（Alembic 迁移）
更新 AIModel 模型定义
为现有图片/视频模型写入 capabilities（数据迁移脚本）

7.2 阶段二：API 层

定义「全量能力枚举」常量
实现 DB capabilities → 前端 schema 转换逻辑（补全 supported: false、snake_case→camelCase）
模型列表/详情接口返回统一格式 capabilities

7.3 阶段三：Provider 改造

AIHubMixProvider 从 model.capabilities 读取 size、seconds 等默认值
支持请求参数与 capabilities 校验
移除基于 model_name 的硬编码分支

7.4 阶段四：前端

定义 TypeScript AiModelCapabilities 接口（含全量字段）
根据 supported 与 values 动态渲染参数控件
请求生成时携带用户选择的参数

8. 附录

8.1 capabilities 字段类型约定

type	结构示例	说明
enum	`{ "values": ["a","b"], "default": "a" }`	枚举类型，必须提供可选值列表
integer	`{ "type": "integer", "min": 0, "max": 5, "default": 2 }`	整数范围类型
boolean	`{ "type": "boolean", "default": true }`	布尔类型
object	`{ "type": "object", "properties": {...} }`	对象类型（如 sequential_image_generation_options）

TypeScript 类型定义：

type CapabilityValue = 
  | { values: string[], default: string }                    // 枚举
  | { type: "integer", min?: number, max?: number, default?: number }  // 整数范围
  | { type: "boolean", default: boolean }                    // 布尔
  | { type: "object", properties: Record<string, any> }      // 对象
  | { supported: false }                                     // 不支持
  | { supported: true, num: number }                         // 参考图数量

8.2 AIHubMix 模型与参数映射

模型名	model_type	size 示例	seconds
dall-e-3	2	1024x1024, 1792x1024, 1024x1792	-
gpt-image-1.5	2	1024x1024 等	-
flux-2-pro	2	1K, 2K, 4K, auto	-
qwen-image	2	5121024, 10241024 等	-
doubao-seedream-4-5	2	2K, 4K, auto	-
imagen-4.0-fast-generate-001	2	1K, 2K, 4K, auto	-
sora-2	3	720x1280, 1280x720	4, 8, 12
sora-2-pro	3	720x1280, 1280x720	4, 8, 12
veo-3.1-generate-preview	3	720P, 1080P	4, 6, 8
veo-3.0-generate-preview	3	720P, 1080P	4, 6, 8
wan2.2-t2v-plus	3	见 4.2 节	5
wan2.5-t2v-preview	3	见 4.2 节	5, 10
wan2.2-i2v-plus	3	见 4.2 节	5
wan2.5-i2v-preview	3	见 4.2 节	5

8.3 向后兼容

capabilities 为空 {} 的模型：API 返回全量字段且均为 supported: false，Provider 回退原有硬编码逻辑
旧接口：不依赖 capabilities 的接口保持不变

9. 决策记录

决策	理由
新增 capabilities 列	与 config 职责分离，便于按能力查询、索引
每模型独立 capabilities	不同模型参数本就不同，按需配置
前端统一字段契约	前端用同一 schema，按 supported 控制展示
API 补全 supported: false	前端无需判断 key 存在性，逻辑更简单
config 保留	继续承载 timeout 等运行时配置，不混入能力描述
支持参数依赖关系	duration ← size/aspectRatio 等条件约束通过 `constraints` 表达，前端动态更新选项

10. 调用 AIHubMix API 的参数转换

10.1 图片生成 API 调用格式

AIHubMix 图片生成端点：

POST https://aihubmix.com/v1/models/<provider>/<model_name>/predictions

请求体格式：

{
  "input": {
    "prompt": "...",
    "size": "...",
    "quality": "...",
    ...
  }
}

参数映射表（我们 → AIHubMix）

我们的参数名	AIHubMix API 参数名	说明
prompt	prompt	提示词（一致）
width + height	size	拼接为 `"{width}x{height}"`
quality	quality	质量档位（一致）
reference_images	image	单张：直接传 URL；多张：数组 `["url1", "url2"]`
num_images	n	生成数量
watermark	watermark	水印开关
seed	seed	随机种子
aspect_ratio	aspect_ratio	Flux 专用
safety_tolerance	safety_tolerance	Flux 专用
raw_mode	raw	Flux 专用
input_fidelity	input_fidelity	OpenAI 专用
moderation_level	moderation	OpenAI 专用
output_format	output_format	OpenAI 专用
response_format	response_format	Doubao 专用
sequential_generation	sequential_image_generation	Doubao 专用

调用示例（OpenAI）

# 我们的接口参数
{
  "prompt": "A cat in the garden",
  "width": 1024,
  "height": 1024,
  "quality": "high",
  "reference_images": ["https://example.com/cat.jpg"]
}

# 转换为 AIHubMix API 调用
import httpx

response = await httpx.post(
    "https://aihubmix.com/v1/models/openai/gpt-image-1.5/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "A cat in the garden",
            "size": "1024x1024",          # width x height
            "quality": "high",
            "image": "https://example.com/cat.jpg"  # 取第一张
        }
    }
)

调用示例（Flux 异步）

# 步骤 1：发起生成请求
response = await httpx.post(
    "https://aihubmix.com/v1/models/bfl/flux-2-pro/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "A cat in the garden",
            "aspect_ratio": "16:9",
            "safety_tolerance": 2
        }
    }
)

task_id = response.json()["output"][0]["taskId"]

# 步骤 2：轮询获取结果
result = await httpx.get(
    f"https://api.aihubmix.com/v1/tasks/{task_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)

调用示例（Doubao 多参考图）

# 我们的接口参数
{
  "prompt": "将图1的服装换为图2的服装",
  "reference_images": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "size": "2K"
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/models/doubao/doubao-seedream-4-5/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "将图1的服装换为图2的服装",
            "image": [
                "https://example.com/image1.jpg",
                "https://example.com/image2.jpg"
            ],  # 多张图片用数组
            "size": "2K",
            "sequential_image_generation": "disabled",
            "watermark": false
        }
    }
)

10.2 视频生成 API 调用格式

AIHubMix 视频生成端点：

POST https://aihubmix.com/v1/videos

请求体格式（JSON）：

{
  "model": "sora-2",
  "prompt": "...",
  "size": "720x1280",
  "seconds": "4"
}

请求体格式（带引导图 - multipart/form-data）：

--form 'model="sora-2"'
--form 'prompt="..."'
--form 'size="1280x720"'
--form 'seconds="4"'
--form 'input_reference=@"/path/to/image.jpg"'

参数映射表（我们 → AIHubMix）

我们的参数名	AIHubMix API 参数名	说明
model_name	model	模型名称（一致）
prompt	prompt	提示词（一致）
width + height	size	Sora/Wan: `"{width}x{height}"`；Veo: `"720P"`
duration	seconds	视频时长，转为字符串 `"4"`
reference_image	input_reference	图生视频：文件或 URL

调用示例（Sora 文生视频 - JSON）

# 我们的接口参数
{
  "model": "sora-2",
  "prompt": "A cat playing in the garden",
  "width": 720,
  "height": 1280,
  "duration": 4
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sora-2",
        "prompt": "A cat playing in the garden",
        "size": "720x1280",     # width x height
        "seconds": "4"          # 整数转字符串
    }
)

调用示例（Sora 图生视频 - multipart）

# 我们的接口参数
{
  "model": "sora-2",
  "prompt": "The kitten is taking a nap",
  "width": 1280,
  "height": 720,
  "duration": 4,
  "reference_image": "https://example.com/cat.jpg"
}

# 下载参考图（如果是 URL）
async with httpx.AsyncClient() as client:
    image_response = await client.get("https://example.com/cat.jpg")
    image_data = image_response.content

# 转换为 AIHubMix API 调用（multipart）
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={"Authorization": f"Bearer {api_key}"},
    data={
        "model": "sora-2",
        "prompt": "The kitten is taking a nap",
        "size": "1280x720",
        "seconds": "4"
    },
    files={
        "input_reference": ("cat.jpg", image_data, "image/jpeg")
    }
)

调用示例（Veo）

# Veo 使用分辨率档位（720P/1080P），不是像素值
{
  "model": "veo-3.1-generate-preview",
  "prompt": "A beautiful landscape",
  "size": "1080P",      # 注意：是档位，不是像素
  "duration": 8
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "veo-3.1-generate-preview",
        "prompt": "A beautiful landscape",
        "size": "1080P",        # 直接传档位字符串
        "seconds": "8"
    }
)

10.3 Provider 参数转换核心逻辑

在 AIHubMixProvider 中需实现：

def _convert_size_for_api(
    self,
    width: int,
    height: int,
    model_capabilities: dict
) -> str:
    """根据模型 capabilities 转换 size 参数格式"""
    
    size_cap = model_capabilities.get('size', {})
    values = size_cap.get('values', [])
    
    # 1. 档位格式（Veo: 720P/1080P）
    if any('P' in v for v in values):
        max_dimension = max(width, height)
        if max_dimension <= 720:
            return '720P'
        return '1080P'
    
    # 2. K 格式（Flux/Imagen: 1K/2K/4K）
    if any('K' in v for v in values):
        max_dimension = max(width, height)
        if max_dimension <= 1024:
            return '1K'
        elif max_dimension <= 2048:
            return '2K'
        elif max_dimension <= 4096:
            return '4K'
        return 'auto'
    
    # 3. * 分隔（Qwen: 512*1024）
    if any('*' in v for v in values):
        return f"{width}*{height}"
    
    # 4. 标准 x 分隔（OpenAI/Sora/Wan: 1024x1024）
    return f"{width}x{height}"

10.4 参数转换检查清单

调用 AIHubMix API 前必须检查：

检查项	说明
✅ size 格式	根据 capabilities.size.values 判断用哪种分隔符
✅ seconds 类型	整数转字符串 `str(duration)`
✅ reference_image 数量	不超过 `capabilities.reference_image` 限制
✅ multipart vs JSON	图片有 input_reference 时用 multipart，视频同理
✅ 嵌套结构	图片用 `{ "input": {...} }`，视频直接 `{...}`
✅ 模型专属参数	仅当 capabilities 中存在时才传递

35 KiB Raw Blame History

RFC 144: ai_models 模型参数能力配置规划

概述

1. 方案概要

2. 数据库设计

2.1 表结构调整

2.2 迁移脚本

2.3 DB capabilities 存储规则

3. 前端统一契约

3.1 全量能力字段枚举

3.2 API 响应 schema（统一结构）

3.3 DB → API 映射

转换逻辑示例（Python）

转换示例

4. AIHubMix API 参数对照

4.1 图片生成（完整参数对照）

通用参数（所有模型）

OpenAI 特有参数

Flux 特有参数

Qwen 特有参数

Doubao 特有参数

Imagen 特有参数

4.2 视频生成（完整参数规范）

通用参数

按模型系列参数取值

i2v / t2v 能力区分

5. 参数规范（capabilities 定义标准）

5.1 参数定义结构

5.2 图片参数规范汇总

5.3 视频参数规范汇总

5.4 参数依赖关系（条件约束）

在 capabilities 中表达依赖

多条件约束示例

5.5 参数校验规则

6. 模型 capabilities 配置示例

6.1 图片模型

DALL-E 3

GPT Image 1.5

Flux 2 Flex（支持多参考图）

Flux 2 Pro

Qwen Image

Doubao SeedDream 4.5

Imagen 4.0 Fast

6.2 视频模型

Sora 2（支持条件约束）

Veo 3.1

Wan 2.5 T2V（文生视频）

Wan 2.2/2.5 i2v（图生视频，input_reference 必填）

7. 实施计划

7.1 阶段一：数据库与模型

7.2 阶段二：API 层

7.3 阶段三：Provider 改造

7.4 阶段四：前端

8. 附录

8.1 capabilities 字段类型约定

8.2 AIHubMix 模型与参数映射

8.3 向后兼容

9. 决策记录

10. 调用 AIHubMix API 的参数转换

10.1 图片生成 API 调用格式

参数映射表（我们 → AIHubMix）

调用示例（OpenAI）

调用示例（Flux 异步）

调用示例（Doubao 多参考图）

10.2 视频生成 API 调用格式

参数映射表（我们 → AIHubMix）

调用示例（Sora 文生视频 - JSON）

调用示例（Sora 图生视频 - multipart）

调用示例（Veo）

10.3 Provider 参数转换核心逻辑

10.4 参数转换检查清单

35 KiB

Raw Blame History