# RFC 144: ai_models 模型参数能力配置规划

## 概述

基于 AIHubMix 图片/视频生成 API 文档，规划 `ai_models` 表的能力配置方案，使不同模型支持差异化参数，同时为前端提供统一的 API 契约。

**参考文档**：
- [图片生成](https://docs.aihubmix.com/en/api/Image-Gen)
- [视频生成](https://docs.aihubmix.com/cn/api/Video-Gen)

**当前架构**：
```
API → Service → Celery Task → AIProviderFactory → AIHubMixProvider → AIHubMix API
```

**核心问题**：
- `AIHubMixProvider` 中硬编码了模型参数逻辑（if/elif 判断）
- 每次新增模型需要修改代码
- 前端无法动态获取模型支持的参数
- 不同模型参数差异巨大（OpenAI、Flux、Qwen、Doubao、Imagen 等）

---

## 1. 方案概要

| 决策 | 说明 |
|------|------|
| **新增 capabilities 列** | 独立 JSONB 列，与 config 职责分离（config=运行时配置，capabilities=能力描述） |
| **每模型独立配置** | 每个模型的 capabilities 只存其支持的参数子集，参数不一致是预期行为 |
| **前端统一契约** | API 层定义「全量能力字段枚举」，对不支持的项返回 `supported: false`，前端按统一 schema 渲染 |
| **config 保留** | 继续用于 timeout、API 特有参数等运行时配置 |

---

## 2. 数据库设计

### 2.1 表结构调整

```diff
  ai_models
  - model_id
  - model_name
  - model_type
  - config              # 运行时配置：timeout、API 特定参数等
+ capabilities          # 模型能力：size/quality/duration 等可选值与约束（JSONB）
  ...
```

### 2.2 迁移脚本

```python
# alembic/versions/xxxx_add_capabilities_to_ai_models.py

def upgrade():
    op.add_column('ai_models', sa.Column(
        'capabilities',
        postgresql.JSONB,
        nullable=False,
        server_default='{}',
        comment='模型参数能力配置（尺寸、质量、时长等可选值与约束）'
    ))
    op.create_index(
        'idx_ai_models_capabilities_gin',
        'ai_models',
        ['capabilities'],
        postgresql_using='gin'
    )

def downgrade():
    op.drop_index('idx_ai_models_capabilities_gin', table_name='ai_models')
    op.drop_column('ai_models', 'capabilities')
```

### 2.3 DB capabilities 存储规则

- 每个模型只存**其支持的参数**，key 可 snake_case
- 不支持的能力不写入，由 API 层补全为 `supported: false`

---

## 3. 前端统一契约

### 3.1 全量能力字段枚举

| 字段 | 说明 | 适用 model_type | 适用模型 |
|------|------|-----------------|----------|
| `size` | 尺寸/分辨率 | 2(图)、3(视频) | 所有 |
| `quality` | 质量档位 | 2 | OpenAI, 通用 |
| `aspectRatio` | 宽高比 | 2 | Flux |
| `duration` | 视频时长(秒) | 3 | 所有视频模型 |
| `referenceImage` | 支持的参考图数量（0=不支持） | 2、3 | 支持图生图/图生视频的模型 |
| `watermark` | 水印开关 | 2 | Qwen, Doubao |
| `seed` | 随机种子 | 2 | Flux, Qwen, Doubao |
| `outputFormat` | 输出格式（png/jpeg/webp） | 2 | OpenAI |
| `moderation` | 内容审核档位 | 2 | OpenAI |
| `inputFidelity` | 输入保真度（high/low） | 2 | OpenAI |
| `safetyTolerance` | 审核宽松度（0-5） | 2 | Flux |
| `raw` | 原始模式（更自然的视觉效果） | 2 | Flux |
| `responseFormat` | 返回格式（url/base64_json） | 2 | Doubao |
| `sequentialImageGeneration` | 连续图片生成控制 | 2 | Doubao |
| `n` | 生成数量（1-10） | 2 | OpenAI, 通用 |

### 3.2 API 响应 schema（统一结构）

**规则**：所有模型的能力 API 返回同一套 key，不支持的项为 `{ "supported": false }`。

```json
{
  "modelId": "xxx",
  "modelName": "flux-2-pro",
  "modelType": 2,
  "capabilities": {
    "size": {
      "supported": true,
      "values": ["1K", "2K", "4K", "auto"],
      "default": "auto"
    },
    "quality": { "supported": false },
    "aspectRatio": {
      "supported": true,
      "values": ["16:9", "1:1", "4:3"],
      "default": "16:9"
    },
    "duration": { "supported": false },
    "referenceImage": {
      "supported": true,
      "num": 5
    },
    "watermark": { "supported": false },
    "seed": { "supported": true },
    "outputFormat": { "supported": false },
    "moderation": { "supported": false }
  }
}
```

**视频模型示例**（Sora 2）：

```json
{
  "capabilities": {
    "size": {
      "supported": true,
      "values": ["720x1280", "1280x720"],
      "default": "720x1280"
    },
    "quality": { "supported": false },
    "aspectRatio": { "supported": false },
    "duration": {
      "supported": true,
      "values": ["4", "8", "12"],
      "default": "4"
    },
    "referenceImage": {
      "supported": true,
      "num": 1
    },
    "watermark": { "supported": false },
    "seed": { "supported": false },
    "outputFormat": { "supported": false },
    "moderation": { "supported": false }
  }
}
```

### 3.3 DB → API 映射

| DB capabilities key | API 响应 key | 说明 |
|---------------------|--------------|------|
| size | size | 尺寸/分辨率 |
| quality | quality | 质量档位 |
| aspect_ratio | aspectRatio | 宽高比 |
| seconds | duration | 视频时长 |
| image / input_reference | referenceImage | 参考图数量（0=不支持） |
| watermark | watermark | 水印开关 |
| seed | seed | 随机种子 |
| output_format | outputFormat | 输出格式 |
| moderation | moderation | 内容审核 |
| input_fidelity | inputFidelity | 输入保真度 |
| safety_tolerance | safetyTolerance | 审核宽松度 |
| raw | raw | 原始模式 |
| response_format | responseFormat | 返回格式 |
| sequential_image_generation | sequentialImageGeneration | 连续图片生成 |
| n | n | 生成数量 |

**API 层职责**：
1. 读取 DB capabilities（snake_case）
2. 按全量枚举补全缺失字段为 `{ "supported": false }`
3. 将 snake_case 转为 camelCase 输出
4. 特殊处理 `reference_image` 和 `input_reference` 的数字转对象格式
5. 保持类型结构一致性

#### 转换逻辑示例（Python）

```python
def transform_capabilities_to_api(db_capabilities: dict, model_type: int) -> dict:
    """将 DB capabilities 转换为前端 API 格式"""
    
    # 定义全量能力字段（按 model_type）
    if model_type == 2:  # 图片
        all_fields = [
            'size', 'quality', 'aspectRatio', 'referenceImage', 
            'watermark', 'seed', 'outputFormat', 'moderation',
            'inputFidelity', 'safetyTolerance', 'raw', 'responseFormat',
            'sequentialImageGeneration', 'n'
        ]
    elif model_type == 3:  # 视频
        all_fields = ['size', 'duration', 'referenceImage']
    else:
        all_fields = []
    
    result = {}
    
    for field in all_fields:
        # snake_case → camelCase 映射
        db_key = camel_to_snake(field)
        
        # 特殊处理：reference_image / input_reference
        if field == 'referenceImage':
            db_key = 'reference_image' if model_type == 2 else 'input_reference'
            num_value = db_capabilities.get(db_key, 0)
            
            if num_value > 0:
                result[field] = {
                    "supported": True,
                    "num": num_value
                }
            else:
                result[field] = {"supported": False}
            continue
        
        # 特殊处理：seconds → duration
        if field == 'duration':
            db_key = 'seconds'
        
        # 通用处理
        if db_key in db_capabilities:
            value = db_capabilities[db_key]
            
            # 枚举类型
            if isinstance(value, dict) and 'values' in value:
                result[field] = {
                    "supported": True,
                    **value  # 包含 values, default
                }
            # 整数范围类型
            elif isinstance(value, dict) and 'type' in value:
                result[field] = {
                    "supported": True,
                    **value
                }
            # 布尔类型
            elif isinstance(value, dict) and 'type' in value and value['type'] == 'boolean':
                result[field] = {
                    "supported": True,
                    **value
                }
            # 其他对象
            elif isinstance(value, dict):
                result[field] = {
                    "supported": True,
                    **value
                }
            else:
                result[field] = {"supported": True}
        else:
            # 不存在则标记为不支持
            result[field] = {"supported": False}
    
    return result


def camel_to_snake(name: str) -> str:
    """camelCase → snake_case"""
    import re
    return re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
```

#### 转换示例

**DB 存储**（Flux 2 Flex）：
```json
{
  "size": { "values": ["1K", "2K", "4K", "auto"], "default": "auto" },
  "aspect_ratio": { "values": ["16:9", "1:1", "4:3"], "default": "16:9" },
  "safety_tolerance": { "type": "integer", "min": 0, "max": 5, "default": 2 },
  "seed": { "type": "integer" },
  "reference_image": 5
}
```

**API 输出**：
```json
{
  "size": {
    "supported": true,
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "quality": { "supported": false },
  "aspectRatio": {
    "supported": true,
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "referenceImage": {
    "supported": true,
    "num": 5
  },
  "watermark": { "supported": false },
  "seed": { "supported": true },
  "outputFormat": { "supported": false },
  "moderation": { "supported": false },
  ...
}
```

---

## 4. AIHubMix API 参数对照

### 4.1 图片生成（完整参数对照）

#### 通用参数（所有模型）
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| prompt | string | ✓ | 提示词 |
| size | string | - | 图片尺寸 |
| image | string | - | 参考图片路径（图生图） |
| n | integer | - | 生成数量（1-10，默认 1） |
| quality | string | - | 渲染质量（low/medium/high） |

#### OpenAI 特有参数
| 参数 | 类型 | 说明 | 可选值 |
|------|------|------|--------|
| input_fidelity | string | 保真度 | high, low（默认） |
| moderation | string | 内容审核严格度 | auto（默认）, low |
| output_format | string | 输出格式 | png, jpeg（默认）, webp |

**尺寸支持**：
- DALL-E 3: `1024x1024`, `1792x1024`, `1024x1792`
- GPT Image 1.5: `1024x1024` 等

#### Flux 特有参数
| 参数 | 类型 | 说明 | 可选值/范围 |
|------|------|------|------------|
| aspect_ratio | string | 宽高比 | 16:9（默认）, 1:1, 4:3 |
| safety_tolerance | integer | 审核宽松度 | 0-5（默认 2） |
| seed | integer | 随机种子 | - |
| raw | boolean | 原始模式 | true, false（默认） |

**尺寸支持**：`1K`, `2K`, `4K`, `auto`（默认）

#### Qwen 特有参数
| 参数 | 类型 | 说明 | 可选值/范围 |
|------|------|------|------------|
| watermark | boolean | 水印 | true（默认）, false |
| seed | integer | 随机种子 | 0-2147483647 |

**尺寸支持**：`512*1024`, `768*512`, `768*1024`, `1024*576`, `576*1024`, `1024*1024`（默认）

#### Doubao 特有参数
| 参数 | 类型 | 说明 | 可选值/范围 |
|------|------|------|------------|
| sequential_image_generation | string | 连续图片生成控制 | auto（默认）, disabled |
| sequential_image_generation_options | object | 连续图片配置 | `{max_images: 1-15}` |
| watermark | boolean | 水印 | true（默认）, false |
| seed | integer | 随机种子 | -1 到 2147483647（默认 -1） |
| response_format | string | 返回格式 | url（默认）, base64_json |

**尺寸支持**：
- Doubao 4.5: `2K`, `4K`, `auto`（默认）
- Doubao 4.0: `1K`, `2K`, `4K`, `auto`（默认）

#### Imagen 特有参数
**尺寸支持**：`1K`, `2K`, `4K`, `auto`（默认）

### 4.2 视频生成（完整参数规范）

基于 [AIHubMix 视频生成接口](https://docs.aihubmix.com/cn/api/Video-Gen)

#### 通用参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| prompt | string | ✓ | 提示词，描述镜头类型、主体、动作、场景、光线、镜头运动。建议描述内容单一 |
| model | string | - | 模型路径 |
| size | string | - | 分辨率（宽度×高度） |
| seconds | string | - | 视频时长（秒） |
| input_reference | File | - | 参考图像，作为视频第一帧。支持 image/jpeg、image/png、image/webp |

**input_reference 调用方式**：使用引导图时需 `multipart/form-data`，否则用 JSON body。

#### 按模型系列参数取值

| 系列 | size 可选值 | 默认 | seconds 可选值 | 默认 |
|------|-------------|------|----------------|------|
| Sora | 720x1280, 1280x720 | 720x1280 | 4, 8, 12 | 4 |
| Veo | 720P, 1080P | 720P | 4, 6, 8 | 8 |
| Wan 480P | 832x480, 480x832, 624x624 | - | 5 | 5 |
| Wan 720P | 1088x832, 832x1088, 960x960, 1280x720, 720x1280 | - | 5 | 5 |
| Wan 1080P | 1248x1632, 1632x1248, 1440x1440, 1080x1920, 1920x1080 | - | 5/10* | 5 |

*wan2.5-t2v-preview 支持 10s，wan2.2-t2v-plus 仅 5s。

#### i2v / t2v 能力区分

| 模型 | 文生视频(t2v) | 图生视频(i2v) | 说明 |
|------|---------------|---------------|------|
| sora-2, sora-2-pro | ✓ | ✓ | 两种都支持 |
| veo-3.0, veo-3.1 | ✓ | ✓ | 两种都支持 |
| wan2.2-t2v-plus | ✓ | ✗ | 仅文生视频 |
| wan2.2-i2v-plus | ✗ | ✓ | 仅图生视频，input_reference 必填 |
| wan2.5-t2v-preview | ✓ | ✗ | 仅文生视频 |
| wan2.5-i2v-preview | ✗ | ✓ | 仅图生视频，input_reference 必填 |

**capabilities 中的表达**：
- `input_reference: 1` 表示支持 1 张参考图
- `input_reference: 0` 表示不支持（仅文生视频）
- 对 wan i2v 系列，可增 `videoMode: "i2v"` 标识仅图生

---

## 5. 参数规范（capabilities 定义标准）

### 5.1 参数定义结构

每个能力参数在 capabilities 中的存储需符合以下结构：

| 结构类型 | JSON 示例 | 说明 |
|----------|-----------|------|
| 枚举 | `{ "values": ["a","b"], "default": "a" }` | 固定可选值 |
| 整数范围 | `{ "type": "integer", "min": 0, "max": 5, "default": 2 }` | 数值区间 |
| 布尔 | `{ "type": "boolean", "default": true }` | 开关 |
| 参考图数量 | `{ "supported": true, "num": 5 }` | 支持的最大数量 |
| 不支持 | `{ "supported": false }` | 该能力不支持 |
| 对象 | `{ "type": "object", "properties": {...} }` | 对象类型（如 sequential_image_generation_options） |

### 5.2 图片参数规范汇总

| 参数 key | 类型 | values / 约束 | 默认 | 适用模型 |
|----------|------|---------------|------|----------|
| size | enum | 见 4.1 各供应商 | - | 所有 |
| quality | enum | low/medium/high 或 standard/hd | - | OpenAI, Imagen |
| aspect_ratio | enum | 16:9, 1:1, 4:3 | 16:9 | Flux |
| n | integer | min:1, max:10 | 1 | 通用 |
| reference_image | integer | 0=不支持，1+=最大数量 | 0 | 支持图生图的模型 |
| watermark | bool | - | true | Qwen, Doubao |
| seed | integer | min/max 见供应商 | - | Flux, Qwen, Doubao |
| output_format | enum | png, jpeg, webp | jpeg | OpenAI |
| moderation | enum | auto, low | auto | OpenAI |
| input_fidelity | enum | high, low | low | OpenAI |
| safety_tolerance | integer | 0-5 | 2 | Flux |
| raw | bool | - | false | Flux |
| response_format | enum | url, base64_json | url | Doubao |
| sequential_image_generation | enum | auto, disabled | auto | Doubao |

**reference_image 数量限制示例**：
- 不支持模型：DB 不存或存 `0`，API 返回 `{ "supported": false }`
- 大部分模型：DB 存 `1`，API 返回 `{ "supported": true, "num": 1 }`
- Flux 2 Flex：DB 存 `5`，API 返回 `{ "supported": true, "num": 5 }`

前端根据 `num` 值动态限制上传数量，`supported: false` 时隐藏上传控件。

### 5.3 视频参数规范汇总

| 参数 key | 类型 | values / 约束 | 默认 | 适用模型 |
|----------|------|---------------|------|----------|
| size | enum | 见 4.2 按系列 | 按系列 | 所有视频 |
| seconds | enum | 4/8/12 或 4/6/8 或 5 或 5/10 | 按系列 | 所有视频 |
| input_reference | integer | 0=不支持，1+=最大数量 | 0 | Sora, Veo, Wan i2v |

**input_reference 数量限制示例**：
- 不支持模型（t2v 系列）：DB 不存或存 `0`，API 返回 `{ "supported": false }`
- Sora/Veo：DB 存 `1`，API 返回 `{ "supported": true, "num": 1 }`
- Wan i2v：DB 存 `1`，API 返回 `{ "supported": true, "num": 1 }`

### 5.4 参数依赖关系（条件约束）

某些参数的可选值依赖于其他参数，常见场景：

| 依赖关系 | 说明 | 示例 |
|----------|------|------|
| **duration ← size** | 不同分辨率支持不同时长 | 高分辨率可能最长仅 5s，低分辨率可达 12s |
| **duration ← aspectRatio** | 横屏/竖屏时长限制不同 | 竖屏(9:16) 可能最长 8s，横屏(16:9) 可达 12s |
| **size ← aspectRatio** | 选定宽高比后，size 限定为对应尺寸 | Flux: aspectRatio=16:9 时 size 只能选 16:9 比例值 |

#### 在 capabilities 中表达依赖

使用 `constraints` 字段描述条件约束：

```json
{
  "seconds": {
    "values": ["4", "8", "12"],
    "default": "4",
    "constraints": {
      "when": { "size": ["720x1280", "480x832"] },
      "then": { "values": ["4", "8"] }
    }
  }
}
```

**解释**：当 size 为 720x1280 或 480x832（高分辨率/竖屏）时，seconds 只能选 4 或 8，不能选 12。

#### 多条件约束示例

```json
{
  "seconds": {
    "values": ["3", "5", "10", "12"],
    "default": "5",
    "constraints": [
      {
        "when": { "size": ["1920x1080", "1080x1920"] },
        "then": { "values": ["3", "5"] }
      },
      {
        "when": { "size": ["1280x720", "720x1280"] },
        "then": { "values": ["5", "10"] }
      },
      {
        "when": { "size": ["720P"] },
        "then": { "values": ["5", "10", "12"] }
      }
    ]
  }
}
```

**前端处理逻辑**：
1. 用户选择 size = "1920x1080"
2. 前端读取 `constraints[0]`，发现匹配 when 条件
3. 动态更新 seconds 可选值为 ["3", "5"]
4. 如果当前选中的 seconds 不在新范围内，重置为 default

### 5.5 参数校验规则

- **values 枚举**：用户传入值必须在 `values` 中
- **integer**：必须在 `min`～`max` 范围内
- **必填能力**：如 wan i2v 的 input_reference，调用时缺则拒绝
- **条件约束**：后端校验时需检查 constraints，前端需动态更新选项

---

## 6. 模型 capabilities 配置示例

### 6.1 图片模型

#### DALL-E 3

```json
{
  "size": {
    "values": ["1024x1024", "1792x1024", "1024x1792"],
    "default": "1024x1024"
  },
  "quality": {
    "values": ["standard", "hd"],
    "default": "standard"
  },
  "input_fidelity": {
    "values": ["high", "low"],
    "default": "low"
  },
  "moderation": {
    "values": ["auto", "low"],
    "default": "auto"
  },
  "output_format": {
    "values": ["png", "jpeg", "webp"],
    "default": "jpeg"
  },
  "n": {
    "type": "integer",
    "min": 1,
    "max": 10,
    "default": 1
  },
  "reference_image": 1
}
```

#### GPT Image 1.5

```json
{
  "size": {
    "values": ["1024x1024", "1792x1024", "1024x1792"],
    "default": "1024x1024"
  },
  "quality": {
    "values": ["standard", "hd"],
    "default": "standard"
  },
  "input_fidelity": {
    "values": ["high", "low"],
    "default": "low"
  },
  "output_format": {
    "values": ["png", "jpeg", "webp"],
    "default": "jpeg"
  },
  "n": {
    "type": "integer",
    "min": 1,
    "max": 10,
    "default": 1
  },
  "reference_image": 1
}
```

#### Flux 2 Flex（支持多参考图）

```json
{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "aspect_ratio": {
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "safety_tolerance": {
    "type": "integer",
    "min": 0,
    "max": 5,
    "default": 2
  },
  "seed": {
    "type": "integer"
  },
  "raw": {
    "type": "boolean",
    "default": false
  },
  "reference_image": 5
}
```

**说明**：Flux 2 Flex 支持最多 5 张参考图进行风格融合。

#### Flux 2 Pro

```json
{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "aspect_ratio": {
    "values": ["16:9", "1:1", "4:3"],
    "default": "16:9"
  },
  "safety_tolerance": {
    "type": "integer",
    "min": 0,
    "max": 5,
    "default": 2
  },
  "seed": {
    "type": "integer"
  },
  "raw": {
    "type": "boolean",
    "default": false
  },
  "reference_image": 1
}
```

#### Qwen Image

```json
{
  "size": {
    "values": ["512*1024", "768*512", "768*1024", "1024*576", "576*1024", "1024*1024"],
    "default": "1024*1024"
  },
  "watermark": {
    "type": "boolean",
    "default": true
  },
  "seed": {
    "type": "integer",
    "min": 0,
    "max": 2147483647
  },
  "reference_image": 1
}
```

#### Doubao SeedDream 4.5

```json
{
  "size": {
    "values": ["2K", "4K", "auto"],
    "default": "auto"
  },
  "watermark": {
    "type": "boolean",
    "default": true
  },
  "seed": {
    "type": "integer",
    "min": -1,
    "max": 2147483647,
    "default": -1
  },
  "response_format": {
    "values": ["url", "base64_json"],
    "default": "url"
  },
  "sequential_image_generation": {
    "values": ["auto", "disabled"],
    "default": "auto"
  },
  "sequential_image_generation_options": {
    "type": "object",
    "properties": {
      "max_images": {
        "type": "integer",
        "min": 1,
        "max": 15,
        "default": 15
      }
    }
  },
  "reference_image": 1
}
```

#### Imagen 4.0 Fast

```json
{
  "size": {
    "values": ["1K", "2K", "4K", "auto"],
    "default": "auto"
  },
  "quality": {
    "values": ["low", "medium", "high"],
    "default": "medium"
  },
  "reference_image": 1
}
```

### 6.2 视频模型

#### Sora 2（支持条件约束）

```json
{
  "size": {
    "values": ["720x1280", "1280x720"],
    "default": "720x1280"
  },
  "seconds": {
    "values": ["4", "8", "12"],
    "default": "4",
    "constraints": [
      {
        "when": { "size": ["720x1280"] },
        "then": { "values": ["4", "8"] },
        "reason": "竖屏高分辨率最长 8 秒"
      },
      {
        "when": { "size": ["1280x720"] },
        "then": { "values": ["4", "8", "12"] },
        "reason": "横屏支持最长 12 秒"
      }
    ]
  },
  "input_reference": 1
}
```

**说明**：这是假设示例，实际约束需根据 AIHubMix 真实限制调整。如果 Sora 没有此类限制，则无需 constraints。

#### Veo 3.1

```json
{
  "size": { "values": ["720P", "1080P"], "default": "720P" },
  "seconds": { "values": ["4", "6", "8"], "default": "8" },
  "input_reference": 1
}
```

#### Wan 2.5 T2V（文生视频）

```json
{
  "size": {
    "values": [
      "832x480", "480x832", "624x624",
      "1088x832", "832x1088", "960x960", "1280x720", "720x1280",
      "1248x1632", "1632x1248", "1440x1440", "1080x1920", "1920x1080"
    ],
    "default": "1280x720"
  },
  "seconds": { "values": ["5", "10"], "default": "5" },
  "input_reference": 0
}
```

#### Wan 2.2/2.5 i2v（图生视频，input_reference 必填）

```json
{
  "size": {
    "values": [
      "832x480", "480x832", "624x624",
      "1088x832", "832x1088", "960x960", "1280x720", "720x1280",
      "1248x1632", "1632x1248", "1440x1440", "1080x1920", "1920x1080"
    ],
    "default": "1280x720"
  },
  "seconds": { "values": ["5"], "default": "5" },
  "input_reference": 1,
  "video_mode": "i2v"
}
```

wan2.5-i2v-preview 的 seconds 支持 5、10。

---

## 7. 实施计划

### 7.1 阶段一：数据库与模型

1. 新增 `capabilities` 列（Alembic 迁移）
2. 更新 `AIModel` 模型定义
3. 为现有图片/视频模型写入 capabilities（数据迁移脚本）

### 7.2 阶段二：API 层

1. 定义「全量能力枚举」常量
2. 实现 DB capabilities → 前端 schema 转换逻辑（补全 supported: false、snake_case→camelCase）
3. 模型列表/详情接口返回统一格式 capabilities

### 7.3 阶段三：Provider 改造

1. AIHubMixProvider 从 `model.capabilities` 读取 size、seconds 等默认值
2. 支持请求参数与 capabilities 校验
3. 移除基于 model_name 的硬编码分支

### 7.4 阶段四：前端

1. 定义 TypeScript `AiModelCapabilities` 接口（含全量字段）
2. 根据 `supported` 与 `values` 动态渲染参数控件
3. 请求生成时携带用户选择的参数

---

## 8. 附录

### 8.1 capabilities 字段类型约定

| type | 结构示例 | 说明 |
|------|----------|------|
| enum | `{ "values": ["a","b"], "default": "a" }` | 枚举类型，必须提供可选值列表 |
| integer | `{ "type": "integer", "min": 0, "max": 5, "default": 2 }` | 整数范围类型 |
| boolean | `{ "type": "boolean", "default": true }` | 布尔类型 |
| object | `{ "type": "object", "properties": {...} }` | 对象类型（如 sequential_image_generation_options） |

**TypeScript 类型定义**：
```typescript
type CapabilityValue = 
  | { values: string[], default: string }                    // 枚举
  | { type: "integer", min?: number, max?: number, default?: number }  // 整数范围
  | { type: "boolean", default: boolean }                    // 布尔
  | { type: "object", properties: Record<string, any> }      // 对象
  | { supported: false }                                     // 不支持
  | { supported: true, num: number }                         // 参考图数量
```

### 8.2 AIHubMix 模型与参数映射

| 模型名 | model_type | size 示例 | seconds |
|--------|------------|----------|---------|
| dall-e-3 | 2 | 1024x1024, 1792x1024, 1024x1792 | - |
| gpt-image-1.5 | 2 | 1024x1024 等 | - |
| flux-2-pro | 2 | 1K, 2K, 4K, auto | - |
| qwen-image | 2 | 512*1024, 1024*1024 等 | - |
| doubao-seedream-4-5 | 2 | 2K, 4K, auto | - |
| imagen-4.0-fast-generate-001 | 2 | 1K, 2K, 4K, auto | - |
| sora-2 | 3 | 720x1280, 1280x720 | 4, 8, 12 |
| sora-2-pro | 3 | 720x1280, 1280x720 | 4, 8, 12 |
| veo-3.1-generate-preview | 3 | 720P, 1080P | 4, 6, 8 |
| veo-3.0-generate-preview | 3 | 720P, 1080P | 4, 6, 8 |
| wan2.2-t2v-plus | 3 | 见 4.2 节 | 5 |
| wan2.5-t2v-preview | 3 | 见 4.2 节 | 5, 10 |
| wan2.2-i2v-plus | 3 | 见 4.2 节 | 5 |
| wan2.5-i2v-preview | 3 | 见 4.2 节 | 5 |

### 8.3 向后兼容

- capabilities 为空 `{}` 的模型：API 返回全量字段且均为 `supported: false`，Provider 回退原有硬编码逻辑
- 旧接口：不依赖 capabilities 的接口保持不变

---

## 9. 决策记录

| 决策 | 理由 |
|------|------|
| 新增 capabilities 列 | 与 config 职责分离，便于按能力查询、索引 |
| 每模型独立 capabilities | 不同模型参数本就不同，按需配置 |
| 前端统一字段契约 | 前端用同一 schema，按 supported 控制展示 |
| API 补全 supported: false | 前端无需判断 key 存在性，逻辑更简单 |
| config 保留 | 继续承载 timeout 等运行时配置，不混入能力描述 |
| **支持参数依赖关系** | duration ← size/aspectRatio 等条件约束通过 `constraints` 表达，前端动态更新选项 |

---

## 10. 调用 AIHubMix API 的参数转换

### 10.1 图片生成 API 调用格式

**AIHubMix 图片生成端点**：
```
POST https://aihubmix.com/v1/models/<provider>/<model_name>/predictions
```

**请求体格式**：
```json
{
  "input": {
    "prompt": "...",
    "size": "...",
    "quality": "...",
    ...
  }
}
```

#### 参数映射表（我们 → AIHubMix）

| 我们的参数名 | AIHubMix API 参数名 | 说明 |
|-------------|---------------------|------|
| prompt | prompt | 提示词（一致） |
| width + height | size | 拼接为 `"{width}x{height}"` |
| quality | quality | 质量档位（一致） |
| reference_images | image | 单张：直接传 URL；多张：数组 `["url1", "url2"]` |
| num_images | n | 生成数量 |
| watermark | watermark | 水印开关 |
| seed | seed | 随机种子 |
| aspect_ratio | aspect_ratio | Flux 专用 |
| safety_tolerance | safety_tolerance | Flux 专用 |
| raw_mode | raw | Flux 专用 |
| input_fidelity | input_fidelity | OpenAI 专用 |
| moderation_level | moderation | OpenAI 专用 |
| output_format | output_format | OpenAI 专用 |
| response_format | response_format | Doubao 专用 |
| sequential_generation | sequential_image_generation | Doubao 专用 |

#### 调用示例（OpenAI）

```python
# 我们的接口参数
{
  "prompt": "A cat in the garden",
  "width": 1024,
  "height": 1024,
  "quality": "high",
  "reference_images": ["https://example.com/cat.jpg"]
}

# 转换为 AIHubMix API 调用
import httpx

response = await httpx.post(
    "https://aihubmix.com/v1/models/openai/gpt-image-1.5/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "A cat in the garden",
            "size": "1024x1024",          # width x height
            "quality": "high",
            "image": "https://example.com/cat.jpg"  # 取第一张
        }
    }
)
```

#### 调用示例（Flux 异步）

```python
# 步骤 1：发起生成请求
response = await httpx.post(
    "https://aihubmix.com/v1/models/bfl/flux-2-pro/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "A cat in the garden",
            "aspect_ratio": "16:9",
            "safety_tolerance": 2
        }
    }
)

task_id = response.json()["output"][0]["taskId"]

# 步骤 2：轮询获取结果
result = await httpx.get(
    f"https://api.aihubmix.com/v1/tasks/{task_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)
```

#### 调用示例（Doubao 多参考图）

```python
# 我们的接口参数
{
  "prompt": "将图1的服装换为图2的服装",
  "reference_images": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "size": "2K"
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/models/doubao/doubao-seedream-4-5/predictions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "input": {
            "prompt": "将图1的服装换为图2的服装",
            "image": [
                "https://example.com/image1.jpg",
                "https://example.com/image2.jpg"
            ],  # 多张图片用数组
            "size": "2K",
            "sequential_image_generation": "disabled",
            "watermark": false
        }
    }
)
```

### 10.2 视频生成 API 调用格式

**AIHubMix 视频生成端点**：
```
POST https://aihubmix.com/v1/videos
```

**请求体格式（JSON）**：
```json
{
  "model": "sora-2",
  "prompt": "...",
  "size": "720x1280",
  "seconds": "4"
}
```

**请求体格式（带引导图 - multipart/form-data）**：
```
--form 'model="sora-2"'
--form 'prompt="..."'
--form 'size="1280x720"'
--form 'seconds="4"'
--form 'input_reference=@"/path/to/image.jpg"'
```

#### 参数映射表（我们 → AIHubMix）

| 我们的参数名 | AIHubMix API 参数名 | 说明 |
|-------------|---------------------|------|
| model_name | model | 模型名称（一致） |
| prompt | prompt | 提示词（一致） |
| width + height | size | Sora/Wan: `"{width}x{height}"`；Veo: `"720P"` |
| duration | seconds | 视频时长，转为字符串 `"4"` |
| reference_image | input_reference | 图生视频：文件或 URL |

#### 调用示例（Sora 文生视频 - JSON）

```python
# 我们的接口参数
{
  "model": "sora-2",
  "prompt": "A cat playing in the garden",
  "width": 720,
  "height": 1280,
  "duration": 4
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "sora-2",
        "prompt": "A cat playing in the garden",
        "size": "720x1280",     # width x height
        "seconds": "4"          # 整数转字符串
    }
)
```

#### 调用示例（Sora 图生视频 - multipart）

```python
# 我们的接口参数
{
  "model": "sora-2",
  "prompt": "The kitten is taking a nap",
  "width": 1280,
  "height": 720,
  "duration": 4,
  "reference_image": "https://example.com/cat.jpg"
}

# 下载参考图（如果是 URL）
async with httpx.AsyncClient() as client:
    image_response = await client.get("https://example.com/cat.jpg")
    image_data = image_response.content

# 转换为 AIHubMix API 调用（multipart）
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={"Authorization": f"Bearer {api_key}"},
    data={
        "model": "sora-2",
        "prompt": "The kitten is taking a nap",
        "size": "1280x720",
        "seconds": "4"
    },
    files={
        "input_reference": ("cat.jpg", image_data, "image/jpeg")
    }
)
```

#### 调用示例（Veo）

```python
# Veo 使用分辨率档位（720P/1080P），不是像素值
{
  "model": "veo-3.1-generate-preview",
  "prompt": "A beautiful landscape",
  "size": "1080P",      # 注意：是档位，不是像素
  "duration": 8
}

# 转换为 AIHubMix API 调用
response = await httpx.post(
    "https://aihubmix.com/v1/videos",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "veo-3.1-generate-preview",
        "prompt": "A beautiful landscape",
        "size": "1080P",        # 直接传档位字符串
        "seconds": "8"
    }
)
```

### 10.3 Provider 参数转换核心逻辑

在 `AIHubMixProvider` 中需实现：

```python
def _convert_size_for_api(
    self,
    width: int,
    height: int,
    model_capabilities: dict
) -> str:
    """根据模型 capabilities 转换 size 参数格式"""
    
    size_cap = model_capabilities.get('size', {})
    values = size_cap.get('values', [])
    
    # 1. 档位格式（Veo: 720P/1080P）
    if any('P' in v for v in values):
        max_dimension = max(width, height)
        if max_dimension <= 720:
            return '720P'
        return '1080P'
    
    # 2. K 格式（Flux/Imagen: 1K/2K/4K）
    if any('K' in v for v in values):
        max_dimension = max(width, height)
        if max_dimension <= 1024:
            return '1K'
        elif max_dimension <= 2048:
            return '2K'
        elif max_dimension <= 4096:
            return '4K'
        return 'auto'
    
    # 3. * 分隔（Qwen: 512*1024）
    if any('*' in v for v in values):
        return f"{width}*{height}"
    
    # 4. 标准 x 分隔（OpenAI/Sora/Wan: 1024x1024）
    return f"{width}x{height}"
```

### 10.4 参数转换检查清单

调用 AIHubMix API 前必须检查：

| 检查项 | 说明 |
|--------|------|
| ✅ size 格式 | 根据 capabilities.size.values 判断用哪种分隔符 |
| ✅ seconds 类型 | 整数转字符串 `str(duration)` |
| ✅ reference_image 数量 | 不超过 `capabilities.reference_image` 限制 |
| ✅ multipart vs JSON | 图片有 input_reference 时用 multipart，视频同理 |
| ✅ 嵌套结构 | 图片用 `{ "input": {...} }`，视频直接 `{...}` |
| ✅ 模型专属参数 | 仅当 capabilities 中存在时才传递 |

---