Skip to content

流式输出

流式输出(Streaming)允许 AI 模型的回复逐字返回,而不是等待完整响应后再返回。这极大地改善了用户体验,特别是对于长文本生成场景。

启用流式输出

只需在请求中设置 "stream": true

bash
curl https://api.beesai.cn/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-你的API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "写一首诗"}],
    "stream": true
  }'

SSE 响应格式

流式输出使用 Server-Sent Events (SSE) 格式,每个事件都是一个 JSON 对象:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"你"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"好"},"finish_reason":null}]}

data: [DONE]

Delta 字段说明

阶段delta 内容说明
开始{"role": "assistant"}标识角色
内容{"content": "文字"}逐字返回的内容
结束{"content": null}流结束

Python 流式示例

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-你的API_KEY",
    base_url="https://api.beesai.cn/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "写一首关于 AI 的诗"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

JavaScript 流式示例

javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-你的API_KEY',
  baseURL: 'https://api.beesai.cn/v1'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: '写一首关于 AI 的诗' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

原生 Fetch 流式示例

javascript
const response = await fetch('https://api.beesai.cn/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk-你的API_KEY'
  },
  body: JSON.stringify({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: '你好' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;
    try {
      const parsed = JSON.parse(data);
      const content = parsed.choices[0]?.delta?.content || '';
      if (content) process.stdout.write(content);
    } catch (e) {}
  }
}

最佳实践

1. 设置合理的超时

流式请求可能持续较长时间,建议设置较长的超时:

python
client = OpenAI(
    api_key="sk-你的API_KEY",
    base_url="https://api.beesai.cn/v1",
    timeout=120.0  # 120 秒超时
)

2. 处理断流

网络不稳定时流可能中断,建议实现重试逻辑:

python
import time

def stream_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            stream = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}],
                stream=True
            )
            full_text = ""
            for chunk in stream:
                content = chunk.choices[0].delta.content
                if content:
                    full_text += content
                    print(content, end="", flush=True)
            return full_text
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
            else:
                raise e

3. 流式 vs 非流式选择

场景推荐原因
聊天界面流式用户体验更好
批量处理非流式更简单,错误处理更方便
API 后端非流式减少连接保持时间
长文本生成流式避免长时间等待

对话补全 API → | 图像生成 → | 文本向量化 →

© 2026 BeesAI. All rights reserved.