流式输出
流式输出(Streaming)允许 AI 模型的回复逐字返回,而不是等待完整响应后再返回。这极大地改善了用户体验,特别是对于长文本生成场景。
启用流式输出
只需在请求中设置 "stream": true:
bash
curl https://api.beesai.cn/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-你的API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "写一首诗"}],
"stream": true
}'SSE 响应格式
流式输出使用 Server-Sent Events (SSE) 格式,每个事件都是一个 JSON 对象:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"你"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"好"},"finish_reason":null}]}
data: [DONE]Delta 字段说明
| 阶段 | delta 内容 | 说明 |
|---|---|---|
| 开始 | {"role": "assistant"} | 标识角色 |
| 内容 | {"content": "文字"} | 逐字返回的内容 |
| 结束 | {"content": null} | 流结束 |
Python 流式示例
python
from openai import OpenAI
client = OpenAI(
api_key="sk-你的API_KEY",
base_url="https://api.beesai.cn/v1"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "写一首关于 AI 的诗"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print()JavaScript 流式示例
javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-你的API_KEY',
baseURL: 'https://api.beesai.cn/v1'
});
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: '写一首关于 AI 的诗' }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}原生 Fetch 流式示例
javascript
const response = await fetch('https://api.beesai.cn/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sk-你的API_KEY'
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: '你好' }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content || '';
if (content) process.stdout.write(content);
} catch (e) {}
}
}最佳实践
1. 设置合理的超时
流式请求可能持续较长时间,建议设置较长的超时:
python
client = OpenAI(
api_key="sk-你的API_KEY",
base_url="https://api.beesai.cn/v1",
timeout=120.0 # 120 秒超时
)2. 处理断流
网络不稳定时流可能中断,建议实现重试逻辑:
python
import time
def stream_with_retry(prompt, max_retries=3):
for attempt in range(max_retries):
try:
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
full_text = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
full_text += content
print(content, end="", flush=True)
return full_text
except Exception as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise e3. 流式 vs 非流式选择
| 场景 | 推荐 | 原因 |
|---|---|---|
| 聊天界面 | 流式 | 用户体验更好 |
| 批量处理 | 非流式 | 更简单,错误处理更方便 |
| API 后端 | 非流式 | 减少连接保持时间 |
| 长文本生成 | 流式 | 避免长时间等待 |
对话补全 API → | 图像生成 → | 文本向量化 →
