Skip to content

文本向量化 API

Embeddings API 将文本转换为高维向量表示,可用于语义搜索、文本分类、聚类等场景。

Endpoint

POST https://api.beesai.cn/v1/embeddings

请求参数

参数类型必填说明
modelstring嵌入模型 ID
inputstring/array要向量化的文本
encoding_formatstring编码格式:floatbase64
dimensionsinteger输出维度(仅部分模型支持)

支持的嵌入模型

模型 ID维度最大输入长度说明
text-embedding-3-small15368191 tokens性价比最高
text-embedding-3-large30728191 tokens精度最高
text-embedding-ada-00215368191 tokens兼容旧版

请求示例

cURL

bash
curl https://api.beesai.cn/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-你的API_KEY" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "BeesAI 是一个统一的大模型 API 网关"
  }'

Python

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-你的API_KEY",
    base_url="https://api.beesai.cn/v1"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="BeesAI 是一个统一的大模型 API 网关"
)

embedding = response.data[0].embedding
print(f"向量维度: {len(embedding)}")
print(f"前5个值: {embedding[:5]}")

批量向量化

python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "BeesAI 是一个统一的大模型 API 网关",
        "支持 GPT-4o、Claude、Gemini 等模型",
        "国内直连,无需 VPN"
    ]
)

for i, item in enumerate(response.data):
    print(f"文本 {i+1}: 向量维度 {len(item.embedding)}")

响应格式

json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

应用场景

1. 语义搜索

python
import numpy as np

def semantic_search(query, documents, top_k=3):
    query_embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    ).data[0].embedding

    doc_embeddings = client.embeddings.create(
        model="text-embedding-3-small",
        input=documents
    ).data

    similarities = []
    for doc_emb in doc_embeddings:
        sim = np.dot(query_embedding, doc_emb.embedding)
        similarities.append(sim)

    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [(documents[i], similarities[i]) for i in top_indices]

2. 文本分类

利用向量相似度进行零样本分类:

python
def zero_shot_classify(text, categories):
    text_emb = client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding

    cat_embs = client.embeddings.create(
        model="text-embedding-3-small", input=categories
    ).data

    similarities = [
        np.dot(text_emb, cat_emb.embedding)
        for cat_emb in cat_embs
    ]

    best_idx = np.argmax(similarities)
    return categories[best_idx], similarities[best_idx]

3. 文本聚类

将相似文本分组:

python
from sklearn.cluster import KMeans

def cluster_texts(texts, n_clusters=3):
    embeddings = client.embeddings.create(
        model="text-embedding-3-small", input=texts
    ).data

    vectors = [item.embedding for item in embeddings]
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(vectors)

    clusters = {i: [] for i in range(n_clusters)}
    for text, label in zip(texts, labels):
        clusters[label].append(text)

    return clusters

对话补全 → | 图像生成 → | 错误码 →

© 2026 BeesAI. All rights reserved.