Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.globalaiopc.com/v1/chat/completions \
  --header 'Authorization: Bearer {{YOUR_API_KEY}}' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gemini-3.1-pro-preview",
    "messages": [
      {
        "role": "system",
        "content": "You are an emotional support assistant"
      },
      {
        "role": "user",
        "content": "hello world"
      }
    ],
    "temperature": 0.7,
    "max_completion_tokens": 2000
  }'
{
  "code": "200",
  "msg": "成功",
  "data": {
    "content": "Hello! I'm Gemini, nice to serve you. Is there anything I can help you with?",
    "usage": {
      "prompt_tokens": 25,
      "completion_tokens": 150,
      "total_tokens": 175
    },
    "model": "gemini-3-pro",
    "id": "chatcmpl-abc123",
    "success": true
  }
}

Authentication

All requests require a Bearer token in the request header:
Authorization: Bearer {{YOUR_API_KEY}}

Request Parameters

model
string
required
The ID of the model to use. Supported models:
  • gemini-3-flash-preview
  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-3.1-pro-preview
  • gemini-3-pro-preview
  • gemini-2.5-pro
messages
array
required
A list of messages comprising the conversation so far. Each message object contains role and content fields
temperature
number
Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We recommend altering this or top_p but not both
top_p
number
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We recommend altering this or temperature but not both
stream
boolean
Defaults to false. If set to true, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message
max_completion_tokens
integer
Defaults to inf. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length
stop
string/array
Defaults to null. Up to 4 sequences where the API will stop generating further tokens
presence_penalty
number
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics
frequency_penalty
number
Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim

Messages Array Structure

Each message object contains the following fields:
role
string
required
The role of the message, options: system, user, assistant
content
string/array
required
The content of the message. Can be a string (plain text), or array (multimodal content, supporting text, images, videos)

Multimodal Content Structure

When content is an array, it supports the following types of objects: Text object:
{
  "type": "text",
  "text": "Message text content"
}
Image/Video object:
{
  "type": "image_url",
  "image_url": {
    "url": "Image or video URL address"
  }
}

Response Parameters

code
string
Response status code, “200” when successful
msg
string
Response message, “成功” when successful
data.content
string
AI-generated reply content
data.usage.prompt_tokens
integer
Number of tokens used in the prompt
data.usage.completion_tokens
integer
Number of tokens used in the completion
data.usage.total_tokens
integer
Total tokens (prompt_tokens + completion_tokens)
data.model
string
The model name used
data.id
string
A unique identifier for the chat completion
data.success
boolean
Whether the request was successful

Multimodal Support

Gemini models support multimodal input, capable of processing text, images, and videos in the same request:

Pure Text Conversation

Use string directly for content

Image Analysis

Use array for content, including text and image_url objects

Video Analysis

Use array for content, videos also use image_url type

Mixed Input

Can include text, images, and videos in the same content array
Important Notes:
  • Image and video URLs must be publicly accessible
  • Video files also use image_url type, not video_url
  • Different models may have varying levels of multimodal support, recommend using gemini-3.1-pro-preview or higher

Best Practices

Best Practices:
  1. Requests must use application/json format
  2. Recommend adjusting only one of temperature or top_p parameters
  3. When using JSON mode, explicitly instruct the model to generate JSON in the message
  4. Streaming responses return in Server-Sent Events (SSE) format, ending with data: [DONE]
  5. When finish_reason is length, it indicates generation exceeded max_completion_tokens or conversation exceeded maximum context length
curl --request POST \
  --url https://api.globalaiopc.com/v1/chat/completions \
  --header 'Authorization: Bearer {{YOUR_API_KEY}}' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gemini-3.1-pro-preview",
    "messages": [
      {
        "role": "system",
        "content": "You are an emotional support assistant"
      },
      {
        "role": "user",
        "content": "hello world"
      }
    ],
    "temperature": 0.7,
    "max_completion_tokens": 2000
  }'
{
  "code": "200",
  "msg": "成功",
  "data": {
    "content": "Hello! I'm Gemini, nice to serve you. Is there anything I can help you with?",
    "usage": {
      "prompt_tokens": 25,
      "completion_tokens": 150,
      "total_tokens": 175
    },
    "model": "gemini-3-pro",
    "id": "chatcmpl-abc123",
    "success": true
  }
}