Gemini Chat API

Authentication

All requests require a Bearer token in the request header:

Authorization: Bearer {{YOUR_API_KEY}}

Request Parameters

model

string

required

The ID of the model to use. Supported models:

gemini-3-flash-preview
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-3.1-pro-preview
gemini-3-pro-preview
gemini-2.5-pro

messages

array

required

A list of messages comprising the conversation so far. Each message object contains role and content fields

temperature

number

Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We recommend altering this or top_p but not both

top_p

number

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We recommend altering this or temperature but not both

stream

boolean

Defaults to false. If set to true, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message

max_completion_tokens

integer

Defaults to inf. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length

stop

string/array

Defaults to null. Up to 4 sequences where the API will stop generating further tokens

presence_penalty

number

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics

frequency_penalty

number

Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim

Messages Array Structure

Each message object contains the following fields:

role

string

required

The role of the message, options: system, user, assistant

content

string/array

required

The content of the message. Can be a string (plain text), or array (multimodal content, supporting text, images, videos)

Multimodal Content Structure

When content is an array, it supports the following types of objects: Text object:

{
  "type": "text",
  "text": "Message text content"
}

Image/Video object:

{
  "type": "image_url",
  "image_url": {
    "url": "Image or video URL address"
  }
}

Response Parameters

code

string

Response status code, “200” when successful

msg

string

Response message, “成功” when successful

data.content

string

AI-generated reply content

data.usage.prompt_tokens

integer

Number of tokens used in the prompt

data.usage.completion_tokens

integer

Number of tokens used in the completion

data.usage.total_tokens

integer

Total tokens (prompt_tokens + completion_tokens)

data.model

string

The model name used

data.id

string

A unique identifier for the chat completion

data.success

boolean

Whether the request was successful

Multimodal Support

Gemini models support multimodal input, capable of processing text, images, and videos in the same request:

Pure Text Conversation

Use string directly for content

Image Analysis

Use array for content, including text and image_url objects

Video Analysis

Use array for content, videos also use image_url type

Mixed Input

Can include text, images, and videos in the same content array

Important Notes:

Image and video URLs must be publicly accessible
Video files also use image_url type, not video_url
Different models may have varying levels of multimodal support, recommend using gemini-3.1-pro-preview or higher

Best Practices

Best Practices:

Requests must use application/json format
Recommend adjusting only one of temperature or top_p parameters
When using JSON mode, explicitly instruct the model to generate JSON in the message
Streaming responses return in Server-Sent Events (SSE) format, ending with data: [DONE]
When finish_reason is length, it indicates generation exceeded max_completion_tokens or conversation exceeded maximum context length

curl --request POST \
  --url https://zcbservice.aizfw.cn/kyyReactApiServer/v1/chat/completions \
  --header 'Authorization: Bearer {{YOUR_API_KEY}}' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gemini-3.1-pro-preview",
    "messages": [
      {
        "role": "system",
        "content": "You are an emotional support assistant"
      },
      {
        "role": "user",
        "content": "hello world"
      }
    ],
    "temperature": 0.7,
    "max_completion_tokens": 2000
  }'

{
  "code": "200",
  "msg": "成功",
  "data": {
    "content": "Hello! I'm Gemini, nice to serve you. Is there anything I can help you with?",
    "usage": {
      "prompt_tokens": 25,
      "completion_tokens": 150,
      "total_tokens": 175
    },
    "model": "gemini-3-pro",
    "id": "chatcmpl-abc123",
    "success": true
  }
}

API Introduction

Changelog

Text API

Image API

Video API

Authentication

Request Parameters

Messages Array Structure

Multimodal Content Structure

Response Parameters

Multimodal Support

Pure Text Conversation

Image Analysis

Video Analysis

Mixed Input

Best Practices

API Introduction

Changelog

Text API

Image API

Video API

Documentation Index

​Authentication

​Request Parameters

​Messages Array Structure

​Multimodal Content Structure

​Response Parameters

​Multimodal Support

Pure Text Conversation

Image Analysis

Video Analysis

Mixed Input

​Best Practices

Authentication

Request Parameters

Messages Array Structure

Multimodal Content Structure

Response Parameters

Multimodal Support

Best Practices