Gemini

AI-Mocks Gemini is a specialized mock server implementation for mocking the Google Vertex AI Gemini API, built using Mokksy.

MockGemini is tested against the Spring AI framework with the Vertex AI Gemini integration.

Currently, it supports basic content generation requests and streaming responses.

Quick Start

Include the library in your test dependencies (Maven or Gradle).

build.gradle.kts
1testImplementation("dev.mokksy.aimocks:ai-mocks-gemini-jvm:$latestVersion")

pom.xml
1<dependency>
2  <groupId>dev.mokksy.aimocks</groupId>
3  <artifactId>ai-mocks-gemini-jvm</artifactId>
4  <version>[LATEST_VERSION]</version>
5  <scope>test</scope>
6</dependency>

Content Generation API

Set up a mock server and define mock responses:

1val gemini = MockGemini(verbose = true)

Let's simulate Gemini content generation API:

 1// Define mock response
 2gemini.generateContent {
temperature = 0.7
model = "gemini-2.0-flash"
project = "your-project-id"
location = "us-central1"
apiVersion = "v1beta1"
path = null // custom request path, overrides "apiVersion"
seed = 42
maxTokens = 100
topK = 40
topP = 0.95
maxOutputTokens(200)
systemMessageContains("helpful pirate")
userMessageContains("say 'Hello!'")
requestBodyContains("helpful")
requestBodyContainsIgnoringCase("PIRATE")
requestBodyDoesNotContains("unwanted text")
requestBodyDoesNotContainsIgnoringCase("unwanted case insensitive text")
requestMatchesPredicate { it.generationConfig?.topP == 0.95 }
21} responds {
content = "Ahoy there, matey! Hello!"
finishReason = "stop"
role = "model"
delay = 42.milliseconds // delay before answer
26}

Configuration Options

The following tables list all available configuration options for mocking Gemini API calls.

Request Configuration Options

Option	Description
`temperature`	Controls randomness of the output. Lower values make output more deterministic.
`model`	The Gemini model to use.
`maxTokens`	Maximum number of tokens to generate.
`topK`	Limits token selection to the K most likely next tokens.
`topP`	Limits token selection to tokens with cumulative probability of P.
`project`	Google Cloud project ID.
`location`	Google Cloud location.
`apiVersion`	API version to use.
`path`	Custom request path.
`seed`	Seed for deterministic generation.
`maxOutputTokens`	Maximum number of tokens to generate.
`systemMessageContains`	Matches requests with system messages containing the specified text.
`userMessageContains`	Matches requests with user messages containing the specified text.
`requestBodyContains`	Matches requests with bodies containing the specified text.
`requestBodyContainsIgnoringCase`	Matches requests with bodies containing the specified text (case-insensitive).
`requestBodyDoesNotContains`	Matches requests with bodies not containing the specified text.
`requestBodyDoesNotContainsIgnoringCase`	Matches requests with bodies not containing the specified text (case-insensitive).
`requestMatchesPredicate`	Matches requests satisfying a custom predicate.

Response Configuration Options

Option	Description	Default Value
`content`	The content to include in the response.	`"This is a mock response from Gemini API."`
`finishReason`	The reason why the model stopped generating tokens.	`"STOP"`
`role`	The role of the content.	`"model"`
`delay`	The delay before sending the response.	`Duration.ZERO`
`delayMillis`	The delay before sending the response in milliseconds.	N/A

Streaming Content Generation

Here's an example of setting up a streaming content generation mock:

 1// Define streaming mock response
 2gemini.generateContentStream {
temperature = 0.7
model = "gemini-2.0-flash"
project = "your-project-id"
location = "us-central1"
apiVersion = "v1beta1"
seed = 42
maxTokens = 100
topK = 40
topP = 0.95
maxOutputTokens(200)
systemMessageContains("helpful pirate")
userMessageContains("say 'Hello!'")
15} respondsStream {
responseFlow = flow {
  emit("Ahoy")
  emit(" there,")
  delay(100.milliseconds)
  emit(" matey!")
  emit(" Hello!")
}
// Alternatively, you can use responseChunks = listOf("Ahoy", " there,", " matey!", " Hello!")
// Or chunks("Ahoy", " there,", " matey!", " Hello!")
finishReason = "stop"
delay = 60.milliseconds // delay before first chunk
delayBetweenChunks = 15.milliseconds // delay between chunks
28}

Streaming Response Configuration Options

Option	Description	Default Value
`responseFlow`	A flow of content chunks to include in the streaming response.	`null`
`responseChunks`	A list of content chunks to include in the streaming response.	`null`
`chunks`	Sets the chunks of content for the streaming response.	N/A
`delayBetweenChunks`	The delay between sending chunks.	`Duration.ZERO`
`finishReason`	The reason why the model stopped generating tokens.	`"STOP"`

Integration with Spring-AI

First, we need a function to create VertexAI client, configured to use the arbitrary server endpoint and credentials.

 1internal fun createTestVertexAI(
  endpoint: String,
  projectId: String,
  location: String,
  timeout: Duration,
 6): VertexAI {
  try {
      val channelProvider =
          LlmUtilityServiceStubSettings
              .defaultHttpJsonTransportProviderBuilder()
              .setEndpoint(endpoint)
              .build()
13
      val newHttpJsonBuilder = LlmUtilityServiceStubSettings.newHttpJsonBuilder()
      newHttpJsonBuilder.unaryMethodSettingsBuilders().forEach { builder ->
          builder.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration())
      }
18
      val llmUtilityServiceStubSettings =
          newHttpJsonBuilder
              .setEndpoint(endpoint)
              .setCredentialsProvider(NoCredentialsProvider.create())
              .setTransportChannelProvider(channelProvider)
              .build()
25
      val llmUtilityServiceClient =
          LlmUtilityServiceClient.create(
              LlmUtilityServiceSettings.create(llmUtilityServiceStubSettings),
          )
30
      val predictionServiceSettingsBuilder =
          PredictionServiceSettings
              .newHttpJsonBuilder()
              .setEndpoint(endpoint)
              .setCredentialsProvider(NoCredentialsProvider.create())
              .applyToAllUnaryMethods { updater ->
                  updater.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration()) as? Void?
              }
39
      val predictionServiceSettings = predictionServiceSettingsBuilder.build()
      val predictionClient = PredictionServiceClient.create(predictionServiceSettings)
42
      return VertexAI
          .Builder()
          .setTransport(Transport.REST)
          .setProjectId(projectId)
          .setLocation(location)
          .setLlmClientSupplier { llmUtilityServiceClient }
          .setPredictionClientSupplier { predictionClient }
          .setCredentials(ApiKeyCredentials.create("dummy-key"))
          .build()
  } catch (e: IOException) {
      throw RuntimeException(e)
  }
55}

Then we should create MockGemini server and test Spring-AI integration:

 1// create mock server
 2val gemini = MockGemini(verbose = true)
 3
 4// Create a VertexAI client that connects to the mock server
 5val vertexAI = createTestVertexAI(
 6    endpoint = gemini.baseUrl(),
 7    projectId = "your-project-id",
 8    location = "us-central1",
 9    timeout = 5.seconds,
10)
11
12// create Spring-AI client
13val chatClient =
14  ChatClient
15    .builder(
16      VertexAiGeminiChatModel
17        .builder()
18        .vertexAI(vertexAI)
19        .build(),
20    ).build()
21
22// Set up a mock for the LLM call
23gemini.generateContent {
24  temperature = 0.7
25  model = "gemini-2.0-flash"
26  project = "your-project-id"
27  location = "us-central1"
28  systemMessageContains("You are a helpful pirate")
29  userMessageContains("Just say 'Hello!'")
30} responds {
31  content = "Ahoy there, matey! Hello!"
32  finishReason = "stop"
33  delay = 42.milliseconds
34}
35
36// Configure Spring-AI client call
37val response =
38  chatClient
39    .prompt()
40    .system("You are a helpful pirate")
41    .user("Just say 'Hello!'")
42    .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
43    // Make a call
44    .call()
45    .chatResponse()
46
47// Verify the response
48response shouldNotBeNull {
49  result shouldNotBeNull {
50    metadata.finishReason shouldBe "STOP"
51    output.text shouldBe "Ahoy there, matey! Hello!"
52  }
53}

Streaming Responses

Mock streaming responses easily with flow support:

 1// configure mock gemini
 2gemini.generateContentStream {
temperature = 0.7
model = "gemini-2.0-flash"
project = "your-project-id"
location = "us-central1"
systemMessageContains("You are a helpful pirate")
userMessageContains("Just say 'Hello!'")
 9}.respondsStream(sse = false) {
responseFlow =
  flow {
    emit("Ahoy")
    emit(" there,")
    delay(100.milliseconds)
    emit(" matey!")
    emit(" Hello!")
  }
delay = 60.milliseconds
delayBetweenChunks = 50.milliseconds
20}
21
22// Use Spring AI's streaming API
23val buffer = StringBuffer()
24val chunkCount =
chatClient
  .prompt()
  .system("You are a helpful pirate")
  .user("Just say 'Hello!'")
  .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
  .stream()
  .chatResponse()
  .doOnNext { chunk ->
    // Process each chunk as it arrives
    chunk.result.output.text?.let(buffer::append)
  }.count()
  .block(5.seconds.toJavaDuration())
37
38// Verify the complete response
39buffer.toString() shouldBe "Ahoy there, matey! Hello!"

Integration with Google Gen AI Java SDK

AI-Mocks Gemini can also be used to test applications that use the Google Gen AI Java SDK directly.

Setting up the Client

First, create a mock Gemini server:

1val gemini = MockGemini(verbose = true)

Then, configure the Google Gen AI Java SDK client to use the mock server:

 1val client = Client.builder()
.project("your-project-id")
.location("us-central1")
.credentials(
  GoogleCredentials.create(
    AccessToken.newBuilder().setTokenValue("dummy-token").build()
  )
)
.vertexAI(true)
.httpOptions(HttpOptions.builder().baseUrl(gemini.baseUrl()).build())
.build()

Regular Content Generation

Set up a mock response for a regular content generation request:

 1gemini.generateContent {
temperature = 0.7
seed = 42
model = "gemini-2.0-flash"
project = "your-project-id"
location = "us-central1"
apiVersion = "v1beta1"
systemMessageContains("You are a helpful pirate")
userMessageContains("Just say 'Hello!'")
10} responds {
content = "Ahoy there, matey! Hello!"
delay = 60.milliseconds
13}

Make a request using the Google Gen AI Java SDK:

 1val config = GenerateContentConfig.builder()
 2  .seed(42)
 3  .maxOutputTokens(100)
 4  .temperature(0.7f)
 5  .systemInstruction(
 6    Content.builder().role("system")
 7      .parts(Part.fromText("You are a helpful pirate")).build()
 8  )
 9  .build()
10
11val response = client.models.generateContent(
12  "gemini-2.0-flash",
13  "Just say 'Hello!'",
14  config
15)
16
17// Verify the response
18response.text() shouldBe "Ahoy there, matey! Hello!"

Streaming Content Generation

Set up a mock response for a streaming content generation request:

 1gemini.generateContentStream {
temperature = 0.7
apiVersion = "v1beta1"
location = "us-central1"
maxOutputTokens(100)
model = "gemini-2.0-flash"
project = "your-project-id"
seed = 42
systemMessageContains("You are a helpful pirate")
userMessageContains("Just say 'Hello!'")
11} respondsStream {
responseFlow =
  flow {
    emit("Ahoy")
    emit(" there,")
    delay(100.milliseconds)
    emit(" matey!")
    emit(" Hello!")
  }
delay = 60.milliseconds
delayBetweenChunks = 15.milliseconds
22}

Make a streaming request using the Google Gen AI Java SDK:

 1val response = client.models.generateContentStream(
 2  "gemini-2.0-flash",
 3  "Just say 'Hello!'",
 4  config
 5)
 6
 7// Collect and verify the streaming response
 8val fullResponse = response.joinToString(separator = "") {
 9  it.text() ?: ""
10}
11fullResponse shouldBe "Ahoy there, matey! Hello!"

Check for examples in the integration tests.