Gemini

Maven Central

AI-Mocks Gemini is a specialized mock server implementation for mocking the Google Vertex AI Gemini API, built using Mokksy.

MockGemini is tested against the Spring AI framework with the Vertex AI Gemini integration.

Currently, it supports basic content generation requests and streaming responses.

Quick Start

Include the library in your test dependencies (Maven or Gradle).

build.gradle.kts
1testImplementation("dev.mokksy.aimocks:ai-mocks-gemini-jvm:$latestVersion")
pom.xml
1<dependency>
2  <groupId>dev.mokksy.aimocks</groupId>
3  <artifactId>ai-mocks-gemini-jvm</artifactId>
4  <version>[LATEST_VERSION]</version>
5  <scope>test</scope>
6</dependency>

Content Generation API

Set up a mock server and define mock responses:

1val gemini = MockGemini(verbose = true)

Let's simulate Gemini content generation API:

 1// Define mock response
 2gemini.generateContent {
 3  temperature = 0.7
 4  model = "gemini-2.0-flash"
 5  project = "your-project-id"
 6  location = "us-central1"
 7  apiVersion = "v1beta1"
 8  path = null // custom request path, overrides "apiVersion"
 9  seed = 42
10  maxTokens = 100
11  topK = 40
12  topP = 0.95
13  maxOutputTokens(200)
14  systemMessageContains("helpful pirate")
15  userMessageContains("say 'Hello!'")
16  requestBodyContains("helpful")
17  requestBodyContainsIgnoringCase("PIRATE")
18  requestBodyDoesNotContains("unwanted text")
19  requestBodyDoesNotContainsIgnoringCase("unwanted case insensitive text")
20  requestMatchesPredicate { it.generationConfig?.topP == 0.95 }
21} responds {
22  content = "Ahoy there, matey! Hello!"
23  finishReason = "stop"
24  role = "model"
25  delay = 42.milliseconds // delay before answer
26}

Configuration Options

The following tables list all available configuration options for mocking Gemini API calls.

Request Configuration Options

OptionDescription
temperatureControls randomness of the output. Lower values make output more deterministic.
modelThe Gemini model to use.
maxTokensMaximum number of tokens to generate.
topKLimits token selection to the K most likely next tokens.
topPLimits token selection to tokens with cumulative probability of P.
projectGoogle Cloud project ID.
locationGoogle Cloud location.
apiVersionAPI version to use.
pathCustom request path.
seedSeed for deterministic generation.
maxOutputTokensMaximum number of tokens to generate.
systemMessageContainsMatches requests with system messages containing the specified text.
userMessageContainsMatches requests with user messages containing the specified text.
requestBodyContainsMatches requests with bodies containing the specified text.
requestBodyContainsIgnoringCaseMatches requests with bodies containing the specified text (case-insensitive).
requestBodyDoesNotContainsMatches requests with bodies not containing the specified text.
requestBodyDoesNotContainsIgnoringCaseMatches requests with bodies not containing the specified text (case-insensitive).
requestMatchesPredicateMatches requests satisfying a custom predicate.

Response Configuration Options

OptionDescriptionDefault Value
contentThe content to include in the response."This is a mock response from Gemini API."
finishReasonThe reason why the model stopped generating tokens."STOP"
roleThe role of the content."model"
delayThe delay before sending the response.Duration.ZERO
delayMillisThe delay before sending the response in milliseconds.N/A

Streaming Content Generation

Here's an example of setting up a streaming content generation mock:

 1// Define streaming mock response
 2gemini.generateContentStream {
 3  temperature = 0.7
 4  model = "gemini-2.0-flash"
 5  project = "your-project-id"
 6  location = "us-central1"
 7  apiVersion = "v1beta1"
 8  seed = 42
 9  maxTokens = 100
10  topK = 40
11  topP = 0.95
12  maxOutputTokens(200)
13  systemMessageContains("helpful pirate")
14  userMessageContains("say 'Hello!'")
15} respondsStream {
16  responseFlow = flow {
17    emit("Ahoy")
18    emit(" there,")
19    delay(100.milliseconds)
20    emit(" matey!")
21    emit(" Hello!")
22  }
23  // Alternatively, you can use responseChunks = listOf("Ahoy", " there,", " matey!", " Hello!")
24  // Or chunks("Ahoy", " there,", " matey!", " Hello!")
25  finishReason = "stop"
26  delay = 60.milliseconds // delay before first chunk
27  delayBetweenChunks = 15.milliseconds // delay between chunks
28}

Streaming Response Configuration Options

OptionDescriptionDefault Value
responseFlowA flow of content chunks to include in the streaming response.null
responseChunksA list of content chunks to include in the streaming response.null
chunksSets the chunks of content for the streaming response.N/A
delayBetweenChunksThe delay between sending chunks.Duration.ZERO
finishReasonThe reason why the model stopped generating tokens."STOP"

Integration with Spring-AI

First, we need a function to create VertexAI client, configured to use the arbitrary server endpoint and credentials.

 1internal fun createTestVertexAI(
 2    endpoint: String,
 3    projectId: String,
 4    location: String,
 5    timeout: Duration,
 6): VertexAI {
 7    try {
 8        val channelProvider =
 9            LlmUtilityServiceStubSettings
10                .defaultHttpJsonTransportProviderBuilder()
11                .setEndpoint(endpoint)
12                .build()
13
14        val newHttpJsonBuilder = LlmUtilityServiceStubSettings.newHttpJsonBuilder()
15        newHttpJsonBuilder.unaryMethodSettingsBuilders().forEach { builder ->
16            builder.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration())
17        }
18
19        val llmUtilityServiceStubSettings =
20            newHttpJsonBuilder
21                .setEndpoint(endpoint)
22                .setCredentialsProvider(NoCredentialsProvider.create())
23                .setTransportChannelProvider(channelProvider)
24                .build()
25
26        val llmUtilityServiceClient =
27            LlmUtilityServiceClient.create(
28                LlmUtilityServiceSettings.create(llmUtilityServiceStubSettings),
29            )
30
31        val predictionServiceSettingsBuilder =
32            PredictionServiceSettings
33                .newHttpJsonBuilder()
34                .setEndpoint(endpoint)
35                .setCredentialsProvider(NoCredentialsProvider.create())
36                .applyToAllUnaryMethods { updater ->
37                    updater.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration()) as? Void?
38                }
39
40        val predictionServiceSettings = predictionServiceSettingsBuilder.build()
41        val predictionClient = PredictionServiceClient.create(predictionServiceSettings)
42
43        return VertexAI
44            .Builder()
45            .setTransport(Transport.REST)
46            .setProjectId(projectId)
47            .setLocation(location)
48            .setLlmClientSupplier { llmUtilityServiceClient }
49            .setPredictionClientSupplier { predictionClient }
50            .setCredentials(ApiKeyCredentials.create("dummy-key"))
51            .build()
52    } catch (e: IOException) {
53        throw RuntimeException(e)
54    }
55}

Then we should create MockGemini server and test Spring-AI integration:

 1// create mock server
 2val gemini = MockGemini(verbose = true)
 3
 4// Create a VertexAI client that connects to the mock server
 5val vertexAI = createTestVertexAI(
 6    endpoint = gemini.baseUrl(),
 7    projectId = "your-project-id",
 8    location = "us-central1",
 9    timeout = 5.seconds,
10)
11
12// create Spring-AI client
13val chatClient =
14  ChatClient
15    .builder(
16      VertexAiGeminiChatModel
17        .builder()
18        .vertexAI(vertexAI)
19        .build(),
20    ).build()
21
22// Set up a mock for the LLM call
23gemini.generateContent {
24  temperature = 0.7
25  model = "gemini-2.0-flash"
26  project = "your-project-id"
27  location = "us-central1"
28  systemMessageContains("You are a helpful pirate")
29  userMessageContains("Just say 'Hello!'")
30} responds {
31  content = "Ahoy there, matey! Hello!"
32  finishReason = "stop"
33  delay = 42.milliseconds
34}
35
36// Configure Spring-AI client call
37val response =
38  chatClient
39    .prompt()
40    .system("You are a helpful pirate")
41    .user("Just say 'Hello!'")
42    .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
43    // Make a call
44    .call()
45    .chatResponse()
46
47// Verify the response
48response shouldNotBeNull {
49  result shouldNotBeNull {
50    metadata.finishReason shouldBe "STOP"
51    output.text shouldBe "Ahoy there, matey! Hello!"
52  }
53}

Streaming Responses

Mock streaming responses easily with flow support:

 1// configure mock gemini
 2gemini.generateContentStream {
 3  temperature = 0.7
 4  model = "gemini-2.0-flash"
 5  project = "your-project-id"
 6  location = "us-central1"
 7  systemMessageContains("You are a helpful pirate")
 8  userMessageContains("Just say 'Hello!'")
 9}.respondsStream(sse = false) {
10  responseFlow =
11    flow {
12      emit("Ahoy")
13      emit(" there,")
14      delay(100.milliseconds)
15      emit(" matey!")
16      emit(" Hello!")
17    }
18  delay = 60.milliseconds
19  delayBetweenChunks = 50.milliseconds
20}
21
22// Use Spring AI's streaming API
23val buffer = StringBuffer()
24val chunkCount =
25  chatClient
26    .prompt()
27    .system("You are a helpful pirate")
28    .user("Just say 'Hello!'")
29    .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
30    .stream()
31    .chatResponse()
32    .doOnNext { chunk ->
33      // Process each chunk as it arrives
34      chunk.result.output.text?.let(buffer::append)
35    }.count()
36    .block(5.seconds.toJavaDuration())
37
38// Verify the complete response
39buffer.toString() shouldBe "Ahoy there, matey! Hello!"

Integration with Google Gen AI Java SDK

AI-Mocks Gemini can also be used to test applications that use the Google Gen AI Java SDK directly.

Setting up the Client

First, create a mock Gemini server:

1val gemini = MockGemini(verbose = true)

Then, configure the Google Gen AI Java SDK client to use the mock server:

 1val client = Client.builder()
 2  .project("your-project-id")
 3  .location("us-central1")
 4  .credentials(
 5    GoogleCredentials.create(
 6      AccessToken.newBuilder().setTokenValue("dummy-token").build()
 7    )
 8  )
 9  .vertexAI(true)
10  .httpOptions(HttpOptions.builder().baseUrl(gemini.baseUrl()).build())
11  .build()

Regular Content Generation

Set up a mock response for a regular content generation request:

 1gemini.generateContent {
 2  temperature = 0.7
 3  seed = 42
 4  model = "gemini-2.0-flash"
 5  project = "your-project-id"
 6  location = "us-central1"
 7  apiVersion = "v1beta1"
 8  systemMessageContains("You are a helpful pirate")
 9  userMessageContains("Just say 'Hello!'")
10} responds {
11  content = "Ahoy there, matey! Hello!"
12  delay = 60.milliseconds
13}

Make a request using the Google Gen AI Java SDK:

 1val config = GenerateContentConfig.builder()
 2  .seed(42)
 3  .maxOutputTokens(100)
 4  .temperature(0.7f)
 5  .systemInstruction(
 6    Content.builder().role("system")
 7      .parts(Part.fromText("You are a helpful pirate")).build()
 8  )
 9  .build()
10
11val response = client.models.generateContent(
12  "gemini-2.0-flash",
13  "Just say 'Hello!'",
14  config
15)
16
17// Verify the response
18response.text() shouldBe "Ahoy there, matey! Hello!"

Streaming Content Generation

Set up a mock response for a streaming content generation request:

 1gemini.generateContentStream {
 2  temperature = 0.7
 3  apiVersion = "v1beta1"
 4  location = "us-central1"
 5  maxOutputTokens(100)
 6  model = "gemini-2.0-flash"
 7  project = "your-project-id"
 8  seed = 42
 9  systemMessageContains("You are a helpful pirate")
10  userMessageContains("Just say 'Hello!'")
11} respondsStream {
12  responseFlow =
13    flow {
14      emit("Ahoy")
15      emit(" there,")
16      delay(100.milliseconds)
17      emit(" matey!")
18      emit(" Hello!")
19    }
20  delay = 60.milliseconds
21  delayBetweenChunks = 15.milliseconds
22}

Make a streaming request using the Google Gen AI Java SDK:

 1val response = client.models.generateContentStream(
 2  "gemini-2.0-flash",
 3  "Just say 'Hello!'",
 4  config
 5)
 6
 7// Collect and verify the streaming response
 8val fullResponse = response.joinToString(separator = "") {
 9  it.text() ?: ""
10}
11fullResponse shouldBe "Ahoy there, matey! Hello!"

Check for examples in the integration tests.