Gemini
AI-Mocks Gemini is a specialized mock server implementation for mocking the Google Vertex AI Gemini API, built using Mokksy.
MockGemini is tested against the Spring AI framework with the Vertex AI Gemini integration.
Currently, it supports basic content generation requests and streaming responses.
Quick Start
Include the library in your test dependencies (Maven or Gradle).
1testImplementation("dev.mokksy.aimocks:ai-mocks-gemini-jvm:$latestVersion")1<dependency>
2 <groupId>dev.mokksy.aimocks</groupId>
3 <artifactId>ai-mocks-gemini-jvm</artifactId>
4 <version>[LATEST_VERSION]</version>
5 <scope>test</scope>
6</dependency>Content Generation API
Set up a mock server and define mock responses:
1val gemini = MockGemini(verbose = true)Let's simulate Gemini content generation API:
1// Define mock response
2gemini.generateContent {
3 temperature = 0.7
4 model = "gemini-2.0-flash"
5 project = "your-project-id"
6 location = "us-central1"
7 apiVersion = "v1beta1"
8 path = null // custom request path, overrides "apiVersion"
9 seed = 42
10 maxTokens = 100
11 topK = 40
12 topP = 0.95
13 maxOutputTokens(200)
14 systemMessageContains("helpful pirate")
15 userMessageContains("say 'Hello!'")
16 requestBodyContains("helpful")
17 requestBodyContainsIgnoringCase("PIRATE")
18 requestBodyDoesNotContains("unwanted text")
19 requestBodyDoesNotContainsIgnoringCase("unwanted case insensitive text")
20 requestMatchesPredicate { it.generationConfig?.topP == 0.95 }
21} responds {
22 content = "Ahoy there, matey! Hello!"
23 finishReason = "stop"
24 role = "model"
25 delay = 42.milliseconds // delay before answer
26}Configuration Options
The following tables list all available configuration options for mocking Gemini API calls.
Request Configuration Options
| Option | Description |
|---|---|
temperature | Controls randomness of the output. Lower values make output more deterministic. |
model | The Gemini model to use. |
maxTokens | Maximum number of tokens to generate. |
topK | Limits token selection to the K most likely next tokens. |
topP | Limits token selection to tokens with cumulative probability of P. |
project | Google Cloud project ID. |
location | Google Cloud location. |
apiVersion | API version to use. |
path | Custom request path. |
seed | Seed for deterministic generation. |
maxOutputTokens | Maximum number of tokens to generate. |
systemMessageContains | Matches requests with system messages containing the specified text. |
userMessageContains | Matches requests with user messages containing the specified text. |
requestBodyContains | Matches requests with bodies containing the specified text. |
requestBodyContainsIgnoringCase | Matches requests with bodies containing the specified text (case-insensitive). |
requestBodyDoesNotContains | Matches requests with bodies not containing the specified text. |
requestBodyDoesNotContainsIgnoringCase | Matches requests with bodies not containing the specified text (case-insensitive). |
requestMatchesPredicate | Matches requests satisfying a custom predicate. |
Response Configuration Options
| Option | Description | Default Value |
|---|---|---|
content | The content to include in the response. | "This is a mock response from Gemini API." |
finishReason | The reason why the model stopped generating tokens. | "STOP" |
role | The role of the content. | "model" |
delay | The delay before sending the response. | Duration.ZERO |
delayMillis | The delay before sending the response in milliseconds. | N/A |
Streaming Content Generation
Here's an example of setting up a streaming content generation mock:
1// Define streaming mock response
2gemini.generateContentStream {
3 temperature = 0.7
4 model = "gemini-2.0-flash"
5 project = "your-project-id"
6 location = "us-central1"
7 apiVersion = "v1beta1"
8 seed = 42
9 maxTokens = 100
10 topK = 40
11 topP = 0.95
12 maxOutputTokens(200)
13 systemMessageContains("helpful pirate")
14 userMessageContains("say 'Hello!'")
15} respondsStream {
16 responseFlow = flow {
17 emit("Ahoy")
18 emit(" there,")
19 delay(100.milliseconds)
20 emit(" matey!")
21 emit(" Hello!")
22 }
23 // Alternatively, you can use responseChunks = listOf("Ahoy", " there,", " matey!", " Hello!")
24 // Or chunks("Ahoy", " there,", " matey!", " Hello!")
25 finishReason = "stop"
26 delay = 60.milliseconds // delay before first chunk
27 delayBetweenChunks = 15.milliseconds // delay between chunks
28}Streaming Response Configuration Options
| Option | Description | Default Value |
|---|---|---|
responseFlow | A flow of content chunks to include in the streaming response. | null |
responseChunks | A list of content chunks to include in the streaming response. | null |
chunks | Sets the chunks of content for the streaming response. | N/A |
delayBetweenChunks | The delay between sending chunks. | Duration.ZERO |
finishReason | The reason why the model stopped generating tokens. | "STOP" |
Integration with Spring-AI
First, we need a function to create VertexAI client, configured to use the arbitrary server endpoint and credentials.
1internal fun createTestVertexAI(
2 endpoint: String,
3 projectId: String,
4 location: String,
5 timeout: Duration,
6): VertexAI {
7 try {
8 val channelProvider =
9 LlmUtilityServiceStubSettings
10 .defaultHttpJsonTransportProviderBuilder()
11 .setEndpoint(endpoint)
12 .build()
13
14 val newHttpJsonBuilder = LlmUtilityServiceStubSettings.newHttpJsonBuilder()
15 newHttpJsonBuilder.unaryMethodSettingsBuilders().forEach { builder ->
16 builder.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration())
17 }
18
19 val llmUtilityServiceStubSettings =
20 newHttpJsonBuilder
21 .setEndpoint(endpoint)
22 .setCredentialsProvider(NoCredentialsProvider.create())
23 .setTransportChannelProvider(channelProvider)
24 .build()
25
26 val llmUtilityServiceClient =
27 LlmUtilityServiceClient.create(
28 LlmUtilityServiceSettings.create(llmUtilityServiceStubSettings),
29 )
30
31 val predictionServiceSettingsBuilder =
32 PredictionServiceSettings
33 .newHttpJsonBuilder()
34 .setEndpoint(endpoint)
35 .setCredentialsProvider(NoCredentialsProvider.create())
36 .applyToAllUnaryMethods { updater ->
37 updater.setSimpleTimeoutNoRetriesDuration(timeout.toJavaDuration()) as? Void?
38 }
39
40 val predictionServiceSettings = predictionServiceSettingsBuilder.build()
41 val predictionClient = PredictionServiceClient.create(predictionServiceSettings)
42
43 return VertexAI
44 .Builder()
45 .setTransport(Transport.REST)
46 .setProjectId(projectId)
47 .setLocation(location)
48 .setLlmClientSupplier { llmUtilityServiceClient }
49 .setPredictionClientSupplier { predictionClient }
50 .setCredentials(ApiKeyCredentials.create("dummy-key"))
51 .build()
52 } catch (e: IOException) {
53 throw RuntimeException(e)
54 }
55}Then we should create MockGemini server and test Spring-AI integration:
1// create mock server
2val gemini = MockGemini(verbose = true)
3
4// Create a VertexAI client that connects to the mock server
5val vertexAI = createTestVertexAI(
6 endpoint = gemini.baseUrl(),
7 projectId = "your-project-id",
8 location = "us-central1",
9 timeout = 5.seconds,
10)
11
12// create Spring-AI client
13val chatClient =
14 ChatClient
15 .builder(
16 VertexAiGeminiChatModel
17 .builder()
18 .vertexAI(vertexAI)
19 .build(),
20 ).build()
21
22// Set up a mock for the LLM call
23gemini.generateContent {
24 temperature = 0.7
25 model = "gemini-2.0-flash"
26 project = "your-project-id"
27 location = "us-central1"
28 systemMessageContains("You are a helpful pirate")
29 userMessageContains("Just say 'Hello!'")
30} responds {
31 content = "Ahoy there, matey! Hello!"
32 finishReason = "stop"
33 delay = 42.milliseconds
34}
35
36// Configure Spring-AI client call
37val response =
38 chatClient
39 .prompt()
40 .system("You are a helpful pirate")
41 .user("Just say 'Hello!'")
42 .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
43 // Make a call
44 .call()
45 .chatResponse()
46
47// Verify the response
48response shouldNotBeNull {
49 result shouldNotBeNull {
50 metadata.finishReason shouldBe "STOP"
51 output.text shouldBe "Ahoy there, matey! Hello!"
52 }
53}Streaming Responses
Mock streaming responses easily with flow support:
1// configure mock gemini
2gemini.generateContentStream {
3 temperature = 0.7
4 model = "gemini-2.0-flash"
5 project = "your-project-id"
6 location = "us-central1"
7 systemMessageContains("You are a helpful pirate")
8 userMessageContains("Just say 'Hello!'")
9}.respondsStream(sse = false) {
10 responseFlow =
11 flow {
12 emit("Ahoy")
13 emit(" there,")
14 delay(100.milliseconds)
15 emit(" matey!")
16 emit(" Hello!")
17 }
18 delay = 60.milliseconds
19 delayBetweenChunks = 50.milliseconds
20}
21
22// Use Spring AI's streaming API
23val buffer = StringBuffer()
24val chunkCount =
25 chatClient
26 .prompt()
27 .system("You are a helpful pirate")
28 .user("Just say 'Hello!'")
29 .options(VertexAiGeminiChatOptions.builder().temperature(0.7).build())
30 .stream()
31 .chatResponse()
32 .doOnNext { chunk ->
33 // Process each chunk as it arrives
34 chunk.result.output.text?.let(buffer::append)
35 }.count()
36 .block(5.seconds.toJavaDuration())
37
38// Verify the complete response
39buffer.toString() shouldBe "Ahoy there, matey! Hello!"Integration with Google Gen AI Java SDK
AI-Mocks Gemini can also be used to test applications that use the Google Gen AI Java SDK directly.
Setting up the Client
First, create a mock Gemini server:
1val gemini = MockGemini(verbose = true)Then, configure the Google Gen AI Java SDK client to use the mock server:
1val client = Client.builder()
2 .project("your-project-id")
3 .location("us-central1")
4 .credentials(
5 GoogleCredentials.create(
6 AccessToken.newBuilder().setTokenValue("dummy-token").build()
7 )
8 )
9 .vertexAI(true)
10 .httpOptions(HttpOptions.builder().baseUrl(gemini.baseUrl()).build())
11 .build()Regular Content Generation
Set up a mock response for a regular content generation request:
1gemini.generateContent {
2 temperature = 0.7
3 seed = 42
4 model = "gemini-2.0-flash"
5 project = "your-project-id"
6 location = "us-central1"
7 apiVersion = "v1beta1"
8 systemMessageContains("You are a helpful pirate")
9 userMessageContains("Just say 'Hello!'")
10} responds {
11 content = "Ahoy there, matey! Hello!"
12 delay = 60.milliseconds
13}Make a request using the Google Gen AI Java SDK:
1val config = GenerateContentConfig.builder()
2 .seed(42)
3 .maxOutputTokens(100)
4 .temperature(0.7f)
5 .systemInstruction(
6 Content.builder().role("system")
7 .parts(Part.fromText("You are a helpful pirate")).build()
8 )
9 .build()
10
11val response = client.models.generateContent(
12 "gemini-2.0-flash",
13 "Just say 'Hello!'",
14 config
15)
16
17// Verify the response
18response.text() shouldBe "Ahoy there, matey! Hello!"Streaming Content Generation
Set up a mock response for a streaming content generation request:
1gemini.generateContentStream {
2 temperature = 0.7
3 apiVersion = "v1beta1"
4 location = "us-central1"
5 maxOutputTokens(100)
6 model = "gemini-2.0-flash"
7 project = "your-project-id"
8 seed = 42
9 systemMessageContains("You are a helpful pirate")
10 userMessageContains("Just say 'Hello!'")
11} respondsStream {
12 responseFlow =
13 flow {
14 emit("Ahoy")
15 emit(" there,")
16 delay(100.milliseconds)
17 emit(" matey!")
18 emit(" Hello!")
19 }
20 delay = 60.milliseconds
21 delayBetweenChunks = 15.milliseconds
22}Make a streaming request using the Google Gen AI Java SDK:
1val response = client.models.generateContentStream(
2 "gemini-2.0-flash",
3 "Just say 'Hello!'",
4 config
5)
6
7// Collect and verify the streaming response
8val fullResponse = response.joinToString(separator = "") {
9 it.text() ?: ""
10}
11fullResponse shouldBe "Ahoy there, matey! Hello!"Check for examples in the integration tests.