Ollama
AI-Mocks Ollama is a specialized mock server implementation for mocking the Ollama API, built using Mokksy.
MockOllama is tested against the LangChain4j framework with the Ollama
integration.
Currently, it supports the main endpoints of the Ollama API, including:
- Generate completions
- Chat completions
- Model management
- Embeddings
Quick Start
Include the library in your test dependencies (Maven or Gradle).
1testImplementation("dev.mokksy.aimocks:ai-mocks-ollama-jvm:$latestVersion")1<dependency>
2 <groupId>dev.mokksy.aimocks</groupId>
3 <artifactId>ai-mocks-ollama-jvm</artifactId>
4 <version>[LATEST_VERSION]</version>
5 <scope>test</scope>
6</dependency>Basic Setup
Set up a mock server and define mock responses:
1// Create a mock Ollama server
2val ollama = MockOllama(verbose = true)
3
4// Get the base URL of the mock server
5val baseUrl = ollama.baseUrl()Generate Completions API
Let's simulate Ollama's Generate Completions API:
1// Define mock response
2ollama.generate {
3 model = "llama3"
4 userMessageContains("Tell me a joke")
5} responds {
6 content("Why did the chicken cross the road? To get to the other side!")
7 doneReason("stop")
8 delay = 42.milliseconds
9}
10
11// Create request
12val request = GenerateRequest(
13 model = "llama3",
14 prompt = "Tell me a joke",
15 stream = false,
16 options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21 .uri(URI.create("${ollama.baseUrl()}/api/generate"))
22 .header("Content-Type", "application/json")
23 .POST(
24 HttpRequest.BodyPublishers.ofString(
25 json.encodeToString(GenerateRequest.serializer(), request)
26 )
27 )
28 .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val generateResponse = json.decodeFromString<GenerateResponse>(response.body())
35generateResponse.response shouldBe "Why did the chicken cross the road? To get to the other side!"
36generateResponse.model shouldBe "llama3"
37generateResponse.done shouldBe true
38generateResponse.doneReason shouldBe "stop"Chat Completions API
Let's simulate Ollama's Chat Completions API:
1// Define mock response
2ollama.chat {
3 model = "llama3"
4 userMessageContains("Hello")
5} responds {
6 content("Hello, how can I help you today?")
7 delay = 42.milliseconds
8}
9
10// Create request
11val request = ChatRequest(
12 model = "llama3",
13 messages = listOf(
14 Message(
15 role = "user",
16 content = "Hello"
17 )
18 ),
19 stream = false,
20 options = ModelOptions(temperature = 0.7, topP = 0.9)
21)
22
23// Send request to mock server
24val httpRequest = HttpRequest.newBuilder()
25 .uri(URI.create("${ollama.baseUrl()}/api/chat"))
26 .header("Content-Type", "application/json")
27 .POST(
28 HttpRequest.BodyPublishers.ofString(
29 json.encodeToString(ChatRequest.serializer(), request)
30 )
31 )
32 .build()
33
34val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
35
36// Verify response
37response.statusCode() shouldBe 200
38val chatResponse = json.decodeFromString<ChatResponse>(response.body())
39chatResponse.message.content shouldBe "Hello, how can I help you today?"
40chatResponse.model shouldBe "llama3"
41chatResponse.done shouldBe trueEmbeddings API
Let's simulate Ollama's Embeddings API:
1// Define mock response for a single string input
2val embeddings = listOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f))
3
4ollama.embed {
5 model = "llama3"
6 stringInput = "The sky is blue"
7} responds {
8 embeddings(embeddings)
9 delay = 42.milliseconds
10}
11
12// Create request
13val request = EmbeddingsRequest(
14 model = "llama3",
15 input = listOf("The sky is blue"),
16 options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21 .uri(URI.create("${ollama.baseUrl()}/api/embed"))
22 .header("Content-Type", "application/json")
23 .POST(
24 HttpRequest.BodyPublishers.ofString(
25 json.encodeToString(EmbeddingsRequest.serializer(), request)
26 )
27 )
28 .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
35embedResponse.embeddings shouldBe embeddings
36embedResponse.model shouldBe "llama3"You can also mock embeddings for a list of strings:
1// Define mock response for multiple string inputs
2val embeddings = listOf(
3 listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f),
4 listOf(0.6f, 0.7f, 0.8f, 0.9f, 1.0f)
5)
6
7ollama.embed {
8 model = "llama3"
9 stringListInput = listOf("The sky is blue", "The grass is green")
10} responds {
11 embeddings(embeddings)
12 delay = 42.milliseconds
13}
14
15// Create request
16val request = EmbeddingsRequest(
17 model = "llama3",
18 input = listOf("The sky is blue", "The grass is green"),
19 options = ModelOptions(temperature = 0.7, topP = 0.9)
20)
21
22// Send request to mock server
23val httpRequest = HttpRequest.newBuilder()
24 .uri(URI.create("${ollama.baseUrl()}/api/embed"))
25 .header("Content-Type", "application/json")
26 .POST(
27 HttpRequest.BodyPublishers.ofString(
28 json.encodeToString(EmbeddingsRequest.serializer(), request)
29 )
30 )
31 .build()
32
33val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
34
35// Verify response
36response.statusCode() shouldBe 200
37val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
38embedResponse.embeddings shouldBe embeddings
39embedResponse.model shouldBe "llama3"Streaming Responses
AI-Mocks-Ollama supports streaming responses for both generate and chat endpoints:
1// Define streaming mock response for generate endpoint
2ollama.generate {
3 model = "llama3"
4 stream = true
5 userMessageContains("Tell me a story")
6} respondsStream {
7 responseChunks = listOf(
8 "Once upon a time",
9 " in a land far, far away",
10 " there lived a programmer",
11 " who never had to debug in production."
12 )
13 delayBetweenChunks = 100.milliseconds
14}
15
16// Define streaming mock response for chat endpoint
17ollama.chat {
18 model = "llama3"
19 stream = true
20} respondsStream {
21 responseChunks = listOf(
22 "Hello",
23 ", how can I",
24 " help you today?"
25 )
26 delayBetweenChunks = 100.milliseconds
27}Request Configuration Options
The following tables list the available configuration options for mocking Ollama API calls.
Generate Request Configuration Options
| Option | Description |
|---|---|
model | The model to match in the request |
prompt | The prompt to match in the request |
system | The system message to match in the request |
template | The template to match in the request |
stream | Whether to match streaming requests |
requestBodyString | Adds a string matcher for the request body |
Chat Request Configuration Options
| Option | Description |
|---|---|
model | The model to match in the request |
messages | The messages to match in the request |
stream | Whether to match streaming requests |
requestBodyString | Adds a string matcher for the request body |
userMessage | Adds a user message to match in the request |
systemMessage | Adds a system message to match in the request |
Embed Request Configuration Options
| Option | Description |
|---|---|
model | The model to match in the request |
stringInput | The string input to match in the request |
stringListInput | The list of string inputs to match in the request |
truncate | Whether to truncate the input to fit within context length |
options | Additional model parameters to match in the request |
keepAlive | Controls how long the model will stay loaded into memory |
requestBodyString | Adds a string matcher for the request body |
Response Configuration Options
Generate Response Configuration Options
| Option | Description | Default Value |
|---|---|---|
content | The content to include in the response | "This is a mock response from Ollama." |
doneReason | The reason why generation completed (e.g., "stop", "length") | "stop" |
delay | The delay before sending the response | Duration.ZERO |
Chat Response Configuration Options
| Option | Description | Default Value |
|---|---|---|
content | The content to include in the response | "This is a mock response from Ollama." |
thinking | The thinking process of the model | null |
toolCalls | The tool calls to include in the response | null |
delay | The delay before sending the response | Duration.ZERO |
Embed Response Configuration Options
| Option | Description | Default Value |
|---|---|---|
embeddings | The embeddings to include in the response | listOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f)) |
embedding | A single embedding to include in the response | N/A |
model | The model name to include in the response | null |
delay | The delay before sending the response | Duration.ZERO |
Streaming Response Configuration Options
| Option | Description | Default Value | Availability |
|---|---|---|---|
responseFlow | A flow of content chunks for the streaming response | null | Generate & Chat |
responseChunks | A list of content chunks for the streaming response | null | Generate & Chat |
delayBetweenChunks | The delay between sending chunks | Duration.ZERO | Generate & Chat |
doneReason | The reason why generation completed | "stop" | Generate only |
Integration Testing
Create a test class with a MockOllama instance to test your Ollama client integration:
1class MyOllamaTest {
2 private val ollama = MockOllama()
3
4 @Test
5 fun `Should respond to Chat Completion`() = runTest {
6 // Configure mock response
7 ollama.chat {
8 model = "llama3"
9 } responds {
10 content("Hello, how can I help you today?")
11 }
12
13 // Use your Ollama client to make a request and verify the response
14 }
15}Integration with LangChain4j
AI-Mocks-Ollama can be used with LangChain4j's Ollama integration:
1// Create a mock Ollama server
2val ollama = MockOllama(verbose = true)
3
4// Configure mock response
5ollama.chat {
6 model = "llama3"
7} responds {
8 content("Hello, how can I help you today?")
9 delay = 42.milliseconds
10}
11
12// Create LangChain4j Ollama client
13val model = OllamaChatModel.builder()
14 .baseUrl(ollama.baseUrl())
15 .modelName("llama3")
16 .temperature(0.7)
17 .topP(0.9)
18 .build()
19
20// Use LangChain4j Kotlin DSL to send a request
21val result = model.chat {
22 messages += userMessage("Hello")
23}
24
25// Verify response
26result.apply {
27 aiMessage().text() shouldBe "Hello, how can I help you today?"
28}Check for examples in the integration tests.