Ollama

Maven Central

AI-Mocks Ollama is a specialized mock server implementation for mocking the Ollama API, built using Mokksy.

MockOllama is tested against the LangChain4j framework with the Ollama integration.

Currently, it supports the main endpoints of the Ollama API, including:

  • Generate completions
  • Chat completions
  • Model management
  • Embeddings

Quick Start

Include the library in your test dependencies (Maven or Gradle).

build.gradle.kts
1testImplementation("dev.mokksy.aimocks:ai-mocks-ollama-jvm:$latestVersion")
pom.xml
1<dependency>
2  <groupId>dev.mokksy.aimocks</groupId>
3  <artifactId>ai-mocks-ollama-jvm</artifactId>
4  <version>[LATEST_VERSION]</version>
5  <scope>test</scope>
6</dependency>

Basic Setup

Set up a mock server and define mock responses:

1// Create a mock Ollama server
2val ollama = MockOllama(verbose = true)
3
4// Get the base URL of the mock server
5val baseUrl = ollama.baseUrl()

Generate Completions API

Let's simulate Ollama's Generate Completions API:

 1// Define mock response
 2ollama.generate {
 3  model = "llama3"
 4  userMessageContains("Tell me a joke")
 5} responds {
 6  content("Why did the chicken cross the road? To get to the other side!")
 7  doneReason("stop")
 8  delay = 42.milliseconds
 9}
10
11// Create request
12val request = GenerateRequest(
13  model = "llama3",
14  prompt = "Tell me a joke",
15  stream = false,
16  options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21  .uri(URI.create("${ollama.baseUrl()}/api/generate"))
22  .header("Content-Type", "application/json")
23  .POST(
24    HttpRequest.BodyPublishers.ofString(
25      json.encodeToString(GenerateRequest.serializer(), request)
26    )
27  )
28  .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val generateResponse = json.decodeFromString<GenerateResponse>(response.body())
35generateResponse.response shouldBe "Why did the chicken cross the road? To get to the other side!"
36generateResponse.model shouldBe "llama3"
37generateResponse.done shouldBe true
38generateResponse.doneReason shouldBe "stop"

Chat Completions API

Let's simulate Ollama's Chat Completions API:

 1// Define mock response
 2ollama.chat {
 3  model = "llama3"
 4  userMessageContains("Hello")
 5} responds {
 6  content("Hello, how can I help you today?")
 7  delay = 42.milliseconds
 8}
 9
10// Create request
11val request = ChatRequest(
12  model = "llama3",
13  messages = listOf(
14    Message(
15      role = "user",
16      content = "Hello"
17    )
18  ),
19  stream = false,
20  options = ModelOptions(temperature = 0.7, topP = 0.9)
21)
22
23// Send request to mock server
24val httpRequest = HttpRequest.newBuilder()
25  .uri(URI.create("${ollama.baseUrl()}/api/chat"))
26  .header("Content-Type", "application/json")
27  .POST(
28    HttpRequest.BodyPublishers.ofString(
29      json.encodeToString(ChatRequest.serializer(), request)
30    )
31  )
32  .build()
33
34val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
35
36// Verify response
37response.statusCode() shouldBe 200
38val chatResponse = json.decodeFromString<ChatResponse>(response.body())
39chatResponse.message.content shouldBe "Hello, how can I help you today?"
40chatResponse.model shouldBe "llama3"
41chatResponse.done shouldBe true

Embeddings API

Let's simulate Ollama's Embeddings API:

 1// Define mock response for a single string input
 2val embeddings = listOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f))
 3
 4ollama.embed {
 5  model = "llama3"
 6  stringInput = "The sky is blue"
 7} responds {
 8  embeddings(embeddings)
 9  delay = 42.milliseconds
10}
11
12// Create request
13val request = EmbeddingsRequest(
14  model = "llama3",
15  input = listOf("The sky is blue"),
16  options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21  .uri(URI.create("${ollama.baseUrl()}/api/embed"))
22  .header("Content-Type", "application/json")
23  .POST(
24    HttpRequest.BodyPublishers.ofString(
25      json.encodeToString(EmbeddingsRequest.serializer(), request)
26    )
27  )
28  .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
35embedResponse.embeddings shouldBe embeddings
36embedResponse.model shouldBe "llama3"

You can also mock embeddings for a list of strings:

 1// Define mock response for multiple string inputs
 2val embeddings = listOf(
 3  listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f),
 4  listOf(0.6f, 0.7f, 0.8f, 0.9f, 1.0f)
 5)
 6
 7ollama.embed {
 8  model = "llama3"
 9  stringListInput = listOf("The sky is blue", "The grass is green")
10} responds {
11  embeddings(embeddings)
12  delay = 42.milliseconds
13}
14
15// Create request
16val request = EmbeddingsRequest(
17  model = "llama3",
18  input = listOf("The sky is blue", "The grass is green"),
19  options = ModelOptions(temperature = 0.7, topP = 0.9)
20)
21
22// Send request to mock server
23val httpRequest = HttpRequest.newBuilder()
24  .uri(URI.create("${ollama.baseUrl()}/api/embed"))
25  .header("Content-Type", "application/json")
26  .POST(
27    HttpRequest.BodyPublishers.ofString(
28      json.encodeToString(EmbeddingsRequest.serializer(), request)
29    )
30  )
31  .build()
32
33val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
34
35// Verify response
36response.statusCode() shouldBe 200
37val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
38embedResponse.embeddings shouldBe embeddings
39embedResponse.model shouldBe "llama3"

Streaming Responses

AI-Mocks-Ollama supports streaming responses for both generate and chat endpoints:

 1// Define streaming mock response for generate endpoint
 2ollama.generate {
 3  model = "llama3"
 4  stream = true
 5  userMessageContains("Tell me a story")
 6} respondsStream {
 7  responseChunks = listOf(
 8    "Once upon a time",
 9    " in a land far, far away",
10    " there lived a programmer",
11    " who never had to debug in production."
12  )
13  delayBetweenChunks = 100.milliseconds
14}
15
16// Define streaming mock response for chat endpoint
17ollama.chat {
18  model = "llama3"
19  stream = true
20} respondsStream {
21  responseChunks = listOf(
22    "Hello",
23    ", how can I",
24    " help you today?"
25  )
26  delayBetweenChunks = 100.milliseconds
27}

Request Configuration Options

The following tables list the available configuration options for mocking Ollama API calls.

Generate Request Configuration Options

OptionDescription
modelThe model to match in the request
promptThe prompt to match in the request
systemThe system message to match in the request
templateThe template to match in the request
streamWhether to match streaming requests
requestBodyStringAdds a string matcher for the request body

Chat Request Configuration Options

OptionDescription
modelThe model to match in the request
messagesThe messages to match in the request
streamWhether to match streaming requests
requestBodyStringAdds a string matcher for the request body
userMessageAdds a user message to match in the request
systemMessageAdds a system message to match in the request

Embed Request Configuration Options

OptionDescription
modelThe model to match in the request
stringInputThe string input to match in the request
stringListInputThe list of string inputs to match in the request
truncateWhether to truncate the input to fit within context length
optionsAdditional model parameters to match in the request
keepAliveControls how long the model will stay loaded into memory
requestBodyStringAdds a string matcher for the request body

Response Configuration Options

Generate Response Configuration Options

OptionDescriptionDefault Value
contentThe content to include in the response"This is a mock response from Ollama."
doneReasonThe reason why generation completed (e.g., "stop", "length")"stop"
delayThe delay before sending the responseDuration.ZERO

Chat Response Configuration Options

OptionDescriptionDefault Value
contentThe content to include in the response"This is a mock response from Ollama."
thinkingThe thinking process of the modelnull
toolCallsThe tool calls to include in the responsenull
delayThe delay before sending the responseDuration.ZERO

Embed Response Configuration Options

OptionDescriptionDefault Value
embeddingsThe embeddings to include in the responselistOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f))
embeddingA single embedding to include in the responseN/A
modelThe model name to include in the responsenull
delayThe delay before sending the responseDuration.ZERO

Streaming Response Configuration Options

OptionDescriptionDefault ValueAvailability
responseFlowA flow of content chunks for the streaming responsenullGenerate & Chat
responseChunksA list of content chunks for the streaming responsenullGenerate & Chat
delayBetweenChunksThe delay between sending chunksDuration.ZEROGenerate & Chat
doneReasonThe reason why generation completed"stop"Generate only

Integration Testing

Create a test class with a MockOllama instance to test your Ollama client integration:

 1class MyOllamaTest {
 2  private val ollama = MockOllama()
 3
 4  @Test
 5  fun `Should respond to Chat Completion`() = runTest {
 6    // Configure mock response
 7    ollama.chat {
 8      model = "llama3"
 9    } responds {
10      content("Hello, how can I help you today?")
11    }
12
13    // Use your Ollama client to make a request and verify the response
14  }
15}

Integration with LangChain4j

AI-Mocks-Ollama can be used with LangChain4j's Ollama integration:

 1// Create a mock Ollama server
 2val ollama = MockOllama(verbose = true)
 3
 4// Configure mock response
 5ollama.chat {
 6  model = "llama3"
 7} responds {
 8  content("Hello, how can I help you today?")
 9  delay = 42.milliseconds
10}
11
12// Create LangChain4j Ollama client
13val model = OllamaChatModel.builder()
14  .baseUrl(ollama.baseUrl())
15  .modelName("llama3")
16  .temperature(0.7)
17  .topP(0.9)
18  .build()
19
20// Use LangChain4j Kotlin DSL to send a request
21val result = model.chat {
22  messages += userMessage("Hello")
23}
24
25// Verify response
26result.apply {
27  aiMessage().text() shouldBe "Hello, how can I help you today?"
28}

Check for examples in the integration tests.