Ollama

AI-Mocks Ollama is a specialized mock server implementation for mocking the Ollama API, built using Mokksy.

MockOllama is tested against the LangChain4j framework with the Ollama integration.

Currently, it supports the main endpoints of the Ollama API, including:

Generate completions
Chat completions
Model management
Embeddings

Quick Start

Include the library in your test dependencies (Maven or Gradle).

build.gradle.kts
1testImplementation("dev.mokksy.aimocks:ai-mocks-ollama-jvm:$latestVersion")

pom.xml
1<dependency>
2  <groupId>dev.mokksy.aimocks</groupId>
3  <artifactId>ai-mocks-ollama-jvm</artifactId>
4  <version>[LATEST_VERSION]</version>
5  <scope>test</scope>
6</dependency>

Basic Setup

Set up a mock server and define mock responses:

1// Create a mock Ollama server
2val ollama = MockOllama(verbose = true)
3
4// Get the base URL of the mock server
5val baseUrl = ollama.baseUrl()

Generate Completions API

Let's simulate Ollama's Generate Completions API:

 1// Define mock response
 2ollama.generate {
 3  model = "llama3"
 4  userMessageContains("Tell me a joke")
 5} responds {
 6  content("Why did the chicken cross the road? To get to the other side!")
 7  doneReason("stop")
 8  delay = 42.milliseconds
 9}
10
11// Create request
12val request = GenerateRequest(
13  model = "llama3",
14  prompt = "Tell me a joke",
15  stream = false,
16  options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21  .uri(URI.create("${ollama.baseUrl()}/api/generate"))
22  .header("Content-Type", "application/json")
23  .POST(
24    HttpRequest.BodyPublishers.ofString(
25      json.encodeToString(GenerateRequest.serializer(), request)
26    )
27  )
28  .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val generateResponse = json.decodeFromString<GenerateResponse>(response.body())
35generateResponse.response shouldBe "Why did the chicken cross the road? To get to the other side!"
36generateResponse.model shouldBe "llama3"
37generateResponse.done shouldBe true
38generateResponse.doneReason shouldBe "stop"

Chat Completions API

Let's simulate Ollama's Chat Completions API:

 1// Define mock response
 2ollama.chat {
 3  model = "llama3"
 4  userMessageContains("Hello")
 5} responds {
 6  content("Hello, how can I help you today?")
 7  delay = 42.milliseconds
 8}
 9
10// Create request
11val request = ChatRequest(
12  model = "llama3",
13  messages = listOf(
14    Message(
15      role = "user",
16      content = "Hello"
17    )
18  ),
19  stream = false,
20  options = ModelOptions(temperature = 0.7, topP = 0.9)
21)
22
23// Send request to mock server
24val httpRequest = HttpRequest.newBuilder()
25  .uri(URI.create("${ollama.baseUrl()}/api/chat"))
26  .header("Content-Type", "application/json")
27  .POST(
28    HttpRequest.BodyPublishers.ofString(
29      json.encodeToString(ChatRequest.serializer(), request)
30    )
31  )
32  .build()
33
34val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
35
36// Verify response
37response.statusCode() shouldBe 200
38val chatResponse = json.decodeFromString<ChatResponse>(response.body())
39chatResponse.message.content shouldBe "Hello, how can I help you today?"
40chatResponse.model shouldBe "llama3"
41chatResponse.done shouldBe true

Embeddings API

Let's simulate Ollama's Embeddings API:

 1// Define mock response for a single string input
 2val embeddings = listOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f))
 3
 4ollama.embed {
 5  model = "llama3"
 6  stringInput = "The sky is blue"
 7} responds {
 8  embeddings(embeddings)
 9  delay = 42.milliseconds
10}
11
12// Create request
13val request = EmbeddingsRequest(
14  model = "llama3",
15  input = listOf("The sky is blue"),
16  options = ModelOptions(temperature = 0.7, topP = 0.9)
17)
18
19// Send request to mock server
20val httpRequest = HttpRequest.newBuilder()
21  .uri(URI.create("${ollama.baseUrl()}/api/embed"))
22  .header("Content-Type", "application/json")
23  .POST(
24    HttpRequest.BodyPublishers.ofString(
25      json.encodeToString(EmbeddingsRequest.serializer(), request)
26    )
27  )
28  .build()
29
30val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
31
32// Verify response
33response.statusCode() shouldBe 200
34val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
35embedResponse.embeddings shouldBe embeddings
36embedResponse.model shouldBe "llama3"

You can also mock embeddings for a list of strings:

 1// Define mock response for multiple string inputs
 2val embeddings = listOf(
 3  listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f),
 4  listOf(0.6f, 0.7f, 0.8f, 0.9f, 1.0f)
 5)
 6
 7ollama.embed {
 8  model = "llama3"
 9  stringListInput = listOf("The sky is blue", "The grass is green")
10} responds {
11  embeddings(embeddings)
12  delay = 42.milliseconds
13}
14
15// Create request
16val request = EmbeddingsRequest(
17  model = "llama3",
18  input = listOf("The sky is blue", "The grass is green"),
19  options = ModelOptions(temperature = 0.7, topP = 0.9)
20)
21
22// Send request to mock server
23val httpRequest = HttpRequest.newBuilder()
24  .uri(URI.create("${ollama.baseUrl()}/api/embed"))
25  .header("Content-Type", "application/json")
26  .POST(
27    HttpRequest.BodyPublishers.ofString(
28      json.encodeToString(EmbeddingsRequest.serializer(), request)
29    )
30  )
31  .build()
32
33val response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString())
34
35// Verify response
36response.statusCode() shouldBe 200
37val embedResponse = json.decodeFromString<EmbeddingsResponse>(response.body())
38embedResponse.embeddings shouldBe embeddings
39embedResponse.model shouldBe "llama3"

Streaming Responses

AI-Mocks-Ollama supports streaming responses for both generate and chat endpoints:

 1// Define streaming mock response for generate endpoint
 2ollama.generate {
 3  model = "llama3"
 4  stream = true
 5  userMessageContains("Tell me a story")
 6} respondsStream {
 7  responseChunks = listOf(
 8    "Once upon a time",
 9    " in a land far, far away",
10    " there lived a programmer",
11    " who never had to debug in production."
12  )
13  delayBetweenChunks = 100.milliseconds
14}
15
16// Define streaming mock response for chat endpoint
17ollama.chat {
18  model = "llama3"
19  stream = true
20} respondsStream {
21  responseChunks = listOf(
22    "Hello",
23    ", how can I",
24    " help you today?"
25  )
26  delayBetweenChunks = 100.milliseconds
27}

Request Configuration Options

The following tables list the available configuration options for mocking Ollama API calls.

Generate Request Configuration Options

Option	Description
`model`	The model to match in the request
`prompt`	The prompt to match in the request
`system`	The system message to match in the request
`template`	The template to match in the request
`stream`	Whether to match streaming requests
`requestBodyString`	Adds a string matcher for the request body

Chat Request Configuration Options

Option	Description
`model`	The model to match in the request
`messages`	The messages to match in the request
`stream`	Whether to match streaming requests
`requestBodyString`	Adds a string matcher for the request body
`userMessage`	Adds a user message to match in the request
`systemMessage`	Adds a system message to match in the request

Embed Request Configuration Options

Option	Description
`model`	The model to match in the request
`stringInput`	The string input to match in the request
`stringListInput`	The list of string inputs to match in the request
`truncate`	Whether to truncate the input to fit within context length
`options`	Additional model parameters to match in the request
`keepAlive`	Controls how long the model will stay loaded into memory
`requestBodyString`	Adds a string matcher for the request body

Response Configuration Options

Generate Response Configuration Options

Option	Description	Default Value
`content`	The content to include in the response	`"This is a mock response from Ollama."`
`doneReason`	The reason why generation completed (e.g., "stop", "length")	`"stop"`
`delay`	The delay before sending the response	`Duration.ZERO`

Chat Response Configuration Options

Option	Description	Default Value
`content`	The content to include in the response	`"This is a mock response from Ollama."`
`thinking`	The thinking process of the model	`null`
`toolCalls`	The tool calls to include in the response	`null`
`delay`	The delay before sending the response	`Duration.ZERO`

Embed Response Configuration Options

Option	Description	Default Value
`embeddings`	The embeddings to include in the response	`listOf(listOf(0.1f, 0.2f, 0.3f, 0.4f, 0.5f))`
`embedding`	A single embedding to include in the response	N/A
`model`	The model name to include in the response	`null`
`delay`	The delay before sending the response	`Duration.ZERO`

Streaming Response Configuration Options

Option	Description	Default Value	Availability
`responseFlow`	A flow of content chunks for the streaming response	`null`	Generate & Chat
`responseChunks`	A list of content chunks for the streaming response	`null`	Generate & Chat
`delayBetweenChunks`	The delay between sending chunks	`Duration.ZERO`	Generate & Chat
`doneReason`	The reason why generation completed	`"stop"`	Generate only

Integration Testing

Create a test class with a MockOllama instance to test your Ollama client integration:

 1class MyOllamaTest {
private val ollama = MockOllama()
 3
@Test
fun `Should respond to Chat Completion`() = runTest {
  // Configure mock response
  ollama.chat {
    model = "llama3"
  } responds {
    content("Hello, how can I help you today?")
  }
12
  // Use your Ollama client to make a request and verify the response
}
15}

Integration with LangChain4j

AI-Mocks-Ollama can be used with LangChain4j's Ollama integration:

 1// Create a mock Ollama server
 2val ollama = MockOllama(verbose = true)
 3
 4// Configure mock response
 5ollama.chat {
 6  model = "llama3"
 7} responds {
 8  content("Hello, how can I help you today?")
 9  delay = 42.milliseconds
10}
11
12// Create LangChain4j Ollama client
13val model = OllamaChatModel.builder()
14  .baseUrl(ollama.baseUrl())
15  .modelName("llama3")
16  .temperature(0.7)
17  .topP(0.9)
18  .build()
19
20// Use LangChain4j Kotlin DSL to send a request
21val result = model.chat {
22  messages += userMessage("Hello")
23}
24
25// Verify response
26result.apply {
27  aiMessage().text() shouldBe "Hello, how can I help you today?"
28}

Check for examples in the integration tests.