- [Overview](#overview)
Ollama Pipe Class API
Table of Contents
Overview
The OllamaPipe class provides a TPipe abstraction for interacting with local Ollama instances. It supports multimodal inputs, streaming, reasoning extraction (specifically for models like DeepSeek-R1), and native tool calling.
class OllamaPipe : Pipe()
Public Functions
Server Configuration
setIP(ip: String): OllamaPipe
Sets the IP address of the Ollama server. Defaults to 127.0.0.1.
setPort(port: Int): OllamaPipe
Sets the port number of the Ollama server. Defaults to 11434.
init(): Pipe
Initializes the pipe. Checks if the Ollama server is running; if not, attempts to start it in the background using ollama serve.
API Mode
useChatApi(): OllamaPipe
Switches to the modern /api/chat endpoint. This is the default and enables conversation history and native tool calling.
useLegacyApi(): OllamaPipe
Switches to the legacy /api/generate endpoint.
Inference Settings
setKeepAlive(duration: String): OllamaPipe
Sets how long the model stays loaded in memory after a request.
5m: 5 minutes (default)1h: 1 hour0: Unload immediately-1: Keep loaded indefinitely
enableThink(): OllamaPipe
Enables reasoning extraction for models that use <think> tags (e.g., DeepSeek-R1). Extracted reasoning is populated in MultimodalContent.modelReasoning.
setMinP(minP: Float): OllamaPipe
Sets the minimum probability threshold for token sampling.
setTypicalP(typicalP: Float): OllamaPipe
Sets the typical probability for sampling.
setMirostat(mode: Int, eta: Float? = null, tau: Float? = null): OllamaPipe
Enables and configures Mirostat sampling.
mode: 0 (off), 1 (Mirostat), 2 (Mirostat 2.0)
setRepeatLastN(n: Int): OllamaPipe
Sets how many tokens back to look for repeat penalty.
setPenalizeNewline(penalize: Boolean): OllamaPipe
Controls whether the model is penalized for generating newlines.
Resource Management
setGpuSettings(numGpu: Int, mainGpu: Int? = null): OllamaPipe
Configures GPU offloading.
numGpu: Number of layers to offload to GPU.mainGpu: The ID of the main GPU to use.
setNumThread(numThread: Int): OllamaPipe
Sets the number of CPU threads to use for generation.
setBatchSize(batchSize: Int): OllamaPipe
Sets the prompt processing batch size.
setNumCtx(numCtx: Int): OllamaPipe
Sets the model’s context window size (tokens).
setLowVram(lowVram: Boolean): OllamaPipe
Enables low VRAM mode for limited hardware.
setNuma(useNuma: Boolean): OllamaPipe
Enables NUMA optimization on supported systems.
Streaming
enableStreaming(callback: (suspend (String) -> Unit)? = null, showReasoning: Boolean = false, streamReasoning: Boolean = true): OllamaPipe
Enables real-time streaming of responses.
callback: Suspendable function receiving text chunks.showReasoning: Propagates streaming to reasoning pipes.streamReasoning: Whether to emit reasoning chunks to the callback.
setStreamingCallback(callback: suspend (String) -> Unit): OllamaPipe
Registers a single streaming callback and enables streaming.
streamingCallbacks(builder: StreamingCallbackBuilder.() -> Unit): OllamaPipe
Configures multiple streaming callbacks using a DSL.
Ollama-Specific Options
OllamaPipe automatically maps standard TPipe parameters (temperature, topP, topK, maxTokens, stopSequences, repetitionPenalty) to the equivalent Ollama options. Advanced options listed above allow for fine-tuned control over the local runner environment.