how-to ~2 min
Getting Started with Ollama

Get started with TPipe-Ollama for local LLM inference using Ollama runtime with models like Llama 3, Mistral, and DeepSeek-R1.

Getting Started with Ollama

Introduction

TPipe-Ollama provides a powerful, local alternative to cloud-based AI providers. It leverages the Ollama runtime to run high-performance models like Llama 3, Mistral, and DeepSeek-R1 directly on your machine.

Prerequisites

  1. Ollama Installed: Ensure you have Ollama installed and available in your system’s PATH. You can download it from ollama.com.
  2. Models Pulled: Pull the models you want to use before running TPipe.
    ollama pull llama3
    ollama pull deepseek-r1:1.5b

Basic Usage

To get started, create an OllamaPipe instance and configure the model.

import ollamaPipe.OllamaPipe
import com.TTT.Pipe.MultimodalContent
import kotlinx.coroutines.runBlocking

fun main() = runBlocking {
    val pipe = OllamaPipe()
        .setModel("llama3")
        .setTemperature(0.7)
        .setSystemPrompt("You are a helpful local assistant.")

    // Initialize the pipe (starts Ollama server if not running)
    pipe.init()

    val result = pipe.execute("What is the speed of light?")
    println(result.text)
}

Advanced Features

Multimodal Support

OllamaPipe supports multimodal models like llava. You can pass images as base64 strings or byte arrays.

val result = pipe.execute(MultimodalContent(
    text = "Describe this image.",
    binaryContent = mutableListOf(BinaryContent.Bytes(imageBytes))
))

DeepSeek-R1 Reasoning

To extract reasoning from models like DeepSeek-R1 that use <think> tags, enable the thinking mode.

val pipe = OllamaPipe()
    .setModel("deepseek-r1:1.5b")
    .enableThink() // Extracts <think> tags

val result = pipe.execute("Explain quantum entanglement.")
println("Reasoning: ${result.modelReasoning}")
println("Answer: ${result.text}")

Native Tool Calling

TPipe automatically maps PCP tools to Ollama’s native tool calling system.

val pipe = OllamaPipe()
    .setModel("llama3.1") // Tool-calling capable model
    .setPcPContext(myPcpContext)

val result = pipe.execute("Check the current system CPU usage.")
// result.text will contain the tool call JSON

Resource Configuration

For large models or specific hardware, you can configure resource limits.

pipe.setGpuSettings(numGpu = 35) // Offload 35 layers to GPU
    .setNumThread(8)             // Use 8 CPU threads
    .setNumCtx(8192)            // Increase context window

Comparison with BedrockPipe

FeatureBedrockPipeOllamaPipe
API EndpointAWS Bedrock SDK/api/chat (local)
ReasoningreasoningContent<think> Extraction
Tool CallingConverse API ToolsNative Tool Calling
StreamingStream HandlerKtor Async Client
MultimodalBase64 ImagesBase64 Images
Context ManagementS3 / ManagedLocal Memory

Next Steps