Using Ollama with Swarms Agents¶
This guide demonstrates how to use Ollama (a tool for running large language models locally) with Swarms agents. You can use Ollama in two ways: the simple model_name approach or a custom wrapper for advanced control.
What is Ollama?¶
Ollama is a tool for running large language models locally. It provides:
- Easy installation and setup
- Support for many open-source models
- Simple API interface
- No API keys required
- Runs entirely on your machine
Prerequisites¶
Install Ollama and the Python client:
# Install Ollama (follow instructions at https://ollama.ai)
# Then install the Python client
pip install ollama
Quick Start: Simple Approach¶
The easiest way to use Ollama is with the model_name parameter. This uses LiteLLM under the hood and follows LiteLLM conventions.
from swarms import Agent
# Initialize the agent with Ollama
agent = Agent(
agent_name="Financial-Analysis-Agent",
model_name="ollama/llama2", # Use ollama/ prefix
system_prompt="Agent system prompt here",
agent_description="Agent performs financial analysis.",
)
# Run a query
agent.run("What are the components of a startup's stock incentive equity plan?")
Available Models¶
You can use any Ollama model with the ollama/ prefix. Here are real models you can use:
# Llama 2 models
agent = Agent(
agent_name="Llama2-7B-Agent",
model_name="ollama/llama2", # 7B version (default)
max_loops=1,
)
agent = Agent(
agent_name="Llama2-13B-Agent",
model_name="ollama/llama2:13b", # 13B version
max_loops=1,
)
# Mistral models
agent = Agent(
agent_name="Mistral-Agent",
model_name="ollama/mistral", # 7B version
max_loops=1,
)
# CodeLlama models
agent = Agent(
agent_name="CodeLlama-Agent",
model_name="ollama/codellama", # 7B version
max_loops=1,
)
agent = Agent(
agent_name="CodeLlama-13B-Agent",
model_name="ollama/codellama:13b", # 13B version
max_loops=1,
)
# Mixtral (mixture of experts)
agent = Agent(
agent_name="Mixtral-Agent",
model_name="ollama/mixtral:8x7b", # 8x7B MoE model
max_loops=1,
)
# Qwen models
agent = Agent(
agent_name="Qwen-Agent",
model_name="ollama/qwen:7b", # 7B version
max_loops=1,
)
# Neural Chat
agent = Agent(
agent_name="NeuralChat-Agent",
model_name="ollama/neural-chat:7b", # 7B version
max_loops=1,
)
Advanced: Custom Wrapper Approach¶
For advanced use cases where you need full control over the Ollama interface, you can create a custom wrapper class. This approach gives you more flexibility and customization options.
Creating a Custom Ollama Wrapper¶
The Agent class accepts an llm parameter that expects an object with a run method that takes a task parameter. Here's how to create a custom wrapper:
Basic Ollama Wrapper¶
import ollama
from typing import Optional, Any, Dict
class OllamaWrapper:
"""
Custom wrapper for Ollama to use with Swarms Agent.
This wrapper implements the required interface for Swarms agents:
- A `run` method that accepts a `task` parameter
- Optional support for additional parameters like `img`, `stream`, etc.
"""
def __init__(
self,
model_name: str = "llama2",
base_url: str = "http://localhost:11434",
temperature: float = 0.7,
top_p: float = 0.9,
top_k: int = 40,
num_predict: int = 2048,
system_prompt: Optional[str] = None,
):
"""
Initialize the Ollama wrapper.
Args:
model_name: Name of the Ollama model to use (e.g., "llama2", "mistral", "codellama")
base_url: Base URL of the Ollama server
temperature: Sampling temperature (0.0 to 1.0)
top_p: Top-p sampling parameter
top_k: Top-k sampling parameter
num_predict: Maximum number of tokens to generate
system_prompt: Optional system prompt to use for all requests
"""
self.model_name = model_name
self.base_url = base_url
self.temperature = temperature
self.top_p = top_p
self.top_k = top_k
self.num_predict = num_predict
self.system_prompt = system_prompt
# Initialize Ollama client
self.client = ollama.Client(host=base_url)
# Verify model is available
self._verify_model()
def _verify_model(self):
"""Verify that the model is available."""
try:
models = self.client.list()
model_names = [model["name"] for model in models["models"]]
if self.model_name not in model_names:
print(f"Warning: Model '{self.model_name}' not found.")
print(f"Available models: {', '.join(model_names)}")
print(f"To pull a model, run: ollama pull {self.model_name}")
except Exception as e:
print(f"Warning: Could not verify model availability: {e}")
def run(
self,
task: str,
img: Optional[str] = None,
temperature: Optional[float] = None,
top_p: Optional[float] = None,
top_k: Optional[int] = None,
num_predict: Optional[int] = None,
system_prompt: Optional[str] = None,
**kwargs
) -> str:
"""
Run inference on the given task.
This method implements the required interface for Swarms agents.
Args:
task: The input prompt/task to process
img: Optional image input (for vision models)
temperature: Override default temperature
top_p: Override default top_p
top_k: Override default top_k
num_predict: Override default num_predict
system_prompt: Override default system prompt
**kwargs: Additional parameters
Returns:
str: The generated text response
"""
# Prepare options
options = {
"temperature": temperature if temperature is not None else self.temperature,
"top_p": top_p if top_p is not None else self.top_p,
"top_k": top_k if top_k is not None else self.top_k,
"num_predict": num_predict if num_predict is not None else self.num_predict,
}
# Prepare messages
messages = []
# Add system prompt if provided
system = system_prompt or self.system_prompt
if system:
messages.append({
"role": "system",
"content": system
})
# Add user message
user_message = {"role": "user", "content": task}
# Add image if provided (for vision models)
if img:
user_message["images"] = [img]
messages.append(user_message)
# Generate response
response = self.client.chat(
model=self.model_name,
messages=messages,
options=options,
)
# Extract and return the generated text
return response["message"]["content"]
def __call__(self, task: str, **kwargs) -> str:
"""
Make the wrapper callable for convenience.
Args:
task: The input prompt/task to process
**kwargs: Additional parameters
Returns:
str: The generated text response
"""
return self.run(task, **kwargs)
When to Use Each Approach¶
| Approach | Use Case | Pros | Cons |
|---|---|---|---|
Simple (model_name) |
Quick setup, standard use cases | Easy to use, no extra code | Less control, limited customization |
| Custom Wrapper | Advanced features, custom logic | Full control, extensible | Requires more code |
Using the Custom Ollama Wrapper with Agent¶
Basic Example¶
from swarms import Agent
from ollama_wrapper import OllamaWrapper # Your custom wrapper
# Initialize the Ollama wrapper
ollama_llm = OllamaWrapper(
model_name="llama2", # Or "mistral", "codellama", etc.
temperature=0.7,
num_predict=2048,
)
# Create agent with the custom LLM
agent = Agent(
agent_name="Ollama-Agent",
agent_description="Agent using Ollama for local inference",
llm=ollama_llm, # Pass the custom wrapper
system_prompt="You are a helpful AI assistant.",
max_loops=1,
)
# Run the agent
response = agent.run("Explain machine learning in simple terms.")
print(response)
Advanced Example with Custom Configuration¶
from swarms import Agent
from ollama_wrapper import OllamaWrapper
# Initialize Ollama with advanced settings
ollama_llm = OllamaWrapper(
model_name="mistral", # Use Mistral model
temperature=0.8,
top_p=0.95,
top_k=50,
num_predict=4096,
system_prompt="You are an expert researcher and analyst.",
)
# Create agent with streaming enabled
agent = Agent(
agent_name="Advanced-Ollama-Agent",
agent_description="High-performance agent with Ollama",
llm=ollama_llm,
system_prompt="You are an expert researcher and analyst.",
max_loops=3,
streaming_on=True,
verbose=True,
)
# Run with a complex task
task = """
Analyze the following research question and provide a comprehensive answer:
What are the key differences between transformer architectures and
recurrent neural networks in natural language processing?
"""
response = agent.run(task)
print(response)
Complete Working Example¶
Here's a complete example that you can run:
"""
Complete example: Using Ollama with Swarms Agent
"""
import ollama
from typing import Optional
from swarms import Agent
class OllamaWrapper:
"""Custom Ollama wrapper for Swarms Agent."""
def __init__(
self,
model_name: str = "llama2",
base_url: str = "http://localhost:11434",
temperature: float = 0.7,
top_p: float = 0.9,
num_predict: int = 2048,
system_prompt: Optional[str] = None,
):
"""Initialize Ollama wrapper."""
self.model_name = model_name
self.base_url = base_url
self.temperature = temperature
self.top_p = top_p
self.num_predict = num_predict
self.system_prompt = system_prompt
# Initialize Ollama client
print(f"Connecting to Ollama at {base_url}")
self.client = ollama.Client(host=base_url)
# Verify model
self._verify_model()
def _verify_model(self):
"""Verify that the model is available."""
try:
models = self.client.list()
model_names = [model["name"] for model in models["models"]]
if self.model_name not in model_names:
print(f"Warning: Model '{self.model_name}' not found.")
print(f"Available models: {', '.join(model_names)}")
print(f"To pull a model, run: ollama pull {self.model_name}")
except Exception as e:
print(f"Warning: Could not verify model availability: {e}")
def run(self, task: str, **kwargs) -> str:
"""Run inference on task."""
options = {
"temperature": kwargs.get("temperature", self.temperature),
"top_p": kwargs.get("top_p", self.top_p),
"num_predict": kwargs.get("num_predict", self.num_predict),
}
messages = []
# Add system prompt if provided
system = kwargs.get("system_prompt", self.system_prompt)
if system:
messages.append({
"role": "system",
"content": system
})
# Add user message
messages.append({
"role": "user",
"content": task
})
# Generate response
response = self.client.chat(
model=self.model_name,
messages=messages,
options=options,
)
return response["message"]["content"]
def main():
"""Main function to run the example."""
# Initialize Ollama wrapper
ollama_llm = OllamaWrapper(
model_name="llama2", # Change to your preferred model
temperature=0.7,
num_predict=1024,
)
# Create agent
agent = Agent(
agent_name="Research-Agent",
agent_description="Agent for research and analysis tasks",
llm=ollama_llm,
system_prompt="You are a helpful research assistant.",
max_loops=1,
verbose=True,
)
# Run agent
task = "What are the main advantages of using Ollama for local LLM inference?"
print("Running agent...")
response = agent.run(task)
print("\nResponse:")
print(response)
if __name__ == "__main__":
main()
Using Different Ollama Models¶
Ollama supports many models. Here are examples with different models:
Mistral Model¶
ollama_llm = OllamaWrapper(
model_name="mistral", # Real Mistral 7B model
temperature=0.8,
)
agent = Agent(
agent_name="Mistral-Agent",
llm=ollama_llm,
max_loops=1,
)
CodeLlama Model¶
ollama_llm = OllamaWrapper(
model_name="codellama", # Real CodeLlama 7B model
temperature=0.2, # Lower temperature for code generation
)
agent = Agent(
agent_name="Code-Agent",
llm=ollama_llm,
system_prompt="You are an expert programmer.",
max_loops=1,
)
# Or use the 13B version for better code quality
ollama_llm = OllamaWrapper(
model_name="codellama:13b", # Real CodeLlama 13B model
temperature=0.2,
)
Llama 2 Model Variants¶
# Llama 2 7B (default, fastest)
ollama_llm = OllamaWrapper(
model_name="llama2", # Real Llama 2 7B model
temperature=0.7,
)
# Llama 2 13B (better quality)
ollama_llm = OllamaWrapper(
model_name="llama2:13b", # Real Llama 2 13B model
temperature=0.7,
)
# Llama 2 70B (best quality, requires more memory)
ollama_llm = OllamaWrapper(
model_name="llama2:70b", # Real Llama 2 70B model
temperature=0.7,
)
Mixtral Model (Mixture of Experts)¶
ollama_llm = OllamaWrapper(
model_name="mixtral:8x7b", # Real Mixtral 8x7B MoE model
temperature=0.7,
)
agent = Agent(
agent_name="Mixtral-Agent",
llm=ollama_llm,
max_loops=1,
)
Qwen Model¶
ollama_llm = OllamaWrapper(
model_name="qwen:7b", # Real Qwen 7B model
temperature=0.7,
)
agent = Agent(
agent_name="Qwen-Agent",
llm=ollama_llm,
max_loops=1,
)
Streaming Support¶
You can extend the wrapper to support streaming:
def run_stream(
self,
task: str,
callback: Optional[callable] = None,
**kwargs
):
"""
Run inference with streaming support.
Args:
task: Input prompt
callback: Optional callback function for each chunk
**kwargs: Additional parameters
Yields:
str: Streaming text chunks
"""
options = {
"temperature": kwargs.get("temperature", self.temperature),
"top_p": kwargs.get("top_p", self.top_p),
"num_predict": kwargs.get("num_predict", self.num_predict),
}
messages = [{"role": "user", "content": task}]
stream = self.client.chat(
model=self.model_name,
messages=messages,
options=options,
stream=True,
)
for chunk in stream:
content = chunk["message"]["content"]
if callback:
callback(content)
yield content
Vision Models¶
Ollama supports vision models. Here's how to use them:
class OllamaVisionWrapper(OllamaWrapper):
"""Extended wrapper for vision models."""
def run(
self,
task: str,
img: Optional[str] = None,
**kwargs
) -> str:
"""Run inference with optional image input."""
options = {
"temperature": kwargs.get("temperature", self.temperature),
"top_p": kwargs.get("top_p", self.top_p),
"num_predict": kwargs.get("num_predict", self.num_predict),
}
messages = [{"role": "user", "content": task}]
# Add image if provided
if img:
# Read image file and encode as base64
import base64
with open(img, "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
messages[0]["images"] = [image_data]
response = self.client.chat(
model=self.model_name, # Use vision model like "llava"
messages=messages,
options=options,
)
return response["message"]["content"]
# Usage with vision
vision_llm = OllamaVisionWrapper(
model_name="llava", # Vision model
)
agent = Agent(
agent_name="Vision-Agent",
llm=vision_llm,
max_loops=1,
)
response = agent.run(
"Describe this image in detail.",
img="path/to/image.jpg"
)
Best Practices¶
1. Model Selection¶
Choose the right model for your task:
- llama2 or llama2:13b: General purpose, good balance (7B or 13B)
- mistral: Fast and efficient (7B)
- codellama or codellama:13b: Best for code generation (7B or 13B)
- mixtral:8x7b: High quality, mixture of experts (8x7B)
- llava or llava:13b: For vision tasks (7B or 13B)
- qwen:7b or qwen:14b: Good multilingual support (7B or 14B)
- phi or phi:2.7b: Small and fast (2.7B)
- gemma:2b or gemma:7b: Google's efficient models (2B or 7B)
2. Temperature Settings¶
# For creative tasks
ollama_llm = OllamaWrapper(temperature=0.9)
# For factual tasks
ollama_llm = OllamaWrapper(temperature=0.3)
# For code generation
ollama_llm = OllamaWrapper(temperature=0.2)
3. Error Handling¶
Add error handling to your wrapper:
def run(self, task: str, **kwargs) -> str:
"""Run inference with error handling."""
try:
# ... existing code ...
response = self.client.chat(...)
return response["message"]["content"]
except Exception as e:
print(f"Error in Ollama inference: {e}")
raise
Troubleshooting¶
Common Issues¶
- Model Not Found: Pull the model first:
ollama pull llama2 - Connection Error: Ensure Ollama is running:
ollama serve - Out of Memory: Use a smaller model or reduce
num_predict - Slow Inference: Use a faster model like
mistralor reduce model size
Debug Tips¶
# Test Ollama connection
import ollama
client = ollama.Client()
models = client.list()
print("Available models:", [m["name"] for m in models["models"]])
# Test the wrapper directly
ollama_llm = OllamaWrapper(model_name="llama2") # Real Llama 2 7B model
response = ollama_llm.run("Test prompt")
print(response)
# Or test with a different model
ollama_llm = OllamaWrapper(model_name="mistral") # Real Mistral 7B model
response = ollama_llm.run("Test prompt")
print(response)
Real Ollama Models Reference¶
Common Ollama models you can use (all are real, available models):
| Model | Description | Pull Command |
|---|---|---|
llama2 |
Meta's Llama 2 (7B default) | ollama pull llama2 |
llama2:13b |
Llama 2 13B version | ollama pull llama2:13b |
llama2:70b |
Llama 2 70B version | ollama pull llama2:70b |
mistral |
Mistral AI 7B model | ollama pull mistral |
codellama |
Code-focused Llama 7B | ollama pull codellama |
codellama:13b |
CodeLlama 13B | ollama pull codellama:13b |
codellama:34b |
CodeLlama 34B | ollama pull codellama:34b |
mixtral:8x7b |
Mixtral 8x7B MoE model | ollama pull mixtral:8x7b |
llava |
Vision-language model 7B | ollama pull llava |
llava:13b |
LLaVA 13B version | ollama pull llava:13b |
phi |
Microsoft's Phi-2 (2.7B) | ollama pull phi |
phi:2.7b |
Phi-2 2.7B | ollama pull phi:2.7b |
gemma:2b |
Google's Gemma 2B | ollama pull gemma:2b |
gemma:7b |
Google's Gemma 7B | ollama pull gemma:7b |
qwen:7b |
Qwen 7B model | ollama pull qwen:7b |
qwen:14b |
Qwen 14B model | ollama pull qwen:14b |
neural-chat:7b |
Neural Chat 7B | ollama pull neural-chat:7b |
starling-lm:7b |
Starling LM 7B | ollama pull starling-lm:7b |
orca-mini:3b |
Orca Mini 3B | ollama pull orca-mini:3b |
Pull a model with: ollama pull <model_name>
For example:
Conclusion¶
Using Ollama with Swarms agents provides:
- Local Deployment: Run models entirely on your machine
- No API Keys: No need for external API keys
- Privacy: All data stays local
- Full Control: Customize the interface to your needs (with custom wrapper)
- Easy Setup: Simple installation and configuration
This approach is ideal for:
- Development and testing
- Privacy-sensitive applications
- Offline deployments
- Cost-effective local inference