Local AI Models

Chucki supports running AI models locally on your own infrastructure, whether for privacy, offline usage, or custom model deployments. Local models work for both Agent Chat and Inline Completions across all license tiers.

Overview

Local AI models run directly within your development environment — on your machine or network infrastructure. This gives you:

Maximum Privacy: No code or conversations leave your machine
Offline Capability: Work without internet connectivity
Full Control: Complete autonomy over data and model execution
Cost Optimization: No per-token API costs for local models
Customization: Fine-tuning and adaptation to your workflows

Use Cases

Local models are valuable for:

Privacy-First Development: Sensitive projects requiring on-device AI
Air-Gapped Environments: Isolated networks without external connectivity
Cost-Sensitive Teams: Unlimited local model usage without API fees
Compliance Requirements: Data residency and regulation requirements (HIPAA, GDPR, etc.)
Offline Development: Work without internet or with unreliable connections
Custom Models: Deploy fine-tuned models tailored to your codebase

Where Local Models Work

Local models are fully supported in Chucki:

Agent Chat: Use local models for all chat-based interactions
Inline Completions: Get inline suggestions with local models (Ghost Text or CodeInsight)

All license tiers can add local models as an upgrade.

Configuration

Setup

Local models are configured in Tools -> Options -> Chucki:

Chat Settings

Navigate to Chat settings
Under Local Models, click Add
Enter your local model server URL (e.g., http://localhost:1234/v1)
Click Check Connection to verify and load available models
Select a model from the Name dropdown
Save the configuration

Context Window: Make sure your local chat model supports a minimum context window of approximately 14K tokens. Larger context windows (32K or more) are recommended for better handling of extended conversations and larger code snippets.

For inline completions, similar settings are available under Completions. While a smaller context window is acceptable for inline completions, a larger one will help generate better suggestions by considering more surrounding code.

Supported Model Formats

OpenAI-compatible APIs: Any model server compatible with OpenAI’s API format (e.g., Ollama, LM Studio, vLLM, Text Generation WebUI)
LLaMA models: Via Ollama or similar wrappers
Mistral: Via Ollama or compatible servers
Open-source quantized models: GGUF and other formats supported by compatible servers

Recommended Tools

Popular options for running local models:

Ollama (https://ollama.ai) — Simple setup, model management, OpenAI-compatible API
LM Studio — GUI-based model management with built-in server
vLLM — High-performance serving for larger models
Text Generation WebUI — Feature-rich interface with many model options

Model Performance Considerations

Hardware Requirements: GPU recommended for reasonable performance (NVIDIA, AMD, or Apple Silicon)
Model Size: Smaller models (7-13B parameters) typically perform best for development
Latency: Local models may have higher latency than cloud APIs but offer no rate limits
Memory: Ensure sufficient RAM/VRAM for your chosen model
Context Window for Agent Chat: Your local model should support a context size of at least ~14K tokens. Larger context windows (32K or more) are recommended for handling longer conversations and larger code blocks.
Context Window for Inline Completions: A smaller context window is acceptable for inline completions, but larger context windows are recommended to generate better suggestions based on more code context.

Enterprise Deployment

For enterprise deployments and fine-tuning services:

Model Customization: Fine-tune models on your codebase and domain
Infrastructure Support: Assistance with setup and optimization for your environment
Professional Services: Custom integrations and deployment consulting
Bulk Licensing: Site licenses with local model support across multiple machines

Contact our enterprise team for specialized deployment scenarios.

Agent Chat — Learn about agent-powered chat
Ghost Text — Inline completions with local models
CodeInsight — CodeInsight with local models
Settings — Configure models and completions