Local AI Models
Chucki supports running AI models locally on your own infrastructure, whether for privacy, offline usage, or custom model deployments. Local models work for both Agent Chat and Inline Completions across all license tiers.
Overview
Local AI models run directly within your development environment — on your machine or network infrastructure. This gives you:
- Maximum Privacy: No code or conversations leave your machine
- Offline Capability: Work without internet connectivity
- Full Control: Complete autonomy over data and model execution
- Cost Optimization: No per-token API costs for local models
- Customization: Fine-tuning and adaptation to your workflows
Use Cases
Local models are valuable for:
- Privacy-First Development: Sensitive projects requiring on-device AI
- Air-Gapped Environments: Isolated networks without external connectivity
- Cost-Sensitive Teams: Unlimited local model usage without API fees
- Compliance Requirements: Data residency and regulation requirements (HIPAA, GDPR, etc.)
- Offline Development: Work without internet or with unreliable connections
- Custom Models: Deploy fine-tuned models tailored to your codebase
Where Local Models Work
Local models are fully supported in Chucki:
- Agent Chat: Use local models for all chat-based interactions
- Inline Completions: Get inline suggestions with local models (Ghost Text or CodeInsight)
All license tiers can add local models as an upgrade.
Configuration
Setup
Local models are configured in Tools -> Options -> Chucki:

- Navigate to Chat settings
- Under Local Models, click Add
- Enter your local model server URL (e.g.,
http://localhost:1234/v1) - Click Check Connection to verify and load available models
- Select a model from the Name dropdown
- Save the configuration
Context Window: Make sure your local chat model supports a minimum context window of approximately 14K tokens. Larger context windows (32K or more) are recommended for better handling of extended conversations and larger code snippets.
For inline completions, similar settings are available under Completions. While a smaller context window is acceptable for inline completions, a larger one will help generate better suggestions by considering more surrounding code.
Supported Model Formats
- OpenAI-compatible APIs: Any model server compatible with OpenAI’s API format (e.g., Ollama, LM Studio, vLLM, Text Generation WebUI)
- LLaMA models: Via Ollama or similar wrappers
- Mistral: Via Ollama or compatible servers
- Open-source quantized models: GGUF and other formats supported by compatible servers
Recommended Tools
Popular options for running local models:
- Ollama (https://ollama.ai) — Simple setup, model management, OpenAI-compatible API
- LM Studio — GUI-based model management with built-in server
- vLLM — High-performance serving for larger models
- Text Generation WebUI — Feature-rich interface with many model options
Model Performance Considerations
- Hardware Requirements: GPU recommended for reasonable performance (NVIDIA, AMD, or Apple Silicon)
- Model Size: Smaller models (7-13B parameters) typically perform best for development
- Latency: Local models may have higher latency than cloud APIs but offer no rate limits
- Memory: Ensure sufficient RAM/VRAM for your chosen model
- Context Window for Agent Chat: Your local model should support a context size of at least ~14K tokens. Larger context windows (32K or more) are recommended for handling longer conversations and larger code blocks.
- Context Window for Inline Completions: A smaller context window is acceptable for inline completions, but larger context windows are recommended to generate better suggestions based on more code context.
Enterprise Deployment
For enterprise deployments and fine-tuning services:
- Model Customization: Fine-tune models on your codebase and domain
- Infrastructure Support: Assistance with setup and optimization for your environment
- Professional Services: Custom integrations and deployment consulting
- Bulk Licensing: Site licenses with local model support across multiple machines
Contact our enterprise team for specialized deployment scenarios.
Related Features
- Agent Chat — Learn about agent-powered chat
- Ghost Text — Inline completions with local models
- CodeInsight — CodeInsight with local models
- Settings — Configure models and completions