Section 1
The Problem with Cloud-First AI
The artificial intelligence industry has converged on a cloud-delivery model as its primary distribution mechanism. Users access language models through web interfaces, mobile applications, and APIs — all of which route every prompt and every response through remote servers operated by a small number of large corporations. This architecture introduces several structural problems that disproportionately affect individual users, small organizations, and users in regions with different regulatory environments.
Every time you type a thought into a cloud AI assistant, you are not just using a service — you are contributing training data, revealing behavioral patterns, and accepting terms that can change without your consent.
Subscription lock-in is the first and most visible problem. Cloud AI access is priced by token, by month, or by API call. The cost structure is designed to make sustained, professional use dependent on continuous payment. Models that were free in beta become paid; pricing tiers change; capabilities are gated behind premium plans. A professional who integrates cloud AI into their daily workflow becomes economically dependent on a vendor whose incentives may not align with the user's continuity of access.
Data exposure is the second and more serious problem. When a prompt is sent to a cloud API, it leaves the user's machine and traverses the internet to a remote server. Even with encryption in transit, the receiving server processes the plaintext prompt. Terms of service for many AI providers explicitly reserve the right to use interactions for model improvement, safety monitoring, and abuse detection. For users handling confidential information — medical records, legal documents, business strategy, personal correspondence — this creates an unacceptable disclosure risk that may also carry regulatory consequences (GDPR, HIPAA, attorney-client privilege).
Censorship and policy enforcement is the third problem. Cloud models are subject to the content policies of the companies that host them. These policies evolve over time, vary by region, and can create unexpected blocks for legitimate professional or creative use. A researcher studying extremist rhetoric, a security professional testing prompt injections, a novelist exploring dark themes, or simply a user asking a politically charged question may encounter refusals that have nothing to do with the user's actual intent. When the model runs locally, the user and their system prompt define the behavioral envelope.
Infrastructure dependency is the fourth problem. Cloud AI services go down, introduce regressions in new model versions, or deprecate models entirely. A workflow built around GPT-4 may break when the model is replaced with a successor that behaves differently. Local models, once downloaded, are stable: the same model weights produce the same outputs indefinitely, independent of network connectivity or vendor decisions.
Section 2
Open-Source Models as the Foundational Layer
MIKA5 is built on the premise that open-source language models are now capable enough to handle the majority of real-world AI tasks, and that this capability will only increase over time. The benchmarks support this position: models like Qwen2.5 72B, DeepSeek-R1 70B, and Llama 3.3 70B perform comparably to GPT-4 on most standardized evaluations, and in specific domains — reasoning under constrained compute, multilingual understanding, and code generation — often surpass it.
Open-source AI is not a compromise. It is a choice with distinct advantages beyond privacy. Open models can be inspected — their weights are public and their architecture is documented. They can be fine-tuned on domain-specific data without vendor approval. They can be run at any quantization level to trade quality for speed. They can be deployed in air-gapped environments with zero external dependencies. And perhaps most importantly, they cannot be suddenly discontinued, repriced, or fundamentally altered by a commercial decision made in a boardroom that the user has no access to.
The open-source model ecosystem is built on a collaborative infrastructure that no single company controls. Meta releases Llama under a license permitting commercial use. Alibaba releases Qwen with multilingual breadth that outpaces Western models. DeepSeek's reinforcement-learning approach to reasoning achieves frontier performance at open weights. Mistral brings European research and strong European language capabilities. The diversity of this ecosystem provides resilience that no single cloud provider can match.
MIKA5 treats open-source models not as a fallback when cloud access fails, but as the primary interface between human thought and machine capability.
Ollama serves as the runtime layer. It provides a unified API across all GGUF-format models, handles quantization, manages GPU/CPU memory allocation, and exposes a clean HTTP interface that MIKA5 communicates with over localhost. This means the entire compute stack — from user input to AI response — operates within a single machine, with no data leaving the network boundary. Ollama's design also makes it trivial to add new models: pulling a new model is a single command, and it becomes immediately available to MIKA5 without any application code change.
Section 3
System Architecture
MIKA5 is implemented as a cross-platform Electron desktop application following a strict two-process model, supporting Windows, macOS, and Linux from a single codebase. Platform-specific installers are produced by electron-builder: NSIS for Windows, DMG for macOS (Apple Silicon and Intel universal builds), and AppImage for Linux. The main process runs in a Node.js environment with full system access and handles all sensitive operations: database read/write, file system access, AI provider communication, encryption, and RAG indexing. The renderer process runs in a sandboxed Chromium environment and handles only the user interface. All communication between processes occurs through Electron's IPC (Inter-Process Communication) channel via a tightly-scoped contextBridge API.
This architectural separation is a security boundary, not merely a technical convention. The renderer process cannot directly access the file system, cannot make network requests to external services, and cannot access the database. It can only invoke the specific IPC handlers that the main process has registered, each of which validates inputs before processing. This means that even if a malicious actor achieved code execution in the renderer (for example, through a cross-site scripting injection in rendered Markdown), they would have no path to the sensitive data or network capabilities available to the main process.
Architecture Diagram
# User Interface Layer (Renderer — sandboxed)
index.html → app.js → window.mika5.* (contextBridge)
# IPC Boundary (Electron contextBridge + ipcRenderer)
↕ db:*, ai:*, rag:*, appState:*, app:*
# Core Services Layer (Main Process — Node.js)
main.js → db.js (SQLite/better-sqlite3)
→ ai-providers.js (Ollama + Cloud APIs)
→ rag.js (chunking + embeddings + cosine search)
→ ollama.js (HTTP client → localhost:11434)
# Local Runtime
Ollama server → GGUF model weights (GPU/CPU)
SQLite DB → mika5.db (AppData/Roaming/MIKA5)
The database layer uses better-sqlite3, a synchronous SQLite binding that provides ACID guarantees and WAL (Write-Ahead Logging) mode for concurrent read performance. All database operations are synchronous, which eliminates an entire class of race conditions that arise from asynchronous database access. SQLite was chosen over alternatives like IndexedDB (browser-native but no true persistence guarantees) or a networked database (would require a server process and break the offline-first guarantee) because it is a mature, battle-tested, embedded database with a zero-configuration deployment story.
Section 4
Privacy Model
MIKA5's privacy model operates on three principles: local by default, explicit cloud disclosure, and encrypted credential storage.
Local by default means that in its default configuration, MIKA5 makes zero outbound network requests to external servers. All AI inference occurs on the local Ollama instance. All conversation history is stored in a local SQLite database. All document knowledge is indexed and searched locally. There is no telemetry, no analytics, no crash reporting, and no background synchronization. The application does not "phone home" in any form unless the user explicitly enables a cloud provider and initiates a conversation that routes through it.
Explicit cloud disclosure means that when a user activates a cloud provider (OpenAI, Anthropic, Groq, or Moonshot), the UI visually changes to reflect this: the privacy indicator changes from a green "Private" badge to an orange "Cloud" badge; a warning banner appears above the input area; and the toolbar text shows the active provider name. This disclosure is persistent and cannot be accidentally missed. The design philosophy is that cloud mode should feel "opted into" rather than transparent — users should always know when their prompts are leaving their machine.
Encrypted credential storage means that API keys are never stored in plaintext. When a user saves an API key, it is encrypted using AES-256-GCM before being written to the SQLite database. The encryption key is derived using two layers of protection: a per-installation salt file (stored in the application's userData directory) and Electron's safeStorage API, which ties decryption to the operating system's credential store (Windows DPAPI on Windows, Keychain on macOS). This means that even if an attacker extracts the database file, they cannot decrypt the API keys without also having access to the specific machine's OS credential store. On systems where safeStorage is unavailable, the system falls back to a salt-based key derivation that still ensures the API keys are not human-readable in the database file.
The cloud is not prohibited — it is disclosed. MIKA5 users who want access to frontier models can have it, with full transparency about when and how their data is being transmitted.
MIKA5 also includes a basic PII (Personally Identifiable Information) sanitization step for cloud requests. When cloud mode is active, email addresses, phone numbers, and long numeric sequences (potential ID numbers) in prompts are replaced with placeholder tokens ([EMAIL], [PHONE], [ID]) before transmission. This is a defense-in-depth measure and does not replace proper data handling policies, but it reduces the risk of accidental disclosure of common PII patterns when cloud providers are used.
Section 5
The RAG Engine — Knowledge-Grounded Inference
Retrieval-Augmented Generation (RAG) addresses one of the most significant limitations of language models: they are trained on static datasets with a knowledge cutoff, and they have no access to information created after that cutoff or to documents that were not in their training data. For professional use — analyzing your own documents, referencing your company's internal knowledge, working with recent research — a model without RAG is fundamentally constrained.
MIKA5's RAG engine operates entirely locally. When a user uploads a document to the Knowledge Base, the following pipeline executes in the background:
1. Chunking. The document text is split into overlapping chunks of 4,200 characters with a 700-character overlap. The overlap ensures that concepts that span chunk boundaries are not lost — a sentence that starts near the end of one chunk will also appear near the beginning of the next. This chunking strategy was chosen based on empirical performance across diverse document types and is configurable in the source code via the CHUNK_CHARS and OVERLAP_CHARS constants.
2. Embedding. Each chunk is passed to Ollama's embedding endpoint using the nomic-embed-text model (or another installed embedding model), which converts the text into a 768-dimensional floating-point vector. This vector captures the semantic content of the chunk — two chunks with similar meaning will have similar vectors, even if they use different words.
3. Storage. The chunk text and its embedding vector are stored in the knowledge_chunks table in SQLite. The vector is stored as a binary blob (serialized Float32 array). A foreign key links each chunk to its parent knowledge_items row, and an optional chat_id column scopes chunks to specific chat conversations for per-chat knowledge isolation.
4. Query. When the user sends a message, the query text is embedded using the same embedding model. The resulting query vector is compared against all stored chunk vectors using cosine similarity (computed as a dot product of unit-normalized vectors). The top-k (default: 3) most similar chunks are retrieved and injected into the conversation as a system message before the AI model is called. This grounds the model's response in the actual content of the user's documents rather than its training data.
RAG Pipeline
User uploads document
→ chunkText(doc, 4200 chars, 700 overlap)
→ ollama.embed(chunk) → Float32[768]
→ SQLite: knowledge_chunks (text, embedding, chat_id)
User sends message
→ ollama.embed(query) → Float32[768]
→ cosineSimilarity(query, all chunks) → top-3
→ system message: "Context:\n{chunk1}\n{chunk2}\n{chunk3}"
→ AI model receives grounded context → better answer
The RAG engine maintains an in-memory cache of embedding vectors per project, with a 5-minute TTL. This means that after the first query in a project, subsequent queries do not need to load embeddings from SQLite — they use the cached vectors for cosine similarity computation. This provides a significant performance improvement for interactive use while ensuring that newly added documents are incorporated after the cache expires or on the next app start.
Knowledge isolation is a first-class feature. Documents uploaded within a specific chat conversation are tagged with that conversation's ID and are only retrieved during queries in that same conversation. Project-level documents are retrieved for all chats within that project. Quick Chat has its own isolated knowledge base. This design allows users to build specialized knowledge contexts for different workflows without cross-contamination.
Section 6
Provider Architecture — Extensible AI Backends
MIKA5's AI provider system is designed around a uniform interface that abstracts over fundamentally different backends: a local Ollama server, multiple cloud API providers, and a llama.cpp server. Each provider implements three methods: chat(), listModels(), and status(). The renderer only ever calls window.mika5.ai.chat() — it is agnostic to whether the underlying call goes to Ollama, OpenAI, or any other provider.
This architecture has two important properties. First, it is easy to add new providers without changing any renderer code — only a new entry in the PROVIDERS object in ai-providers.js is required. Second, it ensures that privacy-sensitive routing decisions happen entirely in the main process, where they can be audited and controlled without the user interface having any influence over them.
Current providers:
Ollama (local) — The primary provider. Routes all requests to the local Ollama HTTP server on port 11434. Model capabilities (including vision support) are probed via the /api/show endpoint, which returns model metadata including architecture information. A name-based fallback handles models that don't respond to the show endpoint.
OpenAI — Routes to the OpenAI REST API. Filters the model list to GPT-family chat models. Includes a 2-minute timeout for long-form generation. Vision capabilities are supported for GPT-4o and GPT-4-turbo models.
Anthropic — Routes to the Anthropic Messages API. Because Anthropic's API requires a single top-level system field rather than multiple system messages in the conversation array, the provider merges all system messages (project context, RAG context) into a single concatenated string before transmission. It also enforces Anthropic's strict user/assistant alternation requirement by merging consecutive same-role messages.
Groq — Routes to Groq's OpenAI-compatible API endpoint. Groq provides hardware-accelerated inference that delivers dramatically higher token throughput than most alternatives. The model list is fetched dynamically from Groq's /models endpoint so that newly available models are automatically shown without an app update.
Moonshot (Kimi) — Routes to the Moonshot AI API. Kimi's models feature extremely long context windows (up to 256k tokens), making them suitable for whole-document analysis tasks that exceed the context limits of other providers.
llama.cpp — Routes to a locally-running llama.cpp server, which exposes an OpenAI-compatible API on port 8080. This provider enables users who prefer llama.cpp's GGUF runtime over Ollama to use MIKA5 without Ollama installed. The provider auto-detects whether the server is running via a 3-second probe at startup.
Section 7
Data Model — SQLite Schema
All user data is stored in a single SQLite database file located at %AppData%\Roaming\MIKA5\mika5.db on Windows. The schema is organized around five core tables:
projects — The top-level organizational unit. Each project has a name, description, optional icon and color for visual organization, and a system_prompt field that defines the AI's behavior within that project. A special project with ID quick-chat serves as the default project for conversations that don't belong to a user-defined project.
chats — Individual conversation threads, each belonging to a project. Chats have a title (auto-generated from the first user message), a system_prompt for per-chat instruction overrides, and a creation timestamp. Foreign key constraints with ON DELETE CASCADE ensure that deleting a project automatically deletes all its chats.
messages — Individual messages within chats. Each message stores the role (user/assistant/system), content text, the model that generated it, optional base64-encoded images, and an optional reply_to_id for threaded context. The model field enables model tracking — the UI shows which model answered each message and highlights model changes within a conversation.
knowledge_items — Metadata records for uploaded documents. Stores the title, MIME type, raw content, and an optional chat_id for per-chat scoping. The index_status field tracks whether the document has been embedded (pending → indexing → ready). Documents that fail to index (e.g., because no embedding model is available) enter an error state and can be retried.
knowledge_chunks — The core of the RAG engine. Each row stores a text chunk and its binary embedding vector. Chunks are linked to their parent knowledge_item and optionally to a specific chat. When a project or chat is deleted, cascading deletes automatically remove all associated chunks.
The database runs in WAL (Write-Ahead Log) mode, which allows concurrent reads during writes — important for the background indexing pipeline that runs alongside interactive chat. Foreign key enforcement is enabled at the connection level, preventing orphaned records. All writes are wrapped in transactions to maintain consistency in the event of unexpected process termination.
Section 8
Security Design
Security in MIKA5 operates across four domains: process isolation, input validation, credential protection, and network boundary enforcement.
Process isolation is provided by Electron's two-process model with contextIsolation: true and nodeIntegration: false. The renderer process runs in a browser-equivalent security context with no Node.js access. The contextBridge API is the only communication channel between processes and exposes only the specific operations needed by the UI.
Input validation is applied at the IPC layer before any database or filesystem operation. Project IDs are validated against a UUID regex pattern to prevent path traversal attacks (a malicious project ID like ../../etc/passwd would be rejected before it could reach any file operation). The app_state table uses an allowlist of permitted keys, preventing arbitrary data injection through the state API. Knowledge item project IDs are validated against the UUID pattern, and any ID containing path traversal characters is rejected with an INVALID_NAME error code.
Credential protection uses a two-layer scheme. The outer layer is Electron's safeStorage API, which on Windows wraps DPAPI (Data Protection API) — a system service that ties encryption to the user's Windows login credentials. Even if an attacker extracts the database file from disk, decryption requires the same OS user session. The inner layer is AES-256-GCM encryption with a per-installation salt, providing a fallback on systems where safeStorage is unavailable and an additional layer of protection on systems where it is available. Authentication tags from GCM mode prevent silent decryption failures from being mistaken for valid data.
Network boundary enforcement relies on the main process as the sole actor with network access. The renderer cannot make fetch requests to external hosts — all such calls go through IPC to the main process, which enforces provider-specific routing and timeout policies. Outbound URL handling is secured through setWindowOpenHandler, which intercepts all window.open and link-click events in the renderer and routes external URLs through the OS's default browser rather than opening new Electron windows with web content.
Markdown rendering uses marked.js with sanitization to prevent cross-site scripting via AI-generated responses containing HTML. Code blocks are syntax-highlighted using highlight.js, which operates on the AST level rather than raw HTML injection. User-supplied HTML in knowledge documents is not rendered — only the extracted text is used for embedding and retrieval.
Section 9
llama.cpp Integration — A Second Local Runtime
While Ollama provides the primary local inference runtime for MIKA5, there is a meaningful segment of users who prefer llama.cpp directly — power users who want fine-grained control over quantization parameters, hardware acceleration settings, batch sizes, and model loading behavior that Ollama's managed interface abstracts away. llama.cpp also has slightly lower overhead for users who run a single model permanently rather than switching between models.
llama.cpp's server mode exposes an OpenAI-compatible HTTP API on http://127.0.0.1:8080/v1, implementing the standard /v1/chat/completions and /v1/models endpoints. This compatibility means MIKA5's llama.cpp provider can use the same request format as the OpenAI provider, without any model-specific adaptation code.
Technical viability assessment: The integration is technically straightforward. MIKA5's provider abstraction is designed exactly for this — adding llama.cpp requires only a new entry in the PROVIDERS dictionary with appropriate chat(), listModels(), and status() implementations. The llama.cpp provider has been implemented in v0.1.0 and is accessible by selecting "llama.cpp (Local)" in the provider dropdown.
To use llama.cpp with MIKA5:
# Download and build llama.cpp
$ git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
$ make # or cmake for GPU support
# Download a GGUF model (example: Qwen2.5 7B Q4)
$ wget https://huggingface.co/Qwen/Qwen2.5-7B-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
# Start the server (GPU with CUDA)
$ ./llama-server -m qwen2.5-7b-instruct-q4_k_m.gguf --port 8080 -ngl 99
# In MIKA5: select "llama.cpp (Local)" from provider dropdown
✓ MIKA5 auto-detects the running server and lists available models
Key differences from the Ollama provider: llama.cpp does not manage model downloads — the user is responsible for obtaining GGUF files and specifying the correct path when starting the server. Vision models require a special multimodal llama.cpp server build with clip support. The -ngl flag controls GPU layer offloading — setting it to 99 offloads all layers to GPU, while setting it to 0 forces CPU-only inference. These tradeoffs give power users full control over the inference stack that Ollama's managed interface does not expose.
Section 10
Roadmap
MIKA5 v0.1.0 represents a functional foundation, not a complete product. Cross-platform support (Windows, macOS, and Linux) has been completed and is available as of v0.1.0, with a CI/CD pipeline via GitHub Actions that automatically builds all three platform installers on every version tag push. The following capabilities are planned for future releases:
Streaming responses. Token-by-token streaming delivery — where tokens appear progressively as they are generated — provides a significantly better user experience for long responses and allows users to begin reading and interrupt generation early. Implementation requires switching the IPC architecture from a single invoke to a stream of send events.
Drag-and-drop file upload. Direct file ingestion into the knowledge base via drag-and-drop, eliminating the need to navigate file dialogs for document import workflows.
Agent / tool use. Structured output and function calling support would allow MIKA5 to integrate with user-defined tools — web search, calculator, file system access — creating an agent loop where the AI can take actions and observe results. This is architecturally feasible with the current main-process design but requires careful security design for the tool execution sandbox.
BM25 hybrid retrieval. The current RAG engine uses pure cosine similarity (dense retrieval). Hybrid retrieval combines dense vector search with sparse keyword search (BM25) to improve retrieval quality, particularly for named entities, technical terms, and cases where the user's query phrasing differs significantly from the document's phrasing. SQLite's FTS5 full-text search extension makes this implementable without adding a separate search library.
Model management UI. A built-in interface for browsing, downloading, and managing Ollama models without leaving the application. Currently, model management requires using the Ollama CLI.
Encrypted knowledge export/import. The ability to export a project's knowledge base (documents and embeddings) and import it on another machine, with optional encryption for sensitive knowledge bases.
Multi-model routing. Automatic routing of different types of queries to specialized models — for example, sending code questions to a code-specialized model and math questions to a reasoning model — based on query classification.
Cloud sync (optional). An opt-in synchronization layer for users who work across multiple machines, allowing encrypted replication of projects, conversations, and knowledge bases via a user-controlled cloud backend. Cloud sync will always be explicitly disclosed and never enabled by default.
Plugin / extension system. A sandboxed plugin architecture that allows third-party developers to extend MIKA5 with custom tools, providers, and UI panels without modifying core application code.
Mobile companion app. A lightweight mobile client for reviewing conversations, querying knowledge bases, and optionally routing inference to the user's desktop machine running MIKA5, maintaining the local-first principle even on mobile.
Section 11
Conclusion
The central thesis of MIKA5 is that AI should serve the individual, not collect from them. The current generation of open-source language models has crossed a threshold of capability that makes this thesis practically realizable — not as a compromise, but as a genuine choice that, for many users and many tasks, produces better outcomes than cloud alternatives.
By placing open-source models at the core, SQLite as the persistent memory layer, and a carefully scoped RAG engine as the knowledge interface, MIKA5 creates a self-contained AI workstation that is owned entirely by the person using it. The cloud remains accessible as an option — always disclosed, never hidden — for cases where frontier capability justifies the tradeoff.
Sovereignty over one's AI tools is not a luxury. It is the foundation on which professional work, creative exploration, and personal use of AI can be built without dependency on external actors whose incentives may not align with the user's interests.
MIKA5 is free, open-source, and designed to remain so. The model ecosystem it runs on is free and open-source. The database that stores your conversations is a standard file you can open, back up, and migrate without any proprietary tool. Every architectural decision has been made with the goal of giving the user maximum control and minimum dependency on external systems.
This is version 0.1.0. The foundation is in place. The work continues.
Technical References
Ollama — Local AI model runtime · ollama.com
Meta Llama 3 — Open-source language model · ai.meta.com/llama
Alibaba Qwen 2.5 — Multilingual language model series · qwen.readthedocs.io
DeepSeek-R1 — Reinforcement learning reasoning model · deepseek.com
llama.cpp — C++ GGUF inference engine · github.com/ggerganov/llama.cpp
Electron — Cross-platform desktop framework · electronjs.org
SQLite — Embedded relational database · sqlite.org
Nomic Embed Text — Open-source embedding model · nomic.ai
AES-256-GCM — NIST FIPS 197, SP 800-38D authenticated encryption
Electron safeStorage — Windows DPAPI, macOS Keychain wrapper · electronjs.org/docs/api/safe-storage