Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
sagemathinc
GitHub Repository: sagemathinc/cocalc
Path: blob/master/src/docs/llm.md
10782 views

LLM / AI Integration

This document explains how CoCalc integrates large language models — provider routing, cost tracking, streaming, the Conat messaging bridge, and frontend components.

Overview

CoCalc supports multiple LLM providers through a unified architecture:

  • Server (packages/server/llm/): evaluation engine, provider routing via the Vercel AI SDK, abuse prevention, cost tracking

  • Conat bridge (packages/conat/llm/): request/response messaging with streaming between frontend and server

  • Frontend (packages/frontend/frame-editors/llm/, packages/frontend/client/llm.ts): model selector, inline assistant, cost estimation

  • Types & config (packages/util/db-schema/llm-utils.ts, packages/util/types/llm.ts): model definitions, pricing, validation

┌───────────────┐ ┌───────────────┐ │ Frontend │ │ REST API │ │ LLMClient │ │ /api/v2/llm │ └───────┬───────┘ └───────┬───────┘ │ Conat multiresponse │ HTTP └───────────┬───────────┘ ┌───────▼────────┐ │ Server LLM │ │ evaluate() │ └───────┬────────┘ │ AI SDK ┌───────────┼───────────┐ │ │ │ ┌────▼───┐ ┌───▼───┐ ┌───▼────┐ │ OpenAI │ │Google │ │Anthropic│ ... └─────────┘ └───────┘ └────────┘

Supported Providers

// packages/util/db-schema/llm-utils.ts const SERVICES = [ "openai", "google", "mistralai", "anthropic", "ollama", "custom_openai", "xai", ] as const;
ProviderModel prefixExamples
OpenAIgpt-gpt-4o-8k, gpt-5.2-8k
Googlegemini-gemini-2.5-flash-8k, gemini-3-flash-preview-16k
Anthropicclaude-claude-4-6-sonnet-8k, claude-3-5-sonnet
Mistralmistral-mistral-large, mistral-small
Xaigrok-grok-4-1-fast-non-reasoning-16k, grok-code-fast-1-16k
Ollamaollama-User-configured local models
Custom OpenAIcustom_openai-User-configured endpoints
User-defineduser-user-{service}-{id}

Default priority when auto-selecting: Google -> OpenAI -> Anthropic -> Mistral -> Xai -> Ollama -> Custom OpenAI.

Server-Side Evaluation

Entry Point

packages/server/llm/index.ts — the main evaluate() function:

async function evaluate(opts: ChatOptions): Promise<string> { // 1. Validate model // 2. Check for abuse (rate limits) // 3. Route to provider: // - User-defined → evaluateUserDefinedLLM() // - All others (incl. Ollama) → evaluateWithAI() // 4. Save response to database // 5. Create purchase record (for non-free models) }

AI SDK Unified Handler

packages/server/llm/evaluate.ts — routes all providers (OpenAI, Google, Anthropic, Mistral, Xai, Ollama, Custom OpenAI) through the Vercel AI SDK:

const PROVIDER_CONFIGS = { openai: { createModel: () => openai(model) }, google: { createModel: () => google(model) }, anthropic: { createModel: () => anthropic(model) }, mistralai: { createModel: () => mistral(model) }, xai: { createModel: () => xai(model) }, ollama: { createModel: () => getOllamaModel(model) }, // ... };

Each provider config includes:

  • createModel() — instantiate an AI SDK model instance

  • checkEnabled() — verify API key is configured

  • canonicalModel() — normalize model name

  • supportsCaching — whether the provider supports prompt caching (e.g. Anthropic)

After evaluation, provider metadata is logged for diagnostics — this includes cached token counts (Anthropic) and reasoning token counts (OpenAI, xAI).

Streaming

Streaming uses Conat multiresponse requests:

  1. Frontend sends request to llm.account-{account_id}.api

  2. Server sends chunks with incrementing sequence numbers

  3. Frontend reassembles via stream callback: (output: string | null) => void

  4. null signals completion

User-Defined LLMs

packages/server/llm/user-defined.ts — models configured by individual users:

async function evaluateUserDefinedLLM(opts, account_id) { // 1. Parse model name: "user-{service}-{id}" // 2. Fetch config from accounts.other_settings["userdefined_llm"] // 3. Route to appropriate evaluator with user's API key }

Reasoning & Thinking Tokens

Different providers expose reasoning and caching metadata in different ways via the AI SDK's providerMetadata:

ProviderReasoning tokensCache tokensNotes
OpenAIproviderMetadata.openai.reasoningTokens--o3-mini: ~86% of output can be reasoning tokens
xAIproviderMetadata.openai.reasoningTokens (OpenAI-compatible)--Reasoning count may exceed output count (different methodology)
Anthropic--providerMetadata.anthropic.cacheCreationInputTokens, cacheReadInputTokensExtended thinking exists but not exposed as token counts
Google----Reasoning tokens included in totals but not yet exposed via AI SDK

Billing note: reasoning tokens are already included in completion_tokens, so billing is correct even when reasoning counts are not separately displayed.

Core Types

ChatOptions

// packages/util/types/llm.ts interface ChatOptions { input: string; // user message system?: string; // system prompt history?: History; // conversation history model?: LanguageModel; // model identifier account_id?: string; project_id?: string; path?: string; // file context tag?: string; // analytics tag maxTokens?: number; timeout?: number; stream?: (output: string | null) => void; } type History = { role: "assistant" | "user" | "system"; content: string; }[];

ChatOutput

interface ChatOutput { output: string; total_tokens: number; prompt_tokens: number; completion_tokens: number; }

Cost Tracking

Pricing

packages/util/db-schema/llm-utils.ts defines per-model pricing:

const LLM_COST: { [name in LanguageModelCore]: Cost } = { "gpt-4": { prompt_tokens: /* USD per 1M */, completion_tokens: /* ... */ }, "claude-3-5-sonnet": { ... }, // ... }; function getLLMCost(model, markup_percentage): { prompt_tokens: number; // USD per token with markup completion_tokens: number; // USD per token with markup }

Purchase Flow

After evaluation:

  1. Check isFreeModel(model) — free models skip charging

  2. Calculate cost: prompt_cost * prompt_tokens + completion_cost * completion_tokens

  3. Create purchase via createPurchase() with type, token counts, tag

Free Models

Determined by isFreeModel(model, isCoCalcCom):

  • Ollama models (self-hosted)

  • Some user-defined LLMs

  • Platform-specific free tiers

Abuse Prevention

packages/server/llm/abuse.ts:

// Configurable via environment variables: COCALC_LLM_QUOTA_NO_ACCOUNT; // default: 0 (disabled) COCALC_LLM_QUOTA_ACCOUNT; // default: 100,000 tokens COCALC_LLM_QUOTA_GLOBAL; // default: 1,000,000 tokens

Prometheus metrics: llm_abuse_usage_global_pct (gauge), llm_abuse_usage_account_pct (histogram), llm_abuse_rejected_total (counter).

Database Schema

openai_chatgpt_log Table

Despite the legacy name, stores all LLM provider interactions:

FieldTypeDescription
idserialPrimary key
timetimestampRequest time
account_idUUIDRequesting user
inputtextUser message
outputtextModel response
historyjsonbConversation history
modeltextModel identifier
systemtextSystem prompt
tagtextAnalytics tag ({vendor}:{category})
total_tokensintegerTotal tokens used
prompt_tokensintegerInput tokens
total_time_sfloatResponse time
project_idUUIDContext project
pathtextContext file path
  • openai_embedding_log — vector embedding usage tracking

  • openai_embedding_cache — embedding cache (keyed by input_sha1)

Conat Messaging

Subject Pattern

// packages/conat/llm/server.ts llm.account - { account_id }.api; // user requests llm.project - { project_id }.api; // project requests llm.hub.api; // hub-level requests

Server Registration

// packages/server/conat/llm.ts export async function init() { await init0(evaluate); // subscribes to llm.*.api subjects }

Client

// packages/conat/llm/client.ts export async function llm(options: ChatOptions): Promise<string> { // Sends multiresponse request to llmSubject // Handles streaming via options.stream callback // Returns concatenated output }

Frontend Components

LLMClient

packages/frontend/client/llm.ts:

class LLMClient { async query(opts: QueryLLMProps): Promise<string>; // one-shot query queryStream(opts): ChatStream; // streaming query }

Handles: default system prompt, locale settings, purchase permission checks, history/message truncation to fit context window, Conat call.

Model Selector

packages/frontend/frame-editors/llm/llm-selector.tsx — dropdown for choosing model. Groups models by provider, shows inline cost estimation, includes user-defined LLMs, validates availability.

AI Assistant Integration Points

Frame editors (packages/frontend/frame-editors/llm/):

ComponentPurpose
llm-assistant-button.tsxMain AI button in editor toolbar
help-me-fix-button.tsxError explanation button
help-me-fix-dialog.tsxFull dialog for fix suggestions
llm-query-dropdown.tsxQuick action menu
llm-history-selector.tsxPrevious query history

Jupyter (packages/frontend/jupyter/llm/):

ComponentPurpose
cell-tool.tsxPer-cell AI assistant button
cell-context-selector.tsxChoose context scope
split-cells.tsLLM-powered cell splitting

Chat (packages/frontend/chat/):

  • llm-cost-estimation.tsx — cost display in chat messages

  • Message summarization via LLM

Token Estimation

packages/frontend/misc/llm.ts:

numTokensEstimate(content: string): number // ~8 chars/token heuristic truncateMessage(content: string, maxTokens): string // truncate to fit truncateHistory(history, maxTokens, model): History // remove oldest entries

Cost Estimation Component

packages/frontend/misc/llm-cost-estimation.tsx — displays estimated cost before execution. Free models marked as "free to use".

User-Defined LLMs

Users can add their own LLM endpoints:

// packages/util/db-schema/llm-utils.ts interface UserDefinedLLM { id: number; service: UserDefinedLLMService; // "openai", "anthropic", etc. model: string; // model name at provider display: string; // display name endpoint: string; // API endpoint URL apiKey: string; // API key icon?: string; max_tokens?: number; } // Stored in: accounts.other_settings["userdefined_llm"] as JSON array // Model name format: "user-{service}-{id}" const USER_LLM_PREFIX = "user-";

User-Defined LLM Hook (Frontend)

// packages/frontend/frame-editors/llm/use-userdefined-llm.ts function useUserDefinedLLM(): UserDefinedLLM[]; function getUserDefinedLLMByModel(model: string): UserDefinedLLM | null;

REST API

POST /api/v2/llm/evaluate Body: { input, system?, history?, model?, tag? } Response: { output, success } | { error }

Server Settings

SettingDescription
default_llmDefault model (fallback: gemini-3-flash-preview-16k)
pay_as_you_go_openai_markup_percentageCost markup (0-100%)
user_defined_llmEnable/disable user-defined LLM support

Key Source Files

FileDescription
packages/util/db-schema/llm-utils.tsModel definitions, pricing, validation (~66KB)
packages/util/types/llm.tsChatOptions, History, ChatOutput types
packages/util/db-schema/llm.tsDatabase schema for log tables
packages/server/llm/index.tsMain evaluate() entry point
packages/server/llm/evaluate.tsAI SDK unified handler
packages/server/llm/client.tsOllama / Custom OpenAI model factories
packages/server/llm/utils.tsToken counting, provider metadata extraction
packages/server/llm/user-defined.tsUser-defined LLM evaluation
packages/server/llm/abuse.tsRate limiting and quotas
packages/server/llm/save-response.tsDatabase persistence
packages/conat/llm/client.tsFrontend -> server messaging
packages/conat/llm/server.tsSubject routing and handling
packages/frontend/client/llm.tsLLMClient class
packages/frontend/frame-editors/llm/llm-selector.tsxModel picker
packages/frontend/frame-editors/llm/llm-assistant-button.tsxAI button
packages/frontend/jupyter/llm/cell-tool.tsxJupyter cell assistant
packages/frontend/misc/llm-cost-estimation.tsxCost display
packages/frontend/misc/llm.tsToken estimation utilities
packages/next/pages/api/v2/llm/evaluate.tsREST API endpoint

Tests

Unit Tests

cd packages/util && pnpm test db-schema/llm-utils.test.ts

Integration Tests (requires Postgres + API keys)

The suite is opt-in and skipped unless COCALC_TEST_LLM=true.

cd packages/server && COCALC_TEST_LLM=true pnpm test llm/test/models.test.ts

Required environment variables (see packages/server/llm/test/shared.ts):

  • COCALC_TEST_OPENAI_KEY

  • COCALC_TEST_GOOGLE_GENAI_KEY

  • COCALC_TEST_ANTHROPIC_KEY

  • COCALC_TEST_MISTRAL_AI_KEY

  • COCALC_TEST_XAI_KEY

Common Patterns for Agents

Making an LLM Query (Frontend)

const result = await webapp_client.llm_client.query({ input: "Explain this error", system: "You are a helpful coding assistant", model: "gpt-4o", project_id: "...", path: "file.py", tag: "editor:help-me-fix", });

Streaming Response (Frontend)

const chatStream = webapp_client.llm_client.queryStream({ input: "Write a function...", model: "claude-3-5-sonnet", tag: "jupyter:cell-tool", }); chatStream.on("token", (token) => { /* update UI */ }); chatStream.on("done", (fullOutput) => { /* final result */ });

Checking Model Availability

import { isLanguageModelValid, isFreeModel, getLLMCost, } from "@cocalc/util/db-schema/llm-utils"; if (isLanguageModelValid(model)) { const free = isFreeModel(model, isCoCalcCom); const cost = getLLMCost(model, markup_percentage); }