Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
sagemathinc
GitHub Repository: sagemathinc/cocalc
Path: blob/master/src/docs/database.md
10782 views

Database Schema and Tables

This document covers the PostgreSQL database schema that backs CoCalc. It focuses on the tables that appear most frequently in the codebase and are essential for understanding data flow.

Overview

CoCalc uses PostgreSQL as its primary data store. The schema is defined declaratively in TypeScript using a Table() registration function.

Schema Definition System

All table schemas live in packages/util/db-schema/. Each file calls Table() to register into the global SCHEMA object:

// packages/util/db-schema/types.ts Table({ name, fields, rules });
  • fields — column definitions with type, pg_type, desc

  • rules — primary key, indexes, durability, and query permissions

  • rules.user_query — defines what frontend clients can read/write

  • rules.project_query — defines what project daemons can read/write

The index file (packages/util/db-schema/index.ts) side-effect-imports all schema files, making the entire schema available as SCHEMA.

Access Patterns

From hub/backend code — use the connection pool directly:

import getPool from "@cocalc/database/pool"; const pool = getPool(); const { rows } = await pool.query("SELECT * FROM accounts WHERE account_id = $1", [id]);

From frontend — use schema-driven user queries via SyncTable or the API. Queries are validated against user_query rules and filtered by pg_where clauses (e.g., account_id, project_id).

Durability Levels

LevelMeaningUsed by
hard (default)Fully persistedaccounts, projects, purchases
softMay be lost without serious impactproject_log, cursors, file_use, stats
ephemeralIn-memory only (SyncTable, no DB)ipywidgets

Virtual Tables

Many tables are virtual — they query the same underlying PostgreSQL table but with different permissions or filters. For example, crm_accounts is a virtual table over accounts that grants admin-only access to all fields.

Table({ name: "crm_accounts", rules: { virtual: "accounts", user_query: { get: { admin: true, ... } } }, fields: schema.accounts.fields, });

Core Tables

accounts

File: packages/util/db-schema/accounts.ts Primary key: account_id (UUID)

Every registered user. This is the most-queried table in the system.

FieldTypeDescription
account_idUUIDPrimary key
email_addressVARCHAR(254)Unique email (optional — SSO users may not have one)
password_hashVARCHAR(173)SHA-512 hash with salt
first_name, last_nameVARCHAR(254)Display name
nameVARCHAR(39)Globally unique username (optional)
createdtimestampAccount creation time
last_activetimestampLast activity
groupsTEXT[]Group memberships (e.g., ['admin'])
passportsmapSSO logins: {"strategy-id": profile}
editor_settingsmapEditor config (font size, key bindings, etc.)
other_settingsmapGeneral settings (dark mode, confirm close, etc.)
terminalmapTerminal settings
bannedbooleanAccount banned flag
deletedbooleanAccount deleted flag
balanceREALCurrent USD balance (not source of truth — display only)
min_balanceREALMinimum allowed balance (admin-set credit limit)
auto_balancemapAuto-topup configuration
stripe_customer_idstringStripe integration
api_keystringFull-access API key (sk_..., 24 chars, base62)
ssh_keysmapSSH key fingerprints to key objects
purchase_closing_dayintegerBilling cutoff day (1–28)
profilemapAvatar and presence data
lti_idTEXT[]LTI ISS and user IDs
unlistedbooleanExclude from name searches
tagsTEXT[]Interest tags

Key indexes: created, last_active DESC, email_address (unique), lti_id, ((passports IS NOT NULL)), ((ssh_keys IS NOT NULL))

Virtual tables: crm_accounts (admin), crm_agents (admin accounts only), accounts_v2 (name search), collaborators (find collaborators)


projects

File: packages/util/db-schema/projects.ts Primary key: project_id (UUID)

Every project. Central to most of CoCalc's functionality.

FieldTypeDescription
project_idUUIDPrimary key
titlestringShort title
descriptionstringMarkdown description
nameVARCHAR(100)Unique name per owner
usersmap{account_id: {group:"owner"|"collaborator", hide, upgrades, ssh}}
deletedbooleanSoft-delete flag
createdtimestampCreation time
last_editedtimestampLast file edit time
last_activemap{account_id: timestamp} per-user activity
statemap{state:"running"|"stopped"|..., time, ip, error}
statusmapDetailed status from project daemon
settingsmapBase quotas: {cores, memory, disk_quota, network, mintime, ...}
run_quotamapActual running quota (computed)
site_licensemap{license_id: {memory, cores, ...}} applied licenses
action_requestmap{action:"start"|"stop", started, finished, err}
compute_imagestringUnderlying compute image name
coursemapCourse management: {project_id, path, pay, account_id}
envmapAdditional environment variables
sandboxbooleanAuto-add visiting users as collaborators
hostmap{host: hostname, assigned: timestamp}
pay_as_you_go_quotasmapPAYG quotas per account
secret_tokenVARCHAR(256)Ephemeral auth token
manage_users_owner_onlybooleanRestrict collaborator management to owners
avatar_image_tinystring32x32 image (~3kb)
colorstringVisual identification color
lti_idTEXT[]LTI context IDs

Key indexes: last_edited, created, USING GIN (users), USING GIN (state), ((state ->> 'state')), deleted, site_license

User query constraints: Only projects edited in the last 6 weeks are returned by default (PROJECTS_CUTOFF = "6 weeks"), limited to 300 results.

Virtual tables: projects_all, projects_admin, projects_owner, project_avatar_images, crm_projects


syncstrings

File: packages/util/db-schema/syncstring-schema.ts Primary key: string_id (CHAR(40) — SHA1 of project_id + path)

Coordination record for each collaboratively-edited document. See syncstrings.md for full architecture details.

FieldTypeDescription
string_idCHAR(40)sha1(project_id, path)
project_idUUIDOwning project
pathstringFile path
usersUUID[]Editor account IDs (index = user_id in patches)
last_activetimestampLast user interaction
last_snapshottimestampMost recent snapshot time
snapshot_intervalintegerPatches between snapshots (default: 300)
doctypestringJSON: {"type":"string"} or {"type":"db","opts":{...}}
savemap{state:"requested"|"done", hash, error}
initmap{time, size, error}
read_onlybooleanFile is read-only
settingsmapShared editing config
archivedUUIDBlob ID if patches are archived
hugebooleanToo many patches to process

Related tables: patches, cursors, eval_inputs, eval_outputs, ipywidgets — all keyed by string_id. See syncstrings.md.


patches

File: packages/util/db-schema/syncstring-schema.ts Primary key: (string_id, time, is_snapshot) — compound

Individual edit patches for synchronized documents.

FieldTypeDescription
string_idCHAR(40)Which syncstring
timetimestampLogical timestamp
walltimestampWallclock time for display
user_idintegerIndex into syncstrings.users
patchTEXTJSON-encoded compressed DMP patch
is_snapshotbooleanWhether this is a snapshot entry
snapshotstringFull document state (if snapshot)
parentsINTEGER[]Parent patch timestamps (DAG)
versionintegerUser-friendly version number
formatinteger0 = string, 1 = db-doc
seq_infomapConat sequence info for incremental loading

Constraints: unique_writes: true — no reason to write the same patch twice. Cannot change user_id or patch after creation.


server_settings

File: packages/util/db-schema/server-settings.ts Primary key: name (string)

Global configuration for the entire CoCalc installation.

FieldTypeDescription
namestringSetting name
valuestringSetting value (stored as string)
readonlybooleanExternally controlled — UI should not edit

Access: Admin-only set; values are read by the server at startup and cached. Settings are defined in packages/util/db-schema/site-defaults.ts and site-settings-extras.ts, which enumerate all valid setting names with defaults, types, and descriptions.

Common settings include: site_name, site_description, help_email, commercial (boolean), ssh_gateway, default_quotas, max_upgrades, email_enabled, AI/LLM configuration, and many more.


passport_settings

File: packages/util/db-schema/server-settings.ts Primary key: strategy (string)

SSO (Single Sign-On) authentication strategy configuration.

FieldTypeDescription
strategystringUnique lowercase identifier (e.g., google, github)
confmapStrategy configuration consumed by auth.ts
infomapPublic display info: {icon, display, public, exclusive_domains, disabled}

Used by the login system to determine which SSO buttons to show and how to authenticate users.


site_licenses

File: packages/util/db-schema/site-licenses.ts Primary key: id (UUID)

License keys that upgrade project quotas.

FieldTypeDescription
idUUIDLicense ID
titlestringDescriptive name
descriptionstringLonger description
infomap{purchased: PurchaseInfo} — specs and price
activatestimestampWhen license starts working
expirestimestampWhen license stops working
createdtimestampCreation time
last_usedtimestampLast used (throttled)
managersTEXT[]Account IDs allowed to manage
run_limitintegerMax simultaneously running upgraded projects
quotamap{cpu, ram, disk, member, boost, idle_timeout, always_running, ...}
upgradesmapLegacy format: {cores, memory, disk_quota, network, mintime}
subscription_idintegerIf auto-renewing
voucher_codestringIf created from voucher

Virtual tables: manager_site_licenses, site_license_public_info, site_license_usage_stats, projects_using_site_license, matching_site_licenses


purchases

File: packages/util/db-schema/purchases.ts Primary key: id (integer, auto-increment)

All financial transactions. Negative cost = credit to user.

FieldTypeDescription
idintegerAuto-increment ID
timetimestampWhen recorded
account_idUUIDWho is paying
costnumberCost in USD (negative = credit)
cost_per_hournumberHourly rate (metered purchases)
period_start, period_endtimestampBilling period
servicestringCategory: license, compute-server, openai-*, etc.
descriptionmapService-specific details
project_idUUIDAffected project (optional)
invoice_idstringStripe invoice/payment intent (unique)
day_statement_idintegerDaily statement
month_statement_idintegerMonthly statement
tagstringAnalytics tag

Service types: credit, refund, license, project-upgrade, compute-server, compute-server-network-usage, compute-server-storage, openai-* (LLM usage), voucher, edit-license

Virtual tables: crm_purchases (admin)


subscriptions

File: packages/util/db-schema/subscriptions.ts Primary key: id (integer)

Recurring subscription management.

FieldTypeDescription
idintegerAuto-increment ID
account_idUUIDSubscriber
costnumberCost per period in USD
intervalstring"month" or "year"
statusstring"active", "canceled", "unpaid", "past_due"
current_period_starttimestampPeriod start
current_period_endtimestampPeriod end
metadatamap{type:"license", license_id}
canceled_attimestampWhen canceled

Activity and Logging Tables

project_log

File: packages/util/db-schema/project-log.ts Primary key: id (UUID) Durability: soft

Activity log for each project (file opens, settings changes, etc.).

FieldTypeDescription
idUUIDEvent ID
project_idUUIDWhich project
timetimestampWhen (kept for 2 months)
account_idUUIDWho
eventmapWhat happened (JSON)

Indexes: project_id, time, account_id


file_use

File: packages/util/db-schema/file-use.ts Primary key: id (CHAR(40) — SHA1 of project_id + path) Durability: soft

Tracks file access for notifications and "last edited" info.

FieldTypeDescription
idCHAR(40)sha1(project_id, path)
project_idUUIDWhich project
pathstringFile path
usersmap{account_id: {edit: timestamp, chat: timestamp, open: timestamp}}
last_editedtimestampMost recent edit (kept for 21 days)

central_log

File: packages/util/db-schema/central-log.ts Primary key: id (UUID) Durability: soft

System-wide analytics log. Not read by frontend (except admins).

FieldTypeDescription
idUUIDEvent ID
eventstringEvent name (must start with "webapp-" for user-set events)
valuemapArbitrary JSON data
timetimestampWhen
expiretimestampAuto-deletion time

blobs

File: packages/util/db-schema/blobs.ts Primary key: id (UUID — derived from SHA1 of content)

Binary data storage for uploads, Sage worksheet output, etc.

FieldTypeDescription
idUUIDSHA1-based content hash
blobBufferBinary content
sizenumberSize in bytes
expiretimestampExpiration time
createdtimestampCreation time
project_idUUIDAssociated project
account_idUUIDCreator (recorded since late 2024)
gcloudstringCloud storage bucket name
compressstring"gzip" or "zlib"

Limits: MAX_BLOB_SIZE = 10MB, daily per-project limit of 100MB (licensed) or 10MB (unlicensed).


Infrastructure Tables

hub_servers

File: packages/util/db-schema/hub-servers.ts Primary key: host (VARCHAR(63)) Durability: soft

Active hub server instances for load balancing.

FieldTypeDescription
hostVARCHAR(63)Hostname
portintegerPort number
clientsintegerConnected client count
expiretimestampTTL for this record

stats

File: packages/util/db-schema/stats.ts Primary key: id (UUID) Durability: soft

Periodic system-wide statistics snapshots.

FieldTypeDescription
idUUIDSnapshot ID
timetimestampWhen computed
accountsintegerTotal account count
projectsintegerTotal project count
accounts_createdmapCounts by time window: {"5min", "1h", "1d", "7d", "30d"}
accounts_activemapSame windows
projects_createdmapSame windows
projects_editedmapSame windows
running_projectsmapCurrently running projects
hub_serversJSONB[]Active hub server info

registration_tokens

File: packages/util/db-schema/registration-tokens.ts Primary key: token (string)

Tokens required to create accounts (admin-managed).

FieldTypeDescription
tokenstringThe token string
descrstringDescription
expirestimestampExpiration time
limitnumberMax accounts to create
counternumberAccounts created so far (read-only)
disabledbooleanDisable this token
ephemeralnumberLifetime in ms for created accounts/projects
customizemapAccount customization overrides

public_paths

File: packages/util/db-schema/public-paths.ts Primary key: id (CHAR(40) — SHA1 of project_id + path)

Published/shared files and directories.

FieldTypeDescription
idCHAR(40)sha1(project_id, path)
project_idUUIDWhich project
pathstringFile/directory path
nameVARCHAR(100)Unique name within project
descriptionstringDescription
disabledbooleanUnpublish without deleting
unlistedbooleanHide from public listing
authenticatedbooleanRequire login to view
counterintegerView count
licensestringContent license
redirectstringRedirect URL
jupyter_apibooleanEnable Jupyter API access

Query Permission System

User Queries

Each table defines user_query.get and user_query.set rules:

user_query: { get: { pg_where: ["account_id"], // filter by authenticated user fields: { account_id: null, ... }, // allowed fields (null = no default) admin: true, // admin-only access options: [{ limit: 100 }], // query options }, set: { fields: { account_id: "account_id", // auto-filled from session project_id: "project_write", // must have write access title: true, // user can set freely }, check_hook(db, obj, account_id, project_id, cb) { ... }, before_change(db, old_val, new_val, account_id, cb) { ... }, on_change(db, old_val, new_val, account_id, cb) { ... }, }, }

pg_where Magic Values

ValueMeaning
"account_id"Filter to authenticated user's account
"projects"Filter to user's projects
"project_id"Must specify and have read access
"project_id-public"Must specify; project has public paths
"all_projects_read"All project IDs user can read
"collaborators"All account IDs of user's collaborators

Hook System

HookWhenPurpose
check_hookBefore processingValidate permissions
before_changeBefore DB writePre-processing, validation
on_changeAfter DB writeSide effects, notifications
instead_of_changeReplaces DB writeCustom write logic
instead_of_queryReplaces entire queryCustom query logic

CRM Virtual Tables

Most core tables have crm_* counterparts for admin access:

Virtual TableBase TableNotes
crm_accountsaccountsAll fields + notes, salesloft_id
crm_projectsprojectsAll fields + notes
crm_purchasespurchasesSet tag/notes
crm_subscriptionssubscriptionsSet notes
crm_site_licensessite_licensesFull admin access
crm_project_logproject_logAdmin read
crm_file_usefile_useAdmin read
crm_syncstringssyncstringsAdmin read
crm_patchespatchesAdmin read (limit 200)

Key Source Files

FileDescription
packages/util/db-schema/index.tsSchema registry — imports all schema files
packages/util/db-schema/types.tsTable() function, TableSchema, Fields types
packages/util/db-schema/accounts.tsAccounts schema
packages/util/db-schema/projects.tsProjects schema
packages/util/db-schema/syncstring-schema.tsSyncstrings, patches, cursors, eval, ipywidgets
packages/util/db-schema/server-settings.tsServer/passport settings
packages/util/db-schema/site-defaults.tsAll server setting names and defaults
packages/util/db-schema/site-licenses.tsLicense management
packages/util/db-schema/purchases.tsFinancial transactions
packages/util/db-schema/subscriptions.tsRecurring subscriptions
packages/util/db-schema/blobs.tsBinary storage
packages/util/db-schema/project-log.tsActivity logging
packages/util/db-schema/file-use.tsFile access tracking
packages/util/db-schema/central-log.tsSystem-wide analytics
packages/util/db-schema/public-paths.tsPublished files
packages/util/db-schema/registration-tokens.tsSignup tokens
packages/util/db-schema/client-db.tsClient-side DB helpers (sha1, etc.)
packages/database/pool/pool.tsPostgreSQL connection pool
packages/database/postgres/Server-side query implementations