Documentation

Run models, query the API, configure your setup. Everything you need to ship with Tessera.

Getting started

Tessera runs open-weight language models on your machine. In three commands you'll have a local model answering your questions — no accounts, no API keys, no outbound calls.

A sixty-second tour

Install the CLI, pull a model, start chatting. Everything happens locally; your prompts and responses never leave your machine.

shell$ brew install tessera
$ tessera run lumen-4
# lumen-4 is a balanced 7B general chat model, ~4 GB
# once it's pulled you'll see a > prompt. start typing.

First run downloads ~4 GB. Subsequent runs start instantly — the model is cached under ~/.tessera/models/.

What Tessera is not

  • Not a hosted API. Everything runs on your hardware.
  • Not a training framework. Tessera loads and serves existing open-weight models.
  • Not opinionated about your stack. Talk to it over HTTP from any language.

Installation

Tessera ships as a single binary. Pick your platform below.

macOS

shell$ brew install tessera
# or download the .dmg from the releases page

Linux

shell$ curl -fsSL https://tessera.ai/install.sh | sh
$ tessera --version
tessera 1.4.2 (a3f9c8e)

Windows

Windows builds ship through winget:

powershellPS> winget install Tessera.Tessera

Verifying the install

After installation, check that the server is reachable:

shell$ tessera status
tessera is running on http://localhost:11434
3 models installed · 11 GB used

API reference

Tessera speaks an unopinionated HTTP API on port 11434. If you can curl, you can talk to a model.

POST /api/chat

The primary endpoint for conversational use. Sends a list of messages and returns the assistant's reply.

bash$ curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lumen-4",
    "messages": [
      { "role": "user", "content": "Write a haiku about Rust." }
    ]
  }'

Response

json{
  "model": "lumen-4",
  "message": {
    "role": "assistant",
    "content": "Borrow, move, and own —\nthe compiler's quiet chorus.\nSafer than it sounds."
  },
  "done": true,
  "total_duration_ms": 842
}

POST /api/embed

Generate embeddings for one or more inputs.

bash$ curl http://localhost:11434/api/embed \
  -d '{ "model": "embed-3", "input": "hello world" }'

GET /api/tags

List the models currently installed and ready to serve.

Models

Tessera supports any model in the open-weight registry. Models are pulled by name; tags select quantization.

Pulling a model

shell$ tessera pull lumen-4:q4
# tags: q4 (4-bit), q8 (8-bit), fp16 (full precision)

Recommended defaults

  • lumen-4 — balanced general chat, 7B, ~4 GB on disk at q4.
  • dev-code — coding and refactors, 14B, long context.
  • atlas-large — research-grade reasoning, 70B, ~40 GB. Needs 48 GB RAM to comfortably host at q4.
  • nano-1 — tiny 1B helper for tool use and classification. Runs on anything.
  • vision-v — multimodal vision+text. Pass images via base64 in the images field.
  • embed-3 — small embedding model for RAG pipelines.

Removing a model

shell$ tessera rm atlas-large
removed atlas-large · freed 39.8 GB

Configuration

Most teams never need to touch the config. When you do, ~/.tessera/config.toml is the single source of truth.

Default config

toml# ~/.tessera/config.toml

[server]
host = "127.0.0.1"
port = 11434
cors_allow_origin = "*"

[models]
cache_dir = "~/.tessera/models"
auto_unload_after_seconds = 600

[telemetry]
# opt-in only. off by default.
enabled = false

Environment variables

Every config value can be overridden via an env var:

  • TESSERA_HOST — bind the server to a different interface.
  • TESSERA_PORT — change the listening port.
  • TESSERA_CACHE_DIR — relocate the model cache (useful on an external SSD).

Security tip. If you expose Tessera on a non-127.0.0.1 interface, put a reverse proxy with auth in front of it. The built-in server has no authentication by design.