Getting started
Tessera runs open-weight language models on your machine. In three commands you'll have a local model answering your questions — no accounts, no API keys, no outbound calls.
A sixty-second tour
Install the CLI, pull a model, start chatting. Everything happens locally; your prompts and responses never leave your machine.
shell$ brew install tessera
$ tessera run lumen-4
# lumen-4 is a balanced 7B general chat model, ~4 GB
# once it's pulled you'll see a > prompt. start typing.
First run downloads ~4 GB. Subsequent runs start instantly — the model is cached under ~/.tessera/models/.
What Tessera is not
- Not a hosted API. Everything runs on your hardware.
- Not a training framework. Tessera loads and serves existing open-weight models.
- Not opinionated about your stack. Talk to it over HTTP from any language.
Installation
Tessera ships as a single binary. Pick your platform below.
macOS
shell$ brew install tessera
# or download the .dmg from the releases page
Linux
shell$ curl -fsSL https://tessera.ai/install.sh | sh
$ tessera --version
tessera 1.4.2 (a3f9c8e)
Windows
Windows builds ship through winget:
powershellPS> winget install Tessera.Tessera
Verifying the install
After installation, check that the server is reachable:
shell$ tessera status
tessera is running on http://localhost:11434
3 models installed · 11 GB used
API reference
Tessera speaks an unopinionated HTTP API on port 11434. If you can curl, you can talk to a model.
POST /api/chat
The primary endpoint for conversational use. Sends a list of messages and returns the assistant's reply.
bash$ curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "lumen-4",
"messages": [
{ "role": "user", "content": "Write a haiku about Rust." }
]
}'
Response
json{
"model": "lumen-4",
"message": {
"role": "assistant",
"content": "Borrow, move, and own —\nthe compiler's quiet chorus.\nSafer than it sounds."
},
"done": true,
"total_duration_ms": 842
}
POST /api/embed
Generate embeddings for one or more inputs.
bash$ curl http://localhost:11434/api/embed \
-d '{ "model": "embed-3", "input": "hello world" }'
GET /api/tags
List the models currently installed and ready to serve.
Models
Tessera supports any model in the open-weight registry. Models are pulled by name; tags select quantization.
Pulling a model
shell$ tessera pull lumen-4:q4
# tags: q4 (4-bit), q8 (8-bit), fp16 (full precision)
Recommended defaults
lumen-4 — balanced general chat, 7B, ~4 GB on disk at q4.
dev-code — coding and refactors, 14B, long context.
atlas-large — research-grade reasoning, 70B, ~40 GB. Needs 48 GB RAM to comfortably host at q4.
nano-1 — tiny 1B helper for tool use and classification. Runs on anything.
vision-v — multimodal vision+text. Pass images via base64 in the images field.
embed-3 — small embedding model for RAG pipelines.
Removing a model
shell$ tessera rm atlas-large
removed atlas-large · freed 39.8 GB
Configuration
Most teams never need to touch the config. When you do, ~/.tessera/config.toml is the single source of truth.
Default config
toml# ~/.tessera/config.toml
[server]
host = "127.0.0.1"
port = 11434
cors_allow_origin = "*"
[models]
cache_dir = "~/.tessera/models"
auto_unload_after_seconds = 600
[telemetry]
# opt-in only. off by default.
enabled = false
Environment variables
Every config value can be overridden via an env var:
TESSERA_HOST — bind the server to a different interface.
TESSERA_PORT — change the listening port.
TESSERA_CACHE_DIR — relocate the model cache (useful on an external SSD).
Security tip. If you expose Tessera on a non-127.0.0.1 interface, put a reverse proxy with auth in front of it. The built-in server has no authentication by design.