A Language Model Built for German, Designed to Run Locally

Most large language models treat German as an afterthought. They are trained predominantly on English text, with German mixed into a multilingual soup alongside dozens of other languages. The result: German works, but it does not work well. Compound words get fragmented. Morphology gets mangled. Token efficiency suffers, which means higher costs and shorter usable context.

We built Faust-1 to change that.

Faust-1 is a 1.6B parameter language model trained entirely from scratch with German as the primary language. Not fine-tuned from an English model. Not adapted from a multilingual base. Built from the ground up with German syntax, compounding, morphology, and reasoning patterns as the default operating regime.

And it runs on your laptop.

Why German-First Matters

When a model is trained primarily on English and then evaluated on German, several things go wrong.

Tokenization is inefficient. Standard tokenizers split German compound words into fragments that make no linguistic sense. “Krankenversicherungsbeitrag” (health insurance contribution) might become five or six tokens. Each extra token costs compute, reduces effective context length, and introduces fragmentation that the model has to work around.

Syntax is mismodeled. German word order is fundamentally different from English. Verb-final clauses, separable prefixes, and complex noun phrases require a model that has seen enough German text to internalize these patterns — not one that treats them as edge cases.

Reasoning patterns differ. The way information is structured, arguments are built, and conclusions are drawn in German text follows different conventions. A model that has primarily learned to reason through English text will produce German that reads like a translation.

Faust-1 addresses all of this by making German the primary training language, representing approximately 90% of the training corpus.

What We Built

Faust-1 is a decoder-only causal language model with 1.6B parameters. The development process covers the full pipeline:

  • Large-scale data collection with synthetic data generation
  • Data cleaning, normalization, and deduplication to reduce contamination and redundancy
  • Pre-training on a predominantly German corpus using a decoder-only language modeling objective
  • Supervised post-training (instruction tuning) using labeled input-output pairs for conversational and task-oriented use
  • Preference-based optimization including Direct Preference Optimization (DPO) to improve response quality and alignment

The result is a conversational model that understands and generates German at a level you would not expect from a model this size.

A Custom Tokenizer for German

Tokenization is one of those details that sounds boring but has enormous practical impact. Every token you waste on a badly split word is a token you cannot use for actual content.

We built a custom tokenizer optimized specifically for German morphology and compounding. The difference is measurable:

Tokenizer efficiency comparison — Faust-1 vs. other tokenizers on German text

Lower token counts on German text translate directly into more usable context, lower inference cost, and less fragmentation on compound-heavy inputs. This is not a marginal improvement. On typical German text, Faust-1’s tokenizer produces significantly fewer tokens than tokenizers designed for English-first models.

Faust-1 tokenizer compared to OpenAI tokenizers

Trained on Verified Synthetic Data

A substantial portion of the training signal comes from synthetic data. This is a deliberate choice, not a shortcut.

Real-world data is messy. It contains inconsistencies, labeling errors, and uneven coverage. Synthetic data, when done correctly, solves these problems. But “when done correctly” is the key phrase. Generating training data without verification produces garbage.

Our approach pairs generation with explicit verification and filtering:

  • LLM-as-judge style evaluations
  • Rule-based and programmatic checks
  • Consistency and self-agreement filtering

This allows broad coverage of instruction-following and reasoning patterns while maintaining the quality control that makes the training signal actually useful.

Benchmark Performance

We evaluated Faust-1 on standard German-language benchmarks: ARC_de, GSM8K_de, HellaSwag_de, MMLU_de, and TruthfulQA_de.

German benchmark performance — Faust-1 vs. comparable models in the 1-2B parameter range

The target is best-in-class performance within the 1–2B parameter range for German-focused models. These benchmarks are straightforward to reproduce using standard Hugging Face evaluation pipelines.

Runs on Consumer Hardware

This is where Faust-1 differs most from the “bigger is better” trend in the industry.

Faust-1 is deliberately sized and optimized to run on consumer-grade hardware. No data-center GPUs required. No cloud dependency. No API costs.

Laptop or desktop (CPU or small GPU): Runs on modern CPUs or entry-level GPUs — Apple Silicon, RTX 3060/4060, RX 6600 — using optimized runtimes like GGUF, MLX, or ONNX.

Single-GPU workstation: Efficiently serves interactive workloads on a single consumer GPU with low VRAM requirements compared to larger multilingual models.

On-device and privacy-sensitive setups: Suitable for local assistants, offline document analysis, and private RAG pipelines where data must not leave the machine.

This makes Faust-1 practical for researchers, developers, and small teams who want strong German language performance without cloud dependency or high inference costs.

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "tabularisai/Faust-1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Gib mir eine kurze Einführung in große Sprachmodelle (LLM)."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.6,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Structured output with GuideGen

Faust-1 supports structured output generation via GuideGen, our library for constrained decoding with Pydantic schemas:

pip install git+https://github.com/tabularis-ai/guidegen.git
import guidegen as gg
from pydantic import BaseModel, Field
from typing import Literal

class EmailSummary(BaseModel):
    Absender: str = Field(description="Der Name des Absenders.")
    Betreff: str = Field(description="Worum geht es in der E-Mail? (max 5 Wörter)")
    Zusammenfassung: str = Field(description="Kurze Zusammenfassung (max 2 Sätze).")
    Prioritaet: Literal["hoch", "mittel", "niedrig"] = Field(description="Wie wichtig die E-Mail ist.")

gen = gg.GuideGen(
    "tabularisai/Faust-1",
    use_chat_template=True,
    enable_thinking=False,
)

summary = gen.generate(prompt, EmailSummary)

This means you can use Faust-1 not just for free-form conversation, but for structured extraction, classification, and any task where you need reliable JSON output.

Deployment options

vLLM (OpenAI-compatible API)

vllm serve tabularisai/Faust-1 --dtype float16

SGLang

python -m sglang.launch_server \
  --model-path tabularisai/Faust-1 \
  --dtype float16

llama.cpp (GGUF, local / on-device)

./llama-cli \
  -m faust_1_q8_0.gguf \
  -p "Erkläre kurz, was ein großes Sprachmodell ist."

The repository includes a prebuilt Q8_0 GGUF file for efficient local inference.

Use Cases

  • German conversational assistants that run locally with full data privacy
  • Research and benchmarking on German NLP tasks
  • Privacy-sensitive deployments where data must not leave the infrastructure
  • On-device and edge experimentation with a model small enough to iterate quickly
  • Structured data extraction from German documents using constrained decoding

What Comes Next

We are actively working on two follow-up variants:

  • Reasoning-focused variant — enhanced chain-of-thought and multi-step reasoning for German
  • Agent-oriented variant — optimized for tool use, function calling, and agentic workflows

Getting Started

  1. Visit the Hugging Face model page
  2. Accept the license terms to get access
  3. Try it on your own German text data

For commercial licensing, enterprise integrations, or questions, contact us at [email protected].

Community

Join our Discord to share feedback, report issues, or discuss use cases:

Join our Discord


Faust-1 is developed by Tabularis AI in Tübingen. We build efficient, privacy-preserving AI solutions for the German language.