Configuration
Settings are stored in /usr/local/antmedia/conf/scribe.properties as YAML. You can edit
the file directly or use the web UI.
Changes to configuration can be applied without restarting AntMedia Server using the hot reload
feature (Dashboard → “Reload Configuration” button or POST /rest/scribe/reload). Active
streams continue with their existing configuration; new streams use the reloaded settings.
Supported Providers
Scribe supports multiple transcription providers:
Speechmatics — Cloud-based real-time transcription with multi-language translation
Speaches — Self-hosted OpenAI-compatible server using faster-whisper (recommended for self-hosting)
Example: Speechmatics (Cloud)
licenseKey: "YOUR-LICENSE-KEY"
provider: "speechmatics"
speechmaticsApiKey: "YOUR-SPEECHMATICS-API-KEY"
applications:
LiveApp:
speechmaticsLanguage: "en"
speechmaticsTranslate: "fr,de"
logTranscriptions: false
subtitleMaxCharsPerLine: 32
subtitleWordsPerMinute: 160
streamNameTemplates:
"^spanish-.*":
speechmaticsLanguage: "es"
speechmaticsTranslate: "en"
".*-test$":
logTranscriptions: true
Example: Speaches (Self-Hosted, Recommended)
licenseKey: "YOUR-LICENSE-KEY"
provider: "speaches"
speachesServerUrl: "ws://localhost:8000"
applications:
LiveApp:
speachesModel: "Systran/faster-distil-whisper-small.en"
speachesLanguage: "en"
speachesVadThreshold: 0.5
speachesVadSilenceDurationMs: 300
logTranscriptions: false
subtitleMaxCharsPerLine: 32
subtitleWordsPerMinute: 160
streamNameTemplates:
"^multilingual-.*":
speachesModel: "Systran/faster-whisper-large-v3"
speachesLanguage: ""
The plugin only activates for applications that appear under applications:. The
application name must match the AntMedia webapp name exactly (case-sensitive).
An application is enabled when it has an entry under applications: and disabled when
that entry is absent. Using the web UI to disable an application removes its entry from the
file.
Per-stream settings (streamNameTemplates)
Within an application block you can define streamNameTemplates to apply different settings
to individual streams based on their stream ID. Keys are Java regular expressions matched
against the full stream ID; the first matching pattern wins.
applications:
LiveApp:
speechmaticsLanguage: "en" # default for all streams in this app
subtitleMaxCharsPerLine: 32
streamNameTemplates:
"^spanish-.*": # streams whose ID starts with "spanish-"
speechmaticsLanguage: "es"
speechmaticsTranslate: "en"
".*-test$": # streams whose ID ends with "-test"
logTranscriptions: true
subtitleMaxCharsPerLine: 48
Any per-application setting (provider-agnostic or provider-specific) can be overridden inside a stream template. Settings absent from the matching template fall back to the application-level value, then to the global value.
Behaviour |
Detail |
|---|---|
Order matters |
Patterns are evaluated top-to-bottom; the first match is used. |
Full-string match |
The regex must match the entire stream ID. Anchors |
Runtime behaviour |
Stream templates are resolved when a stream starts. Changing templates requires a configuration reload; streams already in progress retain their original settings until they are restarted. |
Overridable settings per provider
All per-application settings listed in the tables above can appear inside a stream template. Common overrides:
Provider-agnostic: logTranscriptions, subtitleMaxCharsPerLine, subtitleWordsPerMinute
Speechmatics: speechmaticsLanguage, speechmaticsTranslate
Speaches: speachesLanguage, speachesModel, speachesServerUrl,
speachesVadThreshold, speachesVadSilenceDurationMs
Global settings
These settings apply to all applications unless overridden at the application level.
Key |
Description |
Default |
|---|---|---|
|
Your Scribe license key. |
— |
|
Transcription backend: |
|
|
Speechmatics API key (when provider is |
— |
|
Speaches base WebSocket URL (when provider is |
|
Per-application settings
These go under applications.<appName>. Any setting not specified falls back to the global
value, then to the built-in default.
Provider-agnostic settings
These settings work with all providers.
Key |
Description |
Default |
|---|---|---|
|
When |
|
|
Maximum number of characters per subtitle line before the text is wrapped onto a new cue. |
|
|
Reading speed (words per minute) used to calculate the minimum display time for each subtitle cue. Industry standard is 160–180 WPM. |
|
Speechmatics-specific settings
These settings apply when provider is set to speechmatics.
Key |
Description |
Default |
|---|---|---|
|
Source language code (ISO 639-1),
e.g. |
|
|
Comma-separated list of target languages for live translation,
e.g. |
— |
Speaches-specific settings
These settings apply when provider is set to speaches.
Key |
Description |
Default |
|---|---|---|
|
Speaches base WebSocket URL (can override global setting per-app). |
|
|
Hugging Face model ID, e.g. |
|
|
Source language code (ISO 639-1),
e.g. |
|
|
Voice activity detection sensitivity (0.0–1.0). Higher values require more confident speech detection before triggering transcription. Raising this reduces false positives from music or background noise. |
|
|
Milliseconds of silence required before the speech buffer is committed for transcription. Lower values produce more frequent, shorter transcriptions. Try 200–400 for fast speakers. |
|
Setting Up Speaches
Speaches is an OpenAI API-compatible server for
real-time speech transcription powered by faster-whisper. It runs on GPU and CPU and
downloads models automatically from Hugging Face.
Quick Start (Docker)
# CPU
docker run -p 8000:8000 ghcr.io/speaches-ai/speaches
# NVIDIA GPU
docker run --gpus all -p 8000:8000 ghcr.io/speaches-ai/speaches
Model Selection
Speaches models are specified as Hugging Face model IDs. The server downloads them automatically on first use.
Model |
Speed |
Accuracy |
Use Case |
|---|---|---|---|
|
Very fast |
Good |
Recommended — English only |
|
Fast |
Good |
Multilingual |
|
Medium |
Better |
Higher accuracy needed |
|
Slow |
High |
Maximum accuracy |
|
Medium |
High |
Fast large model |
VAD Tuning
The two most impactful settings for real-time subtitle quality:
``speachesVadSilenceDurationMs`` — Lower values (200–300 ms) produce more frequent, shorter transcription segments, which reduces subtitle latency for fast speakers.
``speachesVadThreshold`` — Raise to 0.6–0.8 if music or background noise causes spurious transcriptions (e.g. repeated words during non-speech audio).
Note
Whisper models may produce repetitive tokens (e.g. “you you you”) when processing
non-speech audio such as music. Raising speachesVadThreshold reduces this effect.