Configuration

Settings are stored in /usr/local/antmedia/conf/scribe.properties as YAML. You can edit the file directly or use the web UI.

Changes to configuration can be applied without restarting AntMedia Server using the hot reload feature (Dashboard → “Reload Configuration” button or POST /rest/scribe/reload). Active streams continue with their existing configuration; new streams use the reloaded settings.

Supported Providers

Scribe supports multiple transcription providers:

Speechmatics — Cloud-based real-time transcription with multi-language translation
Speaches — Self-hosted OpenAI-compatible server using faster-whisper

Example: Speechmatics (Cloud)

licenseKey: "YOUR-LICENSE-KEY"
provider: "speechmatics"
speechmaticsApiKey: "YOUR-SPEECHMATICS-API-KEY"

applications:
  LiveApp:
    speechmaticsLanguage: "en"
    speechmaticsTranslate: "fr,de"
    logTranscriptions: false
    subtitleMaxCharsPerLine: 32
    subtitleWordsPerMinute: 160
    streamNameTemplates:
      "^spanish-.*":
        speechmaticsLanguage: "es"
        speechmaticsTranslate: "en"
      ".*-test$":
        logTranscriptions: true

Example: Speaches (Self-Hosted)

licenseKey: "YOUR-LICENSE-KEY"
provider: "speaches"
speachesServerUrl: "ws://localhost:8000"

applications:
  LiveApp:
    speachesModel: "Systran/faster-distil-whisper-small.en"
    speachesLanguage: "en"
    speachesVadThreshold: 0.5
    speachesVadSilenceDurationMs: 300
    logTranscriptions: false
    subtitleMaxCharsPerLine: 32
    subtitleWordsPerMinute: 160
    streamNameTemplates:
      "^multilingual-.*":
        speachesModel: "Systran/faster-whisper-large-v3"
        speachesLanguage: ""

The plugin only activates for applications that appear under applications:. The application name must match the AntMedia webapp name exactly (case-sensitive).

An application is enabled when it has an entry under applications: and disabled when that entry is absent. Using the web UI to disable an application removes its entry from the file.

Global settings

These settings apply to all applications unless overridden at the application level.

Key	Description	Default
`licenseKey`	Your Scribe license key.	—
`provider`	Transcription backend: `speechmatics` or `speaches`.	`speechmatics`
`speechmaticsApiKey`	Speechmatics API key (when provider is `speechmatics`).	—
`speachesServerUrl`	Speaches base WebSocket URL (when provider is `speaches`), e.g. `ws://localhost:8000`.	`ws://localhost:8000`

Per-application settings

These go under applications.<appName>. Any setting not specified falls back to the global value, then to the built-in default.

Provider-agnostic settings

These settings work with all providers.

Key	Description	Default
`logTranscriptions`	When `true`, every transcript line is written to the AntMedia application log.	`false`
`subtitleMaxCharsPerLine`	Maximum number of characters per subtitle line before the text is wrapped onto a new cue.	`32`
`subtitleWordsPerMinute`	Reading speed (words per minute) used to calculate the minimum display time for each subtitle cue. Industry standard is 160–180 WPM.	`160`

Speechmatics-specific settings

These settings apply when provider is set to speechmatics.

Key	Description	Default
`speechmaticsLanguage`	Source language code (ISO 639-1), e.g. `en`, `de`, `nl`, `fr`.	`en`
`speechmaticsTranslate`	Comma-separated list of target languages for live translation, e.g. `fr,de`. A separate subtitle track is created for each language.	—

Speaches-specific settings

These settings apply when provider is set to speaches.

Key	Description	Default
`speachesServerUrl`	Speaches base WebSocket URL (can override global setting per-app).	`ws://localhost:8000`
`speachesModel`	Hugging Face model ID, e.g. `Systran/faster-distil-whisper-small.en` or `Systran/faster-whisper-large-v3`. The server downloads the model automatically on first use.	`Systran/faster-whisper-small`
`speachesLanguage`	Source language code (ISO 639-1), e.g. `en`, `es`, `fr`. Leave blank for automatic language detection.	`en`
`speachesVadThreshold`	Voice activity detection sensitivity (0.0–1.0). Higher values require more confident speech detection before triggering transcription. Raising this reduces false positives from music or background noise.	`0.5`
`speachesVadSilenceDurationMs`	Milliseconds of silence required before the speech buffer is committed for transcription. Lower values produce more frequent, shorter transcriptions. Try 200–400 for fast speakers.	`300`

Per-stream settings

Within an application block you can define streamNameTemplates to apply different settings to individual streams based on their stream ID. Keys are Java regular expressions matched against the full stream ID; the first matching pattern wins.

applications:
  LiveApp:
    speechmaticsLanguage: "en"        # default for all streams in this app
    subtitleMaxCharsPerLine: 32
    streamNameTemplates:
      "^spanish-.*":                  # streams whose ID starts with "spanish-"
        speechmaticsLanguage: "es"
        speechmaticsTranslate: "en"
      ".*-test$":                     # streams whose ID ends with "-test"
        logTranscriptions: true
        subtitleMaxCharsPerLine: 48

Any per-application setting (provider-agnostic or provider-specific) can be overridden inside a stream template. Settings absent from the matching template fall back to the application-level value, then to the global value.

Behaviour	Detail
Order matters	Patterns are evaluated top-to-bottom; the first match is used.
Full-string match	The regex must match the entire stream ID. Anchors `^` / `$` are recommended for clarity but the match is always against the whole string.
Runtime behaviour	Stream templates are resolved when a stream starts. Changing templates requires a configuration reload; streams already in progress retain their original settings until they are restarted.

Overridable settings per provider

All per-application settings listed in the tables above can appear inside a stream template. Common overrides:

Provider-agnostic: logTranscriptions, subtitleMaxCharsPerLine, subtitleWordsPerMinute

Speechmatics: speechmaticsLanguage, speechmaticsTranslate

Speaches: speachesLanguage, speachesModel, speachesServerUrl, speachesVadThreshold, speachesVadSilenceDurationMs

Setting Up Speaches

Speaches is an OpenAI API-compatible server for real-time speech transcription powered by faster-whisper. It runs on GPU and CPU and downloads models automatically from Hugging Face.

Quick Start (Docker)

# CPU
docker run -p 8000:8000 ghcr.io/speaches-ai/speaches

# NVIDIA GPU
docker run --gpus all -p 8000:8000 ghcr.io/speaches-ai/speaches

Model Selection

Speaches models are specified as Hugging Face model IDs. The server downloads them automatically on first use.

Model	Speed	Accuracy	Use Case
`Systran/faster-distil-whisper-small.en`	Very fast	Good	English only
`Systran/faster-whisper-small`	Fast	Good	Multilingual
`Systran/faster-whisper-medium`	Medium	Better	Higher accuracy needed
`Systran/faster-whisper-large-v3`	Slow	High	Maximum accuracy
`deepdml/faster-whisper-large-v3-turbo-ct2`	Medium	High	Fast large model

VAD Tuning

The two most impactful settings for real-time subtitle quality:

``speachesVadSilenceDurationMs`` — Lower values (200–300 ms) produce more frequent, shorter transcription segments, which reduces subtitle latency for fast speakers.
``speachesVadThreshold`` — Raise to 0.6–0.8 if music or background noise causes spurious transcriptions (e.g. repeated words during non-speech audio).

Note

Whisper models may produce repetitive tokens (e.g. “you you you”) when processing non-speech audio such as music. Raising speachesVadThreshold reduces this effect.