Configuration

Settings are stored in /usr/local/antmedia/conf/scribe.properties as YAML. You can edit the file directly or use the web UI.

Changes to configuration can be applied without restarting AntMedia Server using the hot reload feature (Dashboard → “Reload Configuration” button or POST /rest/scribe/reload). Active streams continue with their existing configuration; new streams use the reloaded settings.

Supported Providers

Scribe supports multiple transcription providers:

  • Speechmatics — Cloud-based real-time transcription with multi-language translation

  • Speaches — Self-hosted OpenAI-compatible server using faster-whisper (recommended for self-hosting)

Example: Speechmatics (Cloud)

licenseKey: "YOUR-LICENSE-KEY"
provider: "speechmatics"
speechmaticsApiKey: "YOUR-SPEECHMATICS-API-KEY"

applications:
  LiveApp:
    speechmaticsLanguage: "en"
    speechmaticsTranslate: "fr,de"
    logTranscriptions: false
    subtitleMaxCharsPerLine: 32
    subtitleWordsPerMinute: 160
    streamNameTemplates:
      "^spanish-.*":
        speechmaticsLanguage: "es"
        speechmaticsTranslate: "en"
      ".*-test$":
        logTranscriptions: true

Per-stream settings (streamNameTemplates)

Within an application block you can define streamNameTemplates to apply different settings to individual streams based on their stream ID. Keys are Java regular expressions matched against the full stream ID; the first matching pattern wins.

applications:
  LiveApp:
    speechmaticsLanguage: "en"        # default for all streams in this app
    subtitleMaxCharsPerLine: 32
    streamNameTemplates:
      "^spanish-.*":                  # streams whose ID starts with "spanish-"
        speechmaticsLanguage: "es"
        speechmaticsTranslate: "en"
      ".*-test$":                     # streams whose ID ends with "-test"
        logTranscriptions: true
        subtitleMaxCharsPerLine: 48

Any per-application setting (provider-agnostic or provider-specific) can be overridden inside a stream template. Settings absent from the matching template fall back to the application-level value, then to the global value.

Behaviour

Detail

Order matters

Patterns are evaluated top-to-bottom; the first match is used.

Full-string match

The regex must match the entire stream ID. Anchors ^ / $ are recommended for clarity but the match is always against the whole string.

Runtime behaviour

Stream templates are resolved when a stream starts. Changing templates requires a configuration reload; streams already in progress retain their original settings until they are restarted.

Overridable settings per provider

All per-application settings listed in the tables above can appear inside a stream template. Common overrides:

Provider-agnostic: logTranscriptions, subtitleMaxCharsPerLine, subtitleWordsPerMinute

Speechmatics: speechmaticsLanguage, speechmaticsTranslate

Speaches: speachesLanguage, speachesModel, speachesServerUrl, speachesVadThreshold, speachesVadSilenceDurationMs

Global settings

These settings apply to all applications unless overridden at the application level.

Key

Description

Default

licenseKey

Your Scribe license key.

provider

Transcription backend: speechmatics or speaches.

speechmatics

speechmaticsApiKey

Speechmatics API key (when provider is speechmatics).

speachesServerUrl

Speaches base WebSocket URL (when provider is speaches), e.g. ws://localhost:8000.

ws://localhost:8000

Per-application settings

These go under applications.<appName>. Any setting not specified falls back to the global value, then to the built-in default.

Provider-agnostic settings

These settings work with all providers.

Key

Description

Default

logTranscriptions

When true, every transcript line is written to the AntMedia application log.

false

subtitleMaxCharsPerLine

Maximum number of characters per subtitle line before the text is wrapped onto a new cue.

32

subtitleWordsPerMinute

Reading speed (words per minute) used to calculate the minimum display time for each subtitle cue. Industry standard is 160–180 WPM.

160

Speechmatics-specific settings

These settings apply when provider is set to speechmatics.

Key

Description

Default

speechmaticsLanguage

Source language code (ISO 639-1), e.g. en, de, nl, fr.

en

speechmaticsTranslate

Comma-separated list of target languages for live translation, e.g. fr,de. A separate subtitle track is created for each language.

Speaches-specific settings

These settings apply when provider is set to speaches.

Key

Description

Default

speachesServerUrl

Speaches base WebSocket URL (can override global setting per-app).

ws://localhost:8000

speachesModel

Hugging Face model ID, e.g. Systran/faster-distil-whisper-small.en or Systran/faster-whisper-large-v3. The server downloads the model automatically on first use.

Systran/faster-whisper-small

speachesLanguage

Source language code (ISO 639-1), e.g. en, es, fr. Leave blank for automatic language detection.

en

speachesVadThreshold

Voice activity detection sensitivity (0.0–1.0). Higher values require more confident speech detection before triggering transcription. Raising this reduces false positives from music or background noise.

0.5

speachesVadSilenceDurationMs

Milliseconds of silence required before the speech buffer is committed for transcription. Lower values produce more frequent, shorter transcriptions. Try 200–400 for fast speakers.

300

Setting Up Speaches

Speaches is an OpenAI API-compatible server for real-time speech transcription powered by faster-whisper. It runs on GPU and CPU and downloads models automatically from Hugging Face.

Quick Start (Docker)

# CPU
docker run -p 8000:8000 ghcr.io/speaches-ai/speaches

# NVIDIA GPU
docker run --gpus all -p 8000:8000 ghcr.io/speaches-ai/speaches

Model Selection

Speaches models are specified as Hugging Face model IDs. The server downloads them automatically on first use.

Model

Speed

Accuracy

Use Case

Systran/faster-distil-whisper-small.en

Very fast

Good

Recommended — English only

Systran/faster-whisper-small

Fast

Good

Multilingual

Systran/faster-whisper-medium

Medium

Better

Higher accuracy needed

Systran/faster-whisper-large-v3

Slow

High

Maximum accuracy

deepdml/faster-whisper-large-v3-turbo-ct2

Medium

High

Fast large model

VAD Tuning

The two most impactful settings for real-time subtitle quality:

  • ``speachesVadSilenceDurationMs`` — Lower values (200–300 ms) produce more frequent, shorter transcription segments, which reduces subtitle latency for fast speakers.

  • ``speachesVadThreshold`` — Raise to 0.6–0.8 if music or background noise causes spurious transcriptions (e.g. repeated words during non-speech audio).

Note

Whisper models may produce repetitive tokens (e.g. “you you you”) when processing non-speech audio such as music. Raising speachesVadThreshold reduces this effect.