.. _Configuration: Configuration ============= Settings are stored in ``/usr/local/antmedia/conf/scribe.properties`` as YAML. You can edit the file directly or use the :ref:`web UI `. Changes to configuration can be applied without restarting AntMedia Server using the hot reload feature (Dashboard → "Reload Configuration" button or ``POST /rest/scribe/reload``). Active streams continue with their existing configuration; new streams use the reloaded settings. Supported Providers ------------------- Scribe supports multiple transcription providers: * **Speechmatics** — Cloud-based real-time transcription with multi-language translation * **Speaches** — Self-hosted OpenAI-compatible server using faster-whisper (recommended for self-hosting) Example: Speechmatics (Cloud) ------------------------------ .. code-block:: yaml licenseKey: "YOUR-LICENSE-KEY" provider: "speechmatics" speechmaticsApiKey: "YOUR-SPEECHMATICS-API-KEY" applications: LiveApp: speechmaticsLanguage: "en" speechmaticsTranslate: "fr,de" logTranscriptions: false subtitleMaxCharsPerLine: 32 subtitleWordsPerMinute: 160 streamNameTemplates: "^spanish-.*": speechmaticsLanguage: "es" speechmaticsTranslate: "en" ".*-test$": logTranscriptions: true Example: Speaches (Self-Hosted, Recommended) -------------------------------------------- .. code-block:: yaml licenseKey: "YOUR-LICENSE-KEY" provider: "speaches" speachesServerUrl: "ws://localhost:8000" applications: LiveApp: speachesModel: "Systran/faster-distil-whisper-small.en" speachesLanguage: "en" speachesVadThreshold: 0.5 speachesVadSilenceDurationMs: 300 logTranscriptions: false subtitleMaxCharsPerLine: 32 subtitleWordsPerMinute: 160 streamNameTemplates: "^multilingual-.*": speachesModel: "Systran/faster-whisper-large-v3" speachesLanguage: "" The plugin only activates for applications that appear under ``applications:``. The application name must match the AntMedia webapp name exactly (case-sensitive). An application is enabled when it has an entry under ``applications:`` and disabled when that entry is absent. Using the web UI to disable an application removes its entry from the file. Per-stream settings (``streamNameTemplates``) --------------------------------------------- Within an application block you can define ``streamNameTemplates`` to apply different settings to individual streams based on their stream ID. Keys are Java regular expressions matched against the full stream ID; the **first** matching pattern wins. .. code-block:: yaml applications: LiveApp: speechmaticsLanguage: "en" # default for all streams in this app subtitleMaxCharsPerLine: 32 streamNameTemplates: "^spanish-.*": # streams whose ID starts with "spanish-" speechmaticsLanguage: "es" speechmaticsTranslate: "en" ".*-test$": # streams whose ID ends with "-test" logTranscriptions: true subtitleMaxCharsPerLine: 48 Any per-application setting (provider-agnostic or provider-specific) can be overridden inside a stream template. Settings absent from the matching template fall back to the application-level value, then to the global value. .. list-table:: :widths: 30 70 :header-rows: 1 * - Behaviour - Detail * - Order matters - Patterns are evaluated top-to-bottom; the first match is used. * - Full-string match - The regex must match the **entire** stream ID. Anchors ``^`` / ``$`` are recommended for clarity but the match is always against the whole string. * - Runtime behaviour - Stream templates are resolved when a stream starts. Changing templates requires a configuration reload; streams already in progress retain their original settings until they are restarted. Overridable settings per provider ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All per-application settings listed in the tables above can appear inside a stream template. Common overrides: *Provider-agnostic:* ``logTranscriptions``, ``subtitleMaxCharsPerLine``, ``subtitleWordsPerMinute`` *Speechmatics:* ``speechmaticsLanguage``, ``speechmaticsTranslate`` *Speaches:* ``speachesLanguage``, ``speachesModel``, ``speachesServerUrl``, ``speachesVadThreshold``, ``speachesVadSilenceDurationMs`` Global settings --------------- These settings apply to all applications unless overridden at the application level. .. list-table:: :widths: 30 55 15 :header-rows: 1 * - Key - Description - Default * - ``licenseKey`` - Your Scribe license key. - — * - ``provider`` - Transcription backend: ``speechmatics`` or ``speaches``. - ``speechmatics`` * - ``speechmaticsApiKey`` - Speechmatics API key (when provider is ``speechmatics``). - — * - ``speachesServerUrl`` - Speaches base WebSocket URL (when provider is ``speaches``), e.g. ``ws://localhost:8000``. - ``ws://localhost:8000`` Per-application settings ------------------------ These go under ``applications.``. Any setting not specified falls back to the global value, then to the built-in default. Provider-agnostic settings ^^^^^^^^^^^^^^^^^^^^^^^^^^ These settings work with all providers. .. list-table:: :widths: 30 55 15 :header-rows: 1 * - Key - Description - Default * - ``logTranscriptions`` - When ``true``, every transcript line is written to the AntMedia application log. - ``false`` * - ``subtitleMaxCharsPerLine`` - Maximum number of characters per subtitle line before the text is wrapped onto a new cue. - ``32`` * - ``subtitleWordsPerMinute`` - Reading speed (words per minute) used to calculate the minimum display time for each subtitle cue. Industry standard is 160–180 WPM. - ``160`` Speechmatics-specific settings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These settings apply when ``provider`` is set to ``speechmatics``. .. list-table:: :widths: 30 55 15 :header-rows: 1 * - Key - Description - Default * - ``speechmaticsLanguage`` - Source language code (`ISO 639-1 `_), e.g. ``en``, ``de``, ``nl``, ``fr``. - ``en`` * - ``speechmaticsTranslate`` - Comma-separated list of target languages for live translation, e.g. ``fr,de``. A separate subtitle track is created for each language. - — Speaches-specific settings ^^^^^^^^^^^^^^^^^^^^^^^^^^ These settings apply when ``provider`` is set to ``speaches``. .. list-table:: :widths: 35 50 15 :header-rows: 1 * - Key - Description - Default * - ``speachesServerUrl`` - Speaches base WebSocket URL (can override global setting per-app). - ``ws://localhost:8000`` * - ``speachesModel`` - Hugging Face model ID, e.g. ``Systran/faster-distil-whisper-small.en`` or ``Systran/faster-whisper-large-v3``. The server downloads the model automatically on first use. - ``Systran/faster-whisper-small`` * - ``speachesLanguage`` - Source language code (`ISO 639-1 `_), e.g. ``en``, ``es``, ``fr``. Leave blank for automatic language detection. - ``en`` * - ``speachesVadThreshold`` - Voice activity detection sensitivity (0.0–1.0). Higher values require more confident speech detection before triggering transcription. Raising this reduces false positives from music or background noise. - ``0.5`` * - ``speachesVadSilenceDurationMs`` - Milliseconds of silence required before the speech buffer is committed for transcription. Lower values produce more frequent, shorter transcriptions. Try 200–400 for fast speakers. - ``300`` Setting Up Speaches -------------------- `Speaches `_ is an OpenAI API-compatible server for real-time speech transcription powered by ``faster-whisper``. It runs on GPU and CPU and downloads models automatically from Hugging Face. Quick Start (Docker) ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash # CPU docker run -p 8000:8000 ghcr.io/speaches-ai/speaches # NVIDIA GPU docker run --gpus all -p 8000:8000 ghcr.io/speaches-ai/speaches Model Selection ^^^^^^^^^^^^^^^^ Speaches models are specified as Hugging Face model IDs. The server downloads them automatically on first use. .. list-table:: :widths: 45 15 15 25 :header-rows: 1 * - Model - Speed - Accuracy - Use Case * - ``Systran/faster-distil-whisper-small.en`` - Very fast - Good - **Recommended — English only** * - ``Systran/faster-whisper-small`` - Fast - Good - Multilingual * - ``Systran/faster-whisper-medium`` - Medium - Better - Higher accuracy needed * - ``Systran/faster-whisper-large-v3`` - Slow - High - Maximum accuracy * - ``deepdml/faster-whisper-large-v3-turbo-ct2`` - Medium - High - Fast large model VAD Tuning ^^^^^^^^^^^ The two most impactful settings for real-time subtitle quality: * **``speachesVadSilenceDurationMs``** — Lower values (200–300 ms) produce more frequent, shorter transcription segments, which reduces subtitle latency for fast speakers. * **``speachesVadThreshold``** — Raise to 0.6–0.8 if music or background noise causes spurious transcriptions (e.g. repeated words during non-speech audio). .. note:: Whisper models may produce repetitive tokens (e.g. "you you you") when processing non-speech audio such as music. Raising ``speachesVadThreshold`` reduces this effect.