.. _Configuration:

Configuration
=============

Settings are stored in ``/usr/local/antmedia/conf/scribe.properties`` as YAML. You can edit
the file directly or use the :ref:`web UI <Web UI>`.

Changes to configuration can be applied without restarting AntMedia Server using the hot reload
feature (Dashboard → "Reload Configuration" button or ``POST /rest/scribe/reload``). Active
streams continue with their existing configuration; new streams use the reloaded settings.

Supported Providers
-------------------

Scribe supports multiple transcription providers:

* **Speechmatics** — Cloud-based real-time transcription with multi-language translation
* **Speaches** — Self-hosted OpenAI-compatible server using faster-whisper (recommended for self-hosting)

Example: Speechmatics (Cloud)
------------------------------

.. code-block:: yaml

   licenseKey: "YOUR-LICENSE-KEY"
   provider: "speechmatics"
   speechmaticsApiKey: "YOUR-SPEECHMATICS-API-KEY"

   applications:
     LiveApp:
       speechmaticsLanguage: "en"
       speechmaticsTranslate: "fr,de"
       logTranscriptions: false
       subtitleMaxCharsPerLine: 32
       subtitleWordsPerMinute: 160
       streamNameTemplates:
         "^spanish-.*":
           speechmaticsLanguage: "es"
           speechmaticsTranslate: "en"
         ".*-test$":
           logTranscriptions: true

Example: Speaches (Self-Hosted, Recommended)
--------------------------------------------

.. code-block:: yaml

   licenseKey: "YOUR-LICENSE-KEY"
   provider: "speaches"
   speachesServerUrl: "ws://localhost:8000"

   applications:
     LiveApp:
       speachesModel: "Systran/faster-distil-whisper-small.en"
       speachesLanguage: "en"
       speachesVadThreshold: 0.5
       speachesVadSilenceDurationMs: 300
       logTranscriptions: false
       subtitleMaxCharsPerLine: 32
       subtitleWordsPerMinute: 160
       streamNameTemplates:
         "^multilingual-.*":
           speachesModel: "Systran/faster-whisper-large-v3"
           speachesLanguage: ""

The plugin only activates for applications that appear under ``applications:``. The
application name must match the AntMedia webapp name exactly (case-sensitive).

An application is enabled when it has an entry under ``applications:`` and disabled when
that entry is absent. Using the web UI to disable an application removes its entry from the
file.

Per-stream settings (``streamNameTemplates``)
---------------------------------------------

Within an application block you can define ``streamNameTemplates`` to apply different settings
to individual streams based on their stream ID. Keys are Java regular expressions matched
against the full stream ID; the **first** matching pattern wins.

.. code-block:: yaml

   applications:
     LiveApp:
       speechmaticsLanguage: "en"        # default for all streams in this app
       subtitleMaxCharsPerLine: 32
       streamNameTemplates:
         "^spanish-.*":                  # streams whose ID starts with "spanish-"
           speechmaticsLanguage: "es"
           speechmaticsTranslate: "en"
         ".*-test$":                     # streams whose ID ends with "-test"
           logTranscriptions: true
           subtitleMaxCharsPerLine: 48

Any per-application setting (provider-agnostic or provider-specific) can be overridden
inside a stream template. Settings absent from the matching template fall back to the
application-level value, then to the global value.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Behaviour
     - Detail
   * - Order matters
     - Patterns are evaluated top-to-bottom; the first match is used.
   * - Full-string match
     - The regex must match the **entire** stream ID. Anchors ``^`` / ``$`` are
       recommended for clarity but the match is always against the whole string.
   * - Runtime behaviour
     - Stream templates are resolved when a stream starts. Changing templates
       requires a configuration reload; streams already in progress retain their
       original settings until they are restarted.

Overridable settings per provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All per-application settings listed in the tables above can appear inside a stream
template. Common overrides:

*Provider-agnostic:* ``logTranscriptions``, ``subtitleMaxCharsPerLine``, ``subtitleWordsPerMinute``

*Speechmatics:* ``speechmaticsLanguage``, ``speechmaticsTranslate``

*Speaches:* ``speachesLanguage``, ``speachesModel``, ``speachesServerUrl``,
``speachesVadThreshold``, ``speachesVadSilenceDurationMs``

Global settings
---------------

These settings apply to all applications unless overridden at the application level.

.. list-table::
   :widths: 30 55 15
   :header-rows: 1

   * - Key
     - Description
     - Default
   * - ``licenseKey``
     - Your Scribe license key.
     - —
   * - ``provider``
     - Transcription backend: ``speechmatics`` or ``speaches``.
     - ``speechmatics``
   * - ``speechmaticsApiKey``
     - Speechmatics API key (when provider is ``speechmatics``).
     - —
   * - ``speachesServerUrl``
     - Speaches base WebSocket URL (when provider is ``speaches``), e.g. ``ws://localhost:8000``.
     - ``ws://localhost:8000``

Per-application settings
------------------------

These go under ``applications.<appName>``. Any setting not specified falls back to the global
value, then to the built-in default.

Provider-agnostic settings
^^^^^^^^^^^^^^^^^^^^^^^^^^

These settings work with all providers.

.. list-table::
   :widths: 30 55 15
   :header-rows: 1

   * - Key
     - Description
     - Default
   * - ``logTranscriptions``
     - When ``true``, every transcript line is written to the AntMedia application log.
     - ``false``
   * - ``subtitleMaxCharsPerLine``
     - Maximum number of characters per subtitle line before the text is wrapped onto
       a new cue.
     - ``32``
   * - ``subtitleWordsPerMinute``
     - Reading speed (words per minute) used to calculate the minimum display time for
       each subtitle cue. Industry standard is 160–180 WPM.
     - ``160``

Speechmatics-specific settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These settings apply when ``provider`` is set to ``speechmatics``.

.. list-table::
   :widths: 30 55 15
   :header-rows: 1

   * - Key
     - Description
     - Default
   * - ``speechmaticsLanguage``
     - Source language code (`ISO 639-1 <https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes>`_),
       e.g. ``en``, ``de``, ``nl``, ``fr``.
     - ``en``
   * - ``speechmaticsTranslate``
     - Comma-separated list of target languages for live translation,
       e.g. ``fr,de``. A separate subtitle track is created for each language.
     - —

Speaches-specific settings
^^^^^^^^^^^^^^^^^^^^^^^^^^

These settings apply when ``provider`` is set to ``speaches``.

.. list-table::
   :widths: 35 50 15
   :header-rows: 1

   * - Key
     - Description
     - Default
   * - ``speachesServerUrl``
     - Speaches base WebSocket URL (can override global setting per-app).
     - ``ws://localhost:8000``
   * - ``speachesModel``
     - Hugging Face model ID, e.g. ``Systran/faster-distil-whisper-small.en``
       or ``Systran/faster-whisper-large-v3``. The server downloads the model
       automatically on first use.
     - ``Systran/faster-whisper-small``
   * - ``speachesLanguage``
     - Source language code (`ISO 639-1 <https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes>`_),
       e.g. ``en``, ``es``, ``fr``. Leave blank for automatic language detection.
     - ``en``
   * - ``speachesVadThreshold``
     - Voice activity detection sensitivity (0.0–1.0). Higher values require
       more confident speech detection before triggering transcription.
       Raising this reduces false positives from music or background noise.
     - ``0.5``
   * - ``speachesVadSilenceDurationMs``
     - Milliseconds of silence required before the speech buffer is committed
       for transcription. Lower values produce more frequent, shorter transcriptions.
       Try 200–400 for fast speakers.
     - ``300``

Setting Up Speaches
--------------------

`Speaches <https://github.com/speaches-ai/speaches>`_ is an OpenAI API-compatible server for
real-time speech transcription powered by ``faster-whisper``. It runs on GPU and CPU and
downloads models automatically from Hugging Face.

Quick Start (Docker)
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   # CPU
   docker run -p 8000:8000 ghcr.io/speaches-ai/speaches

   # NVIDIA GPU
   docker run --gpus all -p 8000:8000 ghcr.io/speaches-ai/speaches

Model Selection
^^^^^^^^^^^^^^^^

Speaches models are specified as Hugging Face model IDs. The server downloads them automatically
on first use.

.. list-table::
   :widths: 45 15 15 25
   :header-rows: 1

   * - Model
     - Speed
     - Accuracy
     - Use Case
   * - ``Systran/faster-distil-whisper-small.en``
     - Very fast
     - Good
     - **Recommended — English only**
   * - ``Systran/faster-whisper-small``
     - Fast
     - Good
     - Multilingual
   * - ``Systran/faster-whisper-medium``
     - Medium
     - Better
     - Higher accuracy needed
   * - ``Systran/faster-whisper-large-v3``
     - Slow
     - High
     - Maximum accuracy
   * - ``deepdml/faster-whisper-large-v3-turbo-ct2``
     - Medium
     - High
     - Fast large model

VAD Tuning
^^^^^^^^^^^

The two most impactful settings for real-time subtitle quality:

* **``speachesVadSilenceDurationMs``** — Lower values (200–300 ms) produce more frequent,
  shorter transcription segments, which reduces subtitle latency for fast speakers.
* **``speachesVadThreshold``** — Raise to 0.6–0.8 if music or background noise causes
  spurious transcriptions (e.g. repeated words during non-speech audio).

.. note::
   Whisper models may produce repetitive tokens (e.g. "you you you") when processing
   non-speech audio such as music. Raising ``speachesVadThreshold`` reduces this effect.