Noise gate and Silero VAD now work fully independently — the worklet
attaches when either is enabled and bypasses the amplitude gate when
only VAD is on (noiseGateActive flag). SileroVADGate gains a two-phase
lifecycle: init(ctx) loads the ONNX model eagerly when the AudioContext
is first created; start(stream) is then near-instant when the user
enables VAD. stop() pauses without unloading the model so re-enabling
is also instant. VAD checkbox no longer requires the noise gate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After setProcessor resolves, track.mediaStreamTrack returns the processed
(noise-gated) track. The VAD was seeing gated silence, closing immediately,
and deadlocking with both gates closed. Capture the raw MediaStreamTrack
before calling setProcessor and pass that to SileroVADGate instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy model is hardcoded to 1536 samples (96ms frames); v5 uses 512
samples (32ms), reducing gate open latency by 3x. Also lower default
positive/negative thresholds to 0.2/0.1 so the gate opens at the first
sign of speech rather than waiting for high model confidence.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the hard 0/1 VAD gate with a 20ms ramp in the worklet to prevent
clicks on open/close transitions. Expose positive and negative speech
probability thresholds as user-adjustable settings (defaults 0.5/0.35).
Sliders with restore-defaults button added to the VAD section of the
audio settings tab.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vite-plugin-static-copy only copies files at build time; in dev the /vad/
requests fell through to the SPA 404 handler, returning text/html which
caused the WASM magic-number validation error. Add a configureServer
middleware that serves the worklet bundle, ONNX model, and WASM files
directly from node_modules with correct MIME types during development.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vad-web's own dependency was resolved to ort@1.24.3 (nested in its
node_modules), which only has threaded WASM requiring a .mjs dynamic
import that Vite fails to serve correctly. Pin ort to 1.18.0 via yarn
resolutions so all packages share the same copy with ort-wasm-simd.wasm
(non-threaded SIMD). Also remove the now-unnecessary COOP/COEP headers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ort 1.19+ dropped non-threaded WASM binaries and replaced them with a
threaded .mjs loader that Vite's dev server fails to serve correctly
(wrong MIME type / transform interception). ort 1.18 ships ort-wasm-simd.wasm
which works with numThreads=1 and needs no .mjs dynamic import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The threaded ORT WASM requires ort-wasm-simd-threaded.mjs to be served
alongside the .wasm files, and needs SharedArrayBuffer (COOP/COEP headers).
Add the .mjs to the static copy targets, add the required headers to the
Vite dev server, and set ort.env.wasm.numThreads=1 as a single-threaded
fallback that avoids the SharedArrayBuffer requirement entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Starting the gate closed caused permanent silence if the ONNX model or
WASM files failed to load (onFrameProcessed never fired). Gate now starts
open so audio flows immediately; the first silence frame closes it. Also
ensures the gate is always reset to open when VAD is disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
onSpeechStart/onSpeechEnd fire at segment boundaries — with constant
non-speech noise, onSpeechEnd never fires so the gate stayed open.
Switch to onFrameProcessed which fires every ~96ms and applies hysteresis
(open at >0.5, close at <0.35) matching Silero's own thresholds. Gate now
starts closed and opens only once the first speech frame is confirmed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integrates @ricky0123/vad-web's MicVAD as an optional voice activity detector
alongside the noise gate. When enabled, the Silero ONNX model classifies each
audio frame as speech or silence; silence frames mute the worklet's output via
a new VAD gate message. VAD is wired into Publisher.ts alongside the existing
noise gate transformer. Vite is configured to copy the worklet bundle, ONNX
model, and ORT WASM files to /vad/ so they're reachable at runtime.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a per-sample transient suppressor in the noise gate AudioWorklet
that instantly cuts gain when a sudden loud peak (desk hit, mic bump) exceeds
the slow background RMS by a configurable threshold, then releases over a
short window. Exposes enable, sensitivity, and release controls in the audio
settings tab.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>