new URL('./file.ts', import.meta.url) copies the file verbatim — the
browser gets raw TypeScript and addModule() throws DOMException. Using
?worker&url tells Vite to bundle and compile the file, producing a .js
output that the browser can actually execute.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use compound-web Form/InlineField/RadioControl/Label/HelpMessage for
VAD mode selection (proper radio button rendering)
- Standard mode: 256 samples / 16 ms hop + 5 ms open / 20 ms close ramp
- Aggressive mode: 160 samples / 10 ms hop + 1 ms open / 5 ms close ramp
- Worklet stores WebAssembly.Module and recreates TenVADRuntime with the
correct hop size whenever the mode changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standard: 5 ms open / 20 ms close ramp (comfortable feel)
Aggressive: 1 ms open / 5 ms close ramp (lowest possible latency)
The mode is surfaced as a radio selector in Settings → Audio → Voice
activity detection, visible while VAD is enabled. Wired through
NoiseGateParams.vadAggressive → worklet updateParams.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Hop size 256 → 160 samples @ 16 kHz: VAD decision every 10 ms instead
of 16 ms (minimum supported by TEN-VAD)
- Asymmetric VAD ramp: 5 ms open (was 20 ms) to avoid masking speech onset,
20 ms close retained for de-click on silence
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TEN-VAD (official TEN-framework/ten-vad WASM, no npm dependency) replaces
@ricky0123/vad-web. The WASM module is compiled once on the main thread and
passed to the AudioWorklet via processorOptions, where it is instantiated
synchronously and called every 16 ms with no IPC round-trip.
- Add public/vad/ten_vad.{wasm,js} from official upstream lib/Web/
- NoiseGateProcessor: TenVADRuntime class wraps the Emscripten WASM with
minimal import stubs; 3:1 decimation accumulates 256 Int16 samples @
16 kHz per hop; hysteresis controls vadGateOpen directly in-worklet
- NoiseGateTransformer: fetch+compile WASM once (module-level cache),
pass WebAssembly.Module via processorOptions; remove setVADOpen()
- Publisher: remove all SileroVADGate lifecycle (init/start/stop/destroy,
rawMicTrack capture); VAD params folded into single combineLatest;
fix transient suppressor standalone attach (shouldAttach now includes
transientSuppressorEnabled)
- vite.config.ts: remove viteStaticCopy, serveVadAssets plugin, and all
vad-web/onnxruntime copy targets (public/vad/ served automatically)
- Remove @ricky0123/vad-web, onnxruntime-web deps and resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Noise gate and Silero VAD now work fully independently — the worklet
attaches when either is enabled and bypasses the amplitude gate when
only VAD is on (noiseGateActive flag). SileroVADGate gains a two-phase
lifecycle: init(ctx) loads the ONNX model eagerly when the AudioContext
is first created; start(stream) is then near-instant when the user
enables VAD. stop() pauses without unloading the model so re-enabling
is also instant. VAD checkbox no longer requires the noise gate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After setProcessor resolves, track.mediaStreamTrack returns the processed
(noise-gated) track. The VAD was seeing gated silence, closing immediately,
and deadlocking with both gates closed. Capture the raw MediaStreamTrack
before calling setProcessor and pass that to SileroVADGate instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy model is hardcoded to 1536 samples (96ms frames); v5 uses 512
samples (32ms), reducing gate open latency by 3x. Also lower default
positive/negative thresholds to 0.2/0.1 so the gate opens at the first
sign of speech rather than waiting for high model confidence.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the hard 0/1 VAD gate with a 20ms ramp in the worklet to prevent
clicks on open/close transitions. Expose positive and negative speech
probability thresholds as user-adjustable settings (defaults 0.5/0.35).
Sliders with restore-defaults button added to the VAD section of the
audio settings tab.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vite-plugin-static-copy only copies files at build time; in dev the /vad/
requests fell through to the SPA 404 handler, returning text/html which
caused the WASM magic-number validation error. Add a configureServer
middleware that serves the worklet bundle, ONNX model, and WASM files
directly from node_modules with correct MIME types during development.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vad-web's own dependency was resolved to ort@1.24.3 (nested in its
node_modules), which only has threaded WASM requiring a .mjs dynamic
import that Vite fails to serve correctly. Pin ort to 1.18.0 via yarn
resolutions so all packages share the same copy with ort-wasm-simd.wasm
(non-threaded SIMD). Also remove the now-unnecessary COOP/COEP headers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ort 1.19+ dropped non-threaded WASM binaries and replaced them with a
threaded .mjs loader that Vite's dev server fails to serve correctly
(wrong MIME type / transform interception). ort 1.18 ships ort-wasm-simd.wasm
which works with numThreads=1 and needs no .mjs dynamic import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The threaded ORT WASM requires ort-wasm-simd-threaded.mjs to be served
alongside the .wasm files, and needs SharedArrayBuffer (COOP/COEP headers).
Add the .mjs to the static copy targets, add the required headers to the
Vite dev server, and set ort.env.wasm.numThreads=1 as a single-threaded
fallback that avoids the SharedArrayBuffer requirement entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Starting the gate closed caused permanent silence if the ONNX model or
WASM files failed to load (onFrameProcessed never fired). Gate now starts
open so audio flows immediately; the first silence frame closes it. Also
ensures the gate is always reset to open when VAD is disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
onSpeechStart/onSpeechEnd fire at segment boundaries — with constant
non-speech noise, onSpeechEnd never fires so the gate stayed open.
Switch to onFrameProcessed which fires every ~96ms and applies hysteresis
(open at >0.5, close at <0.35) matching Silero's own thresholds. Gate now
starts closed and opens only once the first speech frame is confirmed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integrates @ricky0123/vad-web's MicVAD as an optional voice activity detector
alongside the noise gate. When enabled, the Silero ONNX model classifies each
audio frame as speech or silence; silence frames mute the worklet's output via
a new VAD gate message. VAD is wired into Publisher.ts alongside the existing
noise gate transformer. Vite is configured to copy the worklet bundle, ONNX
model, and ORT WASM files to /vad/ so they're reachable at runtime.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a per-sample transient suppressor in the noise gate AudioWorklet
that instantly cuts gain when a sudden loud peak (desk hit, mic bump) exceeds
the slow background RMS by a configurable threshold, then releases over a
short window. Exposes enable, sensitivity, and release controls in the audio
settings tab.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>