12 Commits

Author SHA1 Message Date
mk
dc1f30b84f feat: replace Silero VAD with TEN-VAD running inside the AudioWorklet
TEN-VAD (official TEN-framework/ten-vad WASM, no npm dependency) replaces
@ricky0123/vad-web. The WASM module is compiled once on the main thread and
passed to the AudioWorklet via processorOptions, where it is instantiated
synchronously and called every 16 ms with no IPC round-trip.

- Add public/vad/ten_vad.{wasm,js} from official upstream lib/Web/
- NoiseGateProcessor: TenVADRuntime class wraps the Emscripten WASM with
  minimal import stubs; 3:1 decimation accumulates 256 Int16 samples @
  16 kHz per hop; hysteresis controls vadGateOpen directly in-worklet
- NoiseGateTransformer: fetch+compile WASM once (module-level cache),
  pass WebAssembly.Module via processorOptions; remove setVADOpen()
- Publisher: remove all SileroVADGate lifecycle (init/start/stop/destroy,
  rawMicTrack capture); VAD params folded into single combineLatest;
  fix transient suppressor standalone attach (shouldAttach now includes
  transientSuppressorEnabled)
- vite.config.ts: remove viteStaticCopy, serveVadAssets plugin, and all
  vad-web/onnxruntime copy targets (public/vad/ served automatically)
- Remove @ricky0123/vad-web, onnxruntime-web deps and resolution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 07:43:52 -03:00
mk
dbd4eef899 feat: decouple noise gate and VAD, pre-warm model for instant enable
Noise gate and Silero VAD now work fully independently — the worklet
attaches when either is enabled and bypasses the amplitude gate when
only VAD is on (noiseGateActive flag). SileroVADGate gains a two-phase
lifecycle: init(ctx) loads the ONNX model eagerly when the AudioContext
is first created; start(stream) is then near-instant when the user
enables VAD. stop() pauses without unloading the model so re-enabling
is also instant. VAD checkbox no longer requires the noise gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 00:15:32 -03:00
mk
325094b54d fix: feed VAD the raw mic track captured before setProcessor
After setProcessor resolves, track.mediaStreamTrack returns the processed
(noise-gated) track. The VAD was seeing gated silence, closing immediately,
and deadlocking with both gates closed. Capture the raw MediaStreamTrack
before calling setProcessor and pass that to SileroVADGate instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 00:06:47 -03:00
mk
aff09d0e49 fix: use Silero v5 model for 32ms frames and lower default thresholds
The legacy model is hardcoded to 1536 samples (96ms frames); v5 uses 512
samples (32ms), reducing gate open latency by 3x. Also lower default
positive/negative thresholds to 0.2/0.1 so the gate opens at the first
sign of speech rather than waiting for high model confidence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 00:02:17 -03:00
mk
859db651e0 feat: add VAD threshold controls and smooth gate ramp
Replace the hard 0/1 VAD gate with a 20ms ramp in the worklet to prevent
clicks on open/close transitions. Expose positive and negative speech
probability thresholds as user-adjustable settings (defaults 0.5/0.35).
Sliders with restore-defaults button added to the VAD section of the
audio settings tab.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:57:35 -03:00
mk
1ffee2d25e fix: serve VAD assets from node_modules in dev mode
vite-plugin-static-copy only copies files at build time; in dev the /vad/
requests fell through to the SPA 404 handler, returning text/html which
caused the WASM magic-number validation error. Add a configureServer
middleware that serves the worklet bundle, ONNX model, and WASM files
directly from node_modules with correct MIME types during development.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:50:55 -03:00
mk
4a58277090 fix: force onnxruntime-web@1.18.0 via resolutions to eliminate nested 1.24.3
vad-web's own dependency was resolved to ort@1.24.3 (nested in its
node_modules), which only has threaded WASM requiring a .mjs dynamic
import that Vite fails to serve correctly. Pin ort to 1.18.0 via yarn
resolutions so all packages share the same copy with ort-wasm-simd.wasm
(non-threaded SIMD). Also remove the now-unnecessary COOP/COEP headers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:48:44 -03:00
mk
f2988cd689 fix: downgrade onnxruntime-web to 1.18 for non-threaded SIMD WASM
ort 1.19+ dropped non-threaded WASM binaries and replaced them with a
threaded .mjs loader that Vite's dev server fails to serve correctly
(wrong MIME type / transform interception). ort 1.18 ships ort-wasm-simd.wasm
which works with numThreads=1 and needs no .mjs dynamic import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:45:05 -03:00
mk
b25cec3aa0 fix: copy ort .mjs file, add COOP/COEP headers, set numThreads=1
The threaded ORT WASM requires ort-wasm-simd-threaded.mjs to be served
alongside the .wasm files, and needs SharedArrayBuffer (COOP/COEP headers).
Add the .mjs to the static copy targets, add the required headers to the
Vite dev server, and set ort.env.wasm.numThreads=1 as a single-threaded
fallback that avoids the SharedArrayBuffer requirement entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:41:51 -03:00
mk
edd1e1d34e fix: start VAD gate open to avoid permanent silence on model load failure
Starting the gate closed caused permanent silence if the ONNX model or
WASM files failed to load (onFrameProcessed never fired). Gate now starts
open so audio flows immediately; the first silence frame closes it. Also
ensures the gate is always reset to open when VAD is disabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:37:36 -03:00
mk
9f5b639190 fix: switch VAD gate to per-frame probability control
onSpeechStart/onSpeechEnd fire at segment boundaries — with constant
non-speech noise, onSpeechEnd never fires so the gate stayed open.
Switch to onFrameProcessed which fires every ~96ms and applies hysteresis
(open at >0.5, close at <0.35) matching Silero's own thresholds. Gate now
starts closed and opens only once the first speech frame is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:34:08 -03:00
mk
428b76db25 feat: add Silero VAD toggle to audio pipeline
Integrates @ricky0123/vad-web's MicVAD as an optional voice activity detector
alongside the noise gate. When enabled, the Silero ONNX model classifies each
audio frame as speech or silence; silence frames mute the worklet's output via
a new VAD gate message. VAD is wired into Publisher.ts alongside the existing
noise gate transformer. Vite is configured to copy the worklet bundle, ONNX
model, and ORT WASM files to /vad/ so they're reachable at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 23:29:43 -03:00
8 changed files with 503 additions and 58 deletions

30
public/vad/ten_vad.js Normal file
View File

@@ -0,0 +1,30 @@
var createVADModule = (() => {
var _scriptDir = import.meta.url;
return (
function(createVADModule) {
createVADModule = createVADModule || {};
var a;a||(a=typeof createVADModule !== 'undefined' ? createVADModule : {});var k,l;a.ready=new Promise(function(b,c){k=b;l=c});var p=Object.assign({},a),r="object"==typeof window,u="function"==typeof importScripts,v="",w;
if(r||u)u?v=self.location.href:"undefined"!=typeof document&&document.currentScript&&(v=document.currentScript.src),_scriptDir&&(v=_scriptDir),0!==v.indexOf("blob:")?v=v.substr(0,v.replace(/[?#].*/,"").lastIndexOf("/")+1):v="",u&&(w=b=>{var c=new XMLHttpRequest;c.open("GET",b,!1);c.responseType="arraybuffer";c.send(null);return new Uint8Array(c.response)});var aa=a.print||console.log.bind(console),x=a.printErr||console.warn.bind(console);Object.assign(a,p);p=null;var y;a.wasmBinary&&(y=a.wasmBinary);
var noExitRuntime=a.noExitRuntime||!0;"object"!=typeof WebAssembly&&z("no native wasm support detected");var A,B=!1,C="undefined"!=typeof TextDecoder?new TextDecoder("utf8"):void 0,D,E,F;function J(){var b=A.buffer;D=b;a.HEAP8=new Int8Array(b);a.HEAP16=new Int16Array(b);a.HEAP32=new Int32Array(b);a.HEAPU8=E=new Uint8Array(b);a.HEAPU16=new Uint16Array(b);a.HEAPU32=F=new Uint32Array(b);a.HEAPF32=new Float32Array(b);a.HEAPF64=new Float64Array(b)}var K=[],L=[],M=[];
function ba(){var b=a.preRun.shift();K.unshift(b)}var N=0,O=null,P=null;function z(b){if(a.onAbort)a.onAbort(b);b="Aborted("+b+")";x(b);B=!0;b=new WebAssembly.RuntimeError(b+". Build with -sASSERTIONS for more info.");l(b);throw b;}function Q(){return R.startsWith("data:application/octet-stream;base64,")}var R;if(a.locateFile){if(R="ten_vad.wasm",!Q()){var S=R;R=a.locateFile?a.locateFile(S,v):v+S}}else R=(new URL("ten_vad.wasm",import.meta.url)).href;
function T(){var b=R;try{if(b==R&&y)return new Uint8Array(y);if(w)return w(b);throw"both async and sync fetching of the wasm failed";}catch(c){z(c)}}function ca(){return y||!r&&!u||"function"!=typeof fetch?Promise.resolve().then(function(){return T()}):fetch(R,{credentials:"same-origin"}).then(function(b){if(!b.ok)throw"failed to load wasm binary file at '"+R+"'";return b.arrayBuffer()}).catch(function(){return T()})}function U(b){for(;0<b.length;)b.shift()(a)}
var da=[null,[],[]],ea={a:function(){z("")},f:function(b,c,m){E.copyWithin(b,c,c+m)},c:function(b){var c=E.length;b>>>=0;if(2147483648<b)return!1;for(var m=1;4>=m;m*=2){var h=c*(1+.2/m);h=Math.min(h,b+100663296);var d=Math;h=Math.max(b,h);d=d.min.call(d,2147483648,h+(65536-h%65536)%65536);a:{try{A.grow(d-D.byteLength+65535>>>16);J();var e=1;break a}catch(W){}e=void 0}if(e)return!0}return!1},e:function(){return 52},b:function(){return 70},d:function(b,c,m,h){for(var d=0,e=0;e<m;e++){var W=F[c>>2],
X=F[c+4>>2];c+=8;for(var G=0;G<X;G++){var f=E[W+G],H=da[b];if(0===f||10===f){f=H;for(var n=0,q=n+NaN,t=n;f[t]&&!(t>=q);)++t;if(16<t-n&&f.buffer&&C)f=C.decode(f.subarray(n,t));else{for(q="";n<t;){var g=f[n++];if(g&128){var I=f[n++]&63;if(192==(g&224))q+=String.fromCharCode((g&31)<<6|I);else{var Y=f[n++]&63;g=224==(g&240)?(g&15)<<12|I<<6|Y:(g&7)<<18|I<<12|Y<<6|f[n++]&63;65536>g?q+=String.fromCharCode(g):(g-=65536,q+=String.fromCharCode(55296|g>>10,56320|g&1023))}}else q+=String.fromCharCode(g)}f=q}(1===
b?aa:x)(f);H.length=0}else H.push(f)}d+=X}F[h>>2]=d;return 0}};
(function(){function b(d){a.asm=d.exports;A=a.asm.g;J();L.unshift(a.asm.h);N--;a.monitorRunDependencies&&a.monitorRunDependencies(N);0==N&&(null!==O&&(clearInterval(O),O=null),P&&(d=P,P=null,d()))}function c(d){b(d.instance)}function m(d){return ca().then(function(e){return WebAssembly.instantiate(e,h)}).then(function(e){return e}).then(d,function(e){x("failed to asynchronously prepare wasm: "+e);z(e)})}var h={a:ea};N++;a.monitorRunDependencies&&a.monitorRunDependencies(N);if(a.instantiateWasm)try{return a.instantiateWasm(h,
b)}catch(d){x("Module.instantiateWasm callback failed with error: "+d),l(d)}(function(){return y||"function"!=typeof WebAssembly.instantiateStreaming||Q()||"function"!=typeof fetch?m(c):fetch(R,{credentials:"same-origin"}).then(function(d){return WebAssembly.instantiateStreaming(d,h).then(c,function(e){x("wasm streaming compile failed: "+e);x("falling back to ArrayBuffer instantiation");return m(c)})})})().catch(l);return{}})();
a.___wasm_call_ctors=function(){return(a.___wasm_call_ctors=a.asm.h).apply(null,arguments)};a._malloc=function(){return(a._malloc=a.asm.i).apply(null,arguments)};a._free=function(){return(a._free=a.asm.j).apply(null,arguments)};a._ten_vad_create=function(){return(a._ten_vad_create=a.asm.k).apply(null,arguments)};a._ten_vad_process=function(){return(a._ten_vad_process=a.asm.l).apply(null,arguments)};a._ten_vad_destroy=function(){return(a._ten_vad_destroy=a.asm.m).apply(null,arguments)};
a._ten_vad_get_version=function(){return(a._ten_vad_get_version=a.asm.n).apply(null,arguments)};var V;P=function fa(){V||Z();V||(P=fa)};
function Z(){function b(){if(!V&&(V=!0,a.calledRun=!0,!B)){U(L);k(a);if(a.onRuntimeInitialized)a.onRuntimeInitialized();if(a.postRun)for("function"==typeof a.postRun&&(a.postRun=[a.postRun]);a.postRun.length;){var c=a.postRun.shift();M.unshift(c)}U(M)}}if(!(0<N)){if(a.preRun)for("function"==typeof a.preRun&&(a.preRun=[a.preRun]);a.preRun.length;)ba();U(K);0<N||(a.setStatus?(a.setStatus("Running..."),setTimeout(function(){setTimeout(function(){a.setStatus("")},1);b()},1)):b())}}
if(a.preInit)for("function"==typeof a.preInit&&(a.preInit=[a.preInit]);0<a.preInit.length;)a.preInit.pop()();Z();
return createVADModule.ready
}
);
})();
export default createVADModule;

BIN
public/vad/ten_vad.wasm Normal file

Binary file not shown.

View File

@@ -8,6 +8,9 @@ Please see LICENSE in the repository root for full details.
declare const sampleRate: number;
declare class AudioWorkletProcessor {
public readonly port: MessagePort;
public constructor(options?: {
processorOptions?: Record<string, unknown>;
});
public process(
inputs: Float32Array[][],
outputs: Float32Array[][],
@@ -21,6 +24,7 @@ declare function registerProcessor(
): void;
interface NoiseGateParams {
noiseGateActive: boolean;
threshold: number; // dBFS — gate opens above this, closes below it
attackMs: number;
holdMs: number;
@@ -28,6 +32,15 @@ interface NoiseGateParams {
transientEnabled: boolean;
transientThresholdDb: number; // dB above background RMS that triggers suppression
transientReleaseMs: number; // how quickly suppression fades after transient ends
// TEN-VAD params
vadEnabled: boolean;
vadPositiveThreshold: number; // open gate when prob >= this (01)
vadNegativeThreshold: number; // close gate when prob < this (01)
}
interface VADGateMessage {
type: "vad-gate";
open: boolean;
}
function dbToLinear(db: number): number {
@@ -35,19 +48,146 @@ function dbToLinear(db: number): number {
}
/**
* AudioWorkletProcessor implementing a noise gate and an optional transient
* suppressor, both running per-sample in a single pass.
* Thin synchronous wrapper around the TEN-VAD Emscripten WASM module.
* Instantiated synchronously in the AudioWorklet constructor from a
* pre-compiled WebAssembly.Module passed via processorOptions.
*/
class TenVADRuntime {
private readonly mem: WebAssembly.Memory;
private readonly freeFn: (ptr: number) => void;
private readonly processFn: (
handle: number,
audioPtr: number,
hopSize: number,
probPtr: number,
flagPtr: number,
) => number;
private readonly destroyFn: (handle: number) => number;
private readonly handle: number;
private readonly audioBufPtr: number;
private readonly probPtr: number;
private readonly flagPtr: number;
public readonly hopSize: number;
public constructor(
module: WebAssembly.Module,
hopSize: number,
threshold: number,
) {
this.hopSize = hopSize;
// Late-bound memory reference — emscripten_resize_heap and memmove
// are only called after instantiation, so closing over this is safe.
const state = { mem: null as WebAssembly.Memory | null };
const imports = {
a: {
// abort
a: (): never => {
throw new Error("ten_vad abort");
},
// fd_write / proc_exit stub
b: (): number => 0,
// emscripten_resize_heap
c: (reqBytes: number): number => {
if (!state.mem) return 0;
try {
const cur = state.mem.buffer.byteLength;
if (cur >= reqBytes) return 1;
state.mem.grow(Math.ceil((reqBytes - cur) / 65536));
return 1;
} catch {
return 0;
}
},
// fd_write stub
d: (): number => 0,
// environ stub
e: (): number => 0,
// memmove
f: (dest: number, src: number, len: number): void => {
if (state.mem) {
new Uint8Array(state.mem.buffer).copyWithin(dest, src, src + len);
}
},
},
};
// Synchronous instantiation — valid in Worker/AudioWorklet global scope
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const instance = new WebAssembly.Instance(module, imports as any);
const asm = instance.exports as {
g: WebAssembly.Memory; // exported memory
h: () => void; // __wasm_call_ctors
i: (n: number) => number; // malloc
j: (p: number) => void; // free
k: (handlePtr: number, hopSize: number, threshold: number) => number; // ten_vad_create
l: (handle: number, audioPtr: number, hopSize: number, probPtr: number, flagPtr: number) => number; // ten_vad_process
m: (handle: number) => number; // ten_vad_destroy
};
state.mem = asm.g;
this.mem = asm.g;
this.freeFn = asm.j;
this.processFn = asm.l;
this.destroyFn = asm.m;
// Run Emscripten static constructors
asm.h();
// Allocate persistent buffers (malloc is 8-byte aligned, so alignment is fine)
this.audioBufPtr = asm.i(hopSize * 2); // Int16Array
this.probPtr = asm.i(4); // float
this.flagPtr = asm.i(4); // int
// Create VAD handle — ten_vad_create(void** handle, int hopSize, float threshold)
const handlePtrPtr = asm.i(4);
const ret = asm.k(handlePtrPtr, hopSize, threshold);
if (ret !== 0) throw new Error(`ten_vad_create failed: ${ret}`);
this.handle = new Int32Array(this.mem.buffer)[handlePtrPtr >> 2];
asm.j(handlePtrPtr);
}
/** Process one hop of Int16 audio. Returns speech probability [01]. */
public process(samples: Int16Array): number {
new Int16Array(this.mem.buffer).set(samples, this.audioBufPtr >> 1);
this.processFn(
this.handle,
this.audioBufPtr,
this.hopSize,
this.probPtr,
this.flagPtr,
);
return new Float32Array(this.mem.buffer)[this.probPtr >> 2];
}
public destroy(): void {
this.destroyFn(this.handle);
this.freeFn(this.audioBufPtr);
this.freeFn(this.probPtr);
this.freeFn(this.flagPtr);
}
}
/**
* AudioWorkletProcessor implementing a noise gate, an optional transient
* suppressor, and an optional in-worklet TEN-VAD gate — all running
* per-sample in a single pass.
*
* Noise gate: opens when instantaneous peak exceeds threshold, closes below.
* Attack, hold, and release times smooth the attenuation envelope.
*
* Transient suppressor: tracks a slow-moving RMS background level. When the
* instantaneous peak exceeds the background by more than transientThresholdDb,
* gain is instantly cut to 0 and releases over transientReleaseMs. This catches
* desk hits, mic bumps, and other sudden loud impacts without affecting speech.
* gain is instantly cut to 0 and releases over transientReleaseMs.
*
* TEN-VAD gate: accumulates audio with 3:1 decimation (48 kHz → 16 kHz),
* runs the TEN-VAD model synchronously every 256 samples (16 ms), and
* controls vadGateOpen with hysteresis. No IPC round-trip required.
*/
class NoiseGateProcessor extends AudioWorkletProcessor {
// Noise gate state
private noiseGateActive = true;
private threshold = dbToLinear(-60);
private attackRate = 1.0 / (0.025 * sampleRate);
private releaseRate = 1.0 / (0.15 * sampleRate);
@@ -58,28 +198,88 @@ class NoiseGateProcessor extends AudioWorkletProcessor {
// Transient suppressor state
private transientEnabled = false;
private transientRatio = dbToLinear(15); // peak must exceed rms by this factor
private transientRatio = dbToLinear(15);
private transientReleaseRate = 1.0 / (0.08 * sampleRate);
private transientAttenuation = 1.0; // 1 = fully open, ramps to 0 on transient
private transientAttenuation = 1.0;
private slowRms = 0;
// Exponential smoothing coefficient for background RMS (~200ms time constant)
private rmsCoeff = Math.exp(-1.0 / (0.2 * sampleRate));
// VAD gate state
private vadGateOpen = true; // starts open; TEN-VAD closes it on first silent frame
private vadAttenuation = 1.0;
private readonly vadRampRate = 1.0 / (0.02 * sampleRate);
// TEN-VAD state
private vadEnabled = false;
private vadPositiveThreshold = 0.5;
private vadNegativeThreshold = 0.3;
private tenVadRuntime: TenVADRuntime | null = null;
// 3:1 decimation from AudioContext sample rate to 16 kHz
private readonly decRatio = Math.max(1, Math.round(sampleRate / 16000));
private decPhase = 0;
private decAcc = 0;
private readonly vadHopBuf = new Int16Array(256);
private vadHopCount = 0;
private logCounter = 0;
public constructor() {
super();
this.port.onmessage = (e: MessageEvent<NoiseGateParams>): void => {
this.updateParams(e.data);
public constructor(options?: {
processorOptions?: Record<string, unknown>;
}) {
super(options);
// Try to instantiate TEN-VAD from the pre-compiled module passed by the main thread
const tenVadModule = options?.processorOptions?.tenVadModule as
| WebAssembly.Module
| undefined;
if (tenVadModule) {
try {
// hopSize = 256 samples @ 16 kHz = 16 ms; threshold = 0.5 (overridden via params)
this.tenVadRuntime = new TenVADRuntime(tenVadModule, 256, 0.5);
this.port.postMessage({
type: "log",
msg: "[NoiseGate worklet] TEN-VAD runtime initialized, decRatio=" + this.decRatio,
});
} catch (e) {
this.port.postMessage({
type: "log",
msg: "[NoiseGate worklet] TEN-VAD init failed: " + String(e),
});
}
}
this.port.onmessage = (
e: MessageEvent<NoiseGateParams | VADGateMessage>,
): void => {
if ((e.data as VADGateMessage).type === "vad-gate") {
this.vadGateOpen = (e.data as VADGateMessage).open;
} else {
this.updateParams(e.data as NoiseGateParams);
}
};
this.updateParams({
threshold: -60, attackMs: 25, holdMs: 200, releaseMs: 150,
transientEnabled: false, transientThresholdDb: 15, transientReleaseMs: 80,
noiseGateActive: true,
threshold: -60,
attackMs: 25,
holdMs: 200,
releaseMs: 150,
transientEnabled: false,
transientThresholdDb: 15,
transientReleaseMs: 80,
vadEnabled: false,
vadPositiveThreshold: 0.5,
vadNegativeThreshold: 0.3,
});
this.port.postMessage({
type: "log",
msg: "[NoiseGate worklet] constructor called, sampleRate=" + sampleRate,
});
this.port.postMessage({ type: "log", msg: "[NoiseGate worklet] constructor called, sampleRate=" + sampleRate });
}
private updateParams(p: NoiseGateParams): void {
this.noiseGateActive = p.noiseGateActive ?? true;
this.threshold = dbToLinear(p.threshold);
this.attackRate = 1.0 / ((p.attackMs / 1000) * sampleRate);
this.releaseRate = 1.0 / ((p.releaseMs / 1000) * sampleRate);
@@ -87,11 +287,17 @@ class NoiseGateProcessor extends AudioWorkletProcessor {
this.transientEnabled = p.transientEnabled;
this.transientRatio = dbToLinear(p.transientThresholdDb);
this.transientReleaseRate = 1.0 / ((p.transientReleaseMs / 1000) * sampleRate);
this.vadEnabled = p.vadEnabled ?? false;
this.vadPositiveThreshold = p.vadPositiveThreshold ?? 0.5;
this.vadNegativeThreshold = p.vadNegativeThreshold ?? 0.3;
// When VAD is disabled, open the gate immediately
if (!this.vadEnabled) this.vadGateOpen = true;
this.port.postMessage({
type: "log",
msg: "[NoiseGate worklet] params updated: threshold=" + p.threshold
+ " transientEnabled=" + p.transientEnabled
+ " transientThresholdDb=" + p.transientThresholdDb,
+ " vadEnabled=" + p.vadEnabled
+ " vadPos=" + p.vadPositiveThreshold
+ " vadNeg=" + p.vadNegativeThreshold,
});
}
@@ -114,41 +320,95 @@ class NoiseGateProcessor extends AudioWorkletProcessor {
// --- Transient suppressor ---
let transientGain = 1.0;
if (this.transientEnabled) {
// Update slow RMS background (exponential moving average of energy)
this.slowRms = Math.sqrt(
this.rmsCoeff * this.slowRms * this.slowRms +
(1.0 - this.rmsCoeff) * curLevel * curLevel,
);
const background = Math.max(this.slowRms, 1e-6);
if (curLevel > background * this.transientRatio) {
// Transient detected — instantly cut gain
this.transientAttenuation = 0.0;
} else {
// Release: ramp back toward 1
this.transientAttenuation = Math.min(1.0, this.transientAttenuation + this.transientReleaseRate);
this.transientAttenuation = Math.min(
1.0,
this.transientAttenuation + this.transientReleaseRate,
);
}
transientGain = this.transientAttenuation;
}
// --- Noise gate ---
if (curLevel > this.threshold && !this.isOpen) {
this.isOpen = true;
}
if (curLevel <= this.threshold && this.isOpen) {
this.heldTime = 0;
this.isOpen = false;
}
if (this.isOpen) {
this.gateAttenuation = Math.min(1.0, this.gateAttenuation + this.attackRate);
if (this.noiseGateActive) {
if (curLevel > this.threshold && !this.isOpen) {
this.isOpen = true;
}
if (curLevel <= this.threshold && this.isOpen) {
this.heldTime = 0;
this.isOpen = false;
}
if (this.isOpen) {
this.gateAttenuation = Math.min(
1.0,
this.gateAttenuation + this.attackRate,
);
} else {
this.heldTime += samplePeriod;
if (this.heldTime > this.holdTime) {
this.gateAttenuation = Math.max(
0.0,
this.gateAttenuation - this.releaseRate,
);
}
}
} else {
this.heldTime += samplePeriod;
if (this.heldTime > this.holdTime) {
this.gateAttenuation = Math.max(0.0, this.gateAttenuation - this.releaseRate);
this.gateAttenuation = 1.0;
}
// --- TEN-VAD in-worklet processing ---
// Accumulate raw mono samples with decRatio:1 decimation (48 kHz → 16 kHz).
// Every 256 output samples (16 ms) run the WASM VAD and update vadGateOpen.
if (this.vadEnabled && this.tenVadRuntime !== null) {
this.decAcc += input[0]?.[i] ?? 0;
this.decPhase++;
if (this.decPhase >= this.decRatio) {
this.decPhase = 0;
const avg = this.decAcc / this.decRatio;
this.decAcc = 0;
// Float32 [-1,1] → Int16 with clamping
const s16 =
avg >= 1.0
? 32767
: avg <= -1.0
? -32768
: (avg * 32767 + 0.5) | 0;
this.vadHopBuf[this.vadHopCount++] = s16;
if (this.vadHopCount >= 256) {
this.vadHopCount = 0;
const prob = this.tenVadRuntime.process(this.vadHopBuf);
if (!this.vadGateOpen && prob >= this.vadPositiveThreshold) {
this.vadGateOpen = true;
} else if (this.vadGateOpen && prob < this.vadNegativeThreshold) {
this.vadGateOpen = false;
}
}
}
}
const gain = this.gateAttenuation * transientGain;
// Ramp VAD attenuation toward target to avoid clicks
const vadTarget = this.vadGateOpen ? 1.0 : 0.0;
if (this.vadAttenuation < vadTarget) {
this.vadAttenuation = Math.min(
vadTarget,
this.vadAttenuation + this.vadRampRate,
);
} else if (this.vadAttenuation > vadTarget) {
this.vadAttenuation = Math.max(
vadTarget,
this.vadAttenuation - this.vadRampRate,
);
}
const gain = this.gateAttenuation * transientGain * this.vadAttenuation;
for (let c = 0; c < output.length; c++) {
const inCh = input[c] ?? input[0];
@@ -166,7 +426,8 @@ class NoiseGateProcessor extends AudioWorkletProcessor {
msg: "[NoiseGate worklet] gateOpen=" + this.isOpen
+ " gateAtten=" + this.gateAttenuation.toFixed(3)
+ " transientAtten=" + this.transientAttenuation.toFixed(3)
+ " slowRms=" + this.slowRms.toFixed(5),
+ " vadOpen=" + this.vadGateOpen
+ " vadAtten=" + this.vadAttenuation.toFixed(3),
});
}

View File

@@ -11,6 +11,7 @@ import { logger } from "matrix-js-sdk/lib/logger";
const log = logger.getChild("[NoiseGateTransformer]");
export interface NoiseGateParams {
noiseGateActive: boolean;
threshold: number; // dBFS — gate opens above this, closes below it
attackMs: number;
holdMs: number;
@@ -18,6 +19,10 @@ export interface NoiseGateParams {
transientEnabled: boolean;
transientThresholdDb: number; // dB above background RMS that triggers suppression
transientReleaseMs: number; // ms for suppression to fade after transient ends
// TEN-VAD params — processed entirely inside the AudioWorklet
vadEnabled: boolean;
vadPositiveThreshold: number; // open gate when isSpeech prob >= this (01)
vadNegativeThreshold: number; // close gate when isSpeech prob < this (01)
}
/**
@@ -42,13 +47,36 @@ export interface AudioTrackProcessor {
destroy(): Promise<void>;
}
// Cached compiled TEN-VAD module — compiled once, reused across processor restarts.
let tenVadModulePromise: Promise<WebAssembly.Module> | null = null;
function getTenVADModule(): Promise<WebAssembly.Module> {
if (!tenVadModulePromise) {
tenVadModulePromise = fetch("/vad/ten_vad.wasm")
.then((r) => {
if (!r.ok) throw new Error(`Failed to fetch ten_vad.wasm: ${r.status}`);
return r.arrayBuffer();
})
.then((buf) => WebAssembly.compile(buf))
.catch((e) => {
// Clear the cache so a retry is possible on next attach
tenVadModulePromise = null;
throw e;
});
}
return tenVadModulePromise;
}
/**
* LiveKit audio track processor that applies the OBS-style noise gate via
* AudioWorklet.
* LiveKit audio track processor that applies a noise gate, optional transient
* suppressor, and optional TEN-VAD gate via AudioWorklet.
*
* Builds the audio graph: sourceNode → workletNode → destinationNode, then
* exposes destinationNode's track as processedTrack for LiveKit to swap into
* the WebRTC sender via sender.replaceTrack(processedTrack).
* The TEN-VAD WASM module is fetched once, compiled, and passed to the worklet
* via processorOptions so it runs synchronously inside the audio thread —
* no IPC round-trip, ~16 ms VAD latency.
*
* Audio graph: sourceNode → workletNode → destinationNode
* processedTrack is destinationNode.stream.getAudioTracks()[0]
*/
export class NoiseGateTransformer implements AudioTrackProcessor {
public readonly name = "noise-gate";
@@ -68,6 +96,15 @@ export class NoiseGateTransformer implements AudioTrackProcessor {
log.info("init() called, audioContext state:", audioContext.state, "params:", this.params);
// Fetch and compile the TEN-VAD WASM module (cached after first call)
let tenVadModule: WebAssembly.Module | undefined;
try {
tenVadModule = await getTenVADModule();
log.info("TEN-VAD WASM module compiled");
} catch (e) {
log.warn("TEN-VAD WASM module unavailable — VAD disabled:", e);
}
const workletUrl = new URL(
"./NoiseGateProcessor.worklet.ts",
import.meta.url,
@@ -79,8 +116,15 @@ export class NoiseGateTransformer implements AudioTrackProcessor {
this.workletNode = new AudioWorkletNode(
audioContext,
"noise-gate-processor",
{
processorOptions: {
tenVadModule,
},
},
);
this.workletNode.port.onmessage = (e: MessageEvent<{ type: string; msg: string }>): void => {
this.workletNode.port.onmessage = (
e: MessageEvent<{ type: string; msg: string }>,
): void => {
if (e.data?.type === "log") log.debug(e.data.msg);
};
this.sendParams();
@@ -113,7 +157,7 @@ export class NoiseGateTransformer implements AudioTrackProcessor {
this.processedTrack = undefined;
}
/** Push updated gate parameters to the running worklet. */
/** Push updated gate/VAD parameters to the running worklet. */
public updateParams(params: NoiseGateParams): void {
this.params = { ...params };
this.sendParams();

View File

@@ -32,6 +32,9 @@ import {
transientSuppressorEnabled as transientSuppressorEnabledSetting,
transientThreshold as transientThresholdSetting,
transientRelease as transientReleaseSetting,
vadEnabled as vadEnabledSetting,
vadPositiveThreshold as vadPositiveThresholdSetting,
vadNegativeThreshold as vadNegativeThresholdSetting,
} from "./settings";
import { PreferencesSettingsTab } from "./PreferencesSettingsTab";
import { Slider } from "../Slider";
@@ -129,6 +132,13 @@ export const SettingsModal: FC<Props> = ({
const [showAdvancedGate, setShowAdvancedGate] = useState(false);
// Voice activity detection
const [vadActive, setVadActive] = useSetting(vadEnabledSetting);
const [vadPositiveThreshold, setVadPositiveThreshold] = useSetting(vadPositiveThresholdSetting);
const [vadPositiveThresholdRaw, setVadPositiveThresholdRaw] = useState(vadPositiveThreshold);
const [vadNegativeThreshold, setVadNegativeThreshold] = useSetting(vadNegativeThresholdSetting);
const [vadNegativeThresholdRaw, setVadNegativeThresholdRaw] = useState(vadNegativeThreshold);
// Transient suppressor settings
const [transientEnabled, setTransientEnabled] = useSetting(transientSuppressorEnabledSetting);
const [transientThreshold, setTransientThreshold] = useSetting(transientThresholdSetting);
@@ -310,6 +320,76 @@ export const SettingsModal: FC<Props> = ({
</>
)}
</div>
<div className={styles.noiseGateSection}>
<Heading
type="body"
weight="semibold"
size="sm"
as="h4"
className={styles.noiseGateHeading}
>
Voice Activity Detection
</Heading>
<Separator className={styles.noiseGateSeparator} />
<FieldRow>
<InputField
id="vadEnabled"
type="checkbox"
label="Enable voice activity detection"
description="Uses TEN-VAD to mute audio when no speech is detected (~16 ms latency)."
checked={vadActive}
onChange={(e: ChangeEvent<HTMLInputElement>): void =>
setVadActive(e.target.checked)
}
/>
</FieldRow>
{vadActive && (
<>
<div className={`${styles.volumeSlider} ${styles.thresholdSlider}`}>
<span className={styles.sliderLabel}>Open threshold: {Math.round(vadPositiveThresholdRaw * 100)}%</span>
<p>How confident the model must be before opening the gate.</p>
<Slider
label="VAD open threshold"
value={vadPositiveThresholdRaw}
onValueChange={setVadPositiveThresholdRaw}
onValueCommit={setVadPositiveThreshold}
min={0.1}
max={0.9}
step={0.05}
tooltip={false}
/>
</div>
<div className={styles.volumeSlider}>
<span className={styles.sliderLabel}>Close threshold: {Math.round(vadNegativeThresholdRaw * 100)}%</span>
<p>How low the probability must drop before closing the gate.</p>
<Slider
label="VAD close threshold"
value={vadNegativeThresholdRaw}
onValueChange={setVadNegativeThresholdRaw}
onValueCommit={setVadNegativeThreshold}
min={0.05}
max={0.7}
step={0.05}
tooltip={false}
/>
</div>
<div className={styles.restoreDefaults}>
<Button
kind="secondary"
size="sm"
onClick={(): void => {
const pos = vadPositiveThresholdSetting.defaultValue;
const neg = vadNegativeThresholdSetting.defaultValue;
setVadPositiveThreshold(pos); setVadPositiveThresholdRaw(pos);
setVadNegativeThreshold(neg); setVadNegativeThresholdRaw(neg);
}}
>
Restore defaults
</Button>
</div>
</>
)}
</div>
<div className={styles.noiseGateSection}>
<Heading
type="body"

View File

@@ -145,6 +145,12 @@ export const noiseGateHold = new Setting<number>("noise-gate-hold", 200);
// Time in ms for the gate to fully close after hold expires
export const noiseGateRelease = new Setting<number>("noise-gate-release", 150);
export const vadEnabled = new Setting<boolean>("vad-enabled", false);
// Probability above which the VAD opens the gate (01)
export const vadPositiveThreshold = new Setting<number>("vad-positive-threshold", 0.2);
// Probability below which the VAD closes the gate (01)
export const vadNegativeThreshold = new Setting<number>("vad-negative-threshold", 0.1);
export const transientSuppressorEnabled = new Setting<boolean>(
"transient-suppressor-enabled",
false,

View File

@@ -41,6 +41,9 @@ import {
transientSuppressorEnabled,
transientThreshold,
transientRelease,
vadEnabled,
vadPositiveThreshold,
vadNegativeThreshold,
} from "../../../settings/settings.ts";
import {
type NoiseGateParams,
@@ -437,6 +440,7 @@ export class Publisher {
let audioCtx: AudioContext | null = null;
const currentParams = (): NoiseGateParams => ({
noiseGateActive: noiseGateEnabled.getValue(),
threshold: noiseGateThreshold.getValue(),
attackMs: noiseGateAttack.getValue(),
holdMs: noiseGateHold.getValue(),
@@ -444,14 +448,18 @@ export class Publisher {
transientEnabled: transientSuppressorEnabled.getValue(),
transientThresholdDb: transientThreshold.getValue(),
transientReleaseMs: transientRelease.getValue(),
vadEnabled: vadEnabled.getValue(),
vadPositiveThreshold: vadPositiveThreshold.getValue(),
vadNegativeThreshold: vadNegativeThreshold.getValue(),
});
// Attach / detach processor when enabled state or the track changes.
combineLatest([audioTrack$, noiseGateEnabled.value$])
// Attach / detach processor when any processing feature changes or the track changes.
combineLatest([audioTrack$, noiseGateEnabled.value$, vadEnabled.value$, transientSuppressorEnabled.value$])
.pipe(scope.bind())
.subscribe(([audioTrack, enabled]) => {
.subscribe(([audioTrack, ngEnabled, vadActive, transientActive]) => {
if (!audioTrack) return;
if (enabled && !audioTrack.getProcessor()) {
const shouldAttach = ngEnabled || vadActive || transientActive;
if (shouldAttach && !audioTrack.getProcessor()) {
const params = currentParams();
this.logger.info("[NoiseGate] attaching processor, params:", params);
transformer = new NoiseGateTransformer(params);
@@ -459,17 +467,16 @@ export class Publisher {
this.logger.info("[NoiseGate] AudioContext state before resume:", audioCtx.state);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(audioTrack as any).setAudioContext(audioCtx);
audioCtx.resume().then(() => {
audioCtx.resume().then(async () => {
this.logger.info("[NoiseGate] AudioContext state after resume:", audioCtx?.state);
return audioTrack
// eslint-disable-next-line @typescript-eslint/no-explicit-any
.setProcessor(transformer as any);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
return audioTrack.setProcessor(transformer as any);
}).then(() => {
this.logger.info("[NoiseGate] setProcessor resolved");
}).catch((e: unknown) => {
this.logger.error("[NoiseGate] setProcessor failed", e);
});
} else if (!enabled && audioTrack.getProcessor()) {
} else if (!shouldAttach && audioTrack.getProcessor()) {
this.logger.info("[NoiseGate] removing processor");
void audioTrack.stopProcessor();
void audioCtx?.close();
@@ -477,13 +484,21 @@ export class Publisher {
transformer = null;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(audioTrack as any).setAudioContext(undefined);
} else if (shouldAttach && audioTrack.getProcessor()) {
// Processor already attached — push updated params (e.g. noiseGateActive toggled)
transformer?.updateParams(currentParams());
} else {
this.logger.info("[NoiseGate] tick — enabled:", enabled, "hasProcessor:", !!audioTrack.getProcessor());
this.logger.info(
"[NoiseGate] tick — ngEnabled:", ngEnabled,
"vadActive:", vadActive,
"hasProcessor:", !!audioTrack.getProcessor(),
);
}
});
// Push param changes to the live worklet without recreating the processor.
// Push all param changes (noise gate + VAD) to the live worklet.
combineLatest([
noiseGateEnabled.value$,
noiseGateThreshold.value$,
noiseGateAttack.value$,
noiseGateHold.value$,
@@ -491,13 +506,22 @@ export class Publisher {
transientSuppressorEnabled.value$,
transientThreshold.value$,
transientRelease.value$,
vadEnabled.value$,
vadPositiveThreshold.value$,
vadNegativeThreshold.value$,
])
.pipe(scope.bind())
.subscribe(([threshold, attackMs, holdMs, releaseMs,
transientEnabled, transientThresholdDb, transientReleaseMs]) => {
.subscribe(([
noiseGateActive, threshold, attackMs, holdMs, releaseMs,
transientEnabled, transientThresholdDb, transientReleaseMs,
vadActive, vadPos, vadNeg,
]) => {
transformer?.updateParams({
threshold, attackMs, holdMs, releaseMs,
noiseGateActive, threshold, attackMs, holdMs, releaseMs,
transientEnabled, transientThresholdDb, transientReleaseMs,
vadEnabled: vadActive,
vadPositiveThreshold: vadPos,
vadNegativeThreshold: vadNeg,
});
});
}

View File

@@ -7,7 +7,6 @@ Please see LICENSE in the repository root for full details.
import {
loadEnv,
PluginOption,
searchForWorkspaceRoot,
type ConfigEnv,
type UserConfig,
@@ -34,7 +33,8 @@ export default ({
// In future we might be able to do what is needed via code splitting at
// build time.
process.env.VITE_PACKAGE = packageType ?? "full";
const plugins: PluginOption[] = [
const plugins = [
react(),
svgrPlugin({
svgrOptions: {