fix: use Silero v5 model for 32ms frames and lower default thresholds

The legacy model is hardcoded to 1536 samples (96ms frames); v5 uses 512
samples (32ms), reducing gate open latency by 3x. Also lower default
positive/negative thresholds to 0.2/0.1 so the gate opens at the first
sign of speech rather than waiting for high model confidence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
mk
2026-03-24 00:02:17 -03:00
parent 859db651e0
commit aff09d0e49
3 changed files with 9 additions and 3 deletions

View File

@@ -79,6 +79,10 @@ export default ({
src: "node_modules/@ricky0123/vad-web/dist/silero_vad_legacy.onnx",
dest: "vad",
},
{
src: "node_modules/@ricky0123/vad-web/dist/silero_vad_v5.onnx",
dest: "vad",
},
{
src: "node_modules/onnxruntime-web/dist/*.wasm",
dest: "vad",