The legacy model is hardcoded to 1536 samples (96ms frames); v5 uses 512 samples (32ms), reducing gate open latency by 3x. Also lower default positive/negative thresholds to 0.2/0.1 so the gate opens at the first sign of speech rather than waiting for high model confidence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
6.9 KiB