Skip to content

Frequently Asked Questions

ResponsiveVoice is TypeScript-first text-to-speech for browsers and Node.js. Answers to the questions we hear most often are below.

Yes. The ResponsiveVoice library is open source and free to use from npm or a CDN. Create a free account at responsivevoice.org/register — it takes a few seconds — to unlock free server voices for your site. Without an account the library runs in demo mode. Paid plans add features such as streaming, and premium voice providers (Microsoft Azure, OpenAI, Google Cloud) are supported via Bring Your Own Key (BYOK).

Which browsers and runtimes are supported?

Section titled “Which browsers and runtimes are supported?”

ResponsiveVoice runs in all evergreen browsers and in Node.js, using the native Web Speech API where available and server voices (with an account) otherwise. See the Browser Support guide for the full compatibility matrix.

Yes, to use server voices — and it's free. Register an account to get one. The key is a website identity, not a secret: it's tied to your registered domain, so it's safe to include in client-side code. Without a key, the library runs in demo mode.

Does ResponsiveVoice support streaming audio?

Section titled “Does ResponsiveVoice support streaming audio?”

Yes — on higher-tier plans. Audio is delivered as it's synthesized via HTTP audio streaming or WebSocket streaming, so playback can start before the full clip is ready. Other tiers return the complete audio in a single response.

How many voices and languages are available?

Section titled “How many voices and languages are available?”

The base catalog includes around 100 voices across many languages and genders, chosen through the voice resolution chain (native Web Speech or fallback). Bring Your Own Key (BYOK) providers add their own voices on top — growing the catalog to thousands.

Yes. iOS (and some mobile browsers) require a user gesture before audio can play, and ResponsiveVoice handles that automatically — it shows a built-in permission prompt that captures the first tap and unlocks audio, so you don't need to add your own button. The prompt is customizable, or you can disable it and trigger speech from your own UI instead.

Punctuation shapes pacing and emphasis — add commas and periods for natural pauses. For tricky pronunciations, respell a word phonetically, add hyphens between syllables, or spell it out letter by letter. You can also configure text replacements for consistent pronunciation of names and domain terms.

Can I change the speaking rate, pitch, and volume?

Section titled “Can I change the speaking rate, pitch, and volume?”

Yes. Set rate and pitch (0–2, default 1) and volume (0–1, default 1) per request. Native browser voices (Web Speech API) apply them directly; for voices that synthesize server-side — API-only voices, or when the browser lacks the requested voice — how each adjustment applies depends on that provider.

SSML (voice markup) is part of the Web Speech API specification, but no current browser actually implements it, and there's no announced commitment to add it. So ResponsiveVoice takes plain text — shape delivery with the rate, pitch, and volume parameters, plus punctuation and text replacements for pacing and pronunciation. If browsers add SSML support, ResponsiveVoice will adopt it.