Let’s talk about using JavaScript for speech synthesis, also known as text to speech. You can use it to make the browser read text out loud, which is pretty neat. It’s all done with vanilla JavaScript and surprisingly easy to get started with, though you’ll start uncovering quirks as you dive deeper into it.
Naturally, you’ll want to turn your sound on for the demos in this article.

Quick Start
Here’s the code to make text to speech happen.
const utterance = new SpeechSynthesisUtterance('Hello!');
window.speechSynthesis.speak(utterance);Seriously, that’s it! Here’s a demo to hear for yourself.
The demo does add a line of code worth discussing.
window.speechSynthesis.cancel();If window.speechSynthesis is already speaking and you ask it to speak something else, it’ll get added to a queue. For the demo, I’d rather nix the queue and cancel any speaking in progress so that new text can be spoken immediately. That’s what cancel() does.
Customizing the Voice
You can change the voice used when speaking. The available voices will differ depending on browser and OS. Use the following to get an array of available voices (SpeechSynthesisVoice objects as documented here).
const voices = window.speechSynthesis.getVoices();Be warned that while some voices are available immediately, other voices may be added asynchronously. Fortunately, you can detect when new voices are added.
window.speechSynthesis.onvoiceschanged = function() {
const updatedVoices = window.speechSynthesis.getVoices();
};To use a voice, get it from the array and set it on the SpeechSynthesisUtterance object before speaking. Here’s an example that uses the last voice in the array.
const voices = window.speechSynthesis.getVoices();
const lastVoice = voices[voices.length - 1];
const utterance = new SpeechSynthesisUtterance('Hello!');
utterance.voice = lastVoice; // change voice
window.speechSynthesis.speak(utterance);You can also adjust pitch, rate, and volume properties. These all default to 1 if not specified.
const utterance = new SpeechSynthesisUtterance('Hello!');
utterance.pitch = 0.7; // a little lower
utterance.rate = 1.4; // a little faster
utterance.volume = 0.8; // a little quieter
window.speechSynthesis.speak(utterance);Here’s where it gets messy. Different voices can have different ranges of usable values for pitch and rate. On top of that, different browsers have their own quirks when setting these properties. I’ll detail the quirks I found near the end of this article.
My testing found that it’s safest to keep both pitch and rate between 0.1 and 2, inclusive. Thankfully, volume is easy: 0 to 1, no surprises.
Alright, let’s try all this stuff out!
Pausing and Resuming
Speaking can be paused and resumed. It’s pretty straightforward.
window.speechSynthesis.pause();
window.speechSynthesis.resume();Sadly, pausing/resuming does not work on Android. I’ll talk more about it when I cover quirks later in this article.
To find out if speaking is paused, check window.speechSynthesis.paused. To find out if speaking is in progress, check window.speechSynthesis.speaking. Note that these are not mutually exclusive! Speaking is considered in progress even if it’s paused. Also, it’s possible to be in a paused state even when there is nothing to speak.
Events
You can add event listeners to a SpeechSynthesisUtterance object to react to the following events.
'start'fires when speaking starts.'pause'fires when speaking is paused.'resume'fires when speaking is resumed.'end'fires when speaking reaches the end of the text. Browsers other than Safari will also fire this when speaking is cancelled.'boundary'fires when speaking reaches a new word or sentence. It does not fire on Android, unfortunately. We’ll talk more about this one in a bit.
All of these are SpeechSynthesisEvent events that provide a few extra properties you might find handy. Here’s an example of handling one of them.
const utterance = new SpeechSynthesisUtterance('These are words!');
utterance.addEventListener('pause', function(event) {
console.log('Paused after ' + event.elapsedTime + 'ms.');
});
window.speechSynthesis.speak(utterance);'boundary' events come with a name property that will be set to either 'word' or 'sentence' as appropriate. 'boundary' events also provide a charIndex value in all browsers and a charLength value in all browsers except Safari. Putting these together, you can tell which word in the text is being spoken at that moment.
Our final demo has buttons to play, pause, and stop speaking. It listens for 'start', 'pause', 'resume', and 'end' events to enable/disable buttons when appropriate and uses 'boundary' events to highlight the current word.
Appendix of Quirks
JavaScript speech synthesis has pretty good browser support, but is still considered experimental. I found it easy to use at a basic level, but the more I dug in, the quirkier things got. I’ll list the bugs and gotchas I found here to hopefully help save some of your sanity.
Using a different SpeechSynthesisVoice for the voice:
- [Chrome/Firefox on Android] The voice cannot be changed from the device’s default voice.
- [Safari]
addEventListeneris undefined onwindow.speechSynthesis, so you can’t use it to listen for the'voiceschanged'event. Useonvoiceschangedinstead, as documented here. - [Safari] Be mindful that
SpeechSynthesisVoice.nameis not always unique. If you need a unique identifier for voices, useSpeechSynthesisVoice.voiceURI.
Setting SpeechSynthesisUtterance properties:
- [Safari] All values for
pitchat0.5and below sound the same. - [Safari] If speaking happens with a
rateof0.5or higher, and therateis then changed to below0.5, subsequent speaking will retain the previous higherrate. - [Edge] When using a non-local voice, any value set for
pitchis ignored and will always sound like it’s at1. - [Chrome] When using a non-local voice, setting
pitchto0will revert to1. - [Chrome] When using a non-local voice, speaking will not happen if
rateis higher than2. - [all browsers] Some voices may further constrain the usable range of values for
pitchandrate.
Handling SpeechSynthesisEvent events:
- [Chrome/Firefox on Android] The
'boundary'event does not fire. - [Safari] The
'end'event does not fire if speaking is stopped viacancel(). - [Safari]
charLengthis not provided on'boundary'events. - [macOS] The
'boundary'event will never have anamevalue of'sentence'. Instead, you’ll see an extra event with'word'. - [Chrome/Edge on Windows]
'boundary'events that happen for a sentence will provide acharLengthof0, not the length of the sentence. - [Chrome/Edge/Safari]
charIndexis always0for events that are not'boundary'events.
Other stuff:
- [Chrome/Firefox on Android] Pausing and resuming does not work.
- [iOS] Speaking is inaudible when the “soft mute” switch is on.
- [Android/iOS] There are a couple more issues on mobile devices that I haven’t mentioned specifically, but you can read about them in this helpful write-up.
Conclusion
Whew! That’s a lot of quirks. It’s disappointing to see so many issues, especially the ones on Android, but again, this is an experimental feature.
JavaScript text to speech at its basic level actually has really good cross-browser compatibility — just be careful if you go beyond the basic. I still had a lot of fun playing with it. Hopefully future development will iron out all these quirks.
