Voice Tags — Add Pauses, Emotions, and Sounds to Narration

Make AI narration more expressive by inserting inline pause, emotion, and sound tags directly in your script.

Default AI narration sounds professional, but consistent — every sentence delivered at the same pace, the same energy, no breathing room. Voice tags let you add inline cues to your script so the narrator delivers individual lines with the expression you want.

Narration with an Excited emotion tag, a 0.5s pause, and a Laughs sound tag inline

Adding a voice tag

In the script editor, click the + button next to a sentence. A picker opens with the three tag categories. Choose a tag and it inserts inline at that point in the script.

To edit or remove a tag, click the existing tag in the script.

⚠️ Legacy voices don’t support voice tags. If the voice you’ve selected is a legacy voice, an exclamation icon appears on the tag in the script. The tag is ignored at render time. To make the tag take effect, open the speaker tag at the top of the script and pick a current voice — any voice listed under the enhanced voice catalog supports voice tags.

The three tag types

Pause

Insert silence for a specific duration anywhere within or between sentences. Useful for giving a viewer a beat to absorb a key point, breaking up dense narration, or pacing for a punchline.

Click the pause tag in the script to open the duration slider. Drag to set the length — the slider steps in 0.25-second increments. Click Remove pause in the popover to delete the tag.

Pause-duration slider showing 1.25 seconds and a Remove pause option

Emotions

Apply an emotional register to a sentence. The currently available emotions are:

  • Curious — slight upward inflection, engaged delivery
  • Excited — energetic, higher pitch and pace

Sounds & pacing

Drop in a short non-verbal sound where you place the tag. The voice plays the sound inline. The currently available sounds are:

  • Laughs — a single, brief laugh
  • Starts laughing — a laugh that builds into the next sentence
  • Wheezing — a held, breathless laugh
  • Whisper — a hushed delivery for the next sentence
  • Sighs — an audible exhale
  • Exhales — a breath out
  • Gulps — a swallow sound

🎯 Tag scope. A tag typically applies to the sentence it’s placed in. To carry the same delivery cue across multiple sentences, place the tag at the start of each one.

Tags vs. other narration controls

Voice tags work alongside the rest of the narration controls in Tutorial AI:

  • Pause tags vs. silence between sync markers — for pauses inside or between sentences in the narration, voice tags are usually simpler. For pacing the video itself (slowing scenes down to match a longer narration, holding on a UI element), see How do I add a pause or slow down my video?.
  • Emotions vs. picking a different voice — voice tags shape the delivery of the voice you’ve already picked. To change the underlying voice itself — language, accent, baseline speaking style — use the speaker tag at the top of the script.
  • Pronunciations — for fixing how a specific word is said (brand names, acronyms), use the Pronunciation Lexicon, not a voice tag.

Did this answer your question?