This article provides a step-by-step guide on implementing a simple text-to-speech (TTS) application using C# and the System.Speech.Synthesis namespace. It explains the code used, including creating a SpeechSynthesizer instance and synthesizing speech from text, and offers suggestions for exploring more advanced TTS features.
Introduction
This tip explains how to implement a text-to-speech (TTS) application in C# using the
System.Speech.Synthesis
namespace. TTS technology has many practical use cases, such as in accessibility tools and speech-enabled applications. By following the step-by-step guide provided, readers will be able to create a simple console application that synthesizes speech from text. The article also provides an explanation of the code used and offers suggestions for exploring more advanced features of the
SpeechSynthesizer
class.
Implementing Text-to-Speech in C# using System.Speech.Synthesis
Text-to-speech (TTS) technology has been around for a while and has found many use cases, such as in language learning, accessibility tools for visually impaired individuals, and in speech-enabled applications. In this article, we will explore how to implement a simple TTS application using C# and the
System.Speech.Synthesis
namespace.
Prerequisites
Before we begin, you need to have the following installed on your machine:
.NET Framework 4.6.1 or higher
Visual Studio 2017 or higher
Implementation
We will be using the
System.Speech.Synthesis
namespace, which provides classes for synthesizing speech from text. Follow the steps below to create a console application in C# and implement TTS.
Open Visual Studio and create a new Console Application project.
Add a reference to the
System.Speech
assembly. Right-click on the project in
Solution Explorer
, select
Add Reference
, and then choose
System.Speech
from the list of assemblies.
In the
Program.cs
file, add the following code:
using
System;
using
System.IO;
using
System.Speech.Synthesis;
class
Program
static
void
Main(string[] args)
SpeechSynthesizer synth =
new
SpeechSynthesizer();
synth.SetOutputToDefaultAudioDevice();
synth.Speak(
"
Hello, world!"
);
synth.SelectVoiceByHints(VoiceGender.Female, VoiceAge.Adult);
synth.Speak(
"
Hello, I am a female voice!"
);
synth.Rate = -2;
synth.Volume =
100
;
synth.Speak(
"
Hello, I am speaking slower and louder!"
);
synth.Speak(
"
Hello, I will pause for 3 seconds now."
);
synth.Pause();
System.Threading.Thread.Sleep(
3000
);
synth.Resume();
synth.Speak(
"
I am back!"
);
synth.SetOutputToWaveFile(
"
output.wav"
);
synth.Speak(
"
Hello, I am saving my speech to a WAV file!"
);
MemoryStream stream =
new
MemoryStream();
synth.SetOutputToWaveStream(stream);
synth.Speak(
"
Hello, I am being streamed to a memory stream!"
);
byte[] speechBytes = stream.GetBuffer();
PromptBuilder builder =
new
PromptBuilder();
builder.StartVoice(VoiceGender.Female, VoiceAge.Adult,
1
);
builder.AppendText(
"
Hello, my name is Emily."
);
builder.StartVoice(VoiceGender.Female, VoiceAge.Teen,
2
);
builder.AppendText(
"
I am from New York City."
);
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Strong });
builder.AppendText(
"
I really love chocolate!"
);
builder.EndStyle();
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Reduced });
builder.AppendText(
"
But I'm allergic to it..."
);
builder.EndStyle();
synth.Speak(builder);
Console.ReadLine();
Code Outline
Basic TTS
Creates a
SpeechSynthesizer
instance and synthesizes the text "
Hello, world!
" using the default audio device.
Changing the Voice
Selects a female adult voice and synthesizes the text "
Hello, I am a female voice!
" using that voice.
Changing the Pitch and Rate
Sets the speech rate to -2 (slower) and the volume to 100 (louder), and synthesizes the text "
Hello, I am speaking slower and louder!
".
Pausing and Resuming Speech
Synthesizes the text "
Hello, I will pause for 3 seconds now.
", pauses the speech for 3 seconds, and then resumes the speech and synthesizes the text "
I am back!
".
Saving Speech to a WAV File
Sets the output of the
SpeechSynthesizer
to a WAV file named "
output.wav
", and synthesizes the text "
Hello, I am saving my speech to a WAV file!
".
Setting the Speech Stream
Sets the output of the
SpeechSynthesizer
to a memory stream, synthesizes the text "
Hello, I am being streamed to a memory stream!
", and gets the resulting speech bytes from the memory stream.
Changing the Voice and Pronunciation
Uses the
PromptBuilder
class to create a more complex prompt, changing the voice for certain parts of the prompt, and adding emphasis and reduced emphasis to certain parts of the prompt. The resulting prompt is then synthesized using the
SpeechSynthesizer
.
These code examples demonstrate some of the basic and advanced functionality of the
SpeechSynthesizer
class, including changing the voice and pitch, pausing and resuming speech, and saving synthesized speech to a file or memory stream.
History
16
th
March, 2023: Initial version
As an Artificial Intelligence Engineer with over 15 years of experience, I excel in innovating, designing, and developing state-of-the-art technology. My expertise lies in coding complex algorithms, and engineering robots to automate tasks with precision and efficiency. I'm a passionate coder who is always exploring cutting-edge technologies and pushing the boundaries of what's possible.
Hello,
I tried to add english and french and german voices to the project, it was working for english pronouncation and for french but not for german, the german stayed french. How are the different voices for different languages to be installed on windows ? Anybody having experience with this problem ?
Here the modified Code I tested:
using
System;
using
System.IO;
using
System.Speech.Synthesis;
class
Program
static
System.Globalization.CultureInfo MyCultureInfo =
new
System.Globalization.CultureInfo(
"
en-US"
);
static
System.Globalization.CultureInfo MyCultureInfoGerman =
new
System.Globalization.CultureInfo(
"
de-DE"
);
static
void
Main(string[] args)
SpeechSynthesizer synthD =
new
SpeechSynthesizer();
synthD.SelectVoiceByHints(VoiceGender.Male, VoiceAge.Adult,
1
, MyCultureInfoGerman);
synthD.SetOutputToDefaultAudioDevice();
synthD.Speak(
"
Hallo, bitte starten Sie jetzt die Initialisierung, falls die Maschine bereit ist!"
);
SpeechSynthesizer synth =
new
SpeechSynthesizer();
synth.SelectVoiceByHints(VoiceGender.Male, VoiceAge.Adult,
1
, MyCultureInfo);
synth.SetOutputToDefaultAudioDevice();
synth.Speak(
"
Hello, world!"
);
synth.SelectVoiceByHints(VoiceGender.Female, VoiceAge.Adult,
0
, MyCultureInfo);
synth.Rate =
2
;
synth.Volume =
40
;
synth.Speak(
"
Hello, I am a female voice!"
);
synth.Rate = -2;
synth.Volume =
100
;
synth.Speak(
"
Hello, I am speaking slower and louder!"
);
synth.Speak(
"
Hello, I will pause for 3 seconds now."
);
synth.Pause();
System.Threading.Thread.Sleep(
3000
);
synth.Resume();
synth.Speak(
"
I am back!"
);
synth.SetOutputToWaveFile(
"
output.wav"
);
synth.Speak(
"
Hello, I am saving my speech to a WAV file!"
);
MemoryStream stream =
new
MemoryStream();
synth.SetOutputToWaveStream(stream);
synth.Speak(
"
Hello, I am being streamed to a memory stream!"
);
byte[] speechBytes = stream.GetBuffer();
PromptBuilder builder =
new
PromptBuilder(MyCultureInfo);
builder.StartVoice(VoiceGender.Female, VoiceAge.Adult,
1
);
builder.AppendText(
"
Hello, my name is Emily."
);
builder.EndVoice();
builder.StartVoice(VoiceGender.Female, VoiceAge.Teen,
2
);
builder.AppendText(
"
I am from New York City."
);
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Strong });
builder.AppendText(
"
I really love chocolate!"
);
builder.EndStyle();
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Reduced });
builder.AppendText(
"
But I'm allergic to it..."
);
builder.EndStyle();
builder.EndVoice();
synth.SetOutputToDefaultAudioDevice();
synth.Speak(builder);
Console.ReadLine();
Sign in
·
View Thread
Clearly there are a few things missing here.
1) Using VS2022, I had to download and install
System.Speech > 8.0.0-preview.2.23128.3
using the NuGet Package Manager.
2) The code
'as is'
throws exceptions like 'Cannot generate SSML data: Voice element not closed.'
MS should change this message to say something like 'Close Voice element using EndVoice()'
3) The code following '// Setting the Speech Stream' really does nothing, because the speechBytes are not used anywhere! (Am I missing something??)
4) To
hear
the text synthesized in the 'builder' you will need to add the following before the synth.Speak(builder) line
synth.SetOutputToDefaultAudioDevice();
//otherwise this is going to WavStream??
5) In System.Speech > 8.0.0-preview.2.23128.3
VoiceAge.Teen
sounds exactly like
VoiceAge.Adult
.
So, the code that appears to work on my system looks like this:
SpeechSynthesizer synth =
new
SpeechSynthesizer();
synth.SetOutputToDefaultAudioDevice();
synth.Speak(
"
Hello, world!"
);
synth.SelectVoiceByHints(VoiceGender.Female, VoiceAge.Adult);
synth.Speak(
"
Hello, I am a female voice!"
);
synth.Rate = -2;
synth.Volume =
100
;
synth.Speak(
"
Hello, I am speaking slower and louder!"
);
synth.Speak(
"
Hello, I will pause for 3 seconds now."
);
synth.Pause();
System.Threading.Thread.Sleep(
3000
);
synth.Resume();
synth.Speak(
"
I am back!"
);
synth.SetOutputToWaveFile(
"
output.wav"
);
synth.Speak(
"
Hello, I am saving my speech to a WAV file!"
);
MemoryStream stream =
new
MemoryStream();
synth.SetOutputToWaveStream(stream);
synth.Speak(
"
Hello, I am being streamed to a memory stream!"
);
byte[] speechBytes = stream.GetBuffer();
PromptBuilder builder =
new
PromptBuilder();
builder.StartVoice(VoiceGender.Female, VoiceAge.Adult,
1
);
builder.AppendText(
"
Hello, my name is Emily."
);
builder.EndVoice();
builder.StartVoice(VoiceGender.Female, VoiceAge.Teen,
2
);
builder.AppendText(
"
I am from New York City."
);
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Strong });
builder.AppendText(
"
I really love chocolate!"
);
builder.EndStyle();
builder.StartStyle(
new
PromptStyle() { Emphasis = PromptEmphasis.Reduced });
builder.AppendText(
"
But I'm allergic to it..."
);
builder.EndStyle();
builder.EndVoice();
synth.SetOutputToDefaultAudioDevice();
synth.Speak(builder);
Sign in
·
View Thread
Code had an error...you didn't "EndVoice()" after the "StartVoice()" calls
SBGTrading
18-Mar-23 12:33
SBGTrading
18-Mar-23 12:33
I had to add "builder.EndVoice();" calls after the Emily append, and the New York City append statements.
Sign in
·
View Thread
Interesting even if the article is far from being exhaustive in particular only deals with Framework not .Net
Sign in
·
View Thread
You're missing an EndVoice in the builder section
Sign in
·
View Thread
A few months ago I looked into the code available in this namespace, after spending some time with Edge's Read Aloud function. Edge apparently uses a completely different TTS engine, and comparing the results, it's obvious Edge's sounds a lot better. Like, it's in a completely different league.
Have you looked into what Edge is using? Any idea if there's some APIs one can hook into to use Edge's TTS rather than what's available here?
Sign in
·
View Thread
Maybe this is what you are referring to ?
How to get new edge's TTS voices into C# .NET winforms app?
Text to Speech – Realistic AI Voice Generator | Microsoft Azure
Bringing cloud powered voices to Microsoft Edge Insiders - Microsoft Edge Blog
Just guessing and copy-pasting URLs.
Sign in
·
View Thread