Express.js 应用通过认知服务语音将文本转换为语音 - JavaScript on Azure

应用程序体系结构

本教程使用最精简的 Express.js 应用，并结合使用以下内容来添加功能：

服务器 API 的新路由，提供从文本到语音的转换并返回 MP3 流

HTML 表单的新路由，可用于输入信息

带有 JavaScript 的新 HTML 表单，提供对语音服务的客户端调用

此应用程序提供三种不同的调用，将语音转换为文本：

第一个服务器调用在服务器上创建一个文件，然后将其返回给客户端。通常将其用于较长的文本或需要多次提供的文本。

第二个服务器调用用于短期文本，在返回到客户端之前保存在内存中。

客户端调用演示了如何使用 SDK 直接调用语音服务。如果拥有仅限客户端的应用程序而无服务器，可以选择执行此调用。

Node.js 10.1+ 和 npm - 已安装到本地计算机。

Visual Studio Code - 已安装到本地计算机。

适用于 VS Code 的 Azure 应用服务扩展（从 VS Code 中安装）。

Git - 用于推送到 GitHub，这将激活 GitHub 操作。

使用 bash Embed launch 启动

Azure Cloud Shell

如果需要，请安装 Azure CLI 来运行 CLI 参考命令。

如果使用的是本地安装，请通过 Azure CLI 使用 az login 命令登录。若要完成身份验证过程，请遵循终端中显示的步骤。有关更多登录选项，请参阅使用 Azure CLI 登录。

出现提示时，请在首次使用时安装 Azure CLI 扩展。有关扩展详细信息，请参阅使用 Azure CLI 的扩展。

运行 az version 以查找安装的版本和依赖库。若要升级到最新版本，请运行 az upgrade 。

下载示例 Express.js 存储库

使用 git，将 Express.js 示例存储库克隆到本地计算机。

git clone https://github.com/Azure-Samples/js-e2e-express-server
更改为示例的新目录。
cd js-e2e-express-server
在 Visual Studio Code 中打开项目。
code .
在 Visual Studio Code 中打开新终端并安装项目依赖项。
npm install
安装用于 JavaScript 的认知服务语音 SDK
从 Visual Studio Code 终端安装 Azure 认知服务语音 SDK。
npm install microsoft-cognitiveservices-speech-sdk
为 Express.js 应用创建语音模块
若要将 Speech SDK 集成到 Express.js 应用程序中，请在 src 文件夹中创建一个名为 azure-cognitiveservices-speech.js 的文件。
紧跟在默认根路由之后添加以下代码以拉取依赖关系，并创建一个将文本转换为语音的函数。
// azure-cognitiveservices-speech.js
const sdk = require('microsoft-cognitiveservices-speech-sdk');
const { Buffer } = require('buffer');
const { PassThrough } = require('stream');
const fs = require('fs');
 * Node.js server code to convert text to speech
 * @returns stream
 * @param {*} key your resource key
 * @param {*} region your resource region
 * @param {*} text text to convert to audio/speech
 * @param {*} filename optional - best for long text - temp file for converted speech/audio
const textToSpeech = async (key, region, text, filename)=> {
    // convert callback function to promise
    return new Promise((resolve, reject) => {
        const speechConfig = sdk.SpeechConfig.fromSubscription(key, region);
        speechConfig.speechSynthesisOutputFormat = 5; // mp3
        let audioConfig = null;
        if (filename) {
            audioConfig = sdk.AudioConfig.fromAudioFileOutput(filename);
        const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
        synthesizer.speakTextAsync(
            text,
            result => {
                const { audioData } = result;
                synthesizer.close();
                if (filename) {
                    // return stream from file
                    const audioFile = fs.createReadStream(filename);
                    resolve(audioFile);
                } else {
                    // return stream from memory
                    const bufferStream = new PassThrough();
                    bufferStream.end(Buffer.from(audioData));
                    resolve(bufferStream);
            error => {
                synthesizer.close();
                reject(error);
module.exports = {
    textToSpeech
参数 - 文件拉取依赖关系以便使用 SDK、流、缓冲区和文件系统 (fs)。 textToSpeech 函数采用四个参数。 如果发送包含本地路径的文件名，则文本将转换为音频文件。 如果未发送文件名，则会创建内存中音频流。

语音 SDK 方法 - 语音 SDK 方法 synthesizer.speakTextAsync 基于收到的配置返回不同的类型。
该方法返回结果，该结果基于要求执行的方法而有所不同：
将内存流创建为缓冲区数组
音频格式 - 所选的音频格式是 MP3，但是也存在其他格式，以及其他音频配置方法。
本地方法 textToSpeech，将 SDK 回叫功能打包并转换为承诺。
为 Express.js 应用创建新路由
打开 src/server.js 文件。
将 azure-cognitiveservices-speech.js 模块作为依赖项添加到文件顶部：
const { textToSpeech } = require('./azure-cognitiveservices-speech');
添加新的 API 路由，以调用在本教程的上一部分中创建的 textToSpeech 方法。

// creates a temp file on server, the streams to client
/* eslint-disable no-unused-vars */
app.get('/text-to-speech', async (req, res, next) => {
    const { key, region, phrase, file } = req.query;
    if (!key || !region || !phrase) res.status(404).send('Invalid query string');
    let fileName = null;
    // stream from file or memory
    if (file && file === true) {
        fileName = `./temp/stream-from-file-${timeStamp()}.mp3`;
    const audioStream = await textToSpeech(key, region, phrase, fileName);
    res.set({
        'Content-Type': 'audio/mpeg',
        'Transfer-Encoding': 'chunked'
    audioStream.pipe(res);
此方法从查询字符串中获取 textToSpeech 方法的必需和可选参数。 如果需要创建文件，则会开发一个唯一的文件名。 将异步调用 textToSpeech 方法，并通过管道将结果传递给响应 (res) 对象。
使用表单更新客户端网页
使用收集所需参数的表单更新客户端 HTML 网页。 基于用户选择的音频控件传入可选参数。 由于本教程提供了从客户端调用 Azure 语音服务的机制，因此还提供了 JavaScript。
打开 /public/client.html 文件并将其内容替换为以下内容：
<!DOCTYPE html>
<html lang="en">
  <title>Microsoft Cognitive Services Demo</title>
  <meta charset="utf-8" />
</head>
  <div id="content" style="">
    <h1 style="font-weight:500;">Microsoft Cognitive Services Speech </h1>
    <h2>npm: microsoft-cognitiveservices-speech-sdk</h2>
    <table width="100%">
        <td></td>
          <a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started" target="_blank">Azure
            Cognitive Services Speech Documentation</a>
        <td align="right">Your Speech Resource Key</td>
          <input id="resourceKey" type="text" size="40" placeholder="Your resource key (32 characters)" value=""
            onblur="updateSrc()">
        <td align="right">Your Speech Resource region</td>
          <input id="resourceRegion" type="text" size="40" placeholder="Your resource region" value="eastus"
            onblur="updateSrc()">
        <td align="right" valign="top">Input Text (max 255 char)</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:50px" maxlength="255"
            onblur="updateSrc()">all good men must come to the aid</textarea></td>
        <td align="right">
          Stream directly from Azure Cognitive Services
            <button id="clientAudioAzure" onclick="getSpeechFromAzure()">Get directly from Azure</button>
        <td align="right">
          Stream audio from file on server</td>
          <audio id="serverAudioFile" controls preload="none" onerror="DisplayError()">
          </audio>
        <td align="right">Stream audio from buffer on server</td>
          <audio id="serverAudioStream" controls preload="none" onerror="DisplayError()">
          </audio>
    </table>
  <!-- Speech SDK reference sdk. -->
  <script
    src="https://cdn.jsdelivr.net/npm/microsoft-cognitiveservices-speech-sdk@latest/distrib/browser/microsoft.cognitiveservices.speech.sdk.bundle-min.js">
    </script>
  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var resultDiv;
    // subscription key and region for speech services.
    var resourceKey = null;
    var resourceRegion = "eastus";
    var authorizationToken;
    var SpeechSDK;
    var synthesizer;
    var phrase = "all good men must come to the aid"
    var queryString = null;
    var audioType = "audio/mpeg";
    var serverSrc = "/text-to-speech";
    document.getElementById('serverAudioStream').disabled = true;
    document.getElementById('serverAudioFile').disabled = true;
    document.getElementById('clientAudioAzure').disabled = true;
    // update src URL query string for Express.js server
    function updateSrc() {
      // input values
      resourceKey = document.getElementById('resourceKey').value.trim();
      resourceRegion = document.getElementById('resourceRegion').value.trim();
      phrase = document.getElementById('phraseDiv').value.trim();
      // server control - by file
      var serverAudioFileControl = document.getElementById('serverAudioFile');
      queryString += `%file=true`;
      const fileQueryString = `file=true&region=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
      serverAudioFileControl.src = `${serverSrc}?${fileQueryString}`;
      console.log(serverAudioFileControl.src)
      serverAudioFileControl.type = "audio/mpeg";
      serverAudioFileControl.disabled = false;
      // server control - by stream
      var serverAudioStreamControl = document.getElementById('serverAudioStream');
      const streamQueryString = `region=${resourceRegion}&key=${resourceKey}&phrase=${phrase}`;
      serverAudioStreamControl.src = `${serverSrc}?${streamQueryString}`;
      console.log(serverAudioStreamControl.src)
      serverAudioStreamControl.type = "audio/mpeg";
      serverAudioStreamControl.disabled = false;
      // client control
      var clientAudioAzureControl = document.getElementById('clientAudioAzure');
      clientAudioAzureControl.disabled = false;
    function DisplayError(error) {
      window.alert(JSON.stringify(error));
    // Client-side request directly to Azure Cognitive Services
    function getSpeechFromAzure() {
      // authorization for Speech service
      var speechConfig = SpeechSDK.SpeechConfig.fromSubscription(resourceKey, resourceRegion);
      // new Speech object
      synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig);
      synthesizer.speakTextAsync(
        phrase,
        function (result) {
          // Success function
          // display status
          if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {
            // load client-side audio control from Azure response
            audioElement = document.getElementById("clientAudioAzure");
            const blob = new Blob([result.audioData], { type: "audio/mpeg" });
            const url = window.URL.createObjectURL(blob);
          } else if (result.reason === SpeechSDK.ResultReason.Canceled) {
            // display Error
            throw (result.errorDetails);
          // clean up
          synthesizer.close();
          synthesizer = undefined;
        function (err) {
          // Error function
          throw (err);
          audioElement = document.getElementById("audioControl");
          audioElement.disabled = true;
          // clean up
          synthesizer.close();
          synthesizer = undefined;
    // Initialization
    document.addEventListener("DOMContentLoaded", function () {
      var clientAudioAzureControl = document.getElementById("clientAudioAzure");
      var resultDiv = document.getElementById("resultDiv");
      resourceKey = document.getElementById('resourceKey').value;
      resourceRegion = document.getElementById('resourceRegion').value;
      phrase = document.getElementById('phraseDiv').value;
      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        clientAudioAzure.disabled = false;
        document.getElementById('content').style.display = 'block';
  </script>
</body>
</html>
文件中突出显示的行：
第 74 行：将 Azure Speech SDK 拉取到客户端库，使用 cdn.jsdelivr.net 网站交付 NPM 包。

第 102 行：updateSrc 方法使用包含字符串、区域和文本的查询字符串更新音频控件的 src URL。
第 137 行：如果用户选择该 Get directly from Azure 按钮，则网页直接从客户端页调用 Azure 并处理结果。
创建认知服务语音资源
使用 Azure CLI 命令在 Azure Cloud Shell 中创建语音资源。
登录到 Azure Cloud Shell。 该操作需要使用具有有效 Azure 订阅权限的帐户在浏览器中进行身份验证。
为语音资源创建资源组。
az group create \
    --location eastus \
    --name tutorial-resource-group-eastus
在资源组中创建语音资源。
az cognitiveservices account create \
    --kind SpeechServices \
    --location eastus \
    --name tutorial-speech \
    --resource-group tutorial-resource-group-eastus \
    --sku F0
如果已创建唯一的可用语音资源，则此命令将失败。
使用命令获取新的语音资源的密钥值。
az cognitiveservices account keys list \
    --name tutorial-speech \
    --resource-group tutorial-resource-group-eastus \
    --output table
复制其中一个密钥。
可将密钥粘贴到 Express 应用的 Web 窗体中，以向 Azure 语音服务进行身份验证。
运行 Express.js 应用将文本转换为语音
使用以下 bash 命令启动应用。
npm start
在浏览器中打开 Web 应用。
http://localhost:3000    
将语音密钥粘贴到突出显示的文本框中。
（可选）将文本更改为新内容。
选择三个按钮之一，开始转换为音频格式：
直接从 Azure 获取 - 客户端对 Azure 的调用
文件中音频的音频控件
缓冲区中音频的音频控件
你可能会注意到从选择控件到音频播放之间存在很短的延迟。
在 Visual Studio Code 中创建新的 Azure 应用服务
在命令面板 (Ctrl+Shift+P) 中，键入“创建 Web”并选择“Azure 应用服务:创建新 Web 应用...高级”。 我们使用高级命令来完全控制部署（包括资源组、应用服务计划、操作系统），而不是使用 Linux 默认设置。
响应提示，如下所述：
选择你的“订阅”帐户。
对于“输入全局唯一的名称”，例如 my-text-to-speech-app。
输入在整个 Azure 中均唯一的名称。 仅使用字母数字字符（“A-Z”、“a-z”和“0-9”）和连字符（“-”）
选择 tutorial-resource-group-eastus 作为资源组。
选择包含 Node 和 LTS 的运行时堆栈版本。
选择 Linux 操作系统。
选择“创建新的应用服务计划”，并提供名称，如 my-text-to-speech-app-plan。
选择 F1 免费定价层。 如果订阅已有免费 Web 应用，请选择 Basic 层。
对于 Application Insights 资源，选择“暂时跳过”。
选择 eastus 位置。
短时间过后，Visual Studio Code 会通知你创建已完成。 使用“X”按钮关闭通知：。
在 Visual Studio Code 中将本地 Express.js 应用部署到远程应用服务
部署 Web 应用后，从本地计算机部署代码。 选择 Azure 图标以打开“Azure 应用服务”资源管理器，展开订阅节点，右键单击刚创建的 Web 应用的名称，然后选择“配置到 Web 应用”。
如果出现部署提示，请选择 Express.js 应用的根文件夹并再次选择你的订阅帐户，然后选择此前创建的 Web 应用的名称 my-text-to-speech-app。
如果在部署到 Linux 时提示运行 npm install，请在系统提示更新配置以在目标服务器上运行 npm install 时选择“是”。
部署完成后，选择提示中的“浏览网站”，查看全新部署的 Web 应用。
（可选）：可以对代码文件进行更改，然后使用 Azure 应用服务扩展中的“部署到 Web 应用”来更新 Web 应用。
在 Visual Studio Code 中流式传输远程服务日志
通过调用 console.log 来查看（跟踪）正在运行的应用所生成的任何输出。 此输出显示在 Visual Studio Code 的“输出”  窗口中。
在“Azure 应用服务”资源管理器中右键单击新的应用节点，并选择“开始流式传输日志”。 
 Starting Live Log Stream ---
在浏览器中刷新网页几次以查看更多日志输出。
通过删除资源组来清理资源
完成本教程后，需要删除包含该资源的资源组，以确保不再支付相关使用费用。
在 Azure Cloud Shell 中，使用 Azure CLI 命令删除资源组：
az group delete --name tutorial-resource-group-eastus  -y
此命令可能需要花费几分钟时间。
将 Express.js MongoDB 应用部署到 App 服务