Speech Recognition with TensorFlow.js

Voice Assistants like Amazon Alexa and Google Home have become widely popular, they allow users to quickly get things done by using speech recognition.

Thanks to improvement in speech recognition technology, TensorFlow.js released a javascript module that enables recognition of spoken commands.

In this article, we will use a pre-trained TensorFlow.js model for transfer learning. Let’s build an application which can recognize your speech command.


Speech recognition demo

Turn on your Microphone and start talking one of the words below, see whether your command is recognized and the word is highlighted.

Loading Model

GitHub repository

You can download the complete code of the above demo in the link below:


Implementation

Wow! Isn’t it amazing? Tensorflow.js really bringing Artificial Intelligence to the browser. Let’s dive into the code and show you step by step how to build this speech recognition application with tensorflow.js.

Folder Structure

This application is simple and all you need is basic components of html, javascript and css.

speech recognition folder structure
  • speech_command.js – main application javascript
  • audio.css – css style file
  • index.html – main html file

# Step 1 : Include tensorflow.js

Simply include the scripts for tfjs and speech-commands models in the <head> section of the html file. I also include the jquery library as well.

<html>
  <head>
    <script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>
    <script src="https://unpkg.com/@tensorflow/tfjs"></script>
    <script src="https://unpkg.com/@tensorflow-models/speech-commands"></script>
  </head>

# Step 2 : List command words

HTML – index.html

Add a <div> place holder in the html body

<div id="candidate-words"></div>

Add a checkbox to control turn on/off the Microphone, I had customized the checkbox to look like a mobile switch using pure css. 

<label class="form-switch">
<input type="checkbox" id="audio-switch">
<i></i> Microphone</label>

At the end of the <body>, include the main javascript file speech_command.js

    <script src="js/speech_command.js"></script>
  </body>
</html>

Javascript – speech_command.js

Initialize the variables

let recognizer;
let words;
let wordList;
let modelLoaded = false;

This library support 20 item vocabulary, consisting of: ‘zero’, ‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’, ‘seven’, ‘eight’, ‘nine’, ‘up’, ‘down’, ‘left’, ‘right’, ‘go’, ‘stop’, ‘yes’, and ‘no’, in addition to ‘background_noise’ and ‘unknown’. When the page is loaded, append the words into the list.

$( document ).ready(function() {
    wordList = ["zero","one","two","three","four","five","six","seven","eight","nine", "yes", "no", "up", "down", "left", "right", "stop", "go"];
    $.each(wordList, function( index, word ) {
        if (!word.startsWith('_')){
            $("#candidate-words").append(`<span class='candidate-word col-md-2 col-sm-3 col-3' id='word-${word}'>${word}</span>`);
        }
    });
});

# Step 3 : Load the model

When Microphone switch (checkbox) turn on (checked), first call function loadModel() to load the pre-trained model, then call function startListening() to start voice command recognition.

When Microphone switch (checkbox) turn off (unchecked), call function stopListening() to disconnect the Mic.

$("#audio-switch").change(function() {
    if(this.checked){
        if(!modelLoaded){
            loadModel();
        }else{
            startListening();
        }
    }
    else {
        stopListening();
    }   
});

function loadModel() to load the pre-trained speech command model, calling the API of speechCommands.create and recognizer.ensureModelLoaded

function loadModel(){
    $(".progress-bar").removeClass('d-none'); 
    // When calling `create()`, you must provide the type of the audio input.
    // The two available options are `BROWSER_FFT` and `SOFT_FFT`.
    // - BROWSER_FFT uses the browser's native Fourier transform.
    // - SOFT_FFT uses JavaScript implementations of Fourier transform (not implemented yet).
    recognizer = speechCommands.create("BROWSER_FFT");  
    Promise.all([
        // Make sure that the underlying model and metadata are loaded via HTTPS requests.
        recognizer.ensureModelLoaded()
      ]).then(function(){
        $(".progress-bar").addClass("d-none");
        words = recognizer.wordLabels();
        $.each(words, function( index, word ) {
            if (!word.startsWith("_") && !wordList.includes(word)){
                $("#candidate-words").append(`<span class='candidate-word' id='word-${word}'>${word}</span>`);
            }
        });
        modelLoaded = true;
        startListening();
      })
}

# Step 4 : Start speech recognition

function startListening() to call the recognizer.listen API to start listening to voice command.

It takes a probabilityThreshold parameter : The callback function will be invoked if and only if the maximum probability score of all the words is greater than this threshold.

The output of scores contains the probability scores that correspond to recognizer.wordLabels().

function startListening(){
    // `listen()` takes two arguments:
    // 1. A callback function that is invoked anytime a word is recognized.
    // 2. A configuration object with adjustable fields such a
    //    - includeSpectrogram
    //    - probabilityThreshold
    //    - includeEmbedding
    recognizer.listen(({scores}) => {
        // scores contains the probability scores that correspond to recognizer.wordLabels().
        // Turn scores into a list of (score,word) pairs.
        scores = Array.from(scores).map((s, i) => ({score: s, word: words[i]}));
        // Find the most probable word.
        scores.sort((s1, s2) => s2.score - s1.score);
        $("#word-"+scores[0].word).addClass('candidate-word-active');
        setTimeout(() => {
            $("#word-"+scores[0].word).removeClass('candidate-word-active');
        }, 2000);
    }, 
    {
        probabilityThreshold: 0.70
    });
}
function stopListening(){
    recognizer.stopListening();
}

Finally, testing on browser

For desktop, point your localhost to the cloned root directory, set up https in IIS

Browse to https://localhost/index.html

For mobile, first connect to the same wifi as your desktop, open a command window on your desktop, and type below command to find out your internal IP Address, which should be something 192.168.*.*

> ipconfig

Once you find out your internal IP Address, just replace the localhost in the above link with your internal IP Address and browse on your mobile.

Switch on the Microphone, then the browser will ask you for permission to access Microphone, click Allow.

allow microphone access

Start talking one of the command words in the list, see whether your command is recognized and the word is highlighted, it pick up 9 out of 10 correctly for me.


Tensorflow.js in 2019 has become the bread and butter for all Machine Learning Javascript projects due to its comprehensive linear algebra core and deep learning layers. From this article I hope you have fun and I encourage you to discover more about this library.

Thank you for reading. If you like this article, please share on Facebook or Twitter. Let me know in the comment if you have any questions. Follow me on Medium, GitHub and Linkedin. Support me on Ko-fi.

Leave a Reply