In this guide, we will see how speech recognition can be done using Python, Google's Speech API, and ReSpeaker USB Mic.

In this guide, we will see how the speech recognition can be done using Python, Google's Speech API and ReSpeaker USB Mic from Seeed Studio

Speech Recognition

Speech Recognition is a part of Natural Language Processing which is a subfield of Artificial Intelligence. To put it simply, speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text. It is used in several applications such as voice assistant systems, home automation, voice based chatbots, voice interacting robot, artificial intelligence and etc.

There are different APIs(Application Programming Interface) for recognizing speech. They offer services either free or paid. These are:

We will be using Google Speech Recognition here, as it doesn't require any API key. This tutorial aims to provide an introduction on how to use Google Speech Recognition library on Python with the help of external microphone like ReSpeaker USB 4-Mic Array from Seeed Studio. Although it is not mandatory to use external microphone, even built-in microphone of laptop can be used.

ReSpeaker USB 4-Mic Array

The ReSpeaker USB Mic is a quad-microphone device designed for AI and voice applications, which was developed by Seeed Studio. It has 4 high performance, built-in omnidirectional microphones designed to pick up your voice from anywhere in the room and 12 programmable RGB LED indicators. The ReSpeaker USB mic supports Linux, macOS, and Windows operating systems. Details can be found here.

IMG_4290.JPG
IMG_4291.JPG

The ReSpeaker USB Mic comes in a nice package containing the following items:

  1. A user guide
  2.  ReSpeaker USB Mic Array
  3. Micro USB to USB Cable
IMG_4306.JPG

So we're ready to get started.

Install Required Libraries

For this tutorial, I’ll assume you are using Python 3.x.

Let's install the libraries:

  • pip3 install SpeechRecognition

For macOS, first you will need to install PortAudio with Homebrew, and then install PyAudio with pip3:

  • brew install portaudio
  • pip3 install pyaudio

For Linux, you can install PyAudio with apt:

  • sudo apt-get install python-pyaudio python3-pyaudio

For Windows, you can install PyAudio with pip:

  • pip install pyaudio

Create a new python file 

  • nano get_index.py

Paste on get_index.py below code snippet:

        import pyaudio

p = pyaudio.PyAudio()
info = p.get_host_api_info_by_index(0)
numdevices = info.get('deviceCount')

for i in range(0, numdevices):
        if (p.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0:
            print ("Input Device id ", i, " - ", p.get_device_info_by_host_api_device_index(0, i).get('name'))
    

Run the following command:

  • python3 get_index.py

In my case, command gives following output to screen :

  • Input Device id 1 - ReSpeaker 4 Mic Array (UAC1.0)
  • Input Device id 2 - MacBook Air Microphone

Change device_index to index number as per your choice in below code snippet.

Run the following command:

  • python3 get_index.py

In my case, command gives following output to screen :

Input Device id 1 - ReSpeaker 4 Mic Array (UAC1.0)

Input Device id 2 - MacBook Air Microphone

        import speech_recognition as sr

r = sr.Recognizer()
speech = sr.Microphone(device_index=1)
with speech as source:
    print("say something!…")
    audio = r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
try:
    recog = r.recognize_google(audio, language = 'en-US')

    print("You said: " + recog)
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))
    

Device index was chosen 1 due to ReSpeaker 4 Mic Array will be as a main source.

Text-to-speech in Python With pyttsx3 Library

There are several APIs available to convert text to speech in python. One of such APIs is the pyttsx3, which is the best available text-to-speech package in my opinion. This package works in Windows, Mac, and Linux. Check the official documentation to see how this is done.

Install the package

Use pip to install the package.

  • pip install pyttsx3

If you are in Windows, you will need an additional package, pypiwin32 which it will need to access the native Windows speech API.

  • pip install pypiwin32

Convert text to speech python script

Below is the code snippet for text to speech using pyttsx3 :

        import pyttsx3
engine = pyttsx3.init()
engine.setProperty('rate', 150)    # Speed percent
engine.setProperty('volume', 0.9)  # Volume 0-1
engine.say("Hello, world!")
engine.runAndWait()
    

Putting It All Together

The below code is responsible for recognising human speech using Google Speech Recognition, and converting the text into speech using pyttsx3 library.

        import speech_recognition as sr
import pyttsx3

engine = pyttsx3.init()
engine.setProperty('rate', 200)
engine.setProperty('volume', 0.9)
r = sr.Recognizer()
speech = sr.Microphone(device_index=1)
with speech as source:
    audio = r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
try:
    recog = r.recognize_google(audio, language = 'en-US')
    print("You said: " + recog)
    engine.say("You said: " + recog)
    engine.runAndWait()
except sr.UnknownValueError:
    engine.say("Google Speech Recognition could not understand audio")
    engine.runAndWait()
except sr.RequestError as e:
    engine.say("Could not request results from Google Speech Recognition service; {0}".format(e))
    engine.runAndWait()
    

It prints output on terminal. Also, it will be converted into speech as well.

  • You said: London is the capital of Great Britain

I hope you now have better understanding of how speech recognition works in general and most importantly, how to implement that using Google Speech Recognition API with Python.

If you have any questions or feedback? Leave a comment below. Stay tuned!