Python speech recognition for beginners

Python speech recognition for beginners

Speech recognition has its underlying foundations in research done at Bell Labs in the mid-1950s. Early frameworks were restricted to a solitary speaker and had constrained vocabularies of around twelve words. Present-day discourse acknowledgment frameworks have made considerable progress since their antiquated partners. They can perceive discourse from different speakers and have tremendous vocabularies in various dialects.

In this article - we will show the basics of Speech Recognition with Python. As you maybe know, our platform backend is totally built with Python and Django and this topic should be covered for sure in our blog.

Libraries for Speech Recognition available

You can find a list of PyPI packages for speech recognition. A few of them are:

Some of these packages—such as wit and apiai — offer built-in features, like natural language processing for identifying a speaker’s intent, which goes beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.

There is one package that stands out in terms of ease-of-use: SpeechRecognition.

The flexibility and ease-of-use of the SpeechRecognition package make it an excellent choice for any Python project. However, support for every feature of each API it wraps is not guaranteed. You will need to spend some time researching the available options to find out if SpeechRecognition will work in your particular case.

How to install SpeechRecognition

You can install SpeechRecognition from a terminal with pip:

pip install SpeechRecognition

Our recommendation is to use the package with Python 3, but if you want - it is compatible with the older versions of the language.

Once installed, you should verify the installation by opening an interpreter session and typing:

# Show version of SpeechRecognition
# https://pypi.org/project/SpeechRecognition/
import speech_recognition as sr
sr.__version__
# 3.8.1

The Recognizer

All of the magic in SpeechRecognition happens with the Recognizer class.

The primary purpose of a Recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source.

Creating a Recognizer instance is easy. In your current interpreter session, you need to type:

r = sr.Recognizer()

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

Each recognize_*() method will throw a speech_recognition.RequestError exception if the API is unreachable. For recognize_sphinx(), this could happen as the result of a missing, corrupt or incompatible Sphinx installation. For the other six methods, RequestError may be thrown if quota limits are met, the server is unavailable, or there is no internet connection.

All seven recognize_*() methods of the Recognizer class require an audio_data argument. In each case, audio_data must be an instance of SpeechRecognition’s AudioData class.

Working With Audio Files

SpeechRecognition makes working with audio files easy thanks to its handy AudioFile class. This class can be initialized with the path to an audio file and provides a context manager interface for reading and working with the file’s contents.

Supported File Types

  • WAV: must be in PCM/LPCM format
  • AIFF
  • AIFF-C
  • FLAC: must be native FLAC format; OGG-FLAC is not supported

If you are working on x-86 based Linux, macOS or Windows, you should be able to work with FLAC files without a problem. On other platforms, you will need to install a FLAC encoder and ensure you have access to the flac command line tool.

Using record() to Capture Data From a File

audio_file = sr.AudioFile('audio.wav')
with audio_file as source:
    audio_file = r.record(source)

You can now invoke recognize_google() or some of the other methods to attempt to recognize any speech in the audio. Depending on your internet connection speed, you may have to wait several seconds before seeing the result.

r.recognize_google(audio)

Congratulations! If everything is fine with your origin "wav" file - you will probably see its content in the form of text.

There is for sure many more topics to cover, but this is enough for the first tutorial. If you want to add and use speech recognition easily, you can try our service also. It is just that easy as installing 10 lines of code on your site. Good luck!

Tags

Share this article:

More great articles

The 13 most effective methods to increase your Conversion Rate

Increasing one’s website’s conversion rate is one of the hot topics of all e-commerce websites. They all want to increase their sales and for that, there are 2 basic methods: increasing the website’s traffic or increasing the website’s conversion rate.

Read Story

Can speech recognition increase my revenue?

Speech recognition technology has come a long way since the early 1950s. Having gone from being able to identify the speech of inventors only to identifying an increasing number of dialects and languages with nearer to human accuracy is a simple proof of progress in speech recognition technology.

Read Story

A new feature from 6th of September: Interactive popup over voice widget

We want to share with you some exciting news: popups with more information are now available as an option for every voice widget (tracker).

Read Story
Icon