Python speech recognition for beginners

Python speech recognition for beginners

Speech recognition has its underlying foundations in research done at Bell Labs in the mid-1950s. Early frameworks were restricted to a solitary speaker and had constrained vocabularies of around twelve words. Present-day discourse acknowledgment frameworks have made considerable progress since their antiquated partners. They can perceive discourse from different speakers and have tremendous vocabularies in various dialects.

In this article - we will show the basics of Speech Recognition with Python. As you maybe know, our platform backend is totally built with Python and Django and this topic should be covered for sure in our blog.

Libraries for Speech Recognition available

You can find a list of PyPI packages for speech recognition. A few of them are:

Some of these packages—such as wit and apiai — offer built-in features, like natural language processing for identifying a speaker’s intent, which goes beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.

There is one package that stands out in terms of ease-of-use: SpeechRecognition.

The flexibility and ease-of-use of the SpeechRecognition package make it an excellent choice for any Python project. However, support for every feature of each API it wraps is not guaranteed. You will need to spend some time researching the available options to find out if SpeechRecognition will work in your particular case.

How to install SpeechRecognition

You can install SpeechRecognition from a terminal with pip:

pip install SpeechRecognition

Our recommendation is to use the package with Python 3, but if you want - it is compatible with the older versions of the language.

Once installed, you should verify the installation by opening an interpreter session and typing:

# Show version of SpeechRecognition
import speech_recognition as sr
# 3.8.1

The Recognizer

All of the magic in SpeechRecognition happens with the Recognizer class.

The primary purpose of a Recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source.

Creating a Recognizer instance is easy. In your current interpreter session, you need to type:

r = sr.Recognizer()

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

Each recognize_*() method will throw a speech_recognition.RequestError exception if the API is unreachable. For recognize_sphinx(), this could happen as the result of a missing, corrupt or incompatible Sphinx installation. For the other six methods, RequestError may be thrown if quota limits are met, the server is unavailable, or there is no internet connection.

All seven recognize_*() methods of the Recognizer class require an audio_data argument. In each case, audio_data must be an instance of SpeechRecognition’s AudioData class.

Working With Audio Files

SpeechRecognition makes working with audio files easy thanks to its handy AudioFile class. This class can be initialized with the path to an audio file and provides a context manager interface for reading and working with the file’s contents.

Supported File Types

  • WAV: must be in PCM/LPCM format
  • AIFF
  • AIFF-C
  • FLAC: must be native FLAC format; OGG-FLAC is not supported

If you are working on x-86 based Linux, macOS or Windows, you should be able to work with FLAC files without a problem. On other platforms, you will need to install a FLAC encoder and ensure you have access to the flac command line tool.

Using record() to Capture Data From a File

audio_file = sr.AudioFile('audio.wav')
with audio_file as source:
    audio_file = r.record(source)

You can now invoke recognize_google() or some of the other methods to attempt to recognize any speech in the audio. Depending on your internet connection speed, you may have to wait several seconds before seeing the result.


Congratulations! If everything is fine with your origin "wav" file - you will probably see its content in the form of text.

There is for sure many more topics to cover, but this is enough for the first tutorial. If you want to add and use speech recognition easily, you can try our service also. It is just that easy as installing 10 lines of code on your site. Good luck!

Subscribe to our newsletter

* indicates required
Share this article:

More great articles

Voice Recognition for accessibility: making your website more inclusive

Voice recognition technology has the potential to make websites more accessible to individuals with disabilities by allowing them to interact with the website through voice commands.

Read Story

Transfer learning and fine-tuning in Keras and Tensorflow to build an image recognition system and classify any object

This post will show you how to use transfer learning and fine-tuning to identify any customizable object categories! To recapitulate, here is the blog post series we’ll be following.

Read Story
The Future of Websites: How Speech Recognition Will Change Everything

Stop Typing. Start Talking: How speech recognition will change the future of websites

We run in a world where everything should be fast, easy to find, and easy to use. Your customers don't have much time, and they are willing to receive your service now, without additional effort. But how can you help them?

Read Story