Python speech recognition for beginners

Python speech recognition for beginners

Speech recognition has its underlying foundations in research done at Bell Labs in the mid-1950s. Early frameworks were restricted to a solitary speaker and had constrained vocabularies of around twelve words. Present-day discourse acknowledgment frameworks have made considerable progress since their antiquated partners. They can perceive discourse from different speakers and have tremendous vocabularies in various dialects.

In this article - we will show the basics of Speech Recognition with Python. As you maybe know, our platform backend is totally built with Python and Django and this topic should be covered for sure in our blog.

Libraries for Speech Recognition available

You can find a list of PyPI packages for speech recognition. A few of them are:

Some of these packages—such as wit and apiai — offer built-in features, like natural language processing for identifying a speaker’s intent, which goes beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.

There is one package that stands out in terms of ease-of-use: SpeechRecognition.

The flexibility and ease-of-use of the SpeechRecognition package make it an excellent choice for any Python project. However, support for every feature of each API it wraps is not guaranteed. You will need to spend some time researching the available options to find out if SpeechRecognition will work in your particular case.

How to install SpeechRecognition

You can install SpeechRecognition from a terminal with pip:

pip install SpeechRecognition

Our recommendation is to use the package with Python 3, but if you want - it is compatible with the older versions of the language.

Once installed, you should verify the installation by opening an interpreter session and typing:

# Show version of SpeechRecognition
import speech_recognition as sr
# 3.8.1

The Recognizer

All of the magic in SpeechRecognition happens with the Recognizer class.

The primary purpose of a Recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source.

Creating a Recognizer instance is easy. In your current interpreter session, you need to type:

r = sr.Recognizer()

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

Each recognize_*() method will throw a speech_recognition.RequestError exception if the API is unreachable. For recognize_sphinx(), this could happen as the result of a missing, corrupt or incompatible Sphinx installation. For the other six methods, RequestError may be thrown if quota limits are met, the server is unavailable, or there is no internet connection.

All seven recognize_*() methods of the Recognizer class require an audio_data argument. In each case, audio_data must be an instance of SpeechRecognition’s AudioData class.

Working With Audio Files

SpeechRecognition makes working with audio files easy thanks to its handy AudioFile class. This class can be initialized with the path to an audio file and provides a context manager interface for reading and working with the file’s contents.

Supported File Types

  • WAV: must be in PCM/LPCM format
  • AIFF
  • AIFF-C
  • FLAC: must be native FLAC format; OGG-FLAC is not supported

If you are working on x-86 based Linux, macOS or Windows, you should be able to work with FLAC files without a problem. On other platforms, you will need to install a FLAC encoder and ensure you have access to the flac command line tool.

Using record() to Capture Data From a File

audio_file = sr.AudioFile('audio.wav')
with audio_file as source:
    audio_file = r.record(source)

You can now invoke recognize_google() or some of the other methods to attempt to recognize any speech in the audio. Depending on your internet connection speed, you may have to wait several seconds before seeing the result.


Congratulations! If everything is fine with your origin "wav" file - you will probably see its content in the form of text.

There is for sure many more topics to cover, but this is enough for the first tutorial. If you want to add and use speech recognition easily, you can try our service also. It is just that easy as installing 10 lines of code on your site. Good luck!


Share this article:

More great articles

High bounce rate? Reduce with few easy steps

You create a web page but the bounce rate keeps increasing? How are you going to deal with this situation? When a user opens your page and instantly closes it then it increases the bounce rate. If the bounce rate keeps increasing then your web page is going to lose all its worth slowly (for most of the standard cases).

Read Story

Voice trends-2021: how Speech and AI technologies change life and business

The pandemic has slowed down the development of many businesses and entire industries, but not the sphere of conversational artificial intelligence. The global voice technology market is growing at 17.2% annually, analysts say Meticulous Research. It is expected to reach $ 26.8 billion by 2025.

Read Story

Eight online tools for successful business

The global Software as a Service (SaaS) market will reach $ 68.2 billion this year and will amount to $ 219.5 billion in 2027, an average annual growth of 18.3% over the next seven years, according to Reportlinker. In times of pandemic and telecommuting, the benefits of cloud solutions and software as a service are more than obvious.

Read Story