Speech recognition has its underlying foundations in research done at Bell Labs in the mid-1950s. Early frameworks were restricted to a solitary speaker and had constrained vocabularies of around twelve words. Present-day discourse acknowledgment frameworks have made considerable progress since their antiquated partners. They can perceive discourse from different speakers and have tremendous vocabularies in various dialects.
In this article - we will show the basics of Speech Recognition with Python. As you maybe know, our platform backend is totally built with Python and Django and this topic should be covered for sure in our blog.
Libraries for Speech Recognition available
You can find a list of PyPI packages for speech recognition. A few of them are:
Some of these packages—such as wit and apiai — offer built-in features, like natural language processing for identifying a speaker’s intent, which goes beyond basic speech recognition. Others, like google-cloud-speech, focus solely on speech-to-text conversion.
There is one package that stands out in terms of ease-of-use: SpeechRecognition.
The flexibility and ease-of-use of the SpeechRecognition package make it an excellent choice for any Python project. However, support for every feature of each API it wraps is not guaranteed. You will need to spend some time researching the available options to find out if SpeechRecognition will work in your particular case.
How to install SpeechRecognition
You can install SpeechRecognition from a terminal with pip:
pip install SpeechRecognition
Our recommendation is to use the package with Python 3, but if you want - it is compatible with the older versions of the language.
Once installed, you should verify the installation by opening an interpreter session and typing:
# Show version of SpeechRecognition
# https://pypi.org/project/SpeechRecognition/
import speech_recognition as sr
sr.__version__
# 3.8.1
Speech recognition engine/API support:
The Recognizer
All of the magic in SpeechRecognition happens with the Recognizer class.
The primary purpose of a Recognizer instance is, of course, to recognize speech. Each instance comes with a variety of settings and functionality for recognizing speech from an audio source.
Creating a Recognizer instance is easy. In your current interpreter session, you need to type:
r = sr.Recognizer()
Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:
- recognize_bing(): Microsoft Bing Speech
- recognize_google(): Google Web Speech API
- recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
- recognize_houndify(): Houndify by SoundHound
- recognize_ibm(): IBM Speech to Text
- recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
- recognize_wit(): Wit.ai
Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.
Each recognize_*() method will throw a speech_recognition.RequestError exception if the API is unreachable. For recognize_sphinx(), this could happen as the result of a missing, corrupt or incompatible Sphinx installation. For the other six methods, RequestError may be thrown if quota limits are met, the server is unavailable, or there is no internet connection.
All seven recognize_*() methods of the Recognizer class require an audio_data argument. In each case, audio_data must be an instance of SpeechRecognition’s AudioData class.
Working With Audio Files
SpeechRecognition makes working with audio files easy thanks to its handy AudioFile class. This class can be initialized with the path to an audio file and provides a context manager interface for reading and working with the file’s contents.
Supported File Types
- WAV: must be in PCM/LPCM format
- AIFF
- AIFF-C
- FLAC: must be native FLAC format; OGG-FLAC is not supported
If you are working on x-86 based Linux, macOS or Windows, you should be able to work with FLAC files without a problem. On other platforms, you will need to install a FLAC encoder and ensure you have access to the flac command line tool.
Using record() to Capture Data From a File
audio_file = sr.AudioFile('audio.wav')
with audio_file as source:
audio_file = r.record(source)
You can now invoke recognize_google() or some of the other methods to attempt to recognize any speech in the audio. Depending on your internet connection speed, you may have to wait several seconds before seeing the result.
r.recognize_google(audio)
Congratulations! If everything is fine with your origin "wav" file - you will probably see its content in the form of text.
There is for sure many more topics to cover, but this is enough for the first tutorial. If you want to add and use speech recognition easily, you can try our service also. It is just that easy as installing 10 lines of code on your site. Good luck!