Speech To Text Library

for Java/Processing

Download

The Library

This library used to bring speech recognition to your Processing applications. Since May 2014, this library was using Google Speech API v2. As of December 2014, further development has been discontinued.

Using WebSocket, Google Chrome and Processing, you can get unlimited speech recognition results in your Sketch. View Example

As of May 2014, you need a developer API key to use this library. The new API has a limit of 50 requests/day. Therefor, this library is not useful for real projects anymore. I decided to stop development and want to thank everyone who was experimenting with speech recognition.

Installation

Download and install the library inside your libraries folder. The library listens to the microphone input of your computer and sends recordings of your voice to Google for further processing. If the transcription was successful, the transcribe method is called and you can do whatever you want with the result.

Get your API key

Create a Google developer account and enable the Speech API in your project. http://developers.google.com. You may also have to subscribe to the chromium dev list to enable the Speech API. Read more about this here: http://www.chromium.org/developers/how-tos/api-keys

Settings

STT(PApplet p, String key, boolean history, Minim minim) constructor takes the instance of PApplet (usually this), the API key and an optional boolean value which is false by default. If you set it to true all recordings will be kept in the data folder. You can also pass a Minim instance if available.

addLanguage() You can test against multiple languages (List of supported languages). If you don’t know what the input will be, add the languages using this method.

begin() starts a record until end() is called

disableAutoRecord() disables automatic records

disableAutoThreshold() disables the analysis of the environmental volume after STT initialized

enableDebug() enables console output with relevant information about the transcription process

enableAutoRecord() analyzed the environment sound level and automatically records if anything louder than the average level is recognized

enableAutoRecord(float threshold) automatically records if the given volume threshold is reached

enableAutoThreshold() enables the analysis of the environmental sounds level after STT initialized

end() ends a record and starts transcription process

getLineIn() returns the Minim.AudioInput that STT is using. This is helpful when you want to do other things with the audio input besides voice recognition (e.g. FFT analysis). Use it instead of calling in = minim.getLineIn(Mono).

getMinimInstance() returns the Minim instance that STT is using.

setLanguage(String) en, de, fr, etc. If the language is not supported it will automatically fall back to English. List of all supported languages

setThreshold(float) sets the threshold that is used for speech recognition. If the input volume goes above the threshold it will be used for recognition

Basic Example (Press a Key to Record)

Credits

The library is based on some thoughts by Mike Pultz who wrote an article that shows how to use the technology offered by Google without a browser. The library has the following dependencies: Minim, Gson and Java FLAC Encoder.

Contact

Email me or follow me on Twitter. I’m a designer and I’m aware of bugs, errors and bad ways of coding. Anyway, as long as it works for me, I’m happy to share what I’ve got. Feel free to make any changes to the code.

Florian Schulz, June 2011–2014