Open Source Speech Recognition Software in Java

19,848

Solution 1

The best way to approach this would be use an existing recognition toolkit and the language and acoustic models that come with it. You may train the models to fit your needs.

CMUSphinx is probably the best FOSS speech recognition toolkit out there. CMUSphinx also provides good Java integration and demo applications.

Solution 2

After evaluating several 3rd party speech recognition options, Google voice recognition is by far the most accurate. There are two basic approaches when using Google voice recognition. The easiest is to launch an Intent and handle the results accordingly:

    Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

    intent.addFlags(Intent.FLAG_ACTIVITY_CLEAR_TOP);
    intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);

    startActivityForResult(intent, VOICE_RECOGNITION_REQUEST_CODE );

then in your onActivityResults(), you would handle the matches returned by the service:

    /**
 * Handle the results from the recognition activity.
 */
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
    super.onActivityResult(requestCode, resultCode, data);
    //Toast.makeText(this, "voice recog result: " + resultCode, Toast.LENGTH_LONG).show();
    if (requestCode == VOICE_RECOGNITION_REQUEST_CODE && resultCode == RESULT_OK) {
        // Fill the list view with the strings the recognizer thought it could have heard
        ArrayList<String> matches = data.getStringArrayListExtra(
                RecognizerIntent.EXTRA_RESULTS);
        // handleResults
        if (matches != null) {
            handleResults(matches); 
        }                    
    }     
}

The second approach is more involved but allows for better handling of an error condition that can happen while the recognition service is running. Using this approach, you would create your own recognition listener and callback methods. For example:

start listening:

mSpeechRecognizer.startListening(mRecognizerIntent);

where mRecognizerIntent:

    mSpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(getBaseContext());
    mSpeechRecognizer.setRecognitionListener(mRecognitionListener);
    mRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    mRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
            RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    mRecognizerIntent.putExtra("calling_package", "com.you.package");

then, create your listener:

    private RecognitionListener mRecognitionListener = new RecognitionListener() {
            public void onBufferReceived(byte[] buffer) {
                    // TODO Auto-generated method stub
                    //Log.d(TAG, "onBufferReceived");
            }

            public void onError(int error) {
                    // TODO Auto-generated method stub
                    // here is where you handle the error...


            public void onEvent(int eventType, Bundle params) {
                    // TODO Auto-generated method stub
                    Log.d(TAG, "onEvent");
            }

            public void onPartialResults(Bundle partialResults) {
                    // TODO Auto-generated method stub
                    Log.d(TAG, "onPartialResults");
            }

            public void onReadyForSpeech(Bundle params) {
                    // TODO Auto-generated method stub
                    Log.d(TAG, "onReadyForSpeech");

            }

            public void onResults(Bundle results) {

                    Log.d(TAG, ">>> onResults");
                    //Toast.makeText(getBaseContext(), "got voice results!", Toast.LENGTH_SHORT);

                    ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
                    handleResults(matches);


            }

            public void onRmsChanged(float rmsdB) {
                    // TODO Auto-generated method stub
                    //Log.d(TAG, "onRmsChanged");
            }

            public void onBeginningOfSpeech() {
                    // TODO Auto-generated method stub
                    Log.d(TAG, "onBeginningOfSpeech");
            }

            public void onEndOfSpeech() {
                    // TODO Auto-generated method stub
                    Log.d(TAG, "onEndOfSpeech");

            }

};

you can add your handleResults() to do whatever you want.

Solution 3

You can also use the Google Speech API. From Android it is accessible through the SpeechRecognizer Class Reference

Here is a link to a stackoverflow question, which also contains some demo code in Java: Speech recognition in Java

Share:
19,848
LefterisL
Author by

LefterisL

I am an all around web developer. From websites, to custom CMS and web applications to Android apps. I love creating stuff from scratch. I give extra attention to structure and order and am a big fan of quality code. I also like fixing the little details and optimizing an app. I prefer working in a team, you can learn much faster this way and produce better results.

Updated on June 04, 2022

Comments

  • LefterisL
    LefterisL almost 2 years

    I've been thinking lately to start an application based on Speech recognition. Meaning on certain results to do specific tasks. I was wondering what is the best way to proceed. I'm thinking either for PC or Android also. I would consider JAVA as my strong programming language.

    I've done some searching but still I don't know which is the best way to approach this.

    Have an open software do the speech recognition part for me and work on the other part? Do the whole thing by myself? And if yes is it possible in JAVA?

    Any info will be appreciated.

    Thank you in advance.

  • LefterisL
    LefterisL over 10 years
    Not the first time i've heard of CMUSphinx while i was searching. Thank you for the info.