Speech Recognition – ReconVox – DTec Biometría

ReconVox

ReconVox is our continuous speech recognition engine. You can get the transcription of a phone conversation or analyse answers in modern IVRs based on open questions.

It can also work in Word-Spotting mode for finding names, brands or new keywords defined on the fly!

ReconVox Product Information

ReconVox features

Continuous Speech Recognition
For getting the transcription of natural speech conversations.
Word-Spotting mode
For finding special keywords or sentences in voice recordings.
Custom vocabularies
Dictionaries and language models can be defined from scratch.
Two styles of language model
General purpose for getting the transcription vs based on specific rules defined by the user.

AutoLearn
Automatic adaptation to special accents or noisy environments.
API for on-premise solutions
Efficient enough even for embedded systems.
Languages
Currently available: English and Spanish. New languages in preparation.
Speaker independent
Dedicated training for speakers is not needed.

AutoLearn

Fast, continuous accuracy improvement for difficult acoustic conditions.

Extremely flexible technology that can be used for improving the accuracy of a specific speaker, but also for adapting to unmatching dialectic regions or even noisy acoustic environments.

Typical situations include:

Adapting a main language to a dialectic variant; e.g. global UK English to Scottish pronunciation.

Improving robustness in noisy audio channels; e.g. radio transmissions or analogue phone lines.

Improving robustness in noisy acoustic environments; e.g. engine background noise.

Maximizing accuracy for known speakers; e.g. AI personal assistants, domotics, alarm deactivation, phone banking IVR.

Two modes of operation are supported:

Automatic

AutoLearn manages by itself all the learning process; the administrator of the solution just switches it on and the final users keep using ReconVox as usual.

When eventually enough audio information is available, AutoLearn trains on the fly a new set of improved acoustic models that are immediately used and a new learning cycle is started.

Supervised

If the learning process is to be optimized and accelerated, it’s possible for an administrator to provide a set of selected voice recordings together with their exact transcriptions.

The administrator has full control of the quantity and type of the audio to be used in the learning iteration.

Word-Spotting

Search for concepts defined on the fly.

Search into voice recordings by content or scan an audio stream in real time for raising an alarm when a keyword is detected.

There are several advantages of this mode over the continuous speech recognition mode:

You don’t need to define a vocabulary beforehand. It’s possible to search live for special keywords or sentences never seen before, like new brands, names or expressions. Just type the new concept and launch the search.

Meaningful results for heavily disorganized speech, where a language model can’t represent well the nature of the conversation and thus the accuracy of the transcription is poor. In that situations, instead of trying to get the full transcription it may a be better strategy to search for some key expressions with WordSpotting.

You can even mix different languages in the same stream or recording, because this mode ignores the audio surrounding the keyword of interest. For example, you can search for the keyword “Wiki-Leaks” (pronounced in English) in YouTube videos spoken in English, German or Arabic!

Applications

Speech recognition, made efficient and customizable.