Sensory, Inc. - Leaders in Speech Technology for Consumer Products
  Technologies
  Embedded Software & SDK’s
  Integrated Circuits
  NLP-5x Natural Language Processor with Motor, Sensor and Display Control
  RSC-4x Series for recognition,
synthesis and control
  SC-6x Series for speech
and music output
  IC Development ToolKits
  NLP-5x ToolKits
  RSC-4x ToolKits
  SC-6x ToolKits
  Bluetooth Solutions

Sensory offers the following speech technologies:

Speech Recognition:

Audio:

Interactive/Robotic:

Voice Recognition for BlueTooth Products:

Technology Matrix by Product

 
 
Audio and Speech Technologies
 

Sensory offers world class speech technologies on both hardware and software platforms. The technologies listed below can be implemented on the platforms depicted by the product symbols here:

 
Software: Hardware:


FluentSoft
Speech
Recognition

BlueGenie Voice User Interface
BlueGenie
Voice Interface

 NLP-5x DSP with FluentChip™ 5 Technology
NLP-5x Natural Language Processor
with FluentChip™ 5 Technology

 RSC-4x Family with FluentChip™ Technology
RSC-4x Family with
FluentChip™ 3

SC-6x Speech Synthesis Slave Processor
SC-6x Speech Synthesis Slave Processor

 
Technology Demo Videos Available:

 TrulyHandsfree™ Voice Control 3.0

 Speaker Verification Demo  Speaker Identification Demo

 NLP-5x Demo-Text-to-Speech    NLP-5x Demo-Math Flash Card   
 NLP-5x Microwave Oven   

 Beat Prediction    BlueGenie VUI    LipSync    NanoLock w/Voice Password
    Real-Time LipSync    SonicNet    SoundSource    Natural TimeSet

 

Speech Recognition:

  Phrase Spotting
 

TrulyHandsfree™ Voice Control
Phrase spotting of multiple commands or key words embedded in speech allows the FluentSoft, BlueGenie and NLP-5x to continuously listen for triggers or commands, even in the presence of high noise. The number of commands depends on the power of the processor. In phrase spotting mode, the word(s) to be recognized may be spoken in the middle of speech. Truly Hands-Free™ triggers can be used to alert the recognizer to listen for commands that follow for product control.

 TrulyHandsfree™ Voice Control 3.0

 
  Speaker Verification
 

Voice Biometrics - Speaker Identification
Speaker Verification offers the capability to verify whether or not a password is spoken by the original individual who enrolled it. The user trains 1-4 passwords (the more passwords, the better the security) that can create voice access to any product. Equal error rates (where the probability of an incorrect acceptance equals that of an incorrect rejection) ranges between 0.01-7% depending on the number of words and whether the passwords are known to the imposter.
On the NLP-5x, up to 10 SV templates can be stored on-chip. The RSC-4x can store 5 SV templates on-chip. With external memory, the number of unique sets for both chips is limited only by programmable memory capacity.

FluentSoft offers Speaker Verification as well as Speaker Identification where a user can be identified from an enrolled group - great for personalizing products!

 Speaker Verification Demo

 Speaker Identification Demo

  Language Coverage
    

Wide and Ever Expanding Coverage!
Sensory's speech recognition technologies currently support a wide range of languages covering many countries/regions all over the world.

 Language Coverage Map

We are continuously working to expand our language and country support.
Check back frequently for updated coverage or contact our Sales Department at (971)256-0056 for more information about our language offerings.

  Natural Language Interface

Flexible Grammars!
Sensory's Natural Language Interface for FluentSoft and the NLP-5x provides the unique ability to understand context-specific user's commands in the natural way the user would like to speak. Order independence allows flexibility in commands and speech prompts can request any missing information (form filling). Revolutionary flexible grammars allow the user to say multiple commands in a single phrase, and even in a flexible order. This results in the most natural use of speech recognition!

 
 
 Speaker Independent w/T2SI™
    

No Training Required!
Unspotted Speaker-Independent (SI) works right out of the box, and requires no end-user training. SI technology is designed for a specific language, and can handle thousands of words in a single set for FluentSoft, 40 words in a single set for the FluentChip/RSC-4x combo, and 75 words in a set for the FluentChip 5/NLP-5x combo. The number of sets is limited only by the amount of memory in your system. With proper design, Sensory's SI technology will yield highly accurate recognition. Sensory's Quick T2SI (text-to-SI) is the first ever GUI tool to allow product designers to create their own speaker-independent set and execute recognition within minutes on chip!

 
  Speaker Dependent Speech Recognition
 

Flexible vocabulary, any language, any accent
Speaker dependent (SD) recognition is desirable where user-specific or language-specific vocabularies are required. Each recognition word is trained just once by the user to create voice "templates", each of which requires up to 200 bytes of memory (which can be on-chip or external). Vocabularies in excess of 100 words are possible, although there are often practical reasons for keeping recognition sets under 50 words. The NLP-5x can store up to 10 SD templates in on-chip SRAM. The RSC-4128 can store up to 7 SD templates in on-chip SRAM. With proper design, Sensory's SD technology can yield highly accurate recognition for any user, regardless of language or accent.

 
  Continuous Digits
   

For entering phone numbers and digit strings
This technology is ideally suited for voice dialing applications such as mobile phones, handsets and hands-free kits. It can also be used anywhere that a string of digits are used for recognition.

 

Audio

  Text to Speech Synthesis (TTS) with Voice Morphing
 

Text-based Speech Playback
Text-to-speech (TTS) is supported for systems requiring text-based speech playback, and requires as little as 270KB of external memory. TTS works well for names or text/phrase reading and is supported in multiple languages.

 
  Stereo MP3 Decoder

Hi-Fidelity Stereo MP3 Decoder with all standard bitrates and a 5-band equalizer.

 
  Mono or Stereo Music

Sensory's music synthesis technology can produce up to 24 stereo voices simultaneously at a sample rate of 32K samples-per-second on the NLP-5x. The RSC-4x family supports up to 8 mono voices simultaneously at 8KHz sample rate. The music can be played through the on-chip stereo DAC or mono PWM (Pulse Width Modulator). Speech or sound effects encoded with SX, PCM, or ADPCM can be mixed in with the music. The music synthesis technology can "play" MIDI files that are stored in on-chip or off-chip memory, such as serial flash. MIDI files are a memory-efficient way to store music. The music synthesizer requires a database of instrument audio samples which is typically stored in external parallel flash memory. Sensory currently offers a database of instrument samples for a wide variety of common instruments from the General MIDI melodic instrument set, plus the complete General MIDI percussion set.

 
  Speech Synthesis
  

Perfect for Voice Prompts and/or Speech Output
High quality speech and sound effects can be played back by Sensory’s IC’s and software products. Sensory's compression technology utilizes proprietary time and frequency-domain approaches that can compress speech and sound effects to as little as 1000 bits per second. Speech output creates the opportunity for natural dialog with a product and can reduce reliance on an instruction manual.

 
  Record and Playback
 

Store messages and play back--voice messaging capability
Compressed digital sound reproduction
Sensory's RSC-4x and NLP-5x processors can record audio to off-chip RAM or Flash at data rates of under 30k bits per second for custom greetings, phones and answering machines, voice pitch changers, and hand-held recording devices. On-chip compression levels can be varied depending on the quantity and quality of playback desired. Automatic silence removal can also be done to reduce memory requirements. The NLP-5x offers 8k and 16k bit samples per second while the RSC-4x family offers 8k samples per second. The NLP-5x signal processing provides superior voice quality.

 

Interactive / Robotic

 
  Natural TimeSet
 

Set digital time clocks using natural phrases.
 Natural TimeSet Demo

  LCD Control

LCD control logic and drive - up to 104 icons or pixels. SPI for large array driver interfaces.

 
  Motor Control

Motor control logic - up to 3 bi-directional motors.

 
  Silent SonicNet

Silent SonicNet communicates data via encoded sound at 14KHz or 18KHz in short bursts on the NLP-5x. These high frequencies make the short bursts essentially inaudible in practical application. Silent SonicNet can run conincident with SX or T2SI, allowing data transmission during VR dialogues. Products with integrated speech that already include an NLP-5x, microphone and speaker can implement this at no additional cost, and can interact with each other, potentially doubling demand.

 
  SonicNet
 

Communicates data at 8KHz via encoded sound in short bursts on the RSC-4x. SonicNet can run coincident with SX to partially mask the sonic tones. Products with integrated speech that already include an RSC-4x, microphone and speaker can implement this at no additional cost, and can interact with each other, potentially doubling demand.
 SonicNet Demo

Interactive Multimedia Windows Media demo - requires RSC-4x Demo/Eval Board

 
  System Communications

USB1.1, SPI, UART-Lite, I2S and infrared (IR) interfaces combine with voice user interface capabilities, enabling man-machine interface solutions with an unprecedented combination of power and cost-effectiveness.

 
  Real-Time LipSync
 

Allows a product to match robotic mouth movements to speech heard in real time, much like a ventriloquist dummy.
 Real-Time LipSync Demo

 
  Beat Prediction
 

Chip figures out the recurring beat to know how to act moving forward-great for dancing and motion oriented products.
 Beat Prediction Demo

 
  LipSync
 

Allows for a product to match robotic mouth movements to pre-recorded speech.
 LipSync Demo

 
  Peak Detection
 

Picking up the amplitude of different sounds in the room as they occur and reacting to them with a movement or display function.
 

 
  Pitch Detection
 

A human pitched voice can be analyzed by the RSC processor to figure out the pitches being sung.

 
  Sing Back
 

Combining talkback and pitch detection allows a robotic creature or avatar to imitate a person singing.
See the Windows Media demo

This demo features a dog barking and matching pitches from a human voice.
See the Windows Media demo

 
  Sound Sourcing
 

Adding a second microphone allows the NLP-5x or RSC-4x processor to locate the direction of a human voice.
 SoundSource Demo

 
  Talk Back
 

The RSC can produce speech in response to your talking or inquiries that appears to be conversational speech from a non-human creature.

 
 
  Natural Radio Tuning

Set radio stations using natural phrases on the NLP-5x.

 
  Sensor Interfacing

Sensory and 3rd party developers provide support for presence detection, touch and position sensors, gesture and motion analysis, etc. USB1.1, SPI, UART-Lite, I2S and infrared (IR) interfaces combine with voice user interface capabilities, enabling man-machine interface solutions with an unprecedented combination of power and cost-effectiveness.

 

Voice Recognition for BlueTooth Products

  BlueGenie™ Voice Interface

Speech Recognition and TTS for Headsets, Music Players, Hands-Free Kits & More
Sensory’s BlueGenie Voice Interface software suite runs on CSR's BC-5 MM Kalimba DSP, and enables manufacturers of Bluetooth products to integrate full voice control and synthetic speech output without the need for visual displays or complex user interfacing. It frees designers to pack functionality onto small form factor Bluetooth devices and answers consumer demand for a truly hands-free experience. TTS allows Caller ID announcement and SMS message playback with speech.
 BlueGenie Voice User Interface Demo


 
Technology Matrix
  FluentSoft TrulyHandsfree BlueGenie™
Voice
Interface:
NLP-5x Natural Language Processor RSC-4x Family: SC-6x Family:
Phrase Spotting Triggers and Commands    
Speaker Verification    
Speaker Identification    
Natural Language Interface
coming soon    
Speaker Independent
with T2SI Unspotted Commands
 
Speaker Dependent      
Continuous Digits  
Text-to-Speech Synthesis (TTS) with Voice Morphing    
Stereo MP3 decoder        
Mono or Stereo Music    
Speech Synthesis
Record and Playback      
LCD Control        
Motor Control        
Silent SonicNet        
SonicNet        
System Communications        
RealTime LipSync      
Beat Prediction      
LipSync      
Peak Detection      
Pitch Detection      
Sing Back      
Sound Sourcing      
Talk Back      
Natural TimeSet      
Natural Radio Tuning        
Sensor Interfacing        
Bluetooth Support        
 
Copyright © 1994-2014 Sensory Inc. All Rights Reserved. Please read our Privacy Policy  |  RoHS Compliance Statement