Hearing (Perception)

The “Hearing” module is used to configure how the agent processes audio input, with voice detection as its core functionality.

Configuration Options

Voice Detection

This section is used to configure how the agent detects human voices and related parameters.

Voice Detection Method: This switch is used to select where voice detection is performed. The system provides two modes:
- Terminal-side Detection (Default): Voice processing is performed locally on the device. This mode responds faster and better controls data traffic, as invalid audio data does not need to be uploaded to the server.
- Server-side Detection: When enabled, voice processing will be performed by the cloud server. This mode typically has stronger recognition capabilities but consumes more data traffic. When enabled, you can further configure the following parameters:
- Sensitivity: Used to adjust the sensitivity of voice detection. Lower values make detection more sensitive but may also more easily misidentify background noise as speech. For example, 0.05 is a relatively sensitive setting.
- Voice Pre-buffer (milliseconds): The duration of audio captured and retained before voice detection begins. This helps ensure complete recognition even at the moment speech begins, avoiding loss of initial words.
- Silence Duration (milliseconds): The duration of silence needed to determine the end of a sentence. When the system detects silence exceeding this duration, it considers that the user has finished speaking and sends that audio segment for recognition.

Input Device

This section is used to specify the source of audio input.

Input Device: This dropdown menu is used to select the audio device the agent uses to listen to the environment, such as a specific microphone. The menu lists all available audio input channels currently recognized by the system. This option is currently only effective for PC-side testing.

By correctly configuring these hearing settings, you can ensure that your agent responds sensitively to voice commands and participates smoothly in conversations.