Invented by Kurt Thomas Soto, Sonos Inc
The Sonos Inc invention works as followsHerein are described systems and methods to optimize network microphone devices by using noise classification. In an example, the individual microphones in a network microscopic device (NMD), detect sound. Sound data is analyzed in order to detect trigger events such as wake words. The metadata associated with sound data is stored in the lookback buffer on the NMD. The metadata is then analyzed in order to classify the noise within the sound data. According to the noise classification, at least one NMD performance parameter is modified.
Background for Optimization using noise classification of network microphone devices
Options to access and listen to digital audio out loud were limited up until 2003, when SONOS, Inc. submitted one of its initial patent applications, entitled ‘Method for synchronizing audio playback between multiple networked devices,’ In 2005, SONOS began selling a media-playback system. The SONOS Wireless HiFi System allows users to listen to music from a variety of sources using one or more networked devices. A software application that is installed on a computer, tablet or smartphone allows users to play music in any room with a networked device. A controller can also be used to stream different songs into each room with a playback system, group rooms together for synchronized playback or play the same song in all rooms simultaneously.
Given that digital media is a growing interest, it’s important to continue to develop technologies accessible to consumers to enhance the listening experience.
Voice control is a great feature in a “smart” home. Smart homes include smart appliances and devices connected to a network. These devices and appliances may include wireless audio playback, illumination, home automation, thermostats and door locks. In some cases, network microphones may be used to control smart devices.
A network microphone device (?NMD?) A networked computer device, which includes a microphone array or other arrangement of microphones to detect sound in its environment, is called a “network microphone device (?NMD?) The detected sound can be a combination of a person’s voice and background noise, such as music from a playback unit or ambient noise. In practice, NMDs filter detected sounds to remove background noise to identify whether the speech contains voice inputs indicative of voice controls. The NMD can then take action on the basis of such a voice control input.
The wake-word engine is usually onboard an NMD to determine if the sound it detects contains a voice that contains a specific wake word. The wake-word engines can be configured to identify (i.e.?spot?) One or more algorithms can be used to identify a specific wake word. This wake-word detection process is often referred to by the term ‘keyword spotting.’ To facilitate keyword spotting in practice, the NMD can buffer sound detected via a microphone and then use the Wake-word Engine to process the buffered sound.
When a wakeword engine detects a word in a detected sound, NMD can determine a wakeword event (i.e. a “wake-word-trigger”) has occurred. The NMD will detect a sound that contains a possible voice input. In most cases, the NMD performs additional processes in response to the wake-word. These additional processes can include, for example, generating an alert (e.g. an audible sound or a light) to indicate that a wake-word has been detected and extracting the detected-sound data. The extraction of detected sounds may involve reading and packaging the stream according to a specific format, and then transmitting the packaged data to a VAS.
In turn the VAS that corresponds to the wake-word identified by the wakeword engine receives transmitted sound data from NMD via a communication network. A VAS is typically implemented as a remote service using one or several cloud servers that are configured to process voice inputs. Certain components and functions of a VAS can be distributed between local and remote devices. A VAS can also be implemented as a local service at an NMD, or in a media player system that includes the NMD, so that voice inputs or certain types (e.g. rudimentary command) are processed without the intervention of a remote VAS.
In any case, a VAS that receives detected sound data will process the data. This includes identifying the voice input, and determining the intent of the words captured in the input. The VAS can then send a response to the NMD based on the intent determined. The NMD can then perform an action based on the instruction. In accordance with a VAS’s instruction, an NMD can cause, for example, a playback system to play a specific song or an illumination system to turn on/off. In some cases, NMDs or media systems with NMDs, such as a media playback device with NMDs, may be configured to interoperate with multiple VASes. In practice, the NMD can select one VAS from another based upon the wake word detected in the sound by the NMD.
In some implementations, playback devices that are configured to be a part of a distributed media playback system can include the components and functionality of NMDs (i.e. they’re “NMD-equipped”). This playback system may have a microphone to detect ambient sounds, like people talking, audio output from the device or another nearby device, or ambient noises. It may also include components to buffer detected sound in order to facilitate wake-word recognition.
Some NMD-equipped devices include a power source that is internal (e.g. a rechargeable batteries) which allows them to be operated without having to physically connect to an electrical outlet. This playback device can be referred here as a “portable playback devices”. Playback devices which are powered by a wall outlet or similar devices may be called’stationary playback device.’ Although such devices can be moved about a home or another environment, they are still portable. In reality, people may take portable playback devices to and from homes or other environments where one or more playback devices are stationary.
In some cases, there are multiple voice services configured for a NMD or system of NMDs. The system can be configured with one or more voice services during the setup procedure. Additional voice services can then be added to the system. The NMD can be used as an interface for multiple voice services. This could eliminate the need for an NMD to be installed in each voice service to interact with that voice service. The NMD may also work in conjunction with other NMDs that are specific to a service in the household in order to execute a voice command.
When two or more voice-services are configured on the NMD device, the voice service that is desired can be invoked using a wake-word. In order to query AMAZON a user may speak the wakeword?Alexa’. The voice command is then followed by the wake word?Alexa? Other examples are?Ok Google? Other examples include?Ok, Google? for querying APPLE.
In some cases, you can use a generic wake-word to indicate voice input into an NMD. In some cases, this is a manufacturer-specific wake word rather than a wake word tied to any particular voice service (e.g., ?Hey, Sonos? The NMD can be a SONOS device. The NMD can process a request based on a wake-word. If, for example, the voice input that follows the wake word is related with a specific type of command, such as music playback, then the voice inputting is sent to the particular voice service associated to this type of command. A streaming music service with voice command capability.
An NMD may include a number of microphones.” The NMD processes the sound data received from the individual microphones to determine if a wake-word has been detected. If the wake word is detected, as noted above, the NMD will pass the audio data to the VAS for processing. Noise (e.g. background conversations, noises from an appliance nearby, traffic, construction etc.) can affect the functionality of the network microphone device. The network microphone device’s functionality may be affected by noise. Noise can negatively affect downstream processing. It can increase the false positive or false negative rate of wake-word recognition, and/or cause poor performance on the VAS. For example, the inability to decipher and accurately respond to voice commands.
As described below in more detail, various devices and techniques that enhance voice input processing when there is noise are disclosed. In some embodiments, for example, one or several parameters of the NMD may be adjusted in order to improve the performance of the NMD. In certain embodiments, noise can be classified by comparison to known noise samples, such as those from the user?s environment, or from a large sample population. In the presence of the noise class, the wake-word sensitivity may be adjusted. A frequency band that corresponds to a noise from a household device could be ignored, or even filtered out of the sound data detected before processing. Spatial processing may also be adjusted in order to suppress noise from a specific direction (for instance, a stationary appliance). Voice detection and downstream processing are improved by modifying the performance of the NMD according to the detected characteristics of noise.
In some embodiments, NMD provides sound meta-data (e.g. spectral data), signal levels, and direction detection. To a remote computing system for noise classification and evaluation. It is possible to protect the privacy of users by relying only on sound metadata which does not reveal original audio content. The NMD can generate sound metadata using the detected audio data. This renders the original audio signals unintelligible if only the sound metadata is available. The NMD, for example, can render original sound data unintelligible by using sound metadata that are frequency-domain averaged across many sampling frames. The NMD can collect sound metadata, and then send it to one or several computing devices owned by a remote evaluator. The remote evaluator will then analyze the sound metadata in order to identify features that are indicative of noise, or other factors which may have contributed to the NMD’s poor performance. In some embodiments, this system can classify and detect noise in an environment without compromising user privacy. It does so by sending audio recordings to the remote evaluator.
While some embodiments described in this document may refer to certain actors such as “users” or other entities, it should be understood that these descriptions are only for purposes of explanation. This description should only be used to explain the claims. “The claims should not be understood to require any action by such an example actor, unless the language of those claims explicitly requires it.
II. Example Operating Environment
FIGS. “FIGS. One or more embodiments of the present invention may be implemented. First, FIG. The MPS 100 is shown in FIG. ?smart home,? or ?environment 101.? The environment 101 is a house with several rooms and spaces. The environment 101 comprises a household with several rooms, spaces, and/or playback zones, including a master bathroom 101 a, a master bedroom 101b (referred to herein as “Nick’s Room”). The technologies described in this document may be used in other environments. While some embodiments are described in the context a home, they can also be used in other environments. In certain embodiments, the MPS 100 may be used in multiple environments, including a home, vehicle and commercial setting (e.g. a restaurant or mall, or an airport or hotel), a vehicle (e.g. a car, sports utility vehicle, or bus), a ship or boat, or airplane, or a combination thereof.
The MPS 100 is a computer system that can be installed in these rooms or spaces. Referring to FIGS. Referring to FIGS. NMDs? Referring to FIG. Referring to FIG. Smart thermostat 110 (FIG. 1A). In the embodiments described here, some of the playback device 102 can be configured to be portable, while other devices may be configured to be stationary. As an example, headphones 102 o are a portable playback device (FIG. The playback device on the bookcase, for example, may be stationary. The playback device on the Patio, for example, may be battery-powered, allowing it to be moved to different areas in the environment 101 and even outside the environment 101 when not connected to a wall socket or similar.
Click here to view the patent on Google Patents.