JP2013072919

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013072919
Abstract: A similar sound that interferes with the understanding of context by sound is
distinguished from a sound to be recognized, enabling correct context understanding. A sound
characteristic measurement method for measuring a sound characteristic based on a sound from
a sound source, which is a characteristic in a space in which the sound collection unit is disposed,
using data of sound collected by one or more sound collection units It is determined whether the
sound characteristic measured by the means 101 and the sound characteristic measurement
means 101 matches the sound characteristic of the recognition target sound, and whether the
collected sound is the recognition target sound or the similar sound And a similar sound
discriminating means 102 for discriminating. [Selected figure] Figure 16
Sound determination system, sound determination method and sound determination program
[0001]
The present invention relates to a sound determination system that determines whether a
collected sound is a recognition target sound, a sound determination method, and a sound
determination program in a system that understands context using sounds, and more particularly
to the recognition target sound. The present invention relates to a sound determination system, a
sound determination method, and a sound determination program that distinguishes the similar
sound similar to the recognition target sound and determines whether the collected sound is the
recognition target sound.
[0002]
03-05-2019
1
There are known systems that understand various contexts such as human speech and machine
operation conditions.
One of the systems for understanding context is a sound recognition system that uses sounds to
understand context. An example of the configuration of the sound recognition system is shown in
FIG. A similar sound recognition system is described, for example, in Non-Patent Document 1.
[0003]
The sound recognition system shown in FIG. 17 includes a microphone 901, a preprocessing unit
902, a feature extraction unit 903, an identification unit 904, and an identification dictionary
905. In the sound recognition system shown in FIG. 17, the sound collected by the microphone
901 is converted into an analog signal and input to the pre-processing unit 902. The preprocessing unit 902 converts the collected analog signal into a digital signal, and further
performs processing to suppress noise. Next, the feature extraction unit 903 extracts features
from the noise-suppressed digital signal. Then, the identification unit 904 identifies the collected
sound based on the extracted feature and the identification dictionary 905. In the identification
dictionary 905, the features of the sound and the labels are stored in association with each other.
[0004]
It can be said that the sound recognition system shown in FIG. 17 includes a noise suppression
system that suppresses noise, and a sound identification system that extracts features of the
sound and identifies the input sound based on the features of the sound.
[0005]
As an example of a noise suppression system, for example, Patent Document 1 describes a noise
suppression system that suppresses non-stationary noise such as electronic noise and siren noise
even in an environment where steady noise such as engine noise and air conditioner noise
occurs. ing.
In the noise suppression system described in Patent Document 1, a frame is generated from the
acquired sound data, and a sound signal in frame units is converted into a spectrum. A spectral
03-05-2019
2
envelope is then calculated based on the transformed spectrum, and the calculated spectral
envelope is subtracted from the spectrum. Then, a spectrum peak is detected using the spectrum
after subtraction of the spectrum envelope, and the detected spectrum peak is suppressed as
noise. In this way, even under an environment where steady noise with broad bandwidth such as
engine noise and air conditioner noise is generated, a narrow sharp peak of the bandwidth of
non-stationary noise such as electronic noise and siren noise is detected. Can be suppressed.
[0006]
Patent Document 2 describes an example of a sound identification system. In the sound
identification system described in Patent Document 2, various feature quantities related to the
sound type are extracted from the signal to be discriminated, and each sound type is extracted
based on the extracted feature quantities and learning data stored in advance. The sound type of
the signal to be determined is identified by determining the likelihood of In the sound
identification system described in Patent Document 2, from the signal to be determined, a first
feature for determining constancy and non-stationarity, a second feature representing a pitch
component, and a measure of noise can be obtained. The third feature, the fourth feature
representing the slope of the attenuation that defines the rough shape of the correlation
coefficient that is the measure of the noise characteristic that is the third feature, and the fifth
feature representing the power of the spectrum are extracted. .
[0007]
Further, in Patent Document 3, there is an example of a sound identification system in which the
user is notified of the occurrence of a real notification after the real notification sound such as a
door chime or a telephone bell and the sound of a television or radio are distinguished. Have
been described. In the sound identification system described in Patent Document 3, a real
notification sound is recorded in advance and stored as acoustic data. In addition, ambient
environmental sound is constantly monitored by the monitor means through the microphone. In
addition, the sound output of a specific device such as a television or radio is also monitored
separately. Then, when it is detected by the monitor of the sound output of a specific device that
the sound output from the device contains a similar sound similar to the sound data of the
notification sound, the surrounding environment input to the monitor means The acoustic data of
the sound is processed to process or mute the acoustic component of the similar sound from the
specific device. In this way, it is prevented that the similar sound from the specific device enters
the surrounding environmental sound input to the monitor means with the sound component as
it is.
03-05-2019
3
[0008]
Further, Patent Document 4 describes an example of a system for understanding context using
information other than sound as well as information on collected sound. The sound and vibration
recognition system described in Patent Document 4 measures the sound pressure level of sound
using a plurality of microphones, and specifies the horizontal angle θ in the sound source
direction based on the measured sound pressure level. On the other hand, the vibration level is
measured using a plurality of vibration sensors, and the horizontal angle θ ′ in the direction of
the vibration source is specified based on the measured vibration level. Then, by comparing the
horizontal angle θ of the specified sound source direction with the horizontal angle θ ′ of the
vibration source direction, it is determined whether the observed sound is a sound from a
vibration source accompanied by generation of sound or a sound source without vibration. It is
determined whether the sound is from the sound source, or whether the vibration source is a
vibration source without sound.
[0009]
JP 2008-76676 JP JP 2004-240214 JP JP 9-26354 JP JP 2010-236944
[0010]
Araki Masahiro, "Speech recognition system made with free software", Morikita Press, October
2007, p. 12-52 (in particular, p. 12, 19, 35).
[0011]
A problem in systems that use context to understand sounds is the presence of "similar sounds."
For example, consider a system that recognizes the ringing tone of an interphone installed in the
living room of a certain house and understands the context that there was a visitor from the
outside to the house.
[0012]
03-05-2019
4
In such a system, for example, if the vacuum cleaner makes a loud noise in the living room of a
house, or if the clock emits a time signal at regular intervals, such ambient noise becomes noise
and the system operates. Can be a challenge to
Therefore, for example, if the noise suppression system described above is applied by utilizing
the fact that the frequency of the sound of the vacuum cleaner and the ringing sound of the
intercom are different, such ambient noise (in this case, the noise of the vacuum cleaner) Could
be excluded. Also, for example, if the above-described sound identification system is applied, even
if the sound used does not have much difference in the use frequency band with the recognition
target sound (interphone ringing sound) such as time alert sound, the characteristics such as
melody differ. Can be removed as noise.
[0013]
However, in the above-mentioned method of discriminating the recognition target sound and the
ambient noise according to the sound characteristics of the recognition target sound itself such
as the frequency level and the melody, for example, from the television installed in the living
room of the house When the ringing tone of the same type of intercom is uttered, it is difficult to
distinguish between the sound and the recognition target sound. Then, there is a possibility that
it may be mistaken that there were visitors from the outside to the house. In fact, even we
humans often react to the interphone ringing tone from a television or the ringing tone of a
telephone by incorrectly recognizing it as an actual sound and reacting.
[0014]
In this way, in the sound identification system that identifies what kind of sound the sound is by
comparing and analyzing the characteristics (such as frequency) of the individual sounds
extracted by the event, the same characteristics as the recognition target sound are used. It is
extremely difficult to identify the sound possessed as being different from the sound to be
recognized. Therefore, there was a problem that the understanding of the context was wrong.
[0015]
03-05-2019
5
In the present invention, a sound to be recognized for understanding context by utilizing sounds
is referred to as a "sound to be recognized", and although it is not a recognition target, it is the
same as or similar to the sound to be recognized. The sound is called "similar sound". For
example, if the ringing tone of a bird is the ringing tone of a telephone and you want to recognize
the incoming call from the ringing tone as one of the targets of context recognition, the ringing
tone of this telephone is the recognition target sound and the ringing tone of the actual bird is It
is similar sound.
[0016]
For example, if a sound identification system as described in Patent Document 2 sufficiently
learns the difference between the sound to be recognized and the similar sound, there is a
possibility that the sound identification system can be configured to be able to identify these.
However, it is difficult to collect a sufficient number of learning data in practice for many of the
similar sounds to be discriminated by the present invention, such as sounds from television.
Therefore, it is not realistic to distinguish between the recognition target sound and the similar
sound using learning means.
[0017]
For example, by applying the technology described in Patent Document 3, the similar sound
emitted from a specific television is different from the sound to be recognized using the
characteristics of the sound emitted from the television It is possible to identify as However, the
technology described in Patent Document 3 can not correctly distinguish between the
recognition target sound and the similar sound unless all sound sources that may emit similar
sounds are always connected to the monitor input terminal. In reality, it is difficult to predict in
advance all sound sources that can generate similar sounds, and to connect all of them at all
times to the monitor input terminal.
[0018]
In Patent Document 3, the result of finding the height of the correlation between the location
information that is the generation source of the registered notification sound and the location
information that is the generation source of the detected similar sound It is described that the
user specifies the notification sound among the plurality of sounds detected from the screen, and
03-05-2019
6
presents the user with the type of the notification sound and the position where the sound is
generated, as the generation source of the notification sound has moved. . However, this is
because after the similar sound from the television has been removed, the discrimination based
on the correlation with the generated position is performed with the plurality of registered
notification sounds, and the recognition target sound is determined from the positional
relationship of the sound source It's not about trying to distinguish between sound and similar
sounds. In addition, if the technology of identifying the sound by comparing the position
information of the sound source as described in Patent Document 3 is used to discriminate
between the recognition target sound and the similar sound, the similar sound to the monitor
input terminal is used. It is possible to distinguish between the sound to be recognized and the
similar sound from the difference in the generation position of these sounds, without always
connecting all sound sources that may emit. However, in order to distinguish between the
recognition target sound and the similar sound from the difference in the sound generation
source, it is necessary that at least the generation position of the recognition target sound is
known in advance. Therefore, for example, the present invention can not be applied to the case
where the generation position of the recognition target sound is unknown or the generation
position of the recognition target sound is not determined. Further, depending on the positional
relationship between the source of the recognition target sound and the source of the similar
sound, it is necessary to be able to estimate the generation position of the sound with high
accuracy, and there is a problem that the mechanism therefor is complicated.
[0019]
For example, if the sound and vibration recognition system described in Patent Document 4 is
used, even if the recognition target sound and the similar sound can not be distinguished only by
the sound information, the context accompanied by the vibration is correctly understood from
the vibration information. There is a possibility to do it. However, since a vibration sensor is used
in addition to the microphone, there is a problem that the number of parts increases and the
wiring of the power supply line and the signal line becomes necessary. In addition, it is difficult
to apply the sound and vibration system described in Patent Document 4 to an event in which
there is almost no vibration.
[0020]
SUMMARY OF THE INVENTION In view of these problems, the present invention distinguishes
similar sounds that interfere with the understanding of context with sounds easily from the
sound to be recognized, and a sound determination system, sound determination method, and
sound determination that enable correct context understanding. The purpose is to provide a
program.
03-05-2019
7
[0021]
The sound determination system according to the present invention measures the sound
characteristic based on the sound from the sound source, which is a characteristic in the space
where the sound collection unit is disposed, using the sound data collected by the one or more
sound collection units. Determining whether the sound characteristic measured by the sound
characteristic measurement means and the sound characteristic measurement means matches
the sound characteristic of the recognition target sound, and the collected sound is a recognition
target sound or a similar sound And similar sound discrimination means for discriminating
whether the
[0022]
Further, the sound determination method according to the present invention is a characteristic in
a space in which the sound collection means is disposed using sound data collected by the one or
more sound collection means, and is a sound characteristic based on the sound from the sound
source To determine whether the measured sound characteristic matches the sound
characteristic of the recognition target sound and to determine whether the collected sound is a
recognition target sound or a similar sound. Do.
[0023]
Further, the sound determination program according to the present invention is a characteristic
in a space in which the sound collection means is disposed using a sound data collected by the
one or more sound collection means in a computer, and the sound from the sound source is used.
The collected sound is the recognition target sound by determining whether the sound
characteristics measured by the sound characteristics measurement processing based on the
sound characteristics and the sound characteristics measurement processing match the sound
characteristics of the recognition target sound Similar sound discrimination processing is
performed to determine whether the sound is similar or not.
[0024]
According to the present invention, it is possible to easily distinguish between similar sounds and
recognition target sounds that hinder the understanding of context by sounds.
Furthermore, according to the present invention, it is possible to provide correct context
03-05-2019
8
recognition results that distinguish between similar sounds that interfere with understanding and
recognition target sounds.
[0025]
It is a block diagram showing an example of composition of a sound distinction system of a 1st
embodiment.
It is a flowchart which shows an example of operation ¦ movement of the sound discrimination
system of 1st Embodiment.
It is a block diagram showing an example of composition of a sound judging system which
combined a 1st embodiment and a sound recognition system.
It is a block diagram showing an example of composition of a sound judging system which
combined a 1st embodiment and a sound detection system. It is explanatory drawing which
shows the example of the installation position of the microphone 11, and the example of how to
spread the generated sound. It is explanatory drawing which shows the example of the
installation position of the microphone 11, and the example of how to spread the generated
sound. It is explanatory drawing which shows the other example of the installation position of the
microphone 11, and the example of how to spread the generated sound. It is explanatory drawing
which shows the other example of the installation position of the microphone 11, and the
example of how to spread the generated sound. It is a block diagram showing an example of
composition of a sound distinction system of a 2nd embodiment. It is a flow chart which shows
an example of event sound accumulation operation by a 2nd embodiment. It is a flowchart which
shows an example of the similar sound discrimination ¦ determination operation ¦ movement by
2nd Embodiment. It is explanatory drawing which shows the example of the installation position
of the microphone 11, and the example of a position of the thing used as a sound source. It is a
block diagram showing an example of composition of a sound distinction system of a 3rd
embodiment. It is a flowchart which shows an example of operation ¦ movement of the sound
discrimination system of 3rd Embodiment. . It is explanatory drawing which shows the example
of the analysis result of frequency distribution. It is a block diagram showing an outline of the
present invention. FIG. 1 is a block diagram showing an example of the configuration of a sound
recognition system.
03-05-2019
9
[0026]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings.
[0027]
Embodiment 1
FIG. 1 is a block diagram showing a configuration example of a sound discrimination system
according to a first embodiment of this invention. The sound determination system shown in FIG.
1 includes a plurality of microphones 11 (microphones 11-1 to 11-n), a sound pressure
comparison unit 21, a microphone position database 22, and a similar sound determination unit
23.
[0028]
The microphones 11-1 to 11-n each collect surrounding sound. For example, it collects sounds
generated by people, objects, devices, etc. in the space where the microphone is installed.
[0029]
The sound pressure comparison unit 21 compares the sound pressures of the sounds collected
from the microphones 11, and creates sound pressure distribution information indicating the
distribution of the sound pressure in the space in which the microphones 11-1 to 11-n are
disposed. When creating the sound pressure distribution information, the sound pressure
comparison unit 21 acquires position information indicating the position of each microphone 11
from the microphone position database 22 described later, and collects the acquired position
information of each microphone and each microphone. Sound pressure distribution information
is created based on the noise.
[0030]
03-05-2019
10
The microphone position database 22 stores position information indicating where each of the
microphones 11 is installed in space. The positional information of the microphones may be any
information that allows the relative positional relationship of the plurality of microphones to be
known. That is, the present invention is not limited to absolute coordinates such as longitude and
latitude or distances and directions such as XY coordinates with a certain point as the origin. For
example, even information that is not related to the distance or direction, such as the name of a
room, is information that indicates that the position is different from that of the other
microphones, and is included in the position information of the microphone in this embodiment.
[0031]
The similar sound determination unit 23 determines whether the sound collected by each of the
microphones 11 is a recognition target sound or a similar sound based on the sound pressure
distribution information created by the sound pressure comparison unit 21. In the present
embodiment, it is assumed that data of what kind of sound pressure distribution will be obtained
in advance when a recognition target sound is generated in the space. The similar sound
discrimination unit 23 compares, for example, the sound pressure distribution information
created by the sound pressure comparison unit 21 with the sound pressure distribution
information of the recognition target sound registered in advance, Whether or not the collected
sound is a recognition target sound may be determined based on a difference in how it spreads
from the generation position.
[0032]
In the present embodiment, the sound pressure comparison unit 21 and the similar sound
determination unit 23 are realized by, for example, a CPU operating according to a program.
Further, the microphone position database 22 is realized by, for example, a database system or a
storage device.
[0033]
Next, the operation of this embodiment will be described. FIG. 2 is a flowchart showing an
example of the operation of the sound discrimination system according to this embodiment. In
the example shown in FIG. 2, each microphone 11 collects a sound (step S01). When sound is
collected by at least one of the microphones 11, the sound comparison unit 21 generates sound
03-05-2019
11
pressure distribution based on the sounds collected by the microphones 11 and the positions of
the microphones 11 stored in the microphone position database. Information is created (step
S02). Then, the similar sound determination unit 23 determines whether the sound collected by
each of the microphones 11 is a recognition target sound or a similar sound based on the sound
pressure distribution information (step S03).
[0034]
Steps S01 to S03 may be executed in parallel using a pipeline method. Other than these
implementations, any information processing mechanism may be used as long as the desired
effects can be obtained using the configuration and principle of the present invention.
[0035]
As described above, according to the present embodiment, even when the positional relationship
between the sound source emitting the recognition target sound and the sound source emitting
the similar sound is unknown, the similar sound and the recognition target sound are
distinguished to be correct context Can understand. In this method, sound is collected by a
plurality of microphones disposed at different positions, sound pressure distribution information
is created using the collected sound and the position of each microphone, and the created sound
pressure distribution information is used. This is because it is determined whether the sound is a
recognition target sound or a similar sound.
[0036]
For example, when it is desired to distinguish between the door opening and closing sound and
the similar sound emitted from the television, the door opening and closing sound is from the
position of the door in space even if the distinction is difficult only by the difference that the
sound generation position is different. The sound of general television can be easily distinguished
by judging based on the difference in the spread of sound that the sound spreads with a certain
degree of directivity from the speaker while the .
[0037]
Furthermore, as shown in FIG. 3, it is also possible to implement as a sound determination
03-05-2019
12
system in combination with a sound recognition system.
The sound determination system shown in FIG. 3 further includes a preprocessing unit 31, a
feature extraction unit 32, an identification unit 33, an identification dictionary 34, and a
determination unit 35 in addition to the configuration shown in FIG.
[0038]
The preprocessing unit 31, the feature extraction unit 32, the identification unit 33, and the
identification dictionary 34 may be the same as the components of a general sound recognition
system.
[0039]
The determination unit 35 changes the identification result by the identification unit 33
according to the determination result by the similar sound determination unit 23.
[0040]
With such a configuration, for example, even if the identification result of the identification unit
33 is "motor start sound", the determination unit 35 determines that "the similar sound" is
determined by the similar sound determination unit 23. Since it can be determined as "similar
sound" instead of "motor start sound" and the result output can be changed, context recognition
accuracy is further enhanced.
[0041]
For example, as shown in FIG. 4, it is also possible to implement as a sound determination system
in combination with a sound detection system.
The sound determination system shown in FIG. 4 further includes an event sound detection unit
41 and a determination unit 42 in addition to the configuration shown in FIG.
[0042]
The event sound detection unit 41 detects a sound generated by an event based on, for example,
a change in sound pressure.
03-05-2019
13
[0043]
The determination unit 42 changes the detection result by the event sound detection unit 41
according to the determination result by the similar sound determination unit 23.
[0044]
With such a configuration, for example, it is assumed that the event sound detection unit 41
detects the sound generated by the event and outputs the result that an event has occurred
.
It is assumed that the similar sound determination unit 23 determines that the sound uttered by
the event is similar sound (that is, not recognition target sound ).
In such a case, the determination unit 35 can change the detection result of the event sound
detection unit 41 to no event has occurred and output it, so that the accuracy of detecting an
event is further enhanced.
[0045]
Next, the operation of this embodiment will be described using a specific example.
5 and 6 are explanatory diagrams showing an example of the installation position of the
microphone 11 and an example of how the generated sound is spread.
As shown in FIGS. 5 and 6, a plurality of microphones 11 a-1 to 11 a-9 are arranged in a lattice
in a room. It is assumed that the room is so large that the reflection of the sound can be ignored
or eliminated by the arrival time of the sound. Also, it is assumed that there is no shield from the
sound source to the microphone. This can be easily realized by installing the microphones 11 a
on the ceiling. The density at which the microphones are installed is preferably such an extent
that unevenness indicating directivity is exhibited in the sound pressure distribution of the
speaker 122 described later.
03-05-2019
14
[0046]
When the general vase 121 is dropped in this room, as shown in FIG. 5, the sound generated by
dropping the vase 121 spreads substantially concentrically. Therefore, in the example of FIG. 5,
the microphones 11a-2, 11a-3, 11a-5, and 11a-6 can collect sounds with substantially the same
sound pressure, while the microphone 11a-7 can The sound pressure is lower than the sound
pressure of the sound collected by the microphones 11a-2, 11a-3, 11a-5, and 11a-6.
[0047]
On the other hand, as shown in FIG. 6, when sound is generated from the general speaker 122 in
the same room, the sound generated by the speaker 122 spreads in front of the speaker 122. For
this reason, in the example of FIG. 6, in the microphones 11 a-4, 11 a-5, 11 a-7, and 11 a-8,
sounds with high sound pressure are collected with respect to the other microphones.
[0048]
The sound pressure distribution information inside the room shown in FIG. 5 and FIG. 6 by
combining the data of these sound pressure and the position of the microphone indicating where
the microphones 11a-1 to 11a-9 are installed in the room Create Then, based on the created
sound pressure distribution information, when the sound from the sound source spreads
concentrically, it is determined that the sound actually generated in the room (recognition target
sound), and the sound from the sound source is If the sound does not spread concentrically, it is
determined that the sound is reproduced by the speaker (similar sound).
[0049]
It is assumed that the room is wide enough to ignore the reflection of sound, but in reality the
room is a room of finite size, and if there is something other than a wall, there are also reflections
caused by them. For this reason, it is difficult to determine whether the sound spreads strictly
based on the sound pressure distribution information based on whether it is concentric or not,
and a certain degree of margin is required. However, it depends on the room to which it is
applied.
03-05-2019
15
[0050]
7 and 8 are explanatory diagrams showing another example of the installation position of the
microphone 11 and an example of how the generated sound is spread. As shown in FIGS. 7 and 8,
a plurality of microphones are installed in different spaces. Specifically, microphones 11 b-1 and
11 b-2 are installed one each in the room and the corridor. In this example, a sound emitted by
an object that divides a space like a door is used as a recognition target sound.
[0051]
When the door separating the room and the corridor is closed, as shown in FIG. 7, the sound
generated by closing the door is collected by the microphone 11b-1 of the room and the
microphone 11b-2 of the corridor.
[0052]
On the other hand, as shown in FIG. 8, when the door opening and closing sound is generated
from the speaker installed in the room, this sound is collected by the microphone 11b-2, but not
collected by the microphone 11b-1, or the room is actually The sound generated when the door
is closed and the sound pressure lower than the sound collected by the microphone 11b-2 are
collected.
[0053]
The sound pressure distribution information of the room and the corridor is created by
combining the sound pressure data and the position of the microphone which indicates in which
space the microphones 11b-1 to 11b-2 are installed.
Then, based on the created sound pressure distribution information, when sounds are collected
within a predetermined sound pressure difference in a plurality of spaces, it is judged as a sound
(recognition target sound) generated by the actual opening and closing of the door, and the
sound is If the sound pressure is collected larger than a predetermined sound pressure
difference, it is determined that the sound reproduced by the speaker (similar sound).
03-05-2019
16
[0054]
In this example, although the recognition target sound is limited to the sound emitted by an
object that divides a plurality of spaces, since the shape of the spread of the sound is not used as
the determination material, the reflection of the sound must be considered. Even if it can
correspond.
Also, with regard to the position of the microphone, it is sufficient to know which room it is, and
it is not necessary to consider accurate coordinates. In addition, what is necessary is just to
determine the predetermined sound pressure difference used as the threshold value in the case
of judgment according to the space which implemented this invention.
[0055]
Embodiment 2 Next, a second embodiment of the present invention will be described. FIG. 9 is a
block diagram showing an example of the configuration of a sound discrimination system
according to a second embodiment of the present invention. In the sound discrimination system
shown in FIG. 9, an event sound recording unit 52, an event sound database 53, and a sound
similarity judgment unit 54 are added to the first embodiment shown in FIG. Further, a sound
source position estimation unit 51 is a component in place of the sound pressure comparison
unit 21 of FIG. 1. Also, instead of the similar sound determination unit 23 of FIG. 1, a similar
sound determination unit 55 is a component. The similar sound determination unit 55 is
different from the similar sound determination unit 23 in the first embodiment in the information
to be input and the determination method.
[0056]
The sound source position estimation unit 51 estimates the position of the sound source based
on the data of the sound collected from each of the microphones 11 (microphones 11-1 to 11-n),
and determines the estimated sound source position. The estimation method of the sound source
position is, for example, based on the sound pressure data of the sound collected from each
microphone 11, the position of the microphone 11 that detects the highest sound pressure in
each microphone 11 approximately the position of the sound source You may use the method of
making. In the case of this method, a large number of microphones 11 need to be installed in the
space, but substantially the same configuration as the sound pressure comparison unit 21 of the
03-05-2019
17
first embodiment can be utilized. There are various other methods for estimating the sound
source position. For example, the following method may be used.
[0057]
For example, a method may be used that estimates using the time difference in arrival time of
sound at each microphone. For example, it is also possible to use a method of estimating a sound
source position by scanning by using microphones with high directivity and changing the
direction of each microphone.
[0058]
The event sound recording unit 52 cuts out the sound generated by the event from the data of
the sound collected from each of the microphones 11, the data of the cut out sound and the
sound source position outputted from the sound source position estimation unit 51 In
combination with the estimated sound source position), the event sound database 53 is recorded.
These events can be determined, for example, as occurrence of an event when the sound
pressure changes to a certain level or more. Further, in the event sound database 53, for
example, sounds for a predetermined time after the occurrence of the event are recorded. In
addition, when the sound is temporarily recorded for a predetermined time to detect an event,
the sound for a predetermined time before and after the occurrence of the event may be
recorded.
[0059]
The event may be any event that can be detected by the event sound recording unit 52, such as
start of machine, speech of person, opening and closing of a door, and the like. In addition, what
kind of event is detected as an event depends on a predetermined condition. For example, other
than the above, when a waveform of a specific pattern is detected, when a steep change equal to
or higher than a predetermined level is detected, or when a change is made at a specific change
rate. Further, detection of an event can also be performed using something other than sound
(illuminance, temperature, vibration, etc.). For example, the space may be bright, vibration may
occur, or the like. When using something other than a sound, for example, the event sound
recording unit 52 includes a device for detecting such an event, and based on the sensing result
of that device, the sound generated by the event (before and after the event You can record the
03-05-2019
18
sound).
[0060]
The event sound database 53 holds the data of the sound extracted by the event sound recording
unit 51 and the estimated sound source position information indicating the estimated sound
source position of the sound estimated by the sound source position estimating unit as history
information Do.
[0061]
The sound similarity determination unit 54 compares the sounds collected by the microphones
with the past event sounds having the same estimated sound source position collected so far and
recorded in the event sound database 53, Determine the presence or absence of their similarity.
In the determination, for example, there is a method of searching the event sound database 53
using the estimated sound source position as a key, and determining the presence or absence of
similarity of sound between the searched past event sound and the sound collected this time. .
Furthermore, regarding the determination method of the similarity of sound, for example,
although the method of determining the presence or absence of the similarity of sound can be
realized by utilizing the method of pattern matching, it is not limited to these methods. In
addition, the sound similarity determination unit 54 may calculate and output statistics such as
the number of occurrences and the occurrence frequency of event sounds determined to have
similarity as a result of determination.
[0062]
The similar sound determination unit 55 determines, based on the statistics of the result of the
sound similarity determination unit 54, whether the generated sound is a recognition target
sound or a similar sound. For example, when similar sounds are collected a predetermined
number of times or more in a predetermined period, it is determined as a recognition target
sound, while sounds that have not been collected by a plurality of microphones so far are
collected at the same position. In the case where the sound is collected less than a predetermined
number of times in a predetermined period, it may be determined as a similar sound.
03-05-2019
19
[0063]
If it demonstrates using a type ¦ formula, for example, a predetermined period may be observed
and it may be determined as a recognition object sound, when following formula (1) is
materialized, and it may be determined as a similar sound when it is not materialized. In the
expression (1), same sound means the sound of the same same generation source in the
event sound database 53 determined to be similar to the input sound by the sound similarity
determination unit 54 (including the current input sound Even)).
[0064]
[Number of times the same sound occurred]> [predetermined number of times] Formula (1)
[0065]
Further, the case where the number of occurrences of an event is small can not be dealt with only
by the number of sounds of a specific event collected in a predetermined period.
In that case, for example, the problem can be solved by considering the frequency of occurrence
of a specific event sound based on the type of sound of the event (what kind of sound). When a
specific sound occurs at a predetermined ratio or more of the total including the other sounds, it
is determined as a recognition target sound, and the specific sound is less than the
predetermined ratio to the total including the other sounds If it occurs by the number of times, it
is determined as similar sound. In this case, the observation period may not be specified. The
specific sound is a sound to be evaluated. More specifically, it is a sound classified as a sound to
be recognized when it is intended to distinguish a sound to be recognized from other sounds
from the characteristics of the sound itself. In the actual processing, the sound of the same
source in the past in the event sound database 53 determined to be similar to the input sound by
the sound similarity determination unit 54 (the current input sound is Or the like) may be used.
In the case of the present example, in the determination of the similarity in the sound similarity
determination unit 54, it is also possible to have similarities to those belonging to the same kind
of sound as the input sound by the classification of sounds in a more rough classification.
[0066]
03-05-2019
20
As a specific example of the specific sound, for example, when "door opening and closing sound"
is considered, the door opening and closing sound as the recognition target sound (sound
generated by actual opening and closing of the door) is at the place where the door is installed.
Occur. Therefore, it is considered that the door opening / closing noise most often occurs from
the place (the position where the door is installed). On the other hand, the opening and closing
sound of the door generated from the television (in fact, the opening and closing sound of the
door generated at a studio or a recording site or the like) is generated at the place where the
television is installed. Televisions generate many types of sounds in addition to the door opening
and closing sounds. That is, from the position where the television is placed, many kinds of
sounds (for example, a person's speech, a car's running sound, etc., besides the opening and
closing of the door) are recognized. From this, in the present embodiment, all sounds generated
from positions where many types of sounds are recognized are determined to be similar sounds.
[0067]
If it demonstrates using a type ¦ formula, for example, a predetermined ¦ prescribed period may
be observed and it determines as a recognition target sound, when following formula (2) is
materialized, and you may determine as a similar sound when not materialized.
[0068]
Number of occurrences of [specific sound] / (number of occurrences of [specific sound] + number
of occurrences of [other sound])> [predetermined ratio] (2)
[0069]
In the expression (2), specific sound
from the feature of the sound.
means a sound recognized as a recognition target sound
More specifically, it refers to the sound of the same same generation source in the event sound
database 53 determined to be similar to the input sound by the sound similarity determination
unit 54 (which may include the current input sound).
[0070]
In addition to this, the two criteria of the above-mentioned formula (1) and formula (2) may be
combined, and various statistical methods other than these methods can be adopted.
03-05-2019
21
[0071]
In the present embodiment, the sound source position estimation unit 51, the event sound
recording unit 52, the similar sound determination unit 54, and the similar sound determination
unit 55 are realized by, for example, a CPU operating according to a program.
Also, the microphone position database 22 and the event sound database 53 are realized by, for
example, a database system or a storage device.
[0072]
Next, the operation of this embodiment will be described.
The operation of this embodiment can be roughly divided into two operations. One is an
operation of collecting event sounds and storing them in the event sound database 53
(hereinafter referred to as event sound accumulation operation). )である。 The other is an
operation of determining whether the generated sound is a recognition target sound or a similar
sound by utilizing the sound data stored in the event sound database 53 (hereinafter referred to
as a similar sound determination operation). )である。
[0073]
First, the event sound accumulation operation will be described. FIG. 10 is a flowchart showing
an example of the event sound accumulation operation according to the present embodiment. In
the example shown in FIG. 10, when each microphone first collects the sound (step S11), the
sound source position estimation unit 51 detects the data of the sound collected by each
microphone and the position information of the microphone stored in the microphone position
database 22. And the position of the sound source of the sound is estimated, and the estimated
sound source position is determined (step S12).
[0074]
03-05-2019
22
Further, the event sound recording unit 52 cuts out the sound of the event from the sound
collected by each microphone (step S13). When the sound of the event is cut out, the sound data
of the cut out event is associated with estimated sound source position information indicating the
estimated sound source position of the sound estimated by the sound source position estimating
unit 51 and recorded in the event sound database 53 (Step S14).
[0075]
Steps S11 to S14 may be executed in parallel using a pipeline method. Furthermore, step S12
and step S13 may be implemented by different hardware and executed in parallel.
[0076]
Next, the similar sound discrimination operation of the present embodiment will be described.
FIG. 11 is a flowchart showing an example of the similar sound discrimination operation
according to the present embodiment. In the example shown in FIG. 11, each microphone 11
collects a sound (step S21). Next, the sound source position estimation unit 51 estimates the
position of the sound source based on the data of the sound collected by each of the
microphones 11 and the position information of each of the microphones 11 stored in the
microphone position database 22. The position is determined (step S22).
[0077]
Next, the sound similarity determination unit 54 combines the data of the sound collected by
each of the microphones 11 and the estimated sound source position information, the data of the
sound of the past event stored in the event sound database 53, and the estimated sound source
position information. The presence or absence of the similarity of the sound at the estimated
sound source position of the collected sound is determined based on the pair of (step S23). Then,
the similar sound determination unit 55 determines whether the sound collected by each of the
microphones 11 in step S21 is a recognition target sound or a similar sound from the presence
or absence of the determined similarity of the sound (step 24).
[0078]
03-05-2019
23
Steps S21 to S24 may be executed in parallel using a pipeline method.
[0079]
As described above, according to the present embodiment, the sound is collected by a plurality of
microphones arranged at different positions, and the position of the sound source is estimated
from the collected sound and the position of each microphone, and further estimated. Based on
the position of the sound source and the collected sound set, the presence or absence of a similar
set is retrieved from the sounds of the events collected in the past, and the sound at the same
position based on the probability of the presence or absence Since whether the sound is a
recognition target sound or a similar sound is determined based on the similarity, it is possible to
determine whether the sound collected by each microphone 11 is a recognition target sound or a
similar sound.
[0080]
Further, according to the present embodiment, even if the positional relationship between the
sound source emitting the recognition target sound and the sound source emitting the similar
sound is unknown in advance, the same estimation is performed when the position of any sound
source does not change. The recognition target sound and the similar sound can be distinguished
by collating with the history of event sounds emitted in the past from the sound source position.
[0081]
For example, when the sound of the television is a recognition target sound or similar sound,
various event sounds are emitted from the position of the television, so the positional
relationship between the sound source emitting the recognition target sound and the sound
source emitting similar sound is unknown in advance. Even if this is the case, if the history of
event sounds emitted in the past from the same estimated sound source position is compared, it
is possible to determine which is the sound of the television depending on whether various
sounds are emitted from that sound source. can do.
[0082]
Also in this embodiment, it is possible to implement as a sound determination system by
combining a sound recognition system and a sound detection system as shown in FIGS. 3 and 4.
In such a case, the determination unit 35 may change the identification result by the
03-05-2019
24
identification unit 33 according to the determination result by the similar sound determination
unit 55.
In the present embodiment, the event sound recording unit 52 may also include a processing unit
for identifying what kind of sound the collected sound is.
[0083]
Next, the operation of this embodiment will be described using a specific example.
FIG. 12 is an explanatory view showing an example of the installation position of the microphone
11 and an example of the position of an object to be a sound source.
As shown in FIG. 12, the microphones 11c-1 to 11c-9 are disposed in the same space. Also, there
is a door 321 near the microphone 11 c-3 and a TV speaker 322 near the microphone 11 c-6.
[0084]
When the door 321 is closed in such a room, since the microphone 11c-3 is closest to the door,
the sound of the door is collected by the microphone 11c-3 with the largest sound pressure. In
this case, the sound source position estimation unit 51 estimates the sound source of the
collected sound (the sound of the door) as the vicinity of the microphone 11 c-3.
[0085]
At this time, the opening / closing sound of the door having the microphone 11c-3 as the
estimated sound source position is collected ten times in the past 24 hours, and further, the door
321 among the event sounds having the estimated sound source position as the microphone 11c-3. It is assumed that the rate at which the switching noise is collected is 80% or more. When
the sound of the door having the microphone 11c-3 as the estimated sound source position is
collected in this state, the similar sound determination unit 55 determines the door based on the
determination result of the similarity as described above of the sound similarity determination
03-05-2019
25
unit 54. It is determined that the sound of is the recognition target sound.
[0086]
On the other hand, when the speaker 322 is sounded in this room, since the microphone 11c-6 is
the closest, the sound from the speaker 322 is collected at the largest sound pressure by the
microphone 11c-6. In this case, the sound source position estimation unit 51 estimates the sound
source of the collected sound (sound emitted from the speaker) as the vicinity of the microphone
11c-6.
[0087]
At this time, it is assumed that the open / close sound of the door whose microphone 11c-6 is the
estimated sound source position has not been collected even once in the past 24 hours. When
the sound of the door having the microphone 11c-6 as the estimated sound source position is
collected in this state, the similar sound determination unit 55 determines that the sound of the
door is a similar sound.
[0088]
Or, although the sound of the door with the microphone 11c-6 as the estimated sound source
position has been collected several times so far, the microphone 11c-6 collects the explosion
sound and the sound of the siren besides the sound of the door Among the event sounds from
the same estimated sound source position, it is assumed that the rate at which the door sound is
collected is less than 80%. When the sound of the door having the microphone 11c-6 as the
estimated sound source position is collected in this state, the similar sound determination unit 55
determines that the sound of the door is a similar sound.
[0089]
Embodiment 3 Next, a third embodiment of the present invention will be described. FIG. 13 is a
block diagram showing an example of the configuration of a sound discrimination system
according to a third embodiment of the present invention. The sound discrimination system
03-05-2019
26
shown in FIG. 13 includes a microphone 11, a frequency analysis unit 61, and a similar sound
discrimination unit 62.
[0090]
The microphone 11 collects surrounding sounds. For example, the sound generated by an object
or a person in a space where the microphone 11 is installed is collected. The number of
microphones 11 may be one or more.
[0091]
The frequency analysis unit 61 analyzes the distribution of the frequency of the sound collected
from the microphone 11. For example, the frequency analysis unit 61 picks up frequency
components having a sound pressure of a predetermined level or more, and maps them on a
predetermined space as the frequency components to be expressed in the space within the
effective range of the microphone 11. A frequency distribution in space may be created. In
addition, what is necessary is just to create the frequency distribution of the sound in the space
within the effective range of the said microphone represented only with the frequency
component picked up, when there is one microphone. Alternatively, information indicating the
detected frequency components may simply be used as the frequency distribution.
[0092]
The similar sound determination unit 62 determines whether the sound collected by the
microphone 11 is a recognition target sound or a similar sound based on the frequency
distribution of the sound that is the analysis result of the frequency analysis unit 61.
[0093]
In the present embodiment, the frequency analysis unit 61 and the similar sound determination
unit 62 are realized by, for example, a CPU operating according to a program.
[0094]
Next, the operation of this embodiment will be described.
03-05-2019
27
FIG. 14 is a flowchart showing an example of the operation of the sound discrimination system
according to this embodiment.
In the example shown in FIG. 14, the microphone 11 first collects the sound (step S31). Next, the
frequency analysis unit 61 analyzes the distribution of the frequency of the sound collected by
the microphone 11 (step S32). Then, based on the analysis result of the frequency distribution,
the similar sound determination unit 62 determines whether the sound collected by the
microphone 11 is a recognition target sound or a similar sound (step S33).
[0095]
As described above, according to the present embodiment, it is possible to distinguish the
recognition target sound from the similar sound even when the generation positional relationship
of the sound can not be distinguished.
[0096]
For example, in addition to the fact that typical television speakers do not have very good
frequency characteristics and tend to have low sound pressure especially in the low frequency
range, the limitation is also imposed on the maximum frequency that can be transmitted by
television broadcasting standards. Therefore, there is a certain limit to the frequency distribution
of the sound emitted from the television.
Therefore, if it is determined based on the characteristics of the frequency distribution of the
sound from the sound source whether the collected sound is a recognition target sound or a
similar sound, the generation target relationship of the sound is similar to the recognition target
sound alone. Even if the sound can not be distinguished, the recognition target sound and the
similar sound can be distinguished.
[0097]
Also in the present embodiment, it is possible to combine the sound recognition system and the
sound detection system as shown in FIG. 3 and FIG. 4 to implement as a sound judgment system.
03-05-2019
28
In such a case, the determination unit 35 may change the identification result by the
identification unit 33 according to the determination result by the similar sound determination
unit 62.
[0098]
Next, the operation of this embodiment will be described using a specific example. FIG. 15 is an
explanatory view showing an example of an analysis result of frequency distribution. For
example, as shown in FIG. 15 (a), as a result of analyzing the distribution of frequencies with
respect to the collected sound, the distribution of frequencies of the sounds in the space where
the microphone 11 is installed is 50 Hz or less and 18, If the power is over 000 Hz, it is
determined that the sound is a recognition target sound. On the other hand, as shown in FIG.
15B, when there is no power at 50 Hz or less and at 18,000 Hz or more in the distribution of the
frequency of the sound in the space where the microphone 11 is installed, the sound is It is
determined that the sound is similar.
[0099]
In each of the above-described embodiments, the sound pressure of the sound is particularly
focused on the sound characteristics characterized by the sound source of the sound as the
sound characteristics developed in the predetermined space in which the one or more
microphones 11 are arranged. Although an example using distribution, presence or absence of
similarity of sound at the same sound source position, and distribution of frequency has been
described, it is also possible to use one or more of these sound characteristics in combination. In
such a case, the determination results from one or more similar sound determination units may
be used as an input, and means may be provided to combine them and output a final
determination result.
[0100]
Next, an outline of the present invention will be described. FIG. 16 is a block diagram showing an
outline of the present invention. As shown in FIG. 16, the sound determination system according
to the present invention is characterized by including a sound characteristic measurement means
101 and a similar sound determination means 102.
03-05-2019
29
[0101]
The sound characteristic measurement unit 101 measures the sound characteristic based on the
sound from the sound source, which is the characteristic in the space in which the sound
collection unit is disposed, using the sound data collected by the one or more sound collection
units. .
[0102]
The similar sound judging means 102 judges whether the sound characteristic measured by the
sound characteristic measuring means 101 matches the sound characteristic of the recognition
target sound, and whether the collected sound is a recognition target sound or not Determine if
there is.
[0103]
The above-mentioned sound characteristic may be at least one of sound pressure distribution,
presence or absence of sound similarity at the same sound source position, and frequency
distribution.
[0104]
For example, in the above-described first embodiment, an example is shown in which the
distribution of sound pressure is used as the sound characteristic.
Specifically, based on sound pressure distribution information indicating the distribution of
sound pressure in a predetermined space where the sound collecting means is provided, which is
obtained by arranging a plurality of sound collecting means, the sound to be recognized and the
similar sound are An example is shown in which it is determined whether the collected sound is a
recognition target sound or a similar sound by determining the difference in the occurrence
position and the difference in the spread from the occurrence position.
That is, in the first embodiment, the sound characteristic measurement unit 101 is realized by
the sound pressure comparison unit 21.
[0105]
03-05-2019
30
Further, for example, in the above-described second embodiment, an example is shown in which
the presence or absence of similarity of sound at the same sound source position is used as the
sound characteristic.
More specifically, the presence or absence of sound similarity at the same sound source
position is the presence or absence of similarity of sound features between sounds generated
at the same sound source position. In the second embodiment described above, based on the
sound source position and the number of occurrences and occurrence probability for each type
of sound obtained by arranging a plurality of sound collection means and accumulating event
sounds that occurred in the past as a history, An example is shown in which it is determined
whether the collected sound is a recognition target sound or a similar sound by determining
whether various sounds are emitted from the sound source. That is, in the second embodiment,
the sound characteristic measurement unit 101 is realized by the sound source position
estimation unit 51, the event sound recording unit 52, and the similar sound determination unit
54.
[0106]
Also, for example, in the third embodiment described above, an example is shown in which the
distribution of frequencies is used as the sound characteristic. Specifically, based on the
information indicating the distribution of frequencies possessed by the sound data obtained from
the sound collection means, the collected sound is determined by determining the presence or
absence of a characteristic that appears only in the recognition target sound or similar sound. An
example of determining whether a target sound to be recognized or a similar sound is shown.
That is, in the third embodiment, the sound characteristic measurement unit 101 is realized by
the frequency analysis unit 61.
[0107]
In the sound determination system according to the present invention, a sound collection
position information storage unit (for example, a microphone for storing sound collection
position information indicating the position of each of a plurality of sound collection units (for
example, microphones 11 to 1 to 11)) Sound pressure distribution information indicating a
distribution of sound pressure in a predetermined space in which the sound collecting means is
arranged, based on the position database 22), sound data collected by the plurality of sound
collecting means, and the sound collecting position information Based on the sound pressure
03-05-2019
31
distribution information generating means (for example, the sound pressure comparing unit 21)
for generating the sound pressure and the sound pressure distribution information generated by
the sound pressure distribution information generating means, whether the collected sound is a
recognition target sound or not Similar sound determination means (for example, similar sound
determination unit 23) for determining whether it is a sound may be provided.
[0108]
According to such a configuration, even if the recognition target sound and the similar sound can
not be distinguished from the frequency band or feature of the sound itself, the difference in the
generation position of the sound or the difference in the spread of the sound from the generation
position Can be used to distinguish between the recognition target sound and the similar sound.
[0109]
Further, the sound determination system according to the present invention is a sound collection
position information storage unit (for example, a microphone for storing sound collection
position information indicating the position of each of a plurality of sound collection units (for
example, microphones 11-1 to 11-n)). Sound source position estimating means (for example, a
sound source position estimating unit) for estimating the position of the sound source of the
collected sound based on the position database 22), the sound data collected by the plurality of
sound collecting means, and the sound collecting position information 51) Event sound history
storage means (for example, for example, holding it as history information by correlating data of
a sound generated by a predetermined event with estimated sound source position information
indicating the position of a sound source estimated for the sound) , Event sound database 53),
and when the occurrence of a predetermined event is detected, the event is generated by an
event from sounds collected by a plurality of sound collecting means Event sound recording
means (for example, an event sound recording means (for example, an event sound recording
means) that associates data of the extracted sound with estimated sound source position
information indicating the position of the sound source estimated by the sound source position
estimating means Sound recording unit 52), a set of sound data and estimated sound source
position information newly collected by a plurality of sound collection means, and data of sounds
collected in the past held in the event sound history storage means Sound similarity
determination means for determining presence or absence of similarity of sounds between the
sound collected this time and past event sounds having the same estimated sound source
position based on the combination of sound source position information For example, based on
the determination result by the sound similarity determination unit 54) and the sound similarity
determination means, similar sound determination means for determining whether the collected
sound is a recognition target sound or a similar sound For example, it may be provided with a
similar sound determination unit 55) and.
03-05-2019
32
[0110]
According to such a configuration, even if the positional relationship between the sound source
emitting the recognition target sound and the sound source emitting the similar sound is
unknown, if the position of any sound source does not change, It is possible to distinguish similar
sounds.
[0111]
Further, the sound determination system according to the present invention can calculate the
frequency of the sound in a predetermined space in which the sound collection means is
arranged based on the data of the sound collected by the one or more sound collection means
(for example, the microphone 11). Similar sound judging means for judging whether the collected
sound is a recognition target sound or a similar sound based on a frequency analysis means (for
example, frequency analysis unit 61) for analyzing a distribution and an analysis result by the
frequency analysis means (For example, the similar sound determination unit 62) may be
provided.
[0112]
According to such a configuration, even when the recognition target sound and the similar sound
can not be distinguished only by the generation positional relationship of the sound, a certain
limit or the like appearing in the frequency distribution of the sound from one sound source is
used. , It is possible to distinguish between the recognition target sound and the similar sound.
[0113]
Further, the sound determination system according to the present invention can be configured by
combining the determination methods based on the sound characteristics shown in the abovedescribed respective sound determination systems.
[0114]
Further, the sound determination system according to the present invention may further include
sound identification means (for example, a sound identification unit for identifying whether or
not the collected sound is a recognition target sound based on the feature amount of the sound
itself obtained from the collected sound data). Based on the sound recognition system shown in
FIG. 3 and the sound detection system shown in FIG. 4, the discrimination result by the sound
discrimination means, and the judgment result by the similar sound discrimination means,
whether the collected sound is a recognition target sound And a determination unit (for example,
determination units 35 and 43).
03-05-2019
33
[0115]
Although the present invention has been described above with reference to the embodiments and
the examples, the present invention is not limited to the above embodiments and the examples.
The configurations and details of the present invention can be modified in various ways that can
be understood by those skilled in the art within the scope of the present invention.
[0116]
According to the present invention, since the recognition target sound and the similar sound can
be easily distinguished, for example, a car navigation system which needs to recognize voice even
when the radio or television is operating, a radio, or the like The present invention can also be
suitably applied to an abnormality detection system or the like in a factory or office where it is
necessary to check the operating condition of the machine even when the television is operating.
[0117]
11, 11a, 11b, 11c Microphone 21 sound pressure comparison unit 22 microphone position
database 23, 55, 62 similar sound determination unit 31 pre-processing unit 32 feature
extraction unit 33 identification unit 34 identification dictionary 35 determination unit 41 event
sound detection unit 51 sound source Position estimation unit 52 Event sound recording unit 53
Event sound database 54 Sound similarity determination unit 61 Frequency analysis unit 121
Vase 122, 221, 322 Speaker 222, 321 Door 101 Sound characteristic measurement means 102
Similar sound judgment means
03-05-2019
34