JP2011109434

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011109434
When a voice recognition system is mounted on a mobile object such as a robot and a car, it is
possible to suppress a decrease in recognition rate of a target voice due to vibration from the
mobile object. A microphone array 1 includes microphone elements 111-114. The vibration
suppressing member 12 and the stays 131 to 134 as an example of the support member are
attachable to the movable body 80 and configured to support the microphone elements 111 to
114. The vibration suppression member 12 is attached to the movable body 80, and the arrival
time of the vibration reaching the microphone elements 111 to 114 from the movable body 80
through the vibration suppression member 12 and the stays 131 to 134 corresponds to the
microphone elements 111 to 111. It has a shape determined to be different between at least two
microphone elements included in 114. [Selected figure] Figure 1
Sound collecting device, speech recognition system, and mounting structure of microphone array
[0001]
The present invention relates to a sound collection device including a microphone array, a voice
recognition system using the sound collection device, and a mounting structure of the
microphone array to a moving object.
[0002]
A microphone array is used in a voice recognition system that performs signal processing on an
observation signal by a microphone to recognize target voice (user voice).
04-05-2019
1
The microphone array is an essential element for performing sound source separation processing
using static component analysis (ICA: Independent Component Analysis), principal component
analysis (PCA), etc., or static or dynamic beamforming. is there.
[0003]
A voice recognition system using a microphone array is effective when hands-free instructions
are given to a user for a mobile object such as a robot and a car. However, when the voice
recognition system is mounted on these moving bodies, vibration from the power source such as
a motor or engine of the moving body or vibration due to interference with the outside world
(road surface, obstacle, etc.) reaches the microphone array. Do. For this reason, the vibration
sound is mixed as noise to the observation signal by the microphone array, and there is a
problem that the recognition rate of the target voice is lowered.
[0004]
In order to address the above problems, Patent Document 1 and Non-Patent Documents 1 and 2
observe vibration or vibration sound with a vibration sensor (acceleration sensor etc.) or a
microphone for vibration sound, and perform signal processing using observation results. A voice
recognition system is disclosed that generates a sound signal and suppresses the vibration sound
by subtracting the vibration sound signal from an observation signal by a microphone array for a
target voice (air conduction sound).
[0005]
JP 2008-85613 A
[0006]
Atsushi Sawada, 4 others, "Internal Noise Suppression Method for Robot Spoken Dialogue System
Using Semi-Blind Source Separation," Proceedings of the 2009 Acoustical Society of Japan
Annual Meeting, The Acoustical Society of Japan, September 2009, pp.655-658 Kawabata Naoya
and 2 others, "Evaluation of internal noise reduction method based on 2ch spectral subtraction in
remote speech reception," Proceedings of the 2009 Acoustical Society of Japan Annual Meeting,
The Acoustical Society of Japan, 2009 September, pp. 147-148
04-05-2019
2
[0007]
According to the speech recognition system disclosed in each of the above-mentioned
documents, since the vibration sound can be suppressed, it is possible to suppress the decrease
in the recognition rate of the target speech.
However, in the above-mentioned document, a vibration sensor capable of observing only the
vibration is always required.
The inventors of the present application use the vibration sensor by appropriately adjusting the
relationship between the direction of arrival (DOA: Direction of Arrival) of the target voice with
respect to the microphone array and the direction of arrival (DOA) of the vibration with respect
to the microphone array. It has been found that it is possible to easily suppress the decrease in
the recognition rate of the target voice caused by the vibration sound without making it essential.
[0008]
The problems due to the fact that the relationship between the DOA of the target voice and the
DOA of vibration is not adjusted will be described below.
FIG. 6 is a diagram for explaining the relationship between the arrangement direction of
microphone element groups and the signal arrival direction in a linear microphone array. The
linear microphone array 9 shown in FIG. 6 has four microphone elements 911 to 914. The
arrangement direction of the microphone elements 911 to 914 is included in the XY plane of
FIG. When sound is collected by the linear microphone array 9, the three-dimensional point
sound source S1 is mapped to the virtual point sound source S2 on the XY plane to which the
microphone array 9 belongs. In other words, the sound signal arriving from the threedimensional point sound source S1 to the microphone array 9 is equivalent to the sound signal
arriving from the sound source S2 on the XY plane.
[0009]
04-05-2019
3
Considering the nature of the linear microphone array as shown in FIG. 6, the following events
may occur when it is mounted on a mobile object such as a robot and a car. In the example of
FIG. 7A, the microphone elements 911 to 914 are fixed to the body 80 of the moving body via
the vibration suppression member 92. The microphone elements 911 to 914 are supported by
the stays 931 to 934 extending from the vibration suppressing member 92, respectively. In the
example of FIG. 7A, the length (height) of the vibration suppressing member 92 interposed
between the moving body 80 is common to all the microphone elements 911 to 914. Further, the
lengths of the stays 931 to 934 are also the same.
[0010]
In FIG. 7A, the three-dimensional DOA of the target voice is in the Y-axis direction. In this case,
the arrangement direction (X-axis direction) of the microphone elements 911 to 914 is parallel to
the wave front of the target voice. That is, the target voice reaches all of the microphone
elements 911 to 914 in the same phase. On the other hand, the vibration reaching from the
moving body 80 also reaches all the microphone elements 911 to 914 in the same phase. In
other words, the wave front of the vibration reaching the microphone elements 911 to 914 is
also parallel to the arrangement direction (X-axis direction) of the microphone elements 911 to
914. The length (height) of the vibration suppressing member 92 interposed between the moving
body 80 is common to all the microphone elements 911 to 914, and the lengths of the stays 931
to 934 are also the same. This is because vibration arrival times up to 914 are substantially the
same.
[0011]
When the wave front of the target voice reaching the microphone elements 911 to 914 and the
wave front of the vibration are both parallel to the arrangement direction of the microphone
elements 911 to 914, as shown in FIG. 7B, the microphone elements 911 to 914 The twodimensional DOA of the target speech and the two-dimensional DOA of the vibration mapped to
the two-dimensional plane to which it belongs are in the same direction. If the two-dimensional
DOA of the target voice and the two-dimensional ODA of the vibration are different, the
directivity of the microphone array 9 is used to direct the main beam of the microphone array 9
to the two-dimensional DOA of the target voice and the two-dimensional DOA of the vibration. By
directing the dead angle (null point), mixing of vibration noise can be effectively suppressed.
However, as shown in FIG. 7B, when the two-dimensional DOA of the target voice and the twodimensional DOA of the vibration are the same, the directivity of the microphone array 9 can not
be used to suppress the mixing of the vibration sound.
04-05-2019
4
[0012]
The present invention is made based on the above-mentioned findings by the inventors etc., and
when a voice recognition system is mounted on a mobile object such as a robot and a car, the
recognition rate of target voice caused by the vibration from the mobile object. It is an object of
the present invention to provide a sound collecting device, a speech recognition system, and a
mounting structure of a microphone array that can contribute to suppression of deterioration.
[0013]
A sound collection device according to a first aspect of the present invention includes a
microphone array and a support member.
The microphone array includes a plurality of microphone elements. The support member is
attachable to the movable body and configured to support the plurality of microphone elements.
Furthermore, in the support member, at least the arrival time of the vibration that reaches the
plurality of microphone elements from the movable body via the support member in a state of
being attached to the movable body is at least included in the plurality of microphone elements It
has a shape determined to be different between the first and second microphone elements.
[0014]
By causing a difference in vibration arrival time to the first and second microphone elements, it is
possible to make the wavefront of the vibration incident on the microphone array non-parallel to
the arrangement direction of the plurality of microphone elements. Therefore, for example, the
wavefront of the target voice incident on the plurality of microphone elements is parallel to the
arrangement direction of the plurality of microphone elements, in other words, the incident
direction of the target voice is the array of the plurality of microphone elements By arranging the
microphone array so as to be perpendicular to the direction, the two-dimensional DOA of the
target voice mapped to the two-dimensional plane to which the plurality of microphone elements
belong and the two-dimensional DOA of the vibration have different directions. Can. This makes
it easy to suppress the vibration using the directivity of the microphone array.
[0015]
04-05-2019
5
A voice recognition system according to a second aspect of the present invention includes a
sound collection device according to the first aspect described above, and a signal for extracting
a target voice signal by performing signal processing on a voice signal observed by the sound
collection device. And a processing unit.
[0016]
A third aspect of the present invention relates to a mounting structure of a microphone array.
The structure includes a mobile, a microphone array including a plurality of microphone
elements, and a support member attached to the mobile and supporting the plurality of
microphone elements. Here, in the support member, the arrival time of the vibration reaching the
plurality of microphone elements from the movable body via the support member is at least the
first and second microphone elements included in the plurality of microphone elements. It has a
shape that is determined to differ between the two.
[0017]
According to each aspect of the present invention described above, when the voice recognition
system is mounted on a mobile object such as a robot and a car, the present invention can
contribute to the suppression of the decrease in the recognition rate of the target voice due to
the vibration from the mobile object. .
[0018]
It is a figure which shows the specific example of the attachment structure of the sound
collection apparatus concerning the 1st Embodiment of this invention.
It is a figure which shows the example of the directivity pattern of the microphone array suitable
for suppression of the vibration noise in the 1st Embodiment of this invention. BRIEF
DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the structural example of
the speech recognition system using the sound collection apparatus concerning the 1st
Embodiment of this invention. It is a figure which shows the other example of the attachment
structure of the sound collection apparatus which can suppress the vibration from a moving
body effectively. It is a figure which shows the example of the directivity pattern of the
04-05-2019
6
microphone array in the example shown in FIG. It is a conceptual diagram for demonstrating the
relationship between the arrangement direction of the microphone element group in a linear
microphone array, and the signal arrival direction. It is a figure which shows an example of the
attachment structure of the sound collection apparatus which is not suitable for suppression of
the vibration from a moving body.
[0019]
Hereinafter, specific embodiments to which the present invention is applied will be described in
detail with reference to the drawings. In the drawings, the same components are denoted by the
same reference symbols, and redundant description will be omitted as appropriate for the sake of
clarity of the description.
[0020]
First Embodiment FIG. 1A shows a specific example of the attachment structure of the sound
collection device according to the present embodiment to the moving body 80. As shown in FIG.
The moving body 80 is, for example, an autonomous mobile robot or a car. The sound collection
device shown in FIG. 1A includes a microphone array 1, a vibration suppression member 12, and
stays 131 to 134. The microphone array 1 includes four microphone elements 111 to 114
arranged on a straight line. The vibration suppression member 12 and the stays 131 to 134 are
support members for attaching the microphone elements 111 to 114 to the moving body
80. The vibration suppression member 12 is attached to the moving body 80 and attenuates the
vibration propagating from the moving body 80 to the microphone elements 111 to 114. The
stays 131 to 134 extend from the vibration suppressing member 12 and support the microphone
elements 111 to 114. In FIG. 1A, the incident direction (three-dimensional DOA) of the target
voice (user voice etc.) to the microphone array 1 is in the Y-axis direction perpendicular to the
arrangement direction (X-axis direction) of the microphone elements 1112-114. It is set.
[0021]
The difference between the attachment structure shown in FIG. 1 (a) and the attachment
structure shown in FIG. 7 (a) will be described below. In FIG. 1A, the length (height or thickness)
of the vibration suppression member 12 interposed between each of the microphone elements
111 to 114 and the movable body 80 is different. Further, in order to maintain the linear
04-05-2019
7
arrangement of the microphone elements 111 to 114, the lengths of the stays 131 to 134 are
also adjusted to be different from each other.
[0022]
Since the vibration suppression member 12 and the stays 131 to 134 are different in material,
the propagation speed of vibration transmitted from the moving body 80 is also different.
Therefore, as shown in FIG. 1A, by making the length (height or thickness) of the intervening
portion of the vibration suppression member 12 different for each microphone element, the
vibration arrival time between the microphone elements 111 to 114 Can be different. Thereby,
the wave front of the vibration when reaching the microphone elements 111 to 114 can be
changed from the wave front of the vibration entering from the moving body 80 to the vibration
suppressing member 12.
[0023]
As an example, it is assumed that the vibration incident on the vibration suppression member 12
from the moving body 80 is a plane wave. In this case, even if the wave front of the vibration
entering from the moving body 80 to the vibration suppression member 12 is parallel to the
plane (XY plane in FIG. 1) to which the microphone elements 111 to 114 belong, it reaches the
microphone elements 111 to 114 The wave front of the vibration of (1) can be made non-parallel
to the plane (XY plane in FIG. 1) to which the microphone elements 111 to 114 belong. In other
words, by providing a time difference in the vibration arrival time to the microphone elements
111 to 114, it is possible to make the phase of the vibration incident on the microphone
elements 111 to 114 different between the elements.
[0024]
The two-dimensional DOA of the vibration can be made different from the two-dimensional DOA
of the target voice by making the phase of the vibration incident on the microphone elements
111 to 114 different among the elements. FIG. 1B is a diagram showing a two-dimensional DOA
of a target voice mapped on a two-dimensional plane to which the microphone elements 111 to
114 belong and a two-dimensional DOA of vibration. For example, in FIG. 1A, when the vibration
propagation speed of the vibration suppressing member 12 is larger than the vibration
propagation speed of the stays 131 to 134, the vibration reaches the element 114 earliest and
04-05-2019
8
the vibration reaches the element 111 slowest. In this case, as shown in FIG. 1 (b), the twodimensional DOA of vibration is in the direction perpendicular to the two-dimensional DOA of the
target voice.
[0025]
As shown in FIG. 1 (b), if the two-dimensional DOA of the target voice and the two-dimensional
DOA of the vibration can be adjusted to be in different directions from each other, the mixing of
the vibration sound is effected using the directivity of the microphone array 1. Can be
suppressed. For example, by directing the main beam of the microphone array 1 to the twodimensional DOA of the target voice and directing the dead angle (null point) to the twodimensional DOA of the vibration, the mixing of the vibration sound can be effectively
suppressed. The directivity of the microphone array 1 may be set statically by adjusting the
element spacing, the amplifier, the phase shifter, the digital filter, and the like. Also, the
directivity of the microphone array 1 may be dynamically adjusted by adapting an amplifier, a
phase shifter or a digital filter for changing the amplitude and phase of the observation signal of
each microphone element.
[0026]
FIGS. 2A to 2C show specific examples of the directional beam pattern 200 of the microphone
array 1 effective for suppressing the vibration sound. FIG. 2A shows a beam pattern 200 in
which a main beam is directed to a two-dimensional DOA of a target voice and a dead angle is
directed to a two-dimensional DOA direction of vibration. In the example of FIG. 2A, in which the
two-dimensional DOA of the target voice is perpendicular to the arrangement direction of the
microphone elements 111 to 114 and the two-dimensional DOA of vibration is parallel to the
arrangement direction of the microphone elements 111 to 114, The microphone array 1 may be
a broadside array type.
[0027]
In addition, the propagation speed of the vibration in solid changes with conditions, such as
temperature of a medium (The vibration suppression member 12, the stays 131-134, the housing
¦ casing of the elements 111-114, etc.), etc. FIG. There are also manufacturing errors of the
microphone element. For this reason, it is not easy to precisely control the two-dimensional DOA
04-05-2019
9
of vibration, and it is general that the DOA differs for each frequency component of vibration.
FIG. 2B shows the case where the azimuth of the two-dimensional DOA of the vibration coming
from the moving body 80 deviates from the blind spot of the beam pattern 200 due to an error
of the microphone element or the like. In the case of FIG. 2B, the vibration suppression
performance is degraded compared to the case of FIG. 2A, and vibration noise mixes in the
observation signal of the microphone array 1. However, as shown in FIG. 7 (b), the mixing level of
the vibration sound in the case of FIG. 2 (b) is smaller as compared with the case where the DOA
of the target voice and the vibration is the same. Therefore, as shown in FIG. 2B, if the twodimensional DOA of vibration is determined to have a low gain orientation of the microphone
array 1 compared to the two-dimensional DOA of the target voice, at least FIG. Compared to the
above, it becomes easier to extract the target voice using the sound source separation processing
using an algorithm such as ICA.
[0028]
Furthermore, sound source separation processing by ICA or the like and adaptive beamforming
may be performed in combination. For example, as shown in FIG. 2C, adaptive processing may be
performed to direct the blind spot of the beam pattern 200 to the direction of the twodimensional DOA of vibration. For this adaptive processing, known adaptive beamforming
techniques and direction of arrival estimation techniques (minimum dispersion method, NullBeamformer, etc.) may be used.
[0029]
FIG. 3 is a block diagram showing an example of the configuration of a voice recognition system
100 using the sound collection device shown in FIG. 1 (a). Four AD converters 151 to 154
sample four observation signal groups Xj (t) (j = 1, 2,... 4) by the microphone array 1. The signal
processing unit 16 inputs the sampled observation signal group Xj (f, t), and separates the user
voice as the target voice from these signals. FIG. 3 is an example in which the signal processing
unit 16 performs user speech separation by the ICA algorithm. The signal processing unit 16
includes a user voice emphasis filter unit 161 and an ICA unit 162. The user voice emphasis filter
unit 161 and the ICA unit 162 perform emphasis processing of user voice based on ICA on the
signal groups X1 to X4. In FIG. 3, Z1 (f, t) is a separated signal estimated to be user speech, and
Z2 (f, t) is a separated signal estimated to be background noise.
[0030]
04-05-2019
10
In the present embodiment, an example is shown in which the length of the vibration
suppression member 12 interposed between the four microphone elements 111 to 114 and the
moving body 80 serving as the vibration source is made different stepwise for each microphone
element. The Thereby, vibration arrival time can be made to differ stepwise between the four
microphone elements 111-114. Therefore, as shown in FIG. 1B, the two-dimensional DOA of the
target voice and the two-dimensional DOA of the vibration can be largely shifted. However, in the
case where the angular difference between the two-dimensional DOA of the target voice and the
two-dimensional DOA of the vibration may be small, the difference in vibration arrival time is
generated between at least two of the microphone elements 111 to 114. Good.
[0031]
Further, although the example in which the number of microphone elements included in the
microphone array 1 is four is shown in the present embodiment, the number of microphone
elements may be any number of two or more.
[0032]
Further, FIG. 3 shows an example of using the ICA algorithm as an example, but another method
other than ICA may be used as a signal processing algorithm for separating and emphasizing
target speech.
[0033]
Further, the attachment structure of the sound collection device described in the present
embodiment may be applied to the speech recognition system disclosed in Patent Document 1
and Non-Patent Documents 1 and 2.
In other words, while observing vibration or vibration sound with a vibration sensor (acceleration
sensor etc.) or microphone for vibration sound, vibration sound is subtracted by subtracting
vibration sound signal from observation signal by microphone array for target sound (air
conduction sound). Also in the voice recognition system to be suppressed, the vibration noise
from the moving body 80 can be easily suppressed by applying the attachment structure
described in the present embodiment.
04-05-2019
11
[0034]
<Other Embodiments> In the above-described first embodiment, the length of the vibration
suppressing member 12 interposed between the microphone elements 111 to 114 and the
moving body 80 and the length of the stays 131 to 134 are microphone elements. An example is
shown in which a time difference is provided to the vibration arrival time for each of the
microphone elements while maintaining the linear arrangement of the microphone elements 111
to 114 by making the microphone elements 111 to 114 different from each other.
However, even with the mounting structure of the microphone array as described below, it is
possible to provide a time difference in vibration arrival time for each microphone element.
Thereby, the two-dimensional DOA of the vibration mapped to the array plane can be made to be
different from the two-dimensional DOA of the target speech.
[0035]
For example, the intervening length of the vibration suppression member 12 may be the same
among all the microphone elements 111 to 114, and the lengths of the stays 131 to 134 may be
different. At this time, the lengths of the stays 131 to 134 may be made different from each other
while maintaining the linear arrangement of the microphone elements 111 to 114. For example,
the shapes of the stays 131 to 134 may be zigzag or spiral.
[0036]
Further, by making the arrangement direction of the microphone elements 111 to 114 and the
attachment surface of the movable body 80 non-parallel, the lengths of the stays 131 to 134 may
be made different for each of the microphone elements.
[0037]
The vibration suppression member 12 may include a plurality of divided members.
For example, in FIG. 1, the vibration suppression member 12 may be divided into four members
corresponding to the microphone elements 111 to 114, respectively.
04-05-2019
12
[0038]
Furthermore, the present invention is not limited to the above-described embodiment, and it goes
without saying that various modifications can be made without departing from the scope of the
present invention already described.
[0039]
Reference Example Finally, a mounting structure of a sound collection device which can adjust
the two-dimensional DOA of the target voice and the two-dimensional DOA of vibration in
different directions will be described as a reference.
Fig.4 (a) is a figure which shows the attachment structure of the sound collection apparatus
based on a reference example. The microphone array 2 of FIG. 4A is arranged such that the
arrangement direction of the microphone elements 211 to 214 is parallel to the DOA of the
target voice. That is, the microphone array 2 is an end fire array type having a main beam in the
direction in which the microphone elements 211 to 214 are arranged. In a typical endfire array,
the spacing between adjacent microphone elements is adjusted to λ / 4. Here, λ is the
wavelength of the target voice.
[0040]
The vibration suppression member 22 and the stays 231 to 234 are support members when
the microphone elements 211 to 214 are attached to the moving body 80. The vibration
suppression member 22 is attached to the moving body 80, and attenuates the vibration
propagating from the moving body 80 to the microphone elements 211 to 214. The stays 231 to
234 extend from the vibration suppressing member 22 and support the microphone elements
211 to 214.
[0041]
In the example of FIG. 4A, the target voice is observed with a phase difference of π / 2 between
the microphone elements 211 to 214. On the other hand, the vibrations propagating from the
04-05-2019
13
moving body 80 are observed by the microphone elements 211 to 214 in substantially the same
phase. Therefore, the two-dimensional DOA of the vibration mapped to the two-dimensional
plane (the XY plane in FIG. 4) to which the microphone elements 121 to 214 belong is in a
direction different from the two-dimensional DOA of the target voice. FIG. 4B is a diagram
showing a two-dimensional DOA of the target voice and a two-dimensional DOA of vibration,
which are mapped on the two-dimensional plane to which the microphone elements 211 to 214
belong. As shown in FIG. 4 (b), if the two-dimensional DOA of the target voice and the twodimensional DOA of the vibration can be adjusted to be in different directions from each other,
mixing of the vibration sound is effected using the directivity of the microphone array 2. Can be
suppressed.
[0042]
FIGS. 5A to 5C show specific examples of the directional beam pattern 500 of the microphone
array 2 effective for suppressing the vibration sound. FIG. 5A shows a beam pattern 500 in
which the main beam is directed to the two-dimensional DOA of the target voice and the dead
angle is directed to the two-dimensional DOA direction of the vibration. FIG. 5B shows a case
where the azimuth of the two-dimensional DOA of the vibration coming from the moving body
80 is deviated from the blind spot of the beam pattern 500 due to the error of the microphone
element or the like. FIG. 5C shows an example in which the beam pattern 500 is adapted to direct
a dead angle to the orientation of the two-dimensional DOA of vibration by performing the
adaptive processing.
[0043]
1, 2 microphone array 111 to 114 microphone element 12 vibration suppressing member 131
to 134 stay 151 to 154 A / D converter 16 signal processing unit 161 user speech enhancement
filter unit 162 ICA (independent component analysis) unit 80 moving body 200 beam pattern
500 Beam pattern 2 Microphone array 211 to 214 Microphone element 22 Vibration
suppression member
04-05-2019
14