close

Вход

Забыли?

вход по аккаунту

JP2007235358

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007235358
The present invention provides a sound pickup device capable of effectively removing
background noise using a small-sized microphone array with a small shape. SOLUTION: The first
and second sound collecting sections have an angle area including a desired sound source
position from different positions as a sound collection range, and an angle area including no
desired sound source position from a similar position as a sound collection area A fifth sound
collecting unit that sets an angle range including a desired sound source position from a
midpoint between the sound collecting positions of the third and fourth sound collecting units
and the first and second sound collecting units as a sound collecting range, and a fifth sound
collecting unit And a sixth sound pickup unit that sets an angle area not including the desired
signal source from a position equivalent to the sound pickup position of the sound pickup area as
the sound pickup area, and the sound pickup signal of each sound pickup unit is frequency
domain converted. The gain coefficient determined by the desired signal to background noise
ratio is determined, and the gain coefficient is multiplied by each frequency domain of the signal
mainly composed of the desired signal, and the gain coefficient is approximately 0 in the
frequency domain where the background noise component is dominant. And greatly attenuate
the amount of background noise. [Selected figure] Figure 2
Sound collecting device, program and recording medium recording the same
[0001]
The present invention relates to a sound collection device for collecting sound in a hands-free
manner such as voice communication and operation of a device, and relates in particular to a
large number of noise sources other than a desired sound source emitting a desired sound.
04-05-2019
1
[0002]
FIG. 18 shows a configuration of a sound collection device having a noise removal function
described in Non-Patent Document 1. As shown in FIG.
In the prior art, when M microphones M1 to Mm are used as a signal for the sound emitted from
the desired sound source 1 at the point of coordinates (p, q) and noise for the other points, It
emphasizes only the signal and picks up sound with high SN ratio. First, adding the delay amount
Dm and the gain gm to the signal xm (n) (m = 1... M) received by the microphones M1 to Mm
arranged at the coordinates (pm, qm) as in equation (1) Thus, the signal ym (n) is obtained. ym
(n) = gmxm (n−Dm) (1) At this time, the delay amount Dm and the gain gm are respectively
given by Equation (2) and Equation (3) from the position (p, q) of the desired sound source 1
given in advance. It can be derived by Here, rm and rc are the microphone-sound source distance
and the critical distance defined by the equations (4) and (5), respectively, c represents the
velocity of sound, and V and T represent the chamber volume and the reverberation time in the
chamber, respectively. Next, a signal z (n) emphasizing the sound emitted from the position of the
desired sound source 1 is obtained by adding ym (n) obtained just as in the equation (6). The
above is the conventional noise removal method. In order to pick up a signal with a higher S / N
ratio using this prior art, it is necessary to increase the number of microphones and to make the
microphone array large because the microphones must be widely spaced from each other.
Hiroaki Nomura, Yutaka Kanada, Junji Kojima, "Near-field microphone array," Journal of the
Acoustical Society of Japan, Vol. 53, No. 2, pp. 110-116, 1997
[0003]
In order to select, emphasize and pick up any one of the sounds emitted from the sound sources
arranged at different points in the same direction from the sound collection device using the
prior art noise removal method, A large scale microphone array is required because the positions
of the microphones relative to the desired sound source need to be largely different. Therefore,
the use range is limited because extensive work is required for installation and transportation
when used. Further, in the prior art, the amount of improvement of the SN ratio due to the
treatment is insufficient in practical use. On the other hand, in the prior art using a small scale
microphone array, since it has only the discrimination ability in the direction in principle, only
one of the sounds emitted from the sound sources arranged in the same direction but different in
distance is It was impossible to pick and pick.
04-05-2019
2
[0004]
SUMMARY OF THE INVENTION The object of the present invention is to solve the above
problems, and it is easy to install and transport, and it has not only the direction but also the
discrimination ability regarding distance, and the sound from the desired sound source with a
higher SN ratio than the prior art. It is to realize a device that picks up sound.
[0005]
A sound collection device according to the present invention uses the output signals of a
microphone array configured by mounting a plurality of microphones to collect first and second
sound collection of sound in an angular region including a desired sound source position from
different positions. And third and fourth sound pickup units that pick up sound in an angular
area not including the desired sound source position from different positions using output
signals of the microphone array, and the desired point from the midpoint between different
positions A fifth sound pickup unit that picks up sound in an angle area including the sound
source position, a sixth sound pickup section that picks up sound from an angle area not
including the desired sound source position from the middle point, and the respective sound
pickup sections A sound source signal component estimation unit that estimates the signal
amount of the desired sound source and the signal amounts of the other sound sources from
each collected sound signal, the signal amounts of the desired sound source, and the signals of all
the sound sources including the signal amount of the desired sound source Gain factor to obtain
the gain factor from the ratio A calculation unit, and a multiplication unit that multiplies the
signal whose main component is the signal of the desired sound source obtained by the first and
second sound collection units with the gain coefficient calculated by the gain coefficient
calculation unit. .
[0006]
Furthermore, according to the sound pickup apparatus of the present invention, in the above
sound pickup apparatus, the microphone array is formed of two microphone arrays arranged at
positions approximately equidistant from the desired sound source position, and the outputs of
these two microphone arrays The first to sixth sound collecting units execute sound collection
processing using a signal.
Furthermore, in the sound collection device according to the present invention, in the sound
collection device, the sound collection signals obtained by the first to sixth sound collection units
are subjected to frequency domain conversion processing by frequency domain conversion
means, and gain coefficients are frequency domain conversion means. The corresponding
04-05-2019
3
frequency domain component of the signal whose main component is the amount of the desired
sound source which is calculated for each frequency domain component converted by and the
gain coefficients for each frequency domain calculated are determined by the first and second
sound collecting parts It is characterized by multiplying by.
[0007]
Furthermore, in the sound collection device according to the present invention, in the sound
collection device described above, the sound source signal component estimation unit calculates
the power value of each sound source signal and estimates the gain coefficient by the ratio of the
power values of each sound source signal. It features.
Furthermore, in the sound collection device according to the present invention, in the sound
collection device described above, the sound source signal component estimation unit calculates
the absolute value of each sound source signal to estimate the signal amount of each sound
source signal. The gain coefficient may be estimated by a ratio. Furthermore, in the sound
collection device according to the present invention, in the sound collection device described
above, the gain coefficient calculated by the gain coefficient calculation unit has a small value to
the amount of the desired sound source signal such that the amounts of other sound source
signals can be ignored. In some cases, it is characterized in that it is given at a predetermined
maximum value, and given a value close to 0 when the value of the desired sound source signal is
small enough to neglect the amount of other sound source signals. .
[0008]
Furthermore, in the sound collection device according to the present invention, in the sound
collection device described above, the gain coefficient calculation characteristic of the gain
coefficient calculation unit is obtained, and the value of the gain coefficient is calculated in the
region where the amount of other sound source signals is smaller than the amount It has a
change characteristic to maintain the maximum value or a value close to the maximum value, and
keep the value of the gain coefficient close to 0 to 0 in a region where the amount of other sound
source signals is larger than the amount of the desired sound source signal. It features.
[0009]
According to the sound collection device according to the present invention, the first to sixth
04-05-2019
4
sound collection units mainly use the signals obtained from the microphone array, and the
sounds in the angle area including the desired sound source position and the angles not
including the desired sound source position. The configuration for obtaining the sound collection
characteristic for collecting the sound in the region, that is, the configuration for setting the
directivity, makes it possible to distinguish the desired angular region where sound collection is
desired even if the distance between the microphones is short.
As a result, the shape of the microphone array can be miniaturized. Furthermore, according to
the sound collection device of the present invention, the collected signal is divided into frequency
domains, and the component quantity of each sound source signal is estimated for each
frequency domain divided into frequency domains, and the estimated component quantity of
each sound source signal To calculate the gain coefficient for each frequency domain
corresponding to the SN ratio. Attenuating the amount of another sound source signal contained
in the sound source signal having the sound of the desired sound source as the main component
by multiplying each frequency domain component of the sound source signal having the sound
source of the desired sound source as the main component by this gain coefficient. Can. As a
result, it is possible to emphasize and extract only the desired sound source signal.
[0010]
Furthermore, according to the sound collection device according to the present invention,
although the size of each microphone array is small, it is easy to install and transport, but the
sound sources are different in distance in the same direction, which was impossible in the prior
art. When it is arranged, it is possible to emphasize and pick up any one of them. The
experimental result by simulation of Example 1 and Example 2 which are demonstrated later in
order to show the effect regarding improvement of the SN ratio by this invention is shown. The
situation setting in simulation is shown in FIG. In each microphone array, five microphones are
arranged at a straight line at a distance of 4 cm on a straight line, and each coordinate (unit:
meter). The same applies to (0.4, 0) and (-0.4, 0). In Case 1 shown in FIG. 14A, the desired sound
source 1 is arranged at (0, 0.5), and one background noise source 2 is arranged at (0, 2.5). In the
case 2 shown in FIG. 14B, in addition to the case 1, the background noise sources 2 are arranged
at two points of (−1.6, 2.5) and (1.6, 2.5) respectively.
[0011]
15A shows the signal of the desired sound source 1 in case 1, FIG. 15B shows the signal received
04-05-2019
5
by the microphone, and FIG. 15C shows the signal after the processing of the second
embodiment. 16A, B and C respectively show the same signals in case 2. In either case, the signal
processed by the present invention is closer to the sound of the desired sound source and the
sound from the desired sound source 1 is enhanced and collected as compared to the signal
before processing in either of FIGS. 15 and 16 I understand. Next, FIG. 17 shows the SN ratio
improvement amount of the signal before processing and the signal after processing. It can be
seen that the SN ratio improvement amount when using the present invention is about 13 dB,
which is larger than that of the prior art by 10 dB or more. Further, in the second embodiment,
the addition of the non-linear processing increases the SN ratio improvement amount, and the
effect of the addition of this processing can be confirmed. As described above, according to the
present invention, it can be understood that, while the installation and transportation of the
device are easy, any one of the sounds emitted by a plurality of sound sources can be selectively
emphasized and collected. Further, it can be seen that by using the present invention, the SN
ratio improvement amount at the time of sound collection is greatly improved to a practically
sufficient level.
[0012]
Although it is possible to configure everything by hardware in order to realize the sound
collection device according to the present invention, in the simplest implementation, the program
according to the present invention is installed in a computer and the sound collection device
according to the present invention is installed in the computer The mode to function as is the
best embodiment. In order to realize the sound collection device according to the present
invention by a computer, at least first to sixth sound collection units, a frequency domain
conversion unit, and a sound source signal component estimation unit in the computer according
to a sound collection program installed in the computer A gain coefficient calculation unit and a
multiplication unit are constructed to function as a sound collection device.
[0013]
FIG. 1 shows an example of usage of the present invention. Two small scale microphone arrays
3L and 3R are arranged at different positions to some extent (for example, the same distance as
the distance between the microphone arrays 3L and 3R and the desired sound source 1), and for
each signal received by the microphone The processing described below is performed. By
performing the processing described below, the sound of the desired sound source 1 is
emphasized and collected, and the sound of the background noise source 2 is suppressed. FIG. 2
shows the entire configuration of the sound collection device according to the present invention.
04-05-2019
6
The outline of the sound collection device according to the present invention will be described
with reference to FIG. The respective sound receiving signals generated by the respective
microphones of the microphone array 3L are inputted to the first sound collecting unit 4-1 and
the third sound collecting unit 4-3 in this example. Further, the respective sound receiving
signals generated by the respective microphones of the microphone array 3R are input to the
second sound collecting unit 4-2 and the fourth sound collecting unit 4-4 in this example. The
signals of the microphones located at the centers of the microphone arrays 3L and 3R are input
to the fifth sound collecting unit 4-5 and the sixth sound collecting unit 4-6. The number of
microphones mounted on both microphone arrays 3L and 3R is not necessarily the same.
[0014]
As shown in FIG. 4, the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4 have
M filter processing units 41 to which the sound reception signals x1 to xm of the respective
microphones are input, It is comprised by the addition part 42 which adds each output signal of
the filter process part 41. FIG. Each filter processing unit 41 is constituted by, for example, an
FIR filter, and performs analysis processing for each frequency component included in the
collected sound signal by digital processing to set the directivity characteristics of the
microphone arrays 3L and 3R. Such a technology is described, for example, in "Sound system and
digital processing" co-authored by Oga Juro, Yoshio Yamazaki and Toyoda Kanada on March 25,
1995, published by The Institute of Electronics, Information and Communication Engineers, and
can be realized by a well-known technology. it can. Here, the directivity characteristics of the first
sound collection unit 4-1 and the directivity characteristics of the second sound collection unit 42 are angle regions Θ L including the position of the desired sound source 1 shown in FIG. 3
from the approximate center position of the microphone arrays 3L and 3R. Set to a characteristic
that sets 収 and Θ R as the sound collection range. The directional characteristics of the third
sound collecting unit 4-3 and the fourth sound collecting unit 4-4 are angular regions Θ L Θ
and ¯ R な い not including the position of the desired sound source 1 shown in FIG. And set the
characteristic as the sound collection range. Further, the directivity of the fifth sound collecting
unit 4-5 is set to a characteristic that the angular region ΘC including the position of the desired
sound source 1 from the approximate middle position of the microphone arrays 3L and 3R is the
sound collection range. The directivity of the sixth sound collecting unit 4-6 is set to a
characteristic in which the angular range from the approximate middle position between the
microphone arrays 3 L and 3 R to the angular range C excluding the position of the desired
sound source 1 is the sound collection range.
[0015]
04-05-2019
7
The sound collection signal collected by the directional characteristics of the first to sixth sound
collection units 4-1 to 4-6 is converted to a signal in the frequency domain by the frequency
domain conversion unit 5. In the conversion to the frequency domain, the input signal is
decomposed into frames of a short time length (for example, about 256 samples in the case of
sampling frequency 16000 Hz), and discrete Fourier transform is performed in each frame. For
the discrete Fourier transform, for example, a fast Fourier transform or the like called FFT or the
like can be used. The signal transformed into the frequency domain is divided into a plurality of
frequency domain components. The collected sound signal converted into the signal in the
frequency domain is input to the addition unit 6 and the sound source signal component
estimation unit 7. The output signals of the first sound collecting unit 4-1 and the second sound
collecting unit 4-2 are input to the adding unit 6. The adder 6 adds the signals of each frequency
domain converted to the frequency domain for each same frequency domain component.
[0016]
The sound source signal component estimation unit 7 receives all output signals of the first
sound collection unit 4-1 to the sixth sound collection unit 4-6, and estimates the signal amount
of each sound source for each frequency region. If the signal amount of each sound source can
be estimated, the ratio of the signal amount of the desired sound source 1 to the signal amount of
other sound sources, that is, the SN ratio can be obtained. This SN ratio is determined for each
frequency domain, and the SN ratio is used as a gain coefficient by multiplying the signal having
the signal of the desired sound source 1 given from the adding unit 6 as the main component for
each frequency domain, It is possible to suppress the background noise component contained in
the signal whose main component is the signal of the desired sound source 1. The multiplication
result of the multiplication unit 9 is converted to a time domain signal by the inverse frequency
domain conversion unit 10, and is output as a signal after noise removal. The above is the outline
of the present invention.
[0017]
The configuration and operation of each part will be described in detail below. FIG. 4 shows the
configuration of the first to fourth sound collecting units 4-1 to 4-4. Here, although the first
sound collecting unit 4-1 is described as an example, the same process is performed for the
second sound collecting unit 4-2, the third sound collecting unit 4-3, and the fourth sound
collecting unit 4-4. It will be. These first sound collecting units 4-1 to 4-4 do not include the
sound collecting characteristic and the desired sound source position that set the angle range
04-05-2019
8
including the desired sound source position from the directions on both sides of the position of
the desired sound source 1 Since it is set to the sound collection characteristic which makes an
angle area a sound collection range, it functions as a side beam former. The signal xLmL (n) (mL
= 1, 2,..., ML) input to the first sound collection unit 4-1 is input to the filter processing unit 41.
The filter processing unit 41 substitutes a filter coefficient wLmL (n) given in advance (the
determination method will be described later) and the input signal xLmL (n) into the convolution
operation shown in equation (7) to obtain a signal x'LmL ( Output n). The output signal of each
filter processing unit 41 is input to the addition unit 42. The adding unit 42 adds the input
signals as shown in equation (8) to obtain an output signal ySL (n) of the first sound collecting
unit 4-1. Here, the filter coefficient wLmL (n) is designed using, for example, the least squares
method or the like so that the directivity characteristic DLSPB (ω, θ) of the first sound collection
unit has the characteristic shown in the equation (9). Similarly, the second sound collecting unit,
the third sound collecting unit, and the fourth sound collecting unit are designed to satisfy the
conditions of Equations (10) to (12). Each of Θ and 示 す indicates a peripheral direction of the
desired signal (for example, a direction within a range of about ± 10 ° from the desired signal
direction) and the other direction. Further, D (.omega., .Theta.) Shown in the equations (9) to (12)
represents the directivity characteristics of each sound collecting unit. The first sound collection
unit 4-1 emphasizes and collects only the sound emitted in the direction of the desired sound
source 1 when viewed from the microphone array 3L. As viewed from the microphone array 3L,
the third sound collection unit emphasizes and collects only sounds emitted in directions other
than the direction of the desired sound source. As viewed from the microphone array 3R, the
second sound collection unit 4-2 emphasizes and collects only the sound emitted in the direction
of the desired sound source 1. The fourth sound collecting unit 4-4 emphasizes and collects only
sounds emitted in directions other than the direction of the desired sound source 1 as viewed
from the microphone array 3R.
[0018]
FIG. 5 shows the flow of processing in the fifth sound collecting unit 4-5 and the sixth sound
collecting unit 4-6 which function as frontal beam formers. In the front beamformer, a signal xL
(ML / 2) (n) received by the microphone disposed at the center of the microphone array 3L and a
signal xR received by the microphone disposed at the center of the microphone array 3R (MR /
2) (n) is input to the filter processing units 51 and 52, respectively. In the filter processing units
51 and 52, the input signals xL (ML / 2) (n) and xR (MR / 2) (n) are given filters given in advance
as shown in equations (13) and (14). Outputs x 'L (ML / 2) (n) and x' R (MR / 2) (n) obtained by
convolving coefficients wC (ML / 2) (n) and wC (MR / 2) (n) Do. Here, it is desirable that the filter
coefficients wC (ML / 2) (n) and wC (MR / 2) (n) have the same phase characteristics, and for
example, a single impulse signal is used. In the fifth sound collecting unit 4-5, the output signals
x′L (ML / 2) (n) and x′R (MR / 2) (n) of the filter processing units 51 and 52 are input to the
04-05-2019
9
adding unit 53. The adding unit 53 adds the input signals as shown in equation (16), and outputs
a signal ySC (n). As a result, in the fifth sound collecting unit 4-5, only the sound emitted in the
direction of the desired sound source 1 is emphasized and collected as viewed from the midpoint
between the microphone array 3L and the microphone array 3R.
[0019]
ySC (n) = x'L (ML / 2) (n) + x'R (MR / 2) (n) (16) In the sixth sound collection unit 4-6, the output
signals x 'of the filter processing units 51 and 52 L (ML / 2) (n) and x′R (MR / 2) (n) are input
to the subtraction unit 54. The subtractor 54 subtracts the input signal as shown in equation (17)
and outputs a signal yNC (n). Therefore, in the sixth sound collecting unit 4-6, only the sound
emitted in the direction other than the direction of the desired sound source 1 is emphasized and
collected, as viewed from the middle point between the microphone array 3L and the
microphone array 3R. yNC (n) = x'L (ML / 2) (n) -x'R (MR / 2) (n) (17) FIG. 6 shows the flow of
processing in the sound source signal component estimation unit 7. The frequency components
YSL (ω, l), YNL (ω, l), YSC (ω, l), YNC (ω, l), YSR (ω, l), YNR (ω) input to the sound source
signal component estimation unit 7 , l) are input to the power calculation unit 61, and the power
values ¦ YSL (ω, l) ¦ <2>, ¦ YNL (ω, l) ¦ <2>, ¦ YSC (ω, l) ¦ 2>, ¦ YNC (ω, l) ¦ <2>, ¦ YSR (ω, l) ¦
<2>, ¦ YNR (ω, l) ¦ <2> is output and input to the vectorization unit 62 . The vectorization unit 62
groups power values of the input first to sixth output signals of the first to sixth sound collection
units 4-1 to 4-6 in a vector format as shown in equation (18). Output ω, l). Note that letters with
suffix * and capital letters in the expressions represent vectors.
[0020]
The power vector Y * (ω, l) is input to the multiplication unit 63. The power estimation matrix T
* <+>, which is the other input of the multiplier 63, is an output signal of the pseudo inverse
matrix calculator 64. The gain matrix T * defined by equation (19) is input to the pseudo inverse
matrix operation unit 64, and the pseudo inverse matrix T * <+> is output. Each element of the
gain inverse matrix T * is set in the fifth sound collecting unit 4-5, the sixth sound collecting unit
4-6, and the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4. The gain of the
directional characteristic with respect to the direction or the Θx direction is, for example, an
average value of the frequency and direction of the directional characteristic as shown in the
equations (20) to (23). [alpha] x is an average value of the directivity characteristics set in the
first, second, and fifth sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral
direction of the desired sound. [beta] x is an average value of the directional characteristics set in
the first, second, and fifth sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral
04-05-2019
10
direction of the desired signal. [gamma] x is an average value of directivity characteristics set in
the third, fourth, and sixth sound collecting units 4-3, 4-4, and 4-6 with respect to the peripheral
direction of the desired signal. [delta] x is an average value of directivity characteristics set in the
third, fourth, and sixth sound collecting units 4-3, 4-4, and 4-6 with respect to directions other
than the peripheral direction of the desired signal. In the equations (20) to (23), the subscript x
represents any one of R, C, and L.
[0021]
The multiplication unit multiplies the input beamformer output power vector and the power
estimation matrix as shown in equation (24) for each frequency component, and outputs an
estimated signal power vector X * opt (ω, l). FIG. 7 shows the flow of processing in gain
coefficient calculation 8. The estimated signal power vector X * opt (ω, l) input from the sound
source signal component estimation unit 7 shown in FIG. 6 is input to the vector element
extraction unit 81. The vector element extraction unit 81 estimates the first component of the
input estimated signal power vector as the estimated signal power ¦ S (ω, l) ¦ <2>, and estimates
the second component as shown in equation (25), and the left direction noise Power ¦ NL (ω, l) ¦
<2>, estimated third component Front direction noise power ¦ NC (ω, l) ¦ <2>, estimated fourth
component right direction noise power ¦ NR (ω, l) They are output as ¦ <2>, respectively, and
they are input to the SN ratio estimation unit 82. The SN ratio estimation unit 82 calculates the
estimated SN ratio ESNR (ω, l) using the equation (26). The estimated SN ratio ESNR (ω, l), which
is the output of the SN ratio estimator 82, is output as a gain coefficient R (ω, l). As shown in FIG.
8, the gain coefficient R (ω, l) determined by the equation (26) is as follows: noise component Nx
= ¦ NL (ω, l) ¦ <2> + ¦ NC (ω, l) ¦ <2> When the relationship between + ¦ NR (ω, l) ¦ <2> and the
desired signal Sx = ¦ S (ω, l) ¦ <2> is Nx >> Sx, the gain coefficient R (ω, l) ≒ 0 In the case of Nx
<< Sx, R (ω, l) ≒ 1, that is, a predetermined maximum value. The gain factor R (ω, l) is calculated
for each frequency domain. Therefore, in the frequency domain where the amount of noise
mixing is small, the gain coefficient R (ω, l) has a value close to 1 , and the desired signal
component is output as it is. Further, in the frequency domain where the amount of noise mixing
is large, the gain coefficient R (ω, l) becomes a value close to 0 , and the signal component in
the frequency domain is largely attenuated to suppress the noise amount. As described above, by
multiplying the signal YS (ω, l) whose main component is the desired signal supplied from the
adding unit 6 by the gain coefficient R (ω, l) for each frequency domain, the noise component for
each frequency domain Can be suppressed, and the SN ratio of the signal converted to the time
domain by the inverse frequency domain transform unit 10 can be improved.
[0022]
04-05-2019
11
Here, the principle by which the present invention enables sound collection with the desired
sound selected and emphasized will be described. The output power of each sound collection
signal, which is each element of the power vector Y * (ω, l) of the signal output from each of the
sound collection units 4-1 to 4 can be expressed by Equation (27) to Equation (32) As shown
respectively, the power of the signal Xθ (ω, l) received by the microphone array can be
approximated in the form of being multiplied by the directivity characteristic based on the sound
source direction and frequency of the signal. However, it is assumed here that the sounds emitted
by the respective sound sources are uncorrelated with each other, and the sound is received at
the same level in all the microphones.
[0023]
Now, consider the position of the sound source as shown in FIG. 3 by dividing it into the desired
sound source 1 and the other three background noise sources 2R, 2C, 2L, and the signal Xθ (ω,
l) is S ^ (ω, l), N It is assumed that it is included in any of ^ L (ω, l), N ^ C (ω, l), and N ^ R (ω, l).
At this time, assuming that the directivity characteristic of each sound collection unit designed
under the range of Equation (9) to Equation (12) is uniform within the angle region of Θ or ¯, Y *
(ω, l ) Is expressed by equation (33). In this embodiment, the average value of the directivity
characteristics determined by Equations (20) to (23) is used as a representative value of the
directivity characteristics for each angle region.
[0024]
From the above relationship, by multiplying the beamformer output power vector Y * (ω, l) from
the left side, the pseudoinverse matrix T * <+> of T * given in advance is X * (ω, l) An estimated
signal power vector X * opt (ω, l) which is an estimated value is obtained.
[0025]
The second embodiment is a modification of the procedure in the gain coefficient calculation unit
8 of the first embodiment.
FIG. 9 shows the processing procedure of the gain coefficient calculator 8 used in the second
embodiment. The difference from the gain coefficient calculation unit 8 in the first embodiment
is that a non-linear processing unit 83 is added. In order to emphasize the distinction between
04-05-2019
12
the desired voice and the background noise, the non-linear processing unit 83 multiplies the
estimated input SN ratio by the non-linear function Z (ω, l) that fluctuates between 0 and 1, R
(ω). , l) are output. Here, the nonlinear function Z (ω, l) is given in advance, and maintains a
value close to 1 or 1 in a region where the ESNR (ω, l) is large, and a region where the SN ratio
ESNR (ω, l) is small For example, a function that maintains 0 or a value close to 0 is used, for
example, one that is combined with the hypobolic tangent shown in equation (35) or the
logarithmic function shown in equation (36). FIG. 10 shows an example of the non-linear
function Z (ω, l).
[0026]
Here, ρ and は are arbitrarily set by parameters that change the characteristics of the non-linear
function. The other parts are the same as those of the first embodiment, so the description will be
omitted. According to the non-linear characteristic shown in FIG. 10, it is possible to emphasize
the frequency component in the frequency region in which the desired voice is dominant, and to
suppress the frequency component in the frequency region in which the background noise is
dominant. Has the effect of improving the amount.
[0027]
In the third embodiment, the procedure in the sound source signal component estimation unit 7
and the gain coefficient calculation unit 8 in the first embodiment is modified. The configuration
of the sound source signal component estimation unit 7 used in the third embodiment is shown
in FIG. 11, and the configuration of the gain coefficient calculation unit 8 is shown in FIG. The
frequency components YSL (ω, l), YNL (ω, l), YSC (ω, l), YNC (ω, l), YSR (ω, l), YNR (ω) input to
the sound source signal component estimation unit 7 , l) are input to the absolute value
calculator 61 ′, and the absolute values ¦ YSL (ω, l) ¦, ¦ YNL (ω, l) ¦, ¦ YSC (ω, l) ¦, ¦ YNC (ω) of
the signal are obtained. , l) ¦, ¦ YSR (ω, l) ¦, ¦ YNR (ω, l) ¦ is output to the vectorization unit 62.
The vectorization unit 62 outputs an absolute value vector Y * (ω, l) shown in equation (37) for
the input signal.
[0028]
The absolute value vector is input to the multiplication unit 63. The absolute value estimation
matrix T * <+>, which is the other input of the multiplier 63, is an output signal of the pseudo
04-05-2019
13
inverse matrix calculator 64. The pseudo inverse matrix operation unit 64 outputs the pseudo
inverse matrix T * <+> of the input gain matrix T *. The gain matrix T * functions as the fifth and
sixth sound collecting units 4-5 and 4-6 functioning as frontal beam formers and the first to
fourth sound collecting units 4-1-4 functioning as side beam formers. It is defined by Formula
(38) from the gain amount of the directional characteristic calculated from the filter coefficient
used in the filter processing unit 41 (see FIG. 4) provided in −4 and is given in advance.
[0029]
The multiplication unit 63 multiplies the input beamformer output power vector by the power
estimation matrix for each frequency component, and outputs an estimated signal absolute value
vector X * opt (ω, l). Next, vector element extraction section 81 estimates the first component of
the estimated signal absolute value vector input as shown in equation (39), the estimated signal
absolute value ¦ S (ω, l) ¦, and estimates the second component leftward noise Absolute value ¦
NL (ω, l) ¦, third component estimated as front direction noise absolute value ¦ NC (ω, l) ¦, fourth
component as estimated right direction noise absolute value ¦ NR (ω, l) ¦ These are output to the
SN ratio estimation unit 82. The SN ratio estimation unit 82 calculates the estimated SN ratio
ESNR (ω, l) using the equation (40). The other parts are the same as those of the first
embodiment, and thus further description will be omitted. According to the third embodiment,
the calculation amount can be reduced because squared calculations are not required compared
to the first embodiment. The third embodiment can also be applied to the signal source
component estimation unit 7 and the gain coefficient calculation unit 8 of the second
embodiment. FIG. 13 shows the configuration of the gain coefficient calculation unit 8 in the case
where a change of the third embodiment is added to the second embodiment.
[0030]
Although the above-described sound collecting apparatus according to the present invention can
be entirely configured by hardware, the simplest implementation can be achieved by the present
invention in which each procedure described above is described by a computer readable program
language. It is best to create a sound program, install this sound collection program on a
computer, have the computer execute the sound collection program, and have the computer
function as a sound collection device. The sound collection program according to the present
invention is recorded in a computer readable recording medium such as a magnetic medium, a
CD-ROM, a semiconductor memory, etc., and installed from the recording medium or the
computer through a communication line. The installed sound collection program is decoded by
the CPU provided in the computer, and the computer functions as a sound collection device.
04-05-2019
14
[0031]
The sound collection device according to the invention is used, for example, in the field of handsfree calling devices such as teleconferencing systems.
[0032]
BRIEF DESCRIPTION OF THE DRAWINGS The arrangement ¦ positioning figure for demonstrating
the outline ¦ summary of this invention.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram for explaining the whole of a
sound collection device according to the present invention. The top view for demonstrating the
directivity of the 1st-6th sound pickup part used for this invention. FIG. 7 is a block diagram for
explaining the configuration of first to fourth sound collecting units functioning as a side beam
former used in the present invention. FIG. 6 is a block diagram for explaining the configuration of
fifth and sixth sound collection units that function as front beam formers used in the present
invention. The block diagram for demonstrating the structure of the sound source signal
component estimation part used for this invention. FIG. 2 is a block diagram for explaining a
configuration of a gain coefficient calculation unit used in the present invention. The graph for
demonstrating the example of the gain coefficient calculated by the gain coefficient calculation
part shown in FIG. FIG. 8 is a block diagram for explaining a modification of the gain coefficient
calculation unit shown in FIG. 7; FIG. 8 is a graph for explaining an example of the characteristics
of gain coefficients obtained by the gain coefficient calculation unit shown in FIG. 7; FIG. 7 is a
block diagram for explaining a modification of the sound source signal component estimation
unit shown in FIG. 6. FIG. 12 is a block diagram for explaining the configuration of a gain
coefficient calculation unit that calculates a gain coefficient using the estimated value obtained
by the sound source signal component estimation unit shown in FIG. 11; FIG. 13 is a block
diagram for explaining an embodiment in which the gain coefficient calculator shown in FIG. 9 is
applied to the gain coefficient calculator shown in FIG. 12; A for demonstrating the application
example of the simulation for confirming the effect of this invention A is a layout for
demonstrating the case where there are three background noise sources, when there are one
background noise source. A is a signal waveform diagram of a desired sound source for
explaining the effect of the simulation shown in FIG. 14A, B is a waveform diagram when
background noise is superimposed on the desired sound source signal, and C is a sound
collection process by the sound collection device of the present invention FIG. A is a signal
waveform diagram of a desired sound source for explaining the results of the simulation shown
in FIG. 14B, B is a waveform diagram in which background noise is superimposed on the desired
04-05-2019
15
sound source signal, and C is a result of sound collection processing by the sound collection
device of the present invention FIG. The graph for demonstrating the effect of this invention. The
block diagram for demonstrating a prior art.
Explanation of sign
[0033]
DESCRIPTION OF SYMBOLS 1 desired sound source 5 frequency domain conversion part 2
background noise source 6 addition part 3L, 3R microphone array 7 sound source signal
component estimation part 4-1 1st sound collection part 8 gain coefficient calculation part 4-2
2nd sound collection part 9 multiplication part 4-3 Third Sound Collection Unit 10 Reverse
Frequency Domain Conversion Unit 4-4 Fourth Sound Collection Unit 4-5 Fifth Sound Collection
Unit 4-6 Sixth Sound Collection Unit
04-05-2019
16
1/--страниц
Пожаловаться на содержимое документа