close

Вход

Забыли?

вход по аккаунту

JP2009025490

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009025490
An object of the present invention is to provide a sound collection device having sufficient noise
suppression characteristics. A sound collection device according to the present invention includes
six or more sound collection units, a processing target signal generation unit, a power spectrum
estimation unit, a gain coefficient calculation unit, and a multiplication unit. Each sound pickup
unit picks up sound in different regions using output signals of a microphone array configured
by mounting a plurality of microphones. The processing target signal generation unit generates a
processing target signal from signals from one or more predetermined microphones or a sound
collection unit. The power spectrum estimation unit estimates the signal amount of the desired
sound source and the signal amounts of the other sound sources for each frequency from the
signal amount of each collected sound signal obtained by each sound collection unit. The gain
coefficient calculation unit obtains a gain coefficient for each frequency from the signal amount
of the desired sound source, the signal amounts of all the sound sources including the signal
amount of the desired sound source, and the processing target signal. The multiplication unit
multiplies the processing target signal by the gain coefficient calculated by the gain coefficient
calculation unit. [Selected figure] Figure 8
Sound collecting device, sound collecting method, sound collecting program using the method,
and recording medium
[0001]
The present invention relates to a sound collection device, a sound collection method, a sound
collection program using the method, and a recording medium for collecting sound in a handsfree manner such as voice communication and operation of equipment, except for a desired
sound source The problem is greatly related to the case where there are many noise sources of.
04-05-2019
1
[0002]
As a method of emphasizing the desired sound source at a specific position assuming a handsfree microphone in an environment where many background noises exist, a method of estimating
and emphasizing the desired sound power from a plurality of beamformer outputs has been
proposed (Non-Patent Document 1).
In this method, estimated signal power ¦ S (ω, l) ¦ <2>, estimated left direction noise power ¦ NL
(ω, l) ¦ <2>, estimated front direction noise power ¦ NC (ω, l) ¦ < 2> Calculate the gain coefficient
R (ω, l) using the estimated right-hand noise power ¦ NR (ω, l) ¦ <2>.
[0003]
Then, the signal to be processed is multiplied by the gain coefficient R (ω, l) to obtain a signal in
which the noise component is suppressed for each frequency domain. Hirooka Hioka, Kazunori
Kobayashi, Kenichi Furuya, Akitoshi Kataoka, "Enhancing a Sound Source at a Specific Position
Using a Small Microphone Array Pair," Proceedings of the Spring Meeting of the Acoustical
Society of Japan, pp. 621-622, 2006 .
[0004]
In the technique of Non-Patent Document 1, the gain coefficient R (ω, l) is a value that fluctuates
between 0 and 1, and there was a case where a sufficient noise suppression effect could not be
obtained. The sound collection device of the present invention has been made to solve this
problem, and its object is to improve the noise suppression performance.
[0005]
The sound collection device of the present invention includes six or more sound collection units,
a processing target signal generation unit, a power spectrum estimation unit, a gain coefficient
calculation unit, and a multiplication unit. Each sound pickup unit picks up sound in different
regions using output signals of a microphone array configured by mounting a plurality of
04-05-2019
2
microphones. Here, "different" means that they do not match, and there may be overlapping
parts. The processing target signal generation unit generates a processing target signal from
signals from one or more predetermined microphones or a sound collection unit. The power
spectrum estimation unit estimates the signal amount of the desired sound source and the signal
amounts of the other sound sources for each frequency from the signal amount of each collected
sound signal obtained by each sound collection unit. The gain coefficient calculation unit obtains
a gain coefficient for each frequency from the signal amount of the desired sound source, the
signal amounts of all the sound sources including the signal amount of the desired sound source,
and the processing target signal. The multiplication unit multiplies the processing target signal
by the gain coefficient calculated by the gain coefficient calculation unit.
[0006]
For example, the gain coefficient calculation unit may process the signal to be processed as YS
(ω, l), the signal amount of the desired sound source estimated by the power spectrum
estimation unit as S (ω, l), and the signal amounts of the other sound sources as N (ω, l). Let l) be
the gain factor R (ω, l)
[0007]
とすればよい。
[0008]
According to the sound collection device of the present invention, the gain coefficient is
determined also in consideration of the processing target signal.
Therefore, it is possible to obtain a gain coefficient that makes use of the advantages of both the
gain coefficient that does not consider the processing target signal and the gain coefficient that
has been considered.
Therefore, noise suppression characteristics can be improved.
[0009]
04-05-2019
3
FIG. 1 shows an example of usage of the present invention. Two small scale microphone arrays
3L and 3R are arranged at different positions to some extent (for example, the same distance as
the distance between the microphone arrays 3L and 3R and the desired sound source 1), and for
each signal received by the microphone The processing described below is performed. By
performing the processing described below, the sound of the desired sound source 1 is
emphasized and collected, and the sound of the background noise source 2 is suppressed.
[0010]
Before describing the present invention, first, the technology disclosed in the unpublished patent
application (Japanese Patent Application No. 2006-52502) will be described. The whole
structure of the sound collection apparatus of Japanese Patent Application No. 2006-52502 is
shown in FIG. The outline of the sound collection device will be described with reference to FIG.
The respective sound receiving signals generated by the respective microphones of the
microphone array 3L are inputted to the first sound collecting unit 4-1 and the third sound
collecting unit 4-3 in this example. Further, the respective sound receiving signals generated by
the respective microphones of the microphone array 3R are input to the second sound collecting
unit 4-2 and the fourth sound collecting unit 4-4 in this example. The signals of the microphones
located at the centers of the microphone arrays 3L and 3R are input to the fifth sound collecting
unit 4-5 and the sixth sound collecting unit 4-6. The number of microphones mounted on both
microphone arrays 3L and 3R is not necessarily the same.
[0011]
As shown in FIG. 4, the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4 have
M filter processing units 41 to which the sound reception signals x1 to xm of the respective
microphones are input, It is comprised by the addition part 42 which adds each output signal of
the filter process part 41. FIG. Each filter processing unit 41 is constituted by, for example, an
FIR filter, and performs analysis processing for each frequency component included in the
collected sound signal by digital processing to set the directivity characteristics of the
microphone arrays 3L and 3R. Such a technology is described, for example, in "Sound system and
digital processing" co-authored by Oga Juro, Yoshio Yamazaki and Toyoda Kanada on March 25,
1995, published by The Institute of Electronics, Information and Communication Engineers, and
can be realized by a well-known technology. it can.
[0012]
04-05-2019
4
Here, the directivity characteristics of the first sound collection unit 4-1 and the directivity
characteristics of the second sound collection unit 4-2 are angle regions Θ L including the
position of the desired sound source 1 shown in FIG. 3 from the approximate center position of
the microphone arrays 3L and 3R. Set to a characteristic that sets 収 and Θ R as the sound
collection range. The directional characteristics of the third sound collecting unit 4-3 and the
fourth sound collecting unit 4-4 are angular regions Θ L Θ and ¯ R な い not including the
position of the desired sound source 1 shown in FIG. And set the characteristic as the sound
collection range. Furthermore, the directivity of the fifth sound collecting unit 4-5 is set to a
characteristic that the angle range Θ C including the position of the desired sound source 1 from
the approximate middle position of the microphone arrays 3L and 3R is the sound collection
range. The directivity of the sixth sound collecting unit 4-6 is set to a characteristic in which the
angular range from the approximate middle position between the microphone arrays 3 L and 3 R
to the angular range C excluding the position of the desired sound source 1 is the sound
collection range.
[0013]
The sound collection signal collected by the directional characteristics of the first to sixth sound
collection units 4-1 to 4-6 is converted to a signal in the frequency domain by the frequency
domain conversion unit 5. In the conversion to the frequency domain, the input signal is
decomposed into frames of a short time length (for example, about 256 samples in the case of
sampling frequency 16000 Hz), and discrete Fourier transform is performed in each frame. For
the discrete Fourier transform, for example, a fast Fourier transform or the like called FFT or the
like can be used. The signal transformed into the frequency domain is divided into a plurality of
frequency domain components.
[0014]
The collected sound signal converted into the signal in the frequency domain is described as the
adding unit 6 and the power spectrum estimating unit 7 (however, in the specification of
Japanese Patent Application No. 2006-52502, "the sound source signal component estimating
unit"). And). The output signals of the first sound collecting unit 4-1 and the second sound
collecting unit 4-2 are input to the adding unit 6. The adder 6 adds the signals of each frequency
domain converted to the frequency domain for each same frequency domain component.
04-05-2019
5
[0015]
The power spectrum estimation unit 7 receives all output signals of the first sound collection unit
4-1 to the sixth sound collection unit 4-6, and estimates the signal amount of each sound source
for each frequency domain. If the signal amount of each sound source can be estimated, the ratio
of the signal amount of the desired sound source 1 to the signal amount of other sound sources,
that is, the SN ratio can be obtained. This SN ratio is determined for each frequency domain, and
this SN ratio is used as a gain coefficient by multiplying each signal having the signal of the
desired sound source 1 given from the adding unit 6 as a gain coefficient for each frequency
domain. It is possible to suppress the background noise component contained in the signal whose
main component is the signal of the sound source 1. The multiplication result of the
multiplication unit 9 is converted to a time domain signal by the inverse frequency domain
conversion unit 10, and is output as a signal after noise removal. The above is the outline of the
invention of Japanese Patent Application No. 2006-52502.
[0016]
The configuration and operation of each part will be described in detail below. FIG. 4 shows the
configuration of the first to fourth sound collecting units 4-1 to 4-4. Here, although the first
sound collecting unit 4-1 is described as an example, the same process is performed for the
second sound collecting unit 4-2, the third sound collecting unit 4-3, and the fourth sound
collecting unit 4-4. It will be. These first sound collecting units 4-1 to 4-4 do not include the
sound collecting characteristic and the desired sound source position that set the angle range
including the desired sound source position from the directions on both sides of the position of
the desired sound source 1 Since it is set to the sound collection characteristic which makes an
angle area a sound collection range, it functions as a side beam former. The signal xLmL (n) (mL
= 1, 2,..., ML) input to the first sound collection unit 4-1 is input to the filter processing unit 41.
The filter processing unit 41 substitutes the filter coefficient wLmL (n) given in advance (the
determination method will be described later) and the input signal xLmL (n) into the convolution
operation shown in equation (1) to obtain the signal x'LmL ( Output n).
[0017]
The output signal of each filter processing unit 41 is input to the addition unit 42. The adding
unit 42 adds the input signals as shown in equation (2) to obtain an output signal ySL (n) of the
04-05-2019
6
first sound collecting unit 4-1.
[0018]
Here, the filter coefficient wLmL (n) is designed using, for example, the least squares method or
the like so that the directivity characteristic DLSPB (ω, θ) of the first sound collecting unit has
the characteristic shown in the equation (3). Similarly, the second sound collecting unit, the third
sound collecting unit, and the fourth sound collecting unit are designed to satisfy the conditions
of the equations (4) to (6). Each of Θ and 示 す indicates a peripheral direction of the desired
signal (for example, a direction within a range of about ± 10 ° from the desired signal
direction) and the other direction. Further, D (.omega., .Theta.) Shown in the equations (3) to (6)
represents the directivity characteristic of each sound collecting unit.
[0019]
The first sound collection unit 4-1 emphasizes and collects only the sound emitted in the
direction of the desired sound source 1 when viewed from the microphone array 3L. As viewed
from the microphone array 3L, the third sound collection unit emphasizes and collects only
sounds emitted in directions other than the direction of the desired sound source. As viewed
from the microphone array 3R, the second sound collection unit 4-2 emphasizes and collects
only the sound emitted in the direction of the desired sound source 1. The fourth sound
collecting unit 4-4 emphasizes and collects only sounds emitted in directions other than the
direction of the desired sound source 1 as viewed from the microphone array 3R.
[0020]
FIG. 5 shows the flow of processing in the fifth sound collecting unit 4-5 and the sixth sound
collecting unit 4-6 which function as frontal beam formers. In the front beamformer, a signal xL
(ML / 2) (n) received by the microphone disposed at the center of the microphone array 3L and a
signal xR received by the microphone disposed at the center of the microphone array 3R (MR /
2) (n) is input to the filter processing units 51 and 52, respectively. In the filter processing units
51 and 52, the input signals xL (ML / 2) (n) and xR (MR / 2) (n) are given filters given in advance
as shown in equations (7) and (8). Outputs x 'L (ML / 2) (n) and x' R (MR / 2) (n) obtained by
convolving coefficients wC (ML / 2) (n) and wC (MR / 2) (n) Do.
04-05-2019
7
[0021]
Here, it is desirable that the filter coefficients wC (ML / 2) (n) and wC (MR / 2) (n) have the same
phase characteristics, for example, a single impulse signal.
[0022]
Is used.
The fifth sound collection unit 4-5 inputs the output signals x'L (ML / 2) (n) and x'R (MR / 2) (n)
of the filter processing units 51 and 52 to the addition unit 53. The adding unit 53 adds the
input signals as shown in equation (10), and outputs a signal ySC (n). As a result, in the fifth
sound collecting unit 4-5, only the sound emitted in the direction of the desired sound source 1 is
emphasized and collected as viewed from the midpoint between the microphone array 3L and
the microphone array 3R.
[0023]
ySC (n) = x'L (ML / 2) (n) + x'R (MR / 2) (n) (10) In the sixth sound collection unit 4-6, the output
signals x 'of the filter processing units 51 and 52 L (ML / 2) (n) and x′R (MR / 2) (n) are input
to the subtraction unit 54. The subtractor 54 subtracts the input signal as shown in equation
(11), and outputs a signal yNC (n). Therefore, in the sixth sound collecting unit 4-6, only the
sound emitted in the direction other than the direction of the desired sound source 1 is
emphasized and collected, as viewed from the middle point between the microphone array 3L
and the microphone array 3R.
[0024]
yNC (n) = x'L (ML / 2) (n) -x'R (MR / 2) (n) (11) FIG. 6 shows the flow of processing in the power
spectrum estimation unit 7. The frequency components YSL (ω, l), YNL (ω, l), YSC (ω, l), YNC
(ω, l), YSR (ω, l), YNR (ω, l) input to the power spectrum estimation unit 7 l) are input to the
power calculation unit 61, and the power values of the signals ¦ YSL (ω, l) ¦ <2>, ¦ YNL (ω, l) ¦
<2>, ¦ YSC (ω, l) ¦ <2 >, ¦ YNC (ω, l) ¦ <2>, ¦ YSR (ω, l) ¦ <2>, ¦ YNR (ω, l) ¦ <2> are output and
input to the vectorization unit 62. In the vectorization unit 62, a power vector Y (ω) is obtained
04-05-2019
8
by putting together the power values of the input first to sixth output signals of the first to sixth
sound collection units 4-1 to 4-6 in vector form as in equation (12). , L) are output.
[0025]
The power vector Y (ω, l) is input to the multiplier 63. The power estimation matrix T <+>, which
is the other input of the multiplier 63, is an output signal of the pseudo inverse matrix calculator
64. The gain matrix T defined by the equation (19) is input to the pseudo inverse matrix
operation unit 64, and the pseudo inverse matrix T <+> is output.
[0026]
Each element of the gain inverse matrix T is set in the fifth sound collecting unit 4-5, the sixth
sound collecting unit 4-6, and the first sound collecting unit 4-1 to the fourth sound collecting
unit 4-4 in the x direction Or, it is the gain of the directional characteristic in the Θx direction,
and uses, for example, an average value of the frequency and direction of the directional
characteristic as shown in equations (14) to (17).
[0027]
[alpha] x is an average value of the directivity characteristics set in the first, second, and fifth
sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired
sound.
[beta] x is an average value of the directional characteristics set in the first, second, and fifth
sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired
signal. [gamma] x is an average value of directivity characteristics set in the third, fourth, and
sixth sound collecting units 4-3, 4-4, and 4-6 with respect to the peripheral direction of the
desired signal. [delta] x is an average value of directivity characteristics set in the third, fourth,
and sixth sound collecting units 4-3, 4-4, and 4-6 with respect to directions other than the
peripheral direction of the desired signal. In the equations (14) to (17), the subscript x represents
any one of R, C, and L.
[0028]
The multiplying unit 9 multiplies the beam former output power vector and the power estimation
04-05-2019
9
matrix, which are input as shown in equation (18), for each frequency component, and outputs
an estimated signal power vector X opt (ω, l).
[0029]
X opt (ω, l) = T <+> Y (ω, l) (18) FIG. 7 shows the flow of processing in the gain coefficient
calculation unit 8.
The estimated signal power vector X opt (ω, l) input from the power spectrum estimation unit 7
shown in FIG. 6 is input to the vector element extraction unit 81. The vector element extraction
unit 81 estimates the first component of the input estimated signal power vector as the estimated
signal power ¦ S (ω, l) ¦ <2> and estimates the second component as shown in equation (19), and
the left direction noise Power ¦ NL (ω, l) ¦ <2>, estimated third component Front direction noise
power ¦ NC (ω, l) ¦ <2>, estimated fourth component right direction noise power ¦ NR (ω, l) They
are output as ¦ <2>, respectively, and they are input to the SN ratio estimation unit 82.
[0030]
The SN ratio estimation unit 82 calculates the estimated SN ratio ESNR (ω, l) using Equation
(20).
[0031]
The estimated SN ratio ESNR (ω, l), which is the output of the SN ratio estimator 82, is output as
a gain coefficient R (ω, l).
[0032]
The gain factor R (ω, l) is calculated for each frequency domain.
Therefore, in the frequency domain where the amount of noise mixing is small, the gain
coefficient R (ω, l) has a value close to 1 , and the desired signal component is output as it is.
Further, in the frequency domain where the amount of noise mixing is large, the gain coefficient
04-05-2019
10
R (ω, l) becomes a value close to 0 , and the signal component in the frequency domain is
largely attenuated to suppress the noise amount. As described above, the noise component is
suppressed for each frequency domain by multiplying the signal YS (ω, l) having as a main
component the desired signal supplied from the adding unit 6 with the gain coefficient R (ω, l)
for each frequency domain. Thus, it is possible to improve the SN ratio of the signal converted to
the time domain by the inverse frequency domain conversion unit 10.
[0033]
First Embodiment FIG. 8 shows an example of the overall configuration of a sound collection
device according to a first embodiment of the present invention. The gain coefficient calculation
unit 130 and the processing target signal generation unit 140 are different from the entire
configuration of the sound collection device of Japanese Patent Application No. 2006-52502
shown in FIG. FIG. 9 is a diagram showing a processing flow of the sound collection device of the
first embodiment.
[0034]
The first and second sound collecting units 4-1 and 4-2 use the output signals of a microphone
array configured by mounting a plurality of microphones, and sound ySL of an angular region
including a desired sound source position from different positions from each other (N), ySR (n) is
picked up (S4-1, S4-2). The third and fourth sound collecting units 4-3 and 4-4 use the output
signals of the microphone array to generate sounds yNL (n) and yNR (n) of angle areas not
including the desired sound source position from different positions. Are picked up (S4-3, S4-4).
The fifth sound collecting unit 4-5 collects the sound ySC (n) in the angle area including the
desired sound source position from the middle point of the mutually different positions (S4-5).
The sixth sound collecting unit 4-6 collects the sound yNC (n) of the angle area not including the
desired sound source position from the middle point (S4-6). The frequency domain conversion
unit 5 receives the signals ySL (n), ySR (n), yNL (n), yNR (n), ySC (n), yNC collected by the sound
collection units 4-1 to 4-6. The frequency domain signals YSL (ω, l), YSR (ω, l), YNL (ω, l), YNR
(ω, l), YSC (ω, l), YNC (ω, l) Convert to The frequency domain conversion unit 5 may be
provided in each of the sound collection units 4-1 to 1-6. The processing target signal generation
unit 140 outputs the signal YSL (ω, l) from the first sound collection unit 4-1 converted into the
frequency domain and the signal YSR (ω, l) from the second sound collection unit 4-2. The
average is set as the processing target signal YS (ω, l) (S140). The power spectrum estimation
unit 7 receives the respective collected signals YSL (ω, l), YSR (ω, l) and YNL (ω, l) obtained by
the respective sound collection units 4-1 to 4-6 converted to the frequency domain. l) Estimate
04-05-2019
11
the signal amount of the desired sound source and the signal amount of the other sound source
X opt (ω, l) for each frequency from YNR (ω, l), YSC (ω, l), YNC (ω, l) (S7). The gain coefficient
calculation unit 130 obtains a gain coefficient R (ω, l) for each frequency from the signal amount
of the desired sound source, the signal amount X opt (ω, l) of the other sound sources, and the
processing target signal YS (ω, l). (S130). The multiplying unit 9 multiplies the signal to be
processed YS (ω, l) by the gain coefficient R (ω, l) calculated by the gain coefficient calculating
unit 130 (S9). The inverse frequency domain transform unit 10 transforms the processing target
signal R (ω, l) YS (ω, l) multiplied by the gain coefficient into the time domain. The inverse
frequency domain transform unit 10 may be provided in the multiplication unit 9.
[0035]
Next, details of components different from the sound collection device of FIG. 2 will be described.
FIG. 10 is a diagram showing an example of a functional configuration of the processing target
signal generation unit 140. As shown in FIG. The processing target signal generation unit 140
includes an addition unit 141 and a division unit 142. The addition unit 141 adds the signal YSL
(ω, l) from the first sound collection unit 4-1 in the frequency domain and the signal YSR (ω, l)
from the second sound collection unit 4-2. The division unit 142 divides the added signal by 2
and outputs the average value as the processing target signal YS (ω, l). In the sound collection
device of FIG. 2, the addition unit 6 causes the signal YSL (ω, l) from the first sound collection
unit 4-1 in the frequency domain and the signal YSR (ω, l) from the second sound collection unit
4-2. To be processed as the processing target signal YS (ω, l). The difference is whether to divide
by two or not. The difference caused by this difference is only the volume of the entire signal,
and since the waveforms are the same, they are equivalent from the viewpoint of signal
processing. That is, even if dividing by a value other than 2, it is equivalent processing.
[0036]
FIG. 11 shows a functional configuration example of the gain coefficient calculation unit 130.
The gain coefficient calculation unit 130 includes a vector element extraction unit 81, a first gain
calculation unit 131, a second gain calculation unit 132, and a gain multiplication unit 133. As
shown in equation (19), the vector element extraction unit 81 estimates the first component of
the input estimated signal power vector as the estimated signal power ¦ S (ω, l) ¦ <2>, and
estimates the second component to the left. Directional noise power ¦ NL (ω, l) ¦ <2>, estimated
third component Front direction noise power ¦ NC (ω, l) ¦ <2>, estimated fourth component right
direction noise power ¦ NR (ω, l) Output as ¦ <2>. From the estimated signal power ¦ S (ω, l) ¦
<2> and the processing target signal YS (ω, l), the first gain calculator 131 calculates the first
04-05-2019
12
gain coefficient GS (ω, l) as in the following equation Calculate and output.
[0037]
The second gain calculator 132 estimates estimated signal power ¦ S (ω, l) ¦ <2>, estimated left
direction noise power ¦ NL (ω, l) ¦ <2>, estimated front direction noise power ¦ NC (ω, The
second gain coefficient GSNR (ω, l) is calculated from the following equation using l) ¦ <2> and
the estimated right-hand noise power ¦ NR (ω, l) ¦ <2>, and is output.
[0038]
Note that ¦ NL (ω, l) ¦ <2> + ¦ NC (ω, l) ¦ <2> + ¦ NR (ω, l) ¦ <2> is the power of the signal
amount from a sound source other than the desired sound source ¦ If N (ω, l) ¦ <2>, then
equation (22) can be expressed as the following equation.
[0039]
The gain multiplication unit 133 outputs the product of the first gain coefficient GS (ω, l) and the
second gain coefficient GSNR (ω, l) as a gain coefficient R (ω, l) as expressed by the following
equation.
R (ω, l) = GS (ω, l) · GSNR (ω, l) (24) The processing of the other components is the same as the
sound collection device of FIG.
[0040]
Next, the principle of the present invention for suppressing noise will be described.
The product of the first gain coefficient GS (ω, l) and the processing target signal YS (ω, l) is a
signal having a power spectrum with the same amplitude as the estimated signal power ¦ S (ω, l)
¦ <2> . The estimated signal power ¦ S (ω, l) ¦ <2> is in principle identical to the power of the
desired sound source. Therefore, suppression of the noise component can be expected by the
process of multiplying the processing target signal YS (ω, l) by the first gain coefficient GS (ω, l).
However, in practice there are various disturbances such as reverberation and sensitivity errors
of microphones, and since many errors are included, sufficient noise suppression characteristics
04-05-2019
13
can not always be obtained. On the other hand, since the gain coefficient and the second gain
coefficient GSNR (ω, l) which are the output of the gain coefficient calculation unit 8 of Japanese
Patent Application No. 2006-52502 also use the estimated power of noise in the calculation
process, the estimated signal power ¦ S Even when a large amount of noise is included in (ω, l) ¦
<2>, the noise component can be suppressed if the estimated power of noise ¦ N (ω, l) ¦ <2> is
accurate. However, since these gain coefficients are normalized in the range of 0 to 1, the noise
suppression performance is slow and the noise suppression effect is not high. As described
above, the first gain coefficient, and the gain coefficient and the second gain coefficient of
Japanese Patent Application No. 2006-52502 both have advantages and disadvantages. The
sound collection device according to the first embodiment can obtain a gain coefficient that
makes use of the advantages of both by multiplying both gain coefficients. Therefore, noise
suppression characteristics can be improved.
[0041]
Second Embodiment FIG. 12 shows an example of the overall configuration of a sound collection
device according to a second embodiment of the present invention. The present embodiment
differs from the first embodiment (FIG. 8) in that each of the sound collection units 4'-1 to 4'-6,
the processing target signal generation unit 140 ', the power spectrum estimation unit 7', and the
gain coefficient calculation unit 130 '. Hereinafter, components different from those of the first
embodiment will be described. The processing flow of the sound collection device of the second
embodiment is shown in FIG.
[0042]
FIG. 13 is a diagram showing an area of a sound source position for describing setting of each of
the sound collection units 4'-1 to 4'-6. Moreover, FIG. 14 is a figure which shows the function
structural example of 1st sound collection part 4'-1. A signal xLmL (n) (mL = 1, 2, ..., ML) is input
to the microphone array 3L. In the filter processing unit 41 ′, a signal x ′ obtained by
substituting a predetermined filter coefficient wLmL (n) (the determination method will be
described later) and the input signal xLmL (n) into the convolution operation shown in equation
(25) Output LmL (n).
[0043]
04-05-2019
14
The output signal of each filter processing unit 41 'is input to the addition unit 42'. The adding
unit 42 'adds the input signals according to the following equation to obtain an output signal yLL
(n) of the first sound collecting unit 4'-1.
[0044]
Here, the filter coefficient wLmL (n) is determined by using, for example, the least squares
method or the like so that the directivity characteristic DLSB (ω, θ) of the first sound collecting
unit 4′-1 has the characteristic shown in equation Designed. Similarly, the third sound
collecting unit and the fifth sound collecting unit are designed to satisfy the conditions of the
equations (28) and (29). Each of Θ L1 to Θ L3 indicates an angle area as viewed from the
microphone array 3L shown in FIG.
[0045]
That is, the first sound collecting unit 4'-1 suppresses and collects the sound of the angle region
'L1 (S4'-1). The third sound collecting unit 4'-3 suppresses and collects the sound of the angle
region 'L2 (S4'-3). The fifth sound collecting unit 4'-5 suppresses and collects the sound of the
angle region 'L3 (S4'-5).
[0046]
Similarly, as shown in Equations (30) to (32), the second sound collection unit 4′-2 of the
microphone array 3R suppresses and collects the sound of the angle region Θ R1 (S4′-2) . The
fourth sound collecting unit 4'-4 suppresses and collects the sound of the angle region 'R2 (S4'4). The sixth sound collecting unit 4'-6 suppresses and collects the sound of the angle region 'R3
(S4'-6).
[0047]
FIG. 15 is a diagram showing an example of a functional configuration of the processing target
signal generation unit 140 '. The processing target signal generation unit 140 'includes an
addition unit 141' and a division unit 142 '. The adding unit 141 ′ receives the signal YLL (ω, l)
04-05-2019
15
from the first sound collecting unit 4-1 ′ in the frequency domain, the signal YLR (ω, l) from
the second sound collecting unit 4-2 ′, and the fifth collection. The signal YRL (ω, l) from the
sound unit 4-5 ′ and the signal YRR (ω, l) from the sixth sound collecting unit 4-6 ′ are added
as in the following equation, and the addition result Y ′S (ω , L) are output.
[0048]
The division unit 142 'divides the added signal Y'S (ω, l) by 4 as in the following equation, and
outputs the average value as the processing target signal YS (ω, l) (S140').
[0049]
YS (ω, l) = Y ′S (ω, l) / 4 (34) As described in the first embodiment, the waveform is the same
regardless of the number divided by the dividing unit 142 ′. It is equivalent from the viewpoint
of signal processing.
That is, even if dividing by a value other than 4, it is equivalent processing.
[0050]
FIG. 16 shows an example of a functional configuration of the power spectrum estimation unit 7
'. The power spectrum estimation unit 7 'includes a power calculation unit 61', a vectorization
unit 62 ', a multiplication unit 63', and a pseudo inverse matrix calculation unit 64 '. The power
calculation unit 61 'outputs frequency domain signals YLL (ω, l), YCL (ω, l), YRL (ω, l), YLR (ω,
l), YCR (ω, l) from the respective sound collection units. ), YRR (ω, l), power values ¦ YLL (ω, l) ¦
<2>, ¦ YCL (ω, l) ¦ <2>, ¦ YRL (ω, l) ¦ <2>, ¦ YLR (Ω, l) ¦ <2>, ¦ YCR (ω, l) ¦ <2>, ¦ YRR (ω, l) ¦ <2>
is calculated and output. The vectorization unit 62 'outputs a power vector Y (ω, l) in which the
power values are grouped in vector format as in equation (35).
[0051]
Then, the power vector Y (ω, l) is input to the multiplier 63 '. The power estimation matrix T <+>,
which is the other input of the multiplier 63 ', is the output signal of the pseudo inverse matrix
calculator 64'. The gain matrix T defined by the equation (36) is input to the pseudo inverse
04-05-2019
16
matrix operation unit 64 ', and the pseudo inverse matrix T <+> is output.
[0052]
Each element of the gain inverse matrix T (ω) is the gain of the directivity characteristic in the
Θ1 direction, Θ2 direction, and Θ3 direction of each of the sound collection units 4′-1 to 4′6. Use the average value for the direction of directivity as shown in 39).
[0053]
α x (ω) is an average value of directivity characteristics of the first sound collecting unit 4′-1
and the second sound collecting unit 4′-2 at the frequency ω with respect to the direction of
the angle region Θx.
β x (ω) is an average value of directivity characteristics of the third sound collecting unit 4′-3
and the fourth sound collecting unit 4′-4 at the frequency ω with respect to the direction of the
angle region Θx. γ x (ω) is an average value of directivity characteristics of the fifth sound
collecting unit 4′-5 and the sixth sound collecting unit 4′-6 at the frequency ω with respect to
the direction of the angular region Θx. Here, any one of L1, L2, L3, R1, R2 and R3 enters x. The
multiplying unit 63 ′ multiplies the pseudo-inverse matrix T <+> by the signal Y ′ (ω, l) from
which the reverberation is subtracted as shown in the equation (40), and estimates the estimated
signal power vector Xopt (ω, l) Output (S7 ').
[0054]
X opt (ω, l) = T <+> Y (ω, l) (40) FIG. 17 shows a functional configuration example of the gain
coefficient calculation unit 130 '. The gain coefficient calculation unit 130 ′ includes a vector
element extraction unit 81 ′, a first gain calculation unit 131, a second gain calculation unit
132 ′, and a gain multiplication unit 133. The vector element extraction unit 81 ′ estimates
the input estimated signal power vector X opt (ω, l), estimated signal power ¦ S (ω, l) ¦ <2>,
estimated left side noise power ¦ NLL (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω, l)
¦ <2>, estimated front direction noise power ¦ NC (ω, l) ¦ <2>, estimated right direction noise
power ¦ NR (ω, l) And <2>, and output as estimated right side noise power ¦ NRR (ω, l) ¦ <2>.
From the estimated signal power ¦ S (ω, l) ¦ <2> and the processing target signal YS (ω, l), the
first gain calculator 131 calculates the first gain coefficient GS (ω, l) as in the following equation
Calculate and output.
04-05-2019
17
[0055]
The second gain calculator 132 ′ estimates estimated signal power ¦ S (ω, l) ¦ <2>, estimated
left side noise power ¦ NLL (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω , L) ¦ <2>,
estimated front direction noise power ¦ NC (ω, l) ¦ <2>, estimated right direction noise power ¦
NR (ω, l) ¦ <2>, estimated right direction noise power ¦ NRR ( Based on ω, l) ¦ <2>, the second
gain coefficient GSNR (ω, l) is calculated according to the following equation and output.
[0056]
Note that ¦ NLL (ω, l) ¦ <2> + ¦ NL (ω, l) ¦ <2> + ¦ NC (ω, l) ¦ <2> + ¦ NR (ω, l) ¦ <2> + If ¦ NRR
(ω, l) ¦ <2> is the power of the signal amount from a sound source other than the desired sound
source ¦ N (ω, l) ¦ <2>, equation (42) is expressed as the following equation it can.
[0057]
The gain multiplication unit 133 outputs the product of the first gain coefficient GS (ω, l) and the
second gain coefficient GSNR (ω, l) as the gain coefficient R (ω, l) as in the following equation
(S130 ′ ).
[0058]
R (ω, l) = GS (ω, l) GSNR (ω, l) (44) The processing of the other components is the same as the
sound collection device of the first embodiment.
[0059]
With the above-described configuration, the sound collection device of the second embodiment
can improve the noise suppression characteristic as in the first embodiment.
[0060]
[Modification] FIG. 18 shows another configuration example (modification) of the power
spectrum estimation unit of the second embodiment (FIG. 12).
The power spectrum estimation unit 7 ′ ′ includes a power calculation unit 61 ′, a
vectorization unit 62 ′, and a non-negative constrained least squares unit 63 ′ ′.
04-05-2019
18
The power calculation unit 61 'and the vectorization unit 62' are the same as the power spectrum
estimation unit (FIG. 16) of the second embodiment.
The non-negative constrained least squares unit 63 ′ ′ is constrained such that the input
power vector Y (ω, l) and the gain matrix T are such that the estimated signal power vector Xopt
(ω, l) is nonnegative as shown in equation (46) Under the conditions, as shown in equation (45),
an estimated signal power vector Xopt (ω, l) that minimizes the square error of Y (ω, l) and T · X
opt (ω, l) is determined and output Do.
[0061]
‖ Y (ω, l) − T · X opt (ω, l) ‖ <2> (45) subject to X opt (ω, l) 46 0 (46) In addition, as a
method of calculating this solution, for example, CL Lawson and RJ Hanson, Solving Least
Squares Problems, Prentice-Hall, 1974.
The Non-negative Least Square method described in can be used.
Each component of Xopt (ω, l) should have a non-negative value because it is the power of the
signal, but in the processing of Japanese Patent Application No. 2006-52502, the first
embodiment, and the second embodiment, a negative value that can not be realized in reality
May be a component. The inclusion of such components causes the degradation of noise
suppression performance. In the process of the present modification, each component of the
estimated signal power vector X opt (ω, l) always has a nonnegative value, so that the noise
suppression characteristics can be improved.
[0062]
Third Embodiment FIG. 19 shows an example of the overall configuration of a sound collection
device according to a third embodiment of the present invention. The power spectrum estimation
unit 110 and the reverberation spectrum estimation unit 120 are different from the second
embodiment (FIG. 12). Further, FIG. 20 shows an example of the processing flow of the entire
sound collection device of the third embodiment. The point of estimating the reverberation
04-05-2019
19
spectrum from the estimation result of the power spectrum and performing feedback
(subtraction) differs from the first embodiment and the second embodiment. Hereinafter,
components different from those of the second embodiment will be described.
[0063]
FIG. 21 shows an example of a functional configuration of the power spectrum estimation unit
110. The power spectrum estimation unit 110 includes a power calculation unit 61 ′, a
vectorization unit 62 ′, a subtraction unit 111, a multiplication unit 63 ′, and a pseudo inverse
matrix calculation unit 64 ′. The power calculation unit 61 'and the vectorization unit 62' are
the same as the power spectrum estimation unit 7 '(FIG. 16) of the second embodiment. The
vectorization unit 62 'outputs a power vector Y (ω, l) in which the power values are grouped in
vector format as in equation (35).
[0064]
The subtracting unit 111 subtracts the estimated signal amount Z <*> est (ω, l) of the
reverberation from the vectorized signal Y (ω, l) as in the following equation, and the result Y ′
(ω , L) to the multiplication unit 63 '.
[0065]
Y ′ (ω, l) = Y (ω, l) −Z <*> est (ω, l) (47) The power spectrum estimator of the second
embodiment is also the multiplier 63 ′ and the pseudo inverse matrix calculator 64 ′. Same as
7 '(FIG. 16).
The gain matrix T defined by the equation (36) is input to the pseudo inverse matrix operation
unit 64 ', and the pseudo inverse matrix T <+> is output. The multiplying unit 63 ′ multiplies the
pseudo-inverse matrix T <+> by the signal Y ′ (ω, l) from which the reverberation is subtracted
as shown in equation (48), and estimates the estimated signal power vector X opt (ω, l) Output.
[0066]
X opt (ω, l) = T <+> Y ′ (ω, l) (48) FIG. 22 shows a functional configuration example of the
04-05-2019
20
reverberation spectrum estimation unit 120. The reverberation spectrum estimation unit 120
includes a gain matrix multiplication unit 125 and a weighted addition unit 126. The gain matrix
multiplication unit 125 converts the signal amount of the desired sound source and the signal
amount X opt (ω, l) of the other sound source into the signal amount Zest (ω, l) for each sound
collection unit. The gain matrix T ′ is the gain of the directivity of each sound collection unit
with respect to the reverberation component, and may be, for example, the following equation.
[0067]
ただし、
[0068]
である。
The weighted addition unit 126 records the signal amount Zest (ω, l) of each sound collection
unit, and performs weighted addition of the signal amounts of each of a plurality of past sound
collection units. Specifically, if weighted addition of the signal amount Zest (ω, l) for each sound
collecting unit of N frames in the past is performed, N delay units 1211 to 121 N and N weight
multiplying units 1221 to 122N and N−1 adders 1231 to 123N−1 may be provided. The first
delay unit 1211 records the signal amount Zest (ω, l) of each sound collection unit and delays
the signal amount by one frame. The first weight multiplying unit 1221 multiplies the output of
the first delay unit 1211 (the signal amount Zest (ω, l) of each sound collecting unit one frame
before) by the weight ρ1. The n-th delay unit 121 n records the signal amount Zest (ω, l) of
each sound collection unit before the n−1 frame, and delays the signal amount by one frame.
The n-th weight multiplication unit 122 n multiplies the output of the n-th delay unit 121 n (the
signal amount Zest (ω, l) for each sound collection unit n frames before) by the weight n n. The
n-th addition unit 123 n adds the output of the n-th weight multiplication unit 122 n to the
output of the (n + 1) -th addition unit 123 n + 1. The first addition unit 1231 adds the output of
the first weight multiplication unit 1221 to the output of the second addition unit 1232 and
outputs the signal amount Z <*> est (ω, l) of the reverberation. By performing processing in this
manner, weighted addition in which weight ρ n is added to the signal amount Zest (ω, l) of each
sound collecting unit n frames before can be performed. Here, the weight n n is a parameter
representing the time-based power attenuation of the reverberation component, and for example,
from the reverberation time T60, it is given by the following equation.
[0069]
04-05-2019
21
Here, LS is the number of samples in one frame, and FS is the sampling frequency.
[0070]
The processing of the other components is the same as the sound collection device of the second
embodiment.
Therefore, also in the sound collection device of the third embodiment, the noise suppression
characteristic can be improved as in the first and second embodiments. Furthermore, in the case
of the sound collection device of the third embodiment, the following effects can be obtained.
FIG. 23 shows a model of noise generation. FIG. 24 shows the influence of reverberation on the
power spectrum in each frame. The reverberation is delayed by a time corresponding to the
distance of the transmission path from the direct sound emitted at a certain time 0 (here,
considered in the time frame), and its magnitude is reduced by a constant attenuation rate. To
reach the microphone. For example, in the example shown in FIG. 23, the same sound as the
direct sound emitted at time 0 affects the frames of time 1 to 3 as reverberation. For this reason,
as shown in FIG. 24, the component of direct sound included in the past frame is superimposed
as reverberation on the estimated power spectrum in a certain frame l. The attenuation factor at
this time corresponds to the weight n n of the reverberation spectrum estimation unit 120. The
weight n n is determined from the acoustic characteristics of the room and can be calculated
theoretically according to equation (56) using, for example, the reverberation time T60, which is
one measure indicating the acoustic characteristics of the room. In the sound collection device of
the present invention, the past direct sound component can be obtained as the signal amount
Zest (ω, l) for each sound collection unit in the past. Therefore, the gain matrix multiplication
unit 125 converts the signal amount Zest (ω, l) for each sound collection unit, the weighted
addition unit 126 records the signal amount Zest (ω, l) for each sound collection unit, and
Weighted addition is performed on the signal amount of each past sound collecting unit. Thus,
the signal amount Z <*> est (ω, l) of the reverberation is determined, and the power spectrum
estimation unit 110 estimates the signal amount Z of the reverberation estimated from the
vectorized signal Y (ω, l). *> Est (ω, l) is subtracted. Therefore, the sound collection device of the
third embodiment can also reduce the influence of reverberation.
[0071]
EXPERIMENTAL EXAMPLE Next, the experimental result in the sound collection apparatus of 3rd
Embodiment is shown. FIG. 25 is a diagram showing an experimental environment. In each
04-05-2019
22
microphone array, four microphones are linearly arranged at an equal interval of 4 cm. The unit
of coordinates is meters, and the centers of each are located at (0.4, 0) and (−0.4, 0). The desired
sound source (the position of the target speaker) is at (0, 0.5). And three different background
noise sources (other speaker positions) are placed at (-1.6, 2.5), (1.6, 1.0), (0.0, 2.5) ing.
[0072]
FIG. 26 shows the spectral shapes of the desired signal and the noise signal contained in the
input signal having a high signal-to-noise ratio, and the first gain coefficient GS (ω, l) and the
gain coefficient obtained by the sound collection device of the third embodiment. It is a figure
which shows the example of R ((omega), l). FIG. 27 shows the spectral shapes of the desired
signal and the noise signal contained in the input signal with a low signal-to-noise ratio, and the
first gain coefficient GS (ω, l) and the gain coefficient obtained by the sound collection device of
the third embodiment. It is a figure which shows the example of R ((omega), l). 26A and 27A
show the spectral shapes of the desired signal and the noise signal contained in the input signal.
FIG. 26B and FIG. 27B show the first gain coefficient GS (ω, l) obtained by the sound collection
device of the third embodiment. FIGS. 26C and 27C show gain coefficients R (ω, l) obtained by
the sound collection device of the third embodiment. In the signal of FIG. 26A, the noise signal is
dominant with respect to the desired signal at frequencies near 2000 Hz and 4000 Hz (the
frequency indicated by the dotted line in the figure). That is, it is desirable that the gain factor to
be multiplied be close to 0 near 2000 Hz and 4000 Hz. In the first gain coefficient GS (ω, l) of
FIG. 26B, the coefficient is large even at the corresponding frequency, but in the gain coefficient
R (ω, l) of FIG. 26C, the coefficient at the corresponding frequency is small. From this, it can be
seen that the gain coefficient formed by the multiplication of the plurality of gain coefficients
obtained by the present invention is excellent in the noise suppression effect. Similarly, in FIG.
27A, because the noise signal is dominant in the entire band, it is desirable that the gain factor to
be multiplied be close to zero over the entire band. From FIGS. 27B and 27C, it can be seen that
the gain coefficient according to the present invention has a smaller band with a large coefficient
value, and the noise suppression effect is higher.
[0073]
FIG. 28 shows the results of measuring the amount of background noise suppression in two
experimental environments with different reverberation intensities. When the experimental
environment 1 has a reverberation time of 250 ms (reverberation similar to a general bedroom),
the experimental environment 2 has a reverberation time of 500 ms (reverberation similar to a
general conference room). From the above, it can be seen that the sound collection device of the
04-05-2019
23
present invention has better noise suppression performance than the sound collection device of
Japanese Patent Application No. 2006-52502.
[0074]
FIG. 29 shows an example of the functional configuration of a computer. The sound pickup
apparatus of the present invention causes the recording unit 2020 of the computer 2000 to read
a program for operating the computer 2000 as each component of the present invention, and
operates the processing unit 2010, the input unit 2030, the output unit 2040, and the like. It can
be realized by In addition, as a method of reading into a computer, a program is recorded in a
computer readable recording medium, and a method of reading into a computer from the
recording medium, a program recorded in a server or the like is read into the computer through
a telecommunication line or the like. There is a way to
[0075]
The figure which shows an example of the utilization condition of this invention. The figure
which shows the whole structure of the sound collection apparatus of Japanese Patent
Application No. 2006-52502. The top view for demonstrating the directivity of the 1st-6th sound
collection parts 4-1 to 4-6. The block diagram for demonstrating the structure of the 1st-4th
sound collection parts 4-1 to 4-4. The figure which shows the structure of the 5th sound
collection part 4-5 and the 6th sound collection part 4-6. FIG. 2 is a diagram showing the
configuration of a power spectrum estimation unit 7; FIG. 2 is a diagram showing the
configuration of a gain coefficient calculation unit 8; BRIEF DESCRIPTION OF THE DRAWINGS
The figure which shows the structural example of the whole sound collection apparatus of 1st
Embodiment. The figure which shows the processing flow of the sound collection apparatus of
1st Embodiment and 2nd Embodiment. FIG. 7 is a diagram showing an example of the functional
configuration of a processing target signal generation unit 140. FIG. 7 shows an example of a
functional configuration of a gain coefficient calculation unit 130. The figure which shows the
structural example of the whole sound collection apparatus of 2nd Embodiment. FIG. 7 is a
diagram showing an area of a sound source position for describing setting of each sound
collection unit 4 ′-1 to 4 ′-6. FIG. 7 is a diagram showing an example of a functional
configuration of a first sound collecting unit 4 ′ -1; The figure which shows the function
structural example of the process target signal generation part 140 '. The functional structural
example of power spectrum estimation part 7 'is shown. The figure which shows the function
structural example of gain coefficient calculation part 130 '. FIG. 7 is a view showing a modified
configuration example of the power spectrum estimation unit of the second embodiment. The
04-05-2019
24
figure which shows the structural example of the whole sound collection apparatus of 3rd
Embodiment. The figure which shows the example of the processing flow of the whole sound
collection apparatus of 3rd Embodiment. FIG. 2 shows an example of a functional configuration
of a power spectrum estimation unit 110. FIG. 2 is a diagram showing an example of a functional
configuration of a reverberation spectrum estimation unit 120. The figure which shows the
model of noise generation. The figure which shows the influence of the reverberation on the
power spectrum in each flame ¦ frame. The figure which shows experiment environment. Spectral
shapes of a desired signal and a noise signal included in an input signal having a high signal-tonoise ratio, and a first gain coefficient GS (ω, l) and a gain coefficient R (ω, l) obtained by the
sound collection device of the third embodiment. The figure which shows the example of l).
Spectral shapes of a desired signal and a noise signal included in an input signal having a low
signal-to-noise ratio, and a first gain coefficient GS (ω, l) and a gain coefficient R (ω, l)
determined by the sound collection device of the third embodiment. The figure which shows the
example of l). The figure which shows the result of having measured the amount of suppression
of background noise in two experimental environments from which the intensity ¦ strength of
reverberation differs. The figure which shows the function structural example of a computer.
04-05-2019
25
1/--страниц
Пожаловаться на содержимое документа