close

Вход

Забыли?

вход по аккаунту

JP2013061421

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013061421
PROBLEM TO BE SOLVED: To improve the accuracy of adaptive update of Wiener filter
coefficients without applying a burden to a user by applying coherence to background noise
detection. The present invention relates to an audio signal processing apparatus to which a
Wiener filter technology is applied. The input speech signal is subjected to delay subtraction
processing to form first and second directivity signals having blind spots in the first and second
predetermined azimuths, and coherence is obtained using these two directivity signals. Then,
based on the coherence, it is determined whether the input speech signal is a target speech
segment from the target azimuth or a non-target speech segment other than that. In addition, the
difference between the instantaneous coherence value and the coherence long-term average
value, which is the long-term average value, is obtained, and the difference is compared with the
threshold. Divide into background noise sections and switch the adaptation processing of Wiener
filter coefficients. Then, the input speech signal is multiplied by the Wiener filter coefficient after
the adaptive processing. [Selected figure] Figure 1
Audio signal processing apparatus, method and program
[0001]
The present invention relates to an audio signal processing apparatus, method, and program, and
can be applied to, for example, a communication device or communication software that handles
audio signals such as a telephone and a video conference.
[0002]
03-05-2019
1
One of the noise suppression techniques is a technique called a voice switch.
This detects a section (target voice section) in which a speaker is speaking from an input signal
using a target voice section detection function, outputs no processing in the case of a target voice
section, and amplitude in the case of a non-target voice section. Is a technology that attenuates
For example, as shown in FIG. 11, when the input signal "input" is received, it is determined
whether or not it is a target voice section (step S100), and if it is a target voice section, 1.0 is set
to the gain VS̲GAIN (step S101) If it is a non-target voice section, an arbitrary positive number
α less than 1.0 is set to the gain VS̲GAIN (step S102), and thereafter. The input signal input is
multiplied by the gain VS̲GAIN to obtain an output signal output (step S103).
[0003]
Another noise suppression technique is a technique called a Wiener filter (see Patent Document
1). As shown in FIG. 12, this detects the noise section from the input signal input (step S150),
estimates the background noise characteristic for each frequency, and calculates the Wiener
filter coefficient according to the background noise characteristic (step S151) The technique is to
suppress the background noise component included in the input signal input by multiplying the
input signal input by the Wiener filter coefficient WF̲COEF (f) (step S153). Note that the
equation of Equation 1 of Patent Document 1 can be applied to the method of estimating
noise characteristics, and the equation of Equation 3 of Patent Document 1 can be applied to
the method of calculating filter coefficients.
[0004]
By applying the technology of the voice switch or the Wiener filter to an audio signal processing
apparatus such as a television conference apparatus or a portable telephone, noise can be
suppressed and the speech quality can be improved.
[0005]
By the way, in order to apply the voice switch and the Wiener filter, an unintended voice interval
("disturbance voice" which is a human voice other than the speaker and a "background noise"
interval such as office noise or road noise) One of the detection methods that must be detected is
a method based on a feature quantity called coherence.
03-05-2019
2
Coherence is a feature that means the direction of arrival of an input signal, simply stated. When
the directions of arrival of the target voice and the non-target voice are compared assuming that
the mobile phone etc. is used, while the speaker's voice (target voice) comes from the front,
among the non-target voices, the disturbing voice is other than the front There is a difference
that the background noise has a clear direction of arrival. Therefore, by focusing on the direction
of arrival, it is possible to distinguish between the target voice and the non-target voice.
[0006]
FIG. 13 is a block diagram of a conventional voice signal processing apparatus using a voice
switch and a Wiener filter in combination in the case of using coherence for the target voice
detection function.
[0007]
The input signals s1 (t) and s2 (t) are acquired from each of the pair of microphones m̲1 and
m̲2 via an AD converter (not shown), and the FFT (Fast Fourier Transform) unit 10 obtains a
frequency domain signal X1 (f), Convert to X2 (f).
The first directivity forming unit 11 performs an operation as shown in equation (1) to obtain a
signal B1 (f) having strong directivity in the right direction, and the second directivity forming
unit 12 performs the operation (2) To calculate a signal B2 (f) having strong directivity in the left
direction. The signals B1 (f) and B2 (f) are represented by complex numbers.
[0008]
The meanings of these equations will be described with reference to FIG. 14 and FIG. 15 taking
the equation (1) as an example. It is assumed that a sound wave arrives from the direction θ
shown in FIG. 14 (A) and is captured by a pair of microphones m̲1 and m̲2 set apart by a
distance l. At this time, a time difference occurs before the sound waves reach the pair of
microphones m̲1 and m̲2. This arrival time difference τ is given by the equation (3), where c is
sound velocity, since d = 1 × sin θ, where d is the path difference of sound.
[0009]
03-05-2019
3
τ = l × sin θ / c (3) The signal s1 (t−τ) obtained by delaying the input signal s1 (n) by τ is
the same signal as the input signal s2 (t). Therefore, the signal y (t) = s2 (t) -s1 (t-.tau.) Taking the
difference between the two is a signal from which the sound arriving from the .theta. Direction
has been removed. As a result, the microphone arrays m̲1 and m̲2 have directional
characteristics as shown in FIG. 14 (B).
[0010]
In the above, although the calculation in the time domain is described, the same can be said even
if it is performed in the frequency domain. The equations in this case are the equations (1) and
(2) described above. Now, as an example, it is assumed that the arrival direction θ is ± 90
degrees. That is, the directivity signal B1 (f) from the first directivity forming unit 11 has strong
directivity in the right direction as shown in FIG. 15 (A), and the directivity signal B1 (f) from the
second directivity forming unit 12 The directivity signal B2 (f) has strong directivity in the left
direction as shown in FIG. 15 (B).
[0011]
The coherence COH is obtained by performing operations such as the equations (4) and (5) in the
coherence calculation unit 13 on the directional signals B1 (f) and B2 (f) obtained as described
above. Be B2 (f) <*> in the equation (4) is a complex conjugate of B2 (f).
[0012]
The target voice section detection unit 14 compares the coherence COH with the target voice
section determination threshold Θ, and determines that it is a target voice section if it is larger
than the threshold Θ, otherwise it determines a non-target voice section.
[0013]
Here, the background for detecting the target voice section based on the magnitude of the
coherence will be briefly described.
03-05-2019
4
The concept of coherence is reworded as the correlation between the signal coming from the
right and the signal coming from the left (Equation (4) described above is an equation for
calculating the correlation for a certain frequency component, and Eq. (5) is all frequencies
Calculate the average of the component correlation values). Therefore, the case where the
coherence COH is small is the case where the correlation between the two directional signals B1
and B2 is small, and conversely, the case where the coherence COH is large can be reworded as
the case where the correlation is large. The input signal in the case where the correlation is small
is a case where the input arrival direction is largely deviated to either the right or the left, or a
signal with less definite regularity such as noise even if there is no deviation. Therefore, it can be
said that the section where the coherence COH is small is a disturbing voice section or a
background noise section (non-target voice section). On the other hand, when the value of the
coherence COH is large, it can be said that the input signal comes from the front because there is
no deviation in the arrival direction. Now, it is assumed that the target voice comes from the
front, so when the coherence COH is large, it can be said to be a target voice section.
[0014]
The gain control unit 15 sets 1.0 as the gain VS̲GAIN in the target voice section, and sets an
arbitrary positive value α less than 1.0 as the gain VS̲GAIN in the non-target voice section
(disturbing voice, background noise).
[0015]
Further, the WF adaptation unit 16 refers to the determination result of the target speech section
detection unit 14 and performs control to adapt the Wiener filter coefficient in the non-target
speech section, and to stop adaptation of the Wiener filter coefficient otherwise. Thus, we obtain
the Wiener filter coefficient WF̲COEF [f].
The Wiener filter coefficient WF̲COEF [f] is sent to the WF coefficient multiplication unit 17, and
is multiplied by the FFT-transformed signal X1 (f) of the input signal s1 (t) as shown in equation
(6). Thus, a signal P (f) in which the background noise characteristic is suppressed is obtained
from the input signal.
[0016]
P (f) = WF̲COEF (f) × X1 (f) (6) After the background noise suppression signal P (f) is converted
to a time domain signal q (t) by an IFFT (Inverse Fast Fourier Transform) unit 18, The VS gain
03-05-2019
5
multiplication unit 19 multiplies the gain VS̲GAIN set by the gain control unit 15 as shown in
equation (7) to obtain an output signal y (t).
[0017]
y (t) = VS̲GAIN × q (t) (7) As described above, by using the voice switch and the Wiener filter in
combination, the suppression effect of the non-target voice section by the voice switch and the
target voice section by the Wiener filter The suppression effect of the superimposed noise
component can be compatible, and a higher noise suppression effect can be obtained than when
each of them is used alone.
[0018]
Here, a background that uses coherence as a feature amount for identifying the target voice
section and the non-target voice section is supplemented.
In normal target voice section detection, fluctuation of input signal level is used as feature
quantity of detection, but since this method can not distinguish between disturbing voice and
target voice, the disturbing voice can not be suppressed by the voice switch, and the suppression
effect is It was inadequate.
On the other hand, since the detection based on the coherence is distinguished according to the
arrival direction of the input signal, it is possible to distinguish between the target voice and the
disturbed voice whose arrival directions are different, and the suppression effect by the voice
switch is obtained.
[0019]
JP-A-2010-532897
[0020]
However, although the voice switch and the Wiener filter are the same "noise suppression
technology", noise intervals to be detected for optimal operation are different.
03-05-2019
6
The voice switch only needs to detect a section in which one or both of the disturbing voice and
the background noise are superimposed, but the Wiener filter must detect a section of only the
background noise from the non-target speech section. The reason is that if the coefficient is
applied in the disturbing voice section, the feature of the "voice" of the disturbing voice is also
reflected in the Wiener filter coefficient as noise, and even the characteristic component of the
voice is suppressed from the target voice. Sound quality is degraded.
[0021]
As described above, in the combined use of the voice switch and the Wiener filter, the
characteristic of the disturbing voice is the same because the prior art is applied on the basis of
the uniform standard although it is necessary to detect each optimum section. There is a problem
that the target voice is degraded by adding the reflected Wiener filter coefficient.
[0022]
In order to solve this problem, it is possible to use a plurality of target voice section detection
techniques in combination so that sections suitable for each of the voice switch and the Wiener
filter can be detected. There is a problem that the adjustment of a plurality of parameters having
different behaviors is required, which increases the burden on the device user.
[0023]
Therefore, there is a need for an audio signal processing apparatus, method, and program that
can improve the sound quality by enhancing the accuracy of adaptive update of Wiener filter
coefficients without applying a burden to the user by applying coherence to background noise
detection.
[0024]
According to a first aspect of the present invention, in an audio signal processing apparatus for
suppressing noise components from an input audio signal, (1) a directivity characteristic having a
blind spot in a first predetermined direction by performing delay subtraction processing on the
input audio signal In a second predetermined orientation different from the first predetermined
orientation, a first directivity forming unit for forming the applied first directivity signal and (2)
delay subtraction processing on the input voice signal A second directivity forming unit for
forming a second directivity signal to which directivity characteristics having a dead angle are
imparted; (3) a coherence calculation unit for obtaining coherence using the first and second
directivity signals , (4) a target voice section detection unit that determines whether the input
voice signal is a section of a target voice coming from a target orientation or a non-target voice
03-05-2019
7
section other than that based on the coherence; Difference from the average value of the
coherence And (6) comparing the difference information with the background noise detection
threshold, and comparing the non-target voice interval with the background noise interval when
the background noise detection threshold is smaller and the other non-background A WF
adaptation unit that switches the adaptation processing of the Wiener filter coefficient according
to whether it is divided into noise intervals and a background noise interval or a non-background
noise interval, and (7) the input speech signal is multiplied by the Wiener filter coefficients from
the WF adaptation unit. And a WF coefficient multiplication unit.
[0025]
According to a second aspect of the present invention, in the audio signal processing method for
suppressing noise components from an input audio signal, (1) the first directivity forming unit
performs a delay subtraction process on the input audio signal to obtain a first predetermined
The first directivity signal having directivity characteristics with a blind spot in the azimuth is
formed, and (2) the second directivity forming unit performs the delay subtraction processing on
the input voice signal, thereby forming the first predetermined signal. A second directivity signal
having directivity characteristics having a blind spot in a second predetermined orientation
different from the orientation is formed, and (3) the coherence calculation unit uses the first and
second directivity signals. The coherence is calculated, and (4) the target speech segment
detection unit determines whether the input speech signal is a segment of the target speech
coming from the target direction or a non-target speech segment other than that based on the
coherence. (5) The coherence behavior information calculation unit (6) The WF adaptation unit
compares the difference information with the background noise detection threshold and
determines that the non-target voice section is smaller than the background noise detection
threshold. The adaptive processing of the Wiener filter coefficient is switched according to
whether it is a background noise section and a non-background noise section other than that
section according to whether it is a background noise section or a non-background noise section,
(7) WF coefficient multiplication section is from the WF adaptation section The input speech
signal is multiplied by Wiener filter coefficients.
[0026]
A voice signal processing program according to a third aspect of the present invention
comprises: (1) a first directivity in which a directivity characteristic having a blind spot in a first
predetermined direction is imparted by performing delay subtraction processing on an input
voice signal And (2) performing directivity subtraction processing on the input voice signal to
form a directivity characteristic having a blind spot in a second predetermined orientation
different from the first predetermined orientation. A second directivity forming unit for forming a
second directivity signal applied, (3) a coherence calculation unit for obtaining coherence using
the first and second directivity signals, and (4) the coherence A target voice section detection unit
03-05-2019
8
that determines whether the input voice signal is a section of a target voice coming from a target
direction or a non-target voice section other than that; (5) an average value of the coherence Get
information on differences from The behavior information calculation unit (6) compares the
difference information with the background noise detection threshold, and divides the non-target
voice interval into the background noise interval when it is smaller than the background noise
detection threshold and the other non-background noise intervals A WF adaptation unit that
switches adaptation processing of the Wiener filter coefficient according to whether it is a
background noise interval or a non-background noise interval; (7) a WF coefficient multiplication
unit that multiplies the input speech signal by the Wiener filter coefficient from the WF
adaptation unit It is characterized in that it functions as
[0027]
According to the present invention, it is possible to provide an audio signal processing apparatus,
method and program capable of improving the sound quality by enhancing the accuracy of the
adaptive update of the Wiener filter coefficient while applying a coherence to background noise
detection to not burden the user. .
[0028]
It is a block diagram showing composition of an audio signal processing device concerning a 1st
embodiment.
It is a block diagram which shows the detailed structure of the coherence difference calculation
part in 1st Embodiment.
It is a block diagram which shows the detailed structure of the WF adaptation part in 1st
Embodiment.
It is a flowchart which shows operation ¦ movement of the coherence difference calculation part
in 1st Embodiment.
It is a flowchart which shows operation ¦ movement of the WF adaptation part in 1st
Embodiment.
03-05-2019
9
It is a block diagram which shows the detailed structure of the WF adaptation part in 2nd
Embodiment.
It is a flowchart which shows operation ¦ movement of the coefficient adaptive control part in the
WF adaptive part in 2nd Embodiment.
It is a block diagram which shows the structure of the audio ¦ voice signal processing apparatus
which concerns on 3rd Embodiment. It is a block diagram which shows the structure of the audio
¦ voice signal processing apparatus concerning 4th Embodiment. It is explanatory drawing which
shows the property of the directional signal from the 3rd directivity formation part in 4th
Embodiment. It is a processing flowchart of a voice switch. It is a processing flowchart of a
Wiener filter. It is a block diagram of the conventional audio ¦ voice signal processing apparatus
which used together the voice switch and the Wiener filter in the case of using coherence for the
object audio ¦ voice detection function. It is explanatory drawing which shows the property of the
directivity signal from the directivity formation part of FIG. It is explanatory drawing which
shows the characteristic of the directivity by two directivity formation parts of FIG.
[0029]
(A) First Embodiment Hereinafter, a first embodiment of an audio signal processing apparatus,
method and program according to the present invention will be described with reference to the
drawings. In the first embodiment, without activating plural kinds of voice section detection and
without increasing the burden on the device user, the section optimum for the voice switch and
the Wiener filter is only based on the behavior peculiar to the coherence. I tried to detect it.
[0030]
(A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the configuration
of the audio signal processing device according to the first embodiment, and the same reference
numerals as in FIG. 13 described above denote corresponding parts. Is shown. Here, the portion
excluding the pair of microphones m̲1 and m̲2 can be realized as software (audio signal
processing program) executed by the CPU, but can functionally be represented in FIG.
03-05-2019
10
[0031]
In FIG. 1, the audio signal processing apparatus 1 according to the first embodiment includes
microphones m̲1 and m̲2, an FFT unit 10, a first directivity forming unit 11, a second
directivity forming unit 12, and coherence calculation as in the prior art. In addition to the unit
13, the target speech section detection unit 14, the gain control unit 15, the WF adaptation unit
30, the WF coefficient multiplication unit 17, the IFFT unit 18 and the VS gain multiplication unit
19, a coherence difference calculation unit 20 is provided. The WF adaptation unit 30 is
somewhat different in processing from the conventional WF adaptation unit 16.
[0032]
The coherence generally has a large value in the target voice section, and the value in the large
amplitude component and the value in the small amplitude component of the target voice
fluctuate greatly. On the other hand, the non-target voice section has a unique behavior that the
value is generally small and the variation is also small. Furthermore, the value taken by the
coherence is wide even in the non-target voice section where the coherence is small overall, and
in the section where the regularity of the waveform (such as the pitch of the voice) is clear,
correlation easily occurs. Is relatively large, while it is a particularly small value in a section
where the regularity is sparse. It can be said that the section where the regularity is sparse is the
section of only background noise. Therefore, by controlling the Wiener filter coefficient to be
applied only in the section where the coherence is particularly small among the non-targeted
voice sections, the target voice by reflecting the disturbing voice characteristic which is the
problem of the prior art in the Wiener filter coefficients. Can be prevented.
[0033]
In the case of the first embodiment, the coherence difference calculation unit 20 is added based
on such current situation recognition and idea, and the WF adaptation unit 30 to which the
output is inputted is also changed in its function from the conventional one. There is.
[0034]
The coherence difference calculation unit 20 calculates a difference δ between the
instantaneous value COH (t) of the coherence in the non-target voice section and the long-term
average value AVE̲COH of the coherence.
03-05-2019
11
The WF adaptation unit 30 according to the first embodiment detects a section of only
background noise using the coherence instantaneous value COH and the difference δ to perform
an adaptive operation, and the obtained WF̲COEF (f) is multiplied by the WF coefficient
multiplication unit 17 It is given to
[0035]
FIG. 2 is a block diagram showing the detailed configuration of the coherence difference
calculator 20. As shown in FIG. In FIG. 2, the coherence difference calculation unit 20 includes a
coherence reception unit 21, a coherence long-term average calculation unit 22, a coherence
subtraction unit 23, and a coherence difference transmission unit 24.
[0036]
The coherence receiver 21 takes in the coherence COH (t) calculated by the coherence calculator
13 and, from the target speech segment detector 14, the coherence COH of the current
processing target (for example, the processing target is switched in frame units). ) Is to check
whether or not it is an unintended voice section.
[0037]
The coherence long-term average calculation unit 22 updates the coherence long-term average
AVE̲COH (t) according to the equation (8) if the current processing target belongs to the nontarget voice section.
The formula for the coherence long-term average AVE̲COH (t) is not limited to the formula (8),
and another formula such as simple averaging of a predetermined number of sample values may
be applied.
[0038]
AVE̲COH (t) = β × COH (t) + (1−β) × AVE̲COH (t−1) where 0.0 <β <1.0 (8) The coherence
subtraction unit 23 uses equation (9) to As shown, the difference δ between the coherence longterm average AVE̲COH (t) and the coherence COH (t) is calculated.
03-05-2019
12
[0039]
δ = AVE̲COH (t) −COH (t) (9) The coherence difference transmission unit 24 gives the obtained
difference δ to the WF adaptation unit 39.
[0040]
FIG. 3 is a block diagram showing a detailed configuration of the WF adaptation unit 30 in the
first embodiment.
In FIG. 3, the WF adaptation unit 30 includes a coherence difference reception unit 31, a
background noise section determination unit 32, a WF coefficient adaptation unit 33, and a WF
coefficient transmission unit.
[0041]
The coherence difference receiving unit 31 takes in the coherence COH (t) and the coherence
difference δ.
[0042]
The background noise section determination unit 32 determines whether or not it is a
background noise section.
The determination condition by the background noise section determination unit 32 is
coherence COH (t) is smaller than the target speech determination threshold Θ and the
coherence difference δ is smaller than the difference determination threshold <(Φ <0.0) . If the
judgment condition is satisfied, it is judged as a background noise section.
[0043]
The WF coefficient adaptation unit 33 executes the adaptation operation of the Wiener filter
coefficient if the determination result of the background noise period determination unit 32 is
the background noise period, and does not adapt otherwise.
03-05-2019
13
[0044]
The WF coefficient transmission unit 34 supplies the Wiener filter coefficient obtained by the WF
coefficient adaptation unit 33 to the WF coefficient multiplication unit 17.
[0045]
(A-2) Operation of First Embodiment Next, the entire operation, the detailed operation in the
coherence difference calculation unit 20, the WF adaptation, and the operation of the audio
signal processing device 1 of the first embodiment will be described with reference to the
drawings. The detailed operation in the unit 16 will be described in order.
[0046]
The signals input from the pair of microphones m̲1 and m̲2 are converted from the time
domain into the frequency domain signals X1 (f) and X2 (f) by the FFT unit 10, and then the first
and second directivity forming units 11 are generated. And 12 generate directional signals B1 (f)
and B2 (f) having a blind spot in a predetermined direction.
Then, in the coherence calculation unit 13, the directional signals B1 (f) and B2 (f) are applied,
the calculations of the equations (4) and (5) are executed, and the coherence COH is calculated.
[0047]
Then, the target voice section detection unit 14 determines whether it is a target voice section or
not, and based on the determination result, the gain control section 15 sets the gain VS̲GAIN.
[0048]
In the coherence difference calculation unit 20, the difference δ between the instantaneous
value COH (t) of the coherence in the non-target voice section and the long-term average value
AVE̲COH of the coherence is calculated.
Then, the coherence COH and the difference δ are used in the WF adaptation unit 30 to detect a
03-05-2019
14
section of only background noise, the adaptation operation of the Wiener filter coefficient is
executed, and in the WF coefficient multiplication unit 17, the input signal X1 in the frequency
domain The signal P (f) after multiplication by the Wiener filter coefficient WF̲COEF (f) obtained
in (f), in other words, the signal P (f) whose background noise is suppressed by the Wiener filter
technique, is applied to the IFFT unit 18 In time domain signal q (t).
The VS gain multiplication unit 19 multiplies this signal q (t) by the gain VS̲GAIN set by the gain
control unit 15 to obtain an output signal y (t).
[0049]
Next, the operation of the coherence difference calculator 20 will be described.
FIG. 4 is a flowchart showing the operation of the coherence difference calculator 20.
[0050]
The coherence receiver 21 takes in the coherence COH (t) and checks with the target speech
segment detector 14 whether the processing target is an unintended speech segment (step
S200).
If it is a non-target voice section, the coherence long-term average calculating unit 22 updates
the coherence long-term average AVE̲COH (t) according to the equation (8) (step S201). Further,
the coherence subtraction unit 23 calculates the difference δ between the coherence long-term
average AVE̲COH (t) and the coherence COH (t) as shown in equation (9) (step S202). The
obtained coherence difference δ is given from the coherence difference transmission unit 24 to
the WF adaptation unit 30. Such processing is executed while sequentially updating the
processing target (step S203).
[0051]
Next, the operation of the WF adaptation unit 30 will be described. FIG. 5 is a flowchart showing
03-05-2019
15
the operation of the WF adaptation unit 30.
[0052]
When the coherence difference reception unit 31 takes in the coherence COH and the coherence
difference δ (step S250), the background noise section determination unit 32 determines that
COH is smaller than the target speech determination threshold Θ and the coherence difference
δ is a difference determination threshold. It is determined whether or not よ り 小 さ い (<0.0),
that is, whether or not it is a background noise period (step S251). In the WF coefficient
adaptation unit 33, the adaptation operation of the Wiener filter coefficient is executed in the
background noise section (step S252), and the adaptation operation is not executed otherwise
(step S253). Then, the Wiener filter coefficient WF̲COEF thus obtained is given from the WF
coefficient transmission unit 34 to the WF coefficient multiplication unit 17 (S254).
[0053]
(A-3) Effects of the First Embodiment As described above, according to the first embodiment, the
disturbing voice and the background noise are based on the behavior that coherence is
particularly small in the section of only the background noise . A section of only background
noise is detected from mixed non-target voice sections and used to calculate Wiener filter
coefficients. This makes it possible to detect a signal section suitable for each of the voice switch
and the Wiener filter with only a single parameter (coherence) and apply the voice switch and
the Wiener filter. As a result, it is possible to prevent the occurrence of distortion of the target
voice due to the characteristic of the disturbing voice being reflected to the Wiener filter
coefficient, which is the conventional problem, and to obtain the optimum segment without
introducing a plurality of voice segment detection techniques. Since detection is possible, it is
possible to prevent an increase in the amount of computation, and it is not necessary to adjust a
plurality of parameters of different characteristics, so it is possible to prevent an increase in the
burden on the apparatus user.
[0054]
As a result, it is possible to expect improvement in call sound quality in a communication
apparatus such as a television conference apparatus or a cellular phone, to which the audio
signal processing apparatus, method or audio signal processing program of the first embodiment
is applied.
03-05-2019
16
[0055]
(B) Second Embodiment Next, a second embodiment of the audio signal processing device,
method and program according to the present invention will be described with reference to the
drawings.
[0056]
In the first embodiment, since only the background noise is detected from the unintended speech
section to estimate the Wiener filter coefficient, accurate coefficient estimation is possible, but
the frequency of coefficient estimation processing is reduced. Because the time until sufficient
noise suppression performance is obtained is long, the device user may be exposed to
inappropriate sound quality.
[0057]
In the second embodiment, the first embodiment is implemented by providing a coefficient
adaptive speed control unit in the WF adaptation unit configured to accelerate the filter
coefficient estimation speed immediately after the start of adaptation and thereafter reduce the
estimated speed. It is intended to eliminate the fear that may occur in the form.
[0058]
The audio signal processing apparatus according to the second embodiment differs from the
audio signal processing apparatus 1 according to the first embodiment in the detailed
configuration and operation of the WF adaptation unit, and the others are the first embodiment.
Is the same as
So, below, only WF adaptation part 30A in a 2nd embodiment is explained.
[0059]
FIG. 6 is a block diagram showing a detailed configuration of the WF adaptation unit 30A in the
second embodiment.
03-05-2019
17
In FIG. 6, the WF adaptation unit 30A includes a coherence difference reception unit 31, a
background noise section determination unit 32, a WF coefficient adaptation unit 33A, a WF
coefficient transmission unit 34, and a coefficient adaptation speed control unit 35.
The coherence difference reception unit 31, the background noise section determination unit 32,
and the WF coefficient transmission unit 34 are the same as those in the first embodiment, and
thus the description thereof will be omitted.
[0060]
The coefficient adaptive speed control unit 35 counts the number of times the background noise
is determined, and sets a value of a parameter λ for controlling the adaptation speed of the
Wiener filter coefficient according to whether the number is smaller than a predetermined
threshold. is there.
[0061]
When the determination result of the background noise section determination section 32 is a
section other than the background noise, the WF coefficient adaptation section 33A performs an
adaptation operation to the Wiener filter coefficient as in the first embodiment, and the
background noise section determination section 32A. When the determination result of is a
background noise section, coefficient estimation is performed using the parameter λ received
from the coefficient adaptive speed control unit 35 for coefficient estimation calculation.
[0062]
Here, the role of the parameter λ will be briefly described.
The Wiener filter coefficient is obtained by an operation such as Equation 3 of Patent Document
1.
Prior to this, background noise characteristics must be calculated for each frequency.
The estimation of the background noise is performed by Equation 1 of Patent Document 1, and
the parameter λ is involved here. The parameter λ has a value of 0.0 to 1.0, and has a role of
03-05-2019
18
controlling how much the instantaneous input value is reflected on the background noise
characteristics. The larger the λ, the stronger the effect of the instantaneous input, and the
smaller it is If this happens, the influence of the instantaneous input will diminish. Therefore, if
the parameter λ is large, the momentary input is strongly reflected in the Wiener filter
coefficient, and high-speed coefficient adaptation can be realized, but the influence of the
momentary input becomes strong and the fluctuation of the coefficient value becomes large and
the sound quality is natural Can be reduced. On the other hand, when the parameter λ is small,
although the adaptation speed is slow, the obtained coefficient is not strongly influenced by the
instantaneous characteristics and the noise characteristics of the past are reflected on average, so
the natural quality of the sound quality is Hard to lose.
[0063]
Since the parameter λ has the above characteristics, high speed erase performance can be
realized by increasing the parameter λ immediately after the start of adaptation. Also, after a
certain amount of time has elapsed, the parameter λ can be reduced to realize natural sound
quality.
[0064]
The above is the outline of the operation of the WF adaptation unit 30A in the second
embodiment.
[0065]
Next, the operation of the coefficient adaptive control unit 35 will be described.
FIG. 7 is a flowchart showing the operation of the coefficient adaptive control unit 35.
[0066]
First, the coefficient adaptive control unit 35 determines whether it is a background noise period
based on the determination result of the background noise period determination unit 32 (step
S300). Then, if it is a background noise section, the variable counter for knowing whether it is
03-05-2019
19
immediately after the start of adaptation is incremented by one (step S301), and otherwise, no
processing is added to the variable counter. Thereafter, the initial adaptation time determination
threshold T (an integer of T> 0) is compared with the variable counter to determine whether it is
immediately after the start of adaptation, and if the variable counter is smaller than the threshold
T, it is considered immediately after the start of adaptation. If it is above, it is determined that it
is not immediately after the start of adaptation (step S302). Then, immediately after the start of
adaptation, a large value is set to the parameter λ in order to speed up the coefficient estimation
(step S303), and if not immediately after the start of adaptation, a small value is set to the
parameter λ in order to slow the coefficient estimation speed. It sets (step S304).
[0067]
According to the second embodiment, since the adaptation speed of the Wiener filter coefficient
can be increased immediately after the start of the adaptation, the noise suppression
performance faster than the first embodiment can be realized. Also, after a certain amount of
time has passed, the coefficient adaptation speed is controlled to be slowed, so over adaptation to
instantaneous noise can be prevented and natural sound quality can be realized.
[0068]
As a result, it is possible to expect improvement in call sound quality in a communication
apparatus such as a television conference apparatus or a portable telephone to which the audio
signal processing apparatus, method or audio signal processing program of the second
embodiment is applied.
[0069]
(C) Third Embodiment Next, a third embodiment of the audio signal processing device, method
and program according to the present invention will be described with reference to the drawings.
The audio signal processing device 1B according to the third embodiment is obtained by
introducing a known coherence filter configuration into the configuration of the first
embodiment.
03-05-2019
20
[0070]
The coherence filter is a process of multiplying the obtained coherence coef (f) by the input
signal X1 (f), and has a function of suppressing a component having a left / right bias in the
arrival direction.
[0071]
FIG. 8 is a block diagram showing the configuration of an audio signal processing apparatus 1B
according to the third embodiment, and the same or corresponding parts as in FIG. 1 according
to the first embodiment are given the same or corresponding reference numerals. It shows.
[0072]
In FIG. 8, an audio signal processing apparatus 1B according to the third embodiment includes a
coherence filter coefficient multiplying unit 40 in addition to the configuration of the first
embodiment, and the processing of the WF coefficient multiplying unit 17B is also slightly
changed It is done.
[0073]
The coherence filter coefficient multiplication unit 40 receives the coherence coef (f) from the
coherence calculation unit 13 and also receives one input signal X1 (f) converted to the
frequency domain from the FFT unit 10. As shown in equation (10), the coherence filter
coefficient multiplication unit 40 multiplies them to obtain a coherence filter processing signal
R0 (f).
[0074]
R0 (f) = X1 (f) × coef (f) (10) The WF coefficient multiplication unit 17B according to the third
embodiment generates the coherence filter processing signal R0 (f) as shown in equation (11).
The Wiener filter coefficient WF̲COEF (f) from the WF adaptation unit 30 is multiplied to obtain
a Wiener-filtered signal P (f).
[0075]
P (f) = R0 (f) × WF̲COEF (f) (11) The subsequent processing of the IFFT unit 18 and the VS gain
multiplication unit 19 is the same as that of the first embodiment.
[0076]
According to the third embodiment, by adding the coherence filter function, it is possible to
03-05-2019
21
obtain a higher noise suppression effect than operating the first embodiment alone.
[0077]
(D) Fourth Embodiment Next, a fourth embodiment of the audio signal processing device, method
and program according to the present invention will be described with reference to the drawings.
The audio signal processing apparatus 1C according to the fourth embodiment is the
configuration of the first embodiment in which the configuration of the known frequency
subtraction technique is introduced.
[0078]
The frequency subtraction technique is a signal processing technique for obtaining a noise
reduction effect by subtracting a noise signal from an input signal.
[0079]
FIG. 9 is a block diagram showing the configuration of an audio signal processing apparatus 1C
according to the fourth embodiment, where the same or corresponding parts as in FIG. 1
according to the first embodiment are given the same or corresponding reference numerals. It
shows.
[0080]
In FIG. 9, an audio signal processing apparatus 1C according to the fourth embodiment includes
a frequency subtracting unit 50 in addition to the configuration of the first embodiment, and the
processing of the WF coefficient multiplying unit 17C is also somewhat changed. There is.
The frequency subtracting unit 50 includes a third directivity forming unit 51 and a subtracting
unit 52.
[0081]
03-05-2019
22
The third directivity forming unit 51 receives the two input signals X1 (f) and X2 (f) converted
from the FFT unit 10 into the frequency domain.
The third directivity forming unit 51 forms a third directivity signal B3 (f) according to the
directivity characteristic having a dead angle in the front as shown in FIG. 10, and generates the
directivity signal B3 (f). It is given as a noise signal to the subtraction unit 52 as a subtraction
input.
The subtractor 52 receives one input signal X1 (f) converted to the frequency domain as a
subtractive input, and the subtractor 52 receives the input signal X1 (f) as shown in equation
(12). And the third directivity signal B3 (f) is subtracted to obtain a frequency subtraction
processed signal R1 (f).
[0082]
R1 (f) = X1 (f) -B3 (f) (12) The WF coefficient multiplication unit 17C of the fourth embodiment
generates the frequency subtraction processing signal R1 (f) as shown in equation (13). The
Wiener filter coefficient WF̲COEF (f) from the WF adaptation unit 30 is multiplied to obtain a
Wiener-filtered signal P (f).
[0083]
P (f) = R1 (f) × WF̲COEF (f) (13) The subsequent processing of the IFFT unit 18 and the VS gain
multiplication unit 19 is the same as that of the first embodiment.
[0084]
According to the fourth embodiment, by adding the frequency subtraction function, it is possible
to obtain a higher noise suppression effect than operating the first embodiment alone.
[0085]
(E) Other Embodiments The present invention is not limited to the above embodiments, and may
include modified embodiments as exemplified below.
[0086]
03-05-2019
23
(E-1) As is clear from the description of each of the above embodiments, in each of the above
embodiments, two noise suppression techniques of a voice switch and a Wiener filter are used,
but only background noise is used based on the behavior of coherence. The feature is in the
configuration and processing of extracting the section.
This feature is particularly a function that contributes to the performance improvement of the
Wiener filter.
Therefore, the present invention can be applied to an audio signal processing apparatus or
program having only a Wiener filter as a noise suppression technology.
As a configuration of the audio signal processing apparatus having only a Wiener filter as the
noise suppression technology, for example, one obtained by excluding the gain control unit 15
and the VS gain multiplication unit 19 from the configuration of FIG. 1 can be mentioned.
[0087]
(E-2) In each of the above-described embodiments, based on the difference δ from the long-term
average value AVE̲COH of the coherence, the interval of only the background noise in the
determined non-target voice interval is the coherence's instantaneous value COH (t) Although the
one to be detected is shown, a section of only background noise may be detected by the
magnitude of dispersion (or standard deviation) of coherence.
Since the dispersion of the coherence represents the degree of variation from the average value
of the instantaneous value COH (t) of the latest predetermined number of coherences, it is a
parameter representing the behavior of the coherence similar to the coherence difference.
[0088]
(E-3) In the third embodiment, a known coherence filter configuration is added to the first
embodiment. In the fourth embodiment, a known frequency subtraction configuration is added to
the first embodiment. However, both the coherence filter configuration and the frequency
subtraction configuration may be added to the first embodiment.
03-05-2019
24
[0089]
Also, based on the configuration of the second embodiment, at least one of the coherence filter
configuration and the frequency subtraction configuration may be added.
[0090]
(E-4) In the second embodiment, the adaptation speed is switched in two steps according to the
value of the parameter λ. However, by setting a plurality of threshold values, adaptation is
performed according to the value of the parameter λ. The speed may be switched in three or
more stages.
[0091]
(E-5) In each of the above embodiments, although there is the target voice section detection unit,
the WF adaptation section shows again determining whether or not it is the target voice section
based on coherence. It is also possible not to execute the determination as to whether or not the
WF adaptation unit is the target speech section using the detection result of the target speech
section detection unit.
In the case where the "target voice section detection unit" in the claims also determines whether
the WF adaptation section is the target voice section based on coherence, the WF adaptation
section corresponds and the WF adaptation section is external In the case where the detection
result of the target speech segment detection unit is used, the external target speech segment
detection unit corresponds.
[0092]
(E-6) In each said embodiment, after giving a Wiener-filter process, what applied a voice switch
process was shown, However, This processing order may be reverse.
[0093]
(E-7) In each of the above embodiments, processing that has been processed with a signal in the
frequency domain may be processed with a signal in the time domain, if possible. Conversely,
processing is performed with a signal in the time domain The processing may be processed with
signals in the frequency domain, if possible.
03-05-2019
25
[0094]
(E-8) In each of the above embodiments, the audio signal processing apparatus and program for
immediately processing the signals captured by the pair of microphones are shown, but the audio
signal to be processed in the present invention is not limited to this. .
For example, the present invention can be applied to the case of processing a pair of audio
signals read from a recording medium, and also to the case of processing a pair of audio signals
transmitted from an opposing apparatus. It can apply.
[0095]
DESCRIPTION OF SYMBOLS 1 ... Speech signal processing apparatus, m̲1, m̲2 ... Microphone,
11 ... 1st directivity formation part, 12 ... 2nd directivity formation part, 13 ... Coherence
calculation part, 14 ... Target speech area detection part, 15 ... Gain control 16, 16 WF adaptation
unit 17, 30 WF coefficient multiplication unit 19, VS gain multiplication unit 20, coherence
difference calculation unit 22, coherence long-term average calculation unit 23, coherence
subtraction unit 32, background noise section Determination unit, 33 ... WF coefficient
adaptation unit, 40 ... coherence filter coefficient multiplication unit, 50 ... frequency subtraction
unit, 51 ... third directivity formation unit, 52 ... subtraction unit.
03-05-2019
26
1/--страниц
Пожаловаться на содержимое документа