close

Вход

Забыли?

вход по аккаунту

JP2014075674

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014075674
Abstract: To provide an audio signal processing device capable of improving voice quality by
operating a voice switch appropriately. SOLUTION: A delay subtraction process is performed on
an input voice signal to form first and second directivity signals having a blind spot in first and
second predetermined azimuths, and coherence is performed using these two directivity signals.
Get Then, the coherence is compared with the determination threshold to determine whether the
input voice signal is a section of the target voice coming from the target direction or a non-target
voice section other than that, and the gain is determined according to the determination result.
Set and multiply the input speech signal by the gain to attenuate unintended speech. Here, the
determination threshold is controlled based on the average value of the coherence in the
disturbing voice section. [Selected figure] Figure 1
Audio signal processing apparatus, method and program
[0001]
The present invention relates to an audio signal processing apparatus, method, and program, and
can be applied to, for example, a communication device or communication software that handles
audio signals such as a telephone and a video conference.
[0002]
As noise suppression techniques, there are techniques called voice switches and techniques
called Wiener filters (see Patent Document 1 and Patent Document 2).
[0003]
03-05-2019
1
The voice switch detects a section (target voice section) in which the speaker is speaking from
the input signal using the target voice section detection function, and outputs it without
processing in the case of the target voice section, and in the case of the non-target voice section.
It is a technology that attenuates the amplitude.
For example, as shown in FIG. 12, when the input signal "input" is received, it is determined
whether or not it is the target voice section (step S51), and if it is the target voice section, 1.0 is
set to the gain VS̲GAIN (step S52). If it is an unintended voice section, an arbitrary positive
number α less than 1.0 is set to the gain VS̲GAIN (step S53), and then the gain VS̲GAIN is
multiplied by the input signal input to obtain the output signal output (step S54) .
[0004]
By applying this voice switch technology to a voice communication device such as a television
conference apparatus or a cellular phone, it is possible to suppress a non-target voice section
(noise) and extract a desired target voice. It can be enhanced.
[0005]
By the way, non-purpose voice is divided into "disturbance voice" which is human voice other
than the speaker and "background noise" such as office noise and road noise.
When the non-target voice section is only background noise, it is possible to accurately
determine whether it is the target voice section by the normal target voice section detection
function, but when the disturbing voice is superimposed on the background noise, Because the
target voice section detection function also considers the disturbing voice as the target voice, an
erroneous determination occurs.
As a result, the voice switch can not suppress the disturbing voice, and the voice quality of the
call can not be reached sufficiently.
[0006]
03-05-2019
2
This problem can be improved by changing to the coherence from the fluctuation of the input
signal level used so far as the feature quantity to be referred to in the target voice section
detection unit. Coherence, simply stated, is a feature that means the direction of arrival of an
input signal. Assuming use of a mobile phone etc., the voice of the speaker (target voice) comes
from the front and the disturbing voice tends to come from other than the front, so focusing on
the direction of arrival makes it impossible in the past. It is possible to distinguish between the
target voice and the disturbing voice.
[0007]
FIG. 13 is a block diagram showing the configuration of the voice switch when coherence is used
for the target voice detection function.
[0008]
The input signals s1 (n) and s2 (n) are acquired from the pair of microphones m̲1 and m̲2 via
an AD converter (not shown).
Here, n is an index representing the order of sample input, and is represented by a positive
integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is
the newer input sample.
[0009]
The FFT unit 10 receives input signal series s1 (n) and s2 (n) from the microphones m̲1 and
m̲2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1
and s2. Thereby, input signals s1 and s2 can be expressed in the frequency domain. Note that, in
carrying out the fast Fourier transform, analysis frames FRAME1 (K) and FRAME2 (K) consisting
of predetermined N samples from the input signals s1 (n) and s2 (n) are constructed and applied.
Although the example which comprises analysis frame FRAME1 (K) from input signal s1 (n) is
shown to the following (1) Formula, analysis frame FRAME 2 (K) is the same.
[0010]
03-05-2019
3
Here, K is an index representing the order of frames, and is expressed by a positive integer. In the
text, the smaller K is the older analysis frame, and the larger K is the newer analysis frame. In the
following description of the operation, it is assumed that the index representing the latest
analysis frame to be analyzed is K, unless otherwise specified.
[0011]
The FFT unit 10 performs fast Fourier transform processing for each analysis frame to convert it
into frequency domain signals X1 (f, K) and X2 (f, K), and the obtained frequency domain signals
X1 (f, K). And X2 (f, K) to the corresponding first directivity forming unit 11 and the second
directivity forming unit 12, respectively. Here, f is an index representing a frequency. In addition,
X1 (f, K) is not a single value, but as shown in equation (2), it is composed of spectral
components of frequencies f1 to fm of doublet. The same applies to X2 (f, K) and B1 (f, K) and B2
(f, K) described later.
[0012]
X1 (f, K) = {(f1, K), (f2, K),..., (Fm, K)} (2) In the first directivity forming unit 11, the frequency
domain signal X1 (f, K) And X 2 (f, K) form a signal B 1 (f, K) having strong directivity in a
specific direction, and the second directivity forming unit 12 generates frequency domain signals
X 1 (f, K) and X 2 (f , K) to form a signal B2 (f, K) having strong directivity in a specific direction
(different from the above-mentioned specific direction). The existing method can be applied as a
method of forming the signals B1 (f, K) and B2 (f, K) having strong directivity in a specific
direction. For example, applying the equation (3), directivity is strong in the right direction By
applying B1 (f, K) and the equation (4), B2 (f, K) having strong directivity in the left direction can
be formed. In the equations (3) and (4), the frame index K is omitted because it is not involved in
the operation.
[0013]
The meaning of these equations will be described with reference to FIG. 14 and FIG. 15 taking
the equation (3) as an example. It is assumed that a sound wave arrives from the direction θ
shown in FIG. 14 (A) and is captured by a pair of microphones m̲1 and m̲2 set apart by a
distance l. At this time, a time difference occurs before the sound waves reach the pair of
03-05-2019
4
microphones m̲1 and m̲2. This arrival time difference τ is given by the equation (5), where c is
sound speed, since d = l × sin θ, where d is the path difference of sound.
[0014]
τ = l × sin θ / c (5) The signal s1 (t−τ) obtained by delaying the input signal s1 (n) by τ is
the same signal as the input signal s2 (t). Therefore, the signal y (t) = s2 (t) -s1 (t-.tau.) Taking the
difference between the two is a signal from which the sound arriving from the .theta. Direction
has been removed. As a result, the microphone arrays m̲1 and m̲2 have directional
characteristics as shown in FIG. 14 (B).
[0015]
In the above, although the calculation in the time domain is described, the same can be said even
if it is performed in the frequency domain. The equations in this case are the equations (3) and
(4) described above. Now, as an example, it is assumed that the arrival direction θ is ± 90
degrees. That is, the directivity signal B1 (f) from the first directivity forming unit 11 has strong
directivity in the right direction as shown in FIG. 15 (A), and the directivity signal B1 (f) from the
second directivity forming unit 12 The directivity signal B2 (f) has strong directivity in the left
direction as shown in FIG. 15 (B).
[0016]
Coherence COH is obtained by performing operations such as the equations (6) and (7) in the
coherence calculation unit 13 on the directional signals B1 (f) and B2 (f) obtained as described
above. Be B2 (f) <*> in the equation (6) is a complex conjugate of B2 (f).
[0017]
The target voice section detection unit 14 compares the coherence COH with the target voice
section determination threshold 、 and determines the target voice section if it is larger than the
threshold Θ, otherwise determines it as a non-target voice section, and the determination result
VAD̲RES (K Form).
03-05-2019
5
[0018]
Here, the background for detecting the target voice section based on the magnitude of the
coherence will be briefly described.
The concept of coherence is reworded as the correlation between the signal coming from the
right and the signal coming from the left (Equation (6) described above is an equation for
calculating the correlation for a certain frequency component; Calculate the average of the
component correlation values). Therefore, the case where the coherence COH is small is the case
where the correlation between the two directional signals B1 and B2 is small, and conversely, the
case where the coherence COH is large can be reworded as the case where the correlation is
large. The input signal in the case where the correlation is small is a case where the input arrival
direction is largely deviated to either the right or the left, or a signal with less definite regularity
such as noise even if there is no deviation. Therefore, it can be said that the section where the
coherence COH is small is a disturbing voice section or a background noise section (non-target
voice section). On the other hand, when the value of the coherence COH is large, it can be said
that the input signal comes from the front because there is no deviation in the arrival direction.
Now, it is assumed that the target voice comes from the front, so when the coherence COH is
large, it can be said to be a target voice section.
[0019]
The gain control unit 15 sets 1.0 as the gain VS̲GAIN in the target voice section and an arbitrary
positive value α less than 1.0 as the gain VS̲GAIN in the non-target voice section (disturbing
voice, background noise). The voice switch gain multiplication unit 16 obtains the signal y (n)
after the voice switch by multiplying the input signal s1 (n) by the obtained gain VS̲GAIN.
[0020]
JP, 2006-333215, A Special table 2010-532879
[0021]
By the way, when the arrival direction is closer to the front, the coherence COH has a large value
as a whole, but the coherence COH becomes a smaller value as it is shifted to the side.
03-05-2019
6
FIG. 16 shows a change in coherence COH when the voice arrival direction is closer to the front
(solid line), the voice arrival direction is side (dotted line), and the arrival direction is the middle
between the front and side (dotted line). The vertical axis represents coherence COH, and the
horizontal axis represents time (analysis frame k).
[0022]
As shown in FIG. 16, the coherence COH has a characteristic that the range of values largely
changes according to the direction of arrival. However, conventionally, there is a problem that an
erroneous determination occurs because the target voice section determination threshold Θ is a
fixed value regardless of the arrival direction.
[0023]
For example, when the threshold Θ is large, the target voice section is erroneously determined as
an unintended voice section in a period in which the value of coherence COH does not increase
so much even in the target voice, such as a voice rising section or a consonant part. Ru. As a
result, the target voice component is attenuated by the voice switch processing, resulting in an
unnatural sound quality as if it were interrupted.
[0024]
Also, when a small value is set as the threshold 妨害, when the disturbing sound arrives from an
arrival direction approaching the front, the coherence of the disturbing sound exceeds the
threshold Θ, and the non-target voice section is the target voice section. It will be misjudged that
there is. As a result, the non-target voice component is not attenuated and sufficient cancellation
performance can not be obtained. In addition, if the device user is in an environment where the
direction of arrival of the disturbing speech changes from moment to moment, the frequency of
erroneous determinations increases.
[0025]
As described above, since the determination threshold Θ of the target voice section is a fixed
03-05-2019
7
value, the voice switch processing can not be operated in a desired section, and the voice switch
processing is operated in a section other than the desired section to degrade the sound quality.
There is a problem called.
[0026]
Therefore, there is a need for an audio signal processing apparatus, method, and program that
can appropriately operate a voice switch to improve sound quality.
[0027]
According to a first aspect of the present invention, in an audio signal processing apparatus for
suppressing noise components from an input audio signal, (1) a directivity characteristic having a
blind spot in a first predetermined direction by performing delay subtraction processing on the
input audio signal In a second predetermined orientation different from the first predetermined
orientation, a first directivity forming unit for forming the applied first directivity signal and (2)
delay subtraction processing on the input voice signal A second directivity forming unit for
forming a second directivity signal to which a directivity characteristic having a dead angle is
imparted; (3) a coherence calculation unit for obtaining coherence using the first and second
directivity signals (4) A target voice section to determine whether the input voice signal is a
section of the target voice coming from the target direction or a non-target voice section other
than that by comparing the coherence with the first determination threshold And (5) based on
the above-mentioned coherence. And detecting the disturbing voice section in the non-target
voice section including both the disturbing voice section and the background noise section, and
obtaining a disturbing voice coherence average value which is a coherence average value in the
disturbing voice section; A target voice section determination threshold control section that
controls the first determination threshold based on a value; (6) a gain control section that sets a
voice switch gain according to the determination result of the target voice section detection
section; (7) A voice switch gain multiplication unit for multiplying an input voice signal by the
voice switch gain obtained by the gain control unit.
[0028]
According to a second aspect of the present invention, in the audio signal processing method for
suppressing noise components from an input audio signal, (1) the first directivity forming unit
performs a delay subtraction process on the input audio signal to obtain a first predetermined
The first directivity signal having directivity characteristics with a dead angle in the azimuth is
formed, and (2) the second directivity forming unit performs the delay subtraction process on the
input voice signal to obtain the first predetermined signal. A second directivity signal having
directivity characteristics having a blind spot in a second predetermined orientation different
from the orientation is formed, and (3) the coherence calculation unit uses the first and second
03-05-2019
8
directivity signals. The coherence is calculated, and (4) the target speech segment detection unit
compares the coherence with the first determination threshold, and the input speech signal is a
segment of the target speech that has arrived from the target direction or not. Determine
whether it is a non-target voice section, (5) target voice section determination threshold The
control unit detects the disturbing voice section in the non-target voice section including both the
disturbing voice section and the background noise section based on the coherence, and the
disturbing voice coherence average value is a coherence average value in the disturbing voice
section. , And controls the first determination threshold based on the disturbing voice coherence
average value, and (6) the gain control unit sets the voice switch gain according to the
determination result of the target voice section detecting unit. (7) The voice switch gain
multiplication unit is characterized in that the input voice signal is multiplied by the voice switch
gain obtained by the gain control unit.
[0029]
A voice signal processing program according to a third aspect of the present invention
comprises: (1) a first directivity in which a directivity characteristic having a blind spot in a first
predetermined direction is imparted by performing delay subtraction processing on an input
voice signal And (2) performing a delay subtraction process on the input voice signal to form a
directivity characteristic having a blind spot in a second predetermined direction different from
the first predetermined direction. A second directivity forming unit for forming a second
directivity signal applied, (3) a coherence calculation unit for obtaining coherence using the first
and second directivity signals, (4) the coherence and A target voice section detection unit that
determines whether the input voice signal is a section of a target voice coming from a target
direction or a non-target voice section other than that by comparing with the first determination
threshold; (5) Based on the above-mentioned coherence, The disturbing voice section in the nontarget voice section including both of the background noise section is detected, and a disturbing
voice coherence average value, which is a coherence average value in the disturbing voice
section, is obtained. A target voice section determination threshold control section for controlling
a first determination threshold, (6) a gain control section for setting a voice switch gain
according to the determination result of the target voice section detection section, and (7) the
gain control It is characterized in that it functions as a voice switch gain multiplication unit which
multiplies an input voice signal by the voice switch gain obtained by the unit.
[0030]
According to the present invention, since the determination threshold value applied to determine
whether or not the target voice section is controlled, the voice switch can be appropriately
operated to improve the sound quality.
03-05-2019
9
[0031]
It is a block diagram showing composition of an audio signal processing device concerning a 1st
embodiment.
It is a block diagram which shows the detailed structure of the target audio ¦ voice area
determination threshold value control part in the audio ¦ voice signal processing apparatus of 1st
Embodiment.
It is explanatory drawing of the memory content of the memory ¦ storage part in the target
speech area determination threshold value control part in the audio ¦ voice signal processing
apparatus of 1st Embodiment.
It is a flowchart which shows operation ¦ movement of the target speech area determination
threshold value control part in the audio ¦ voice signal processing apparatus of 1st Embodiment.
It is a flowchart which shows operation ¦ movement of the target speech area determination
threshold value control part in the audio ¦ voice signal processing apparatus of 2nd Embodiment.
It is a block diagram which shows the detailed structure of the target audio ¦ voice area
determination threshold value control part in the audio ¦ voice signal processing apparatus of
3rd Embodiment.
It is a flowchart which shows operation ¦ movement of the target voice area determination
threshold value control part in the audio ¦ voice signal processing apparatus of 3rd Embodiment.
It is a block diagram which shows the structure of the modification embodiment which used
frequency subtraction and 1st Embodiment together. It is explanatory drawing which shows the
property of the directivity signal from the 3rd directivity formation part of FIG. It is a block
diagram which shows the structure of the modification embodiment which used the coherence
filter and 1st Embodiment together. It is a block diagram which shows the structure of the
modification embodiment which used the Wiener filter and 1st Embodiment together. It is a
flowchart which shows the flow of a voice switch process. It is a block diagram which shows the
structure of the voice switch in the case of using coherence for the object audio ¦ voice detection
03-05-2019
10
function. It is explanatory drawing which shows the property of the directivity signal from the
directivity formation part of FIG. It is explanatory drawing which shows the characteristic of the
directivity by two directivity formation parts of FIG. It is explanatory drawing which shows that
the change of coherence changes with arrival directions of speech.
[0032]
(A) First Embodiment Hereinafter, a first embodiment of an audio signal processing apparatus,
method and program according to the present invention will be described with reference to the
drawings. The first embodiment is configured to be able to set an appropriate determination
threshold 適 切 of a target voice section according to the arrival direction of the disturbing voice
based on the coherence COH.
[0033]
(A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the configuration
of the audio signal processing device according to the first embodiment, and the same reference
numerals as in FIG. 13 described above denote corresponding parts. Is shown. Here, the portion
excluding the pair of microphones m̲1 and m̲2 can be realized as software (audio signal
processing program) executed by the CPU, but can functionally be represented in FIG.
[0034]
In FIG. 1, the audio signal processing apparatus 1 according to the first embodiment includes the
microphones m̲1 and m̲2, the FFT unit 10, the first directivity forming unit 11, the second
directivity forming unit 12, and the coherence as in the prior art. In addition to the calculation
unit 13, the target voice interval detection unit 14, the gain control unit 15, and the voice switch
gain multiplication unit 16, the target voice interval determination threshold control unit 20 is
provided.
[0035]
Here, the microphones m̲1 and m̲2, the FFT unit 10, the first directivity formation unit 11, the
second directivity formation unit 12, the coherence calculation unit 13, the gain control unit 15,
and the voice switch gain multiplication unit 16 are the same as in the prior art. Functions are
omitted, so the explanation of the functions is omitted.
03-05-2019
11
[0036]
The target voice section determination threshold value control unit 20 uses the target voice
section determination threshold Θ (K) according to the arrival direction at that time based on the
coherence COH (K) calculated by the coherence calculation section 13. It is set to.
[0037]
The target voice section detection unit 14 of the first embodiment compares the coherence COH
(K) with the target voice section determination threshold value Θ (K) that is variably controlled
and is set as long as it is larger than the threshold Θ (K) A voice section is determined, and if not,
a non-target voice section is determined to form a determination result VAD̲RES (K).
[0038]
FIG. 2 is a block diagram showing the detailed configuration of the target voice section
determination threshold control unit 20. As shown in FIG.
[0039]
The target voice section determination threshold control section 20 includes a coherence
receiving section 21, a non-target voice section detecting section 22, a non-target voice
coherence average processing section 23, a difference calculating section 24, a disturbing voice
section detecting section 25, and a disturbing voice coherence average processing section. 26
includes a target voice section determination threshold comparing section 27, a storage section
28, and a target voice section determination threshold transmitting section 29.
[0040]
The coherence receiver 21 takes in the coherence COH (K) calculated by the coherence calculator
13.
[0041]
The non-target voice section detection unit 22 roughly determines whether the section related to
the coherence COH (K) is a non-target voice section.
This coarse determination compares the coherence COH (K) with the fixed threshold Ψ, and
03-05-2019
12
determines that it is an unintended speech segment when the coherence COH (K) is smaller than
the fixed threshold Ψ.
The determination threshold value 異 な る is a value different from the target voice
determination threshold value 時 々 controlled every moment used by the target voice section
detection unit 14 and it is sufficient as long as the non-target voice section can be roughly
detected. It does not have to be accurate, it applies fixed values.
[0042]
If the result of the rough determination indicates the target speech section, the non-target speech
coherence averaging processing unit 23 sets the average value AVE̲COH (K) of the coherence in
the non-target speech section as the value AVE̲COH (K−1) in the immediately preceding
analysis frame K−1 Is applied as it is, while in the case of an unintended speech section, the
average value AVE̲COH (K) of the coherence in the unintended speech section is determined
according to the equation (8).
The formula for calculating the coherence average value AVE̲COH (K) is not limited to the
formula (8), and another formula such as simple averaging of a predetermined number of sample
values may be applied.
In the equation (8), δ is a value within the range of 0.0 <δ <1.0.
[0043]
AVE̲COH (K) = δ × COH (K) + (1−δ) × AVE̲COH (K−1) (8) Equation (8) is the average value
for the current frame interval (counted from the start of operation Calculating the weighted
addition of the coherence COH (K) and the average value AVE̲COH (K-1) obtained in the
previous frame interval to the input speech of It is possible to adjust the degree of contribution of
the instantaneous value of COH (K) to the average value.
Assuming that δ is set to a small value close to 0, the degree of contribution of the
instantaneous value to the average value decreases, so that the fluctuation due to the
03-05-2019
13
instantaneous value can be suppressed.
Further, if δ is a value close to 1, the degree of contribution of the instantaneous value is
increased, so that the effect of the average value can be weakened.
In accordance with such a viewpoint, δ may be appropriately selected.
[0044]
The difference calculating unit 24 calculates the absolute value DIFF (K) of the difference
between the instantaneous value COH (K) of the coherence and the average value AVE̲COH (K)
as shown in the equation (9).
[0045]
DIFF (K) = ¦ COH (K) -AVE̲COH (K) ¦ (9) The disturbance voice interval detector 25 compares the
value DIFF (K) with the disturbance voice interval determination threshold Φ, and the value DIFF
(K) If it is equal to or higher than the disturbing voice section determination threshold Φ, it is
determined as a disturbing voice section, and otherwise it is determined as a section other than
the disturbing voice section (background noise section).
This determination method uses the property that in the disturbing voice section, the value
(instantaneous value) of the coherence is larger than the background noise section, so that the
difference from the average value also becomes large.
[0046]
If the determination result is not a disturbing voice section, the disturbing voice coherence
average processing unit 26 sets the value DIST̲COH (K-1) in the immediately preceding analysis
frame K-1 as the average value DIST̲COH (K) of the coherence in the disturbing voice section.
On the other hand, in the case of the disturbing voice section, the average value DIST̲COH (K) of
the coherence in the disturbing voice section is determined according to the equation (10) similar
to the equation (8). The calculation formula of the coherence average value DIST̲COH (K) is not
limited to the formula (10), and another calculation formula such as simple averaging of a
03-05-2019
14
predetermined number of sample values may be applied. In the equation (10), ζ is a value within
the range of 0.0 <ζ <1.0.
[0047]
DIST̲COH (K) = ζ × COH (K) + (1−ζ) × DIST̲COH (K−1) (10) The storage unit 28
determines the range of the average value DIST̲COH of coherence in the disturbing voice section
and the target voice determination threshold It stores the correspondence information with the
eyebrows. For example, as shown in FIG. 3, the storage unit 28 can be configured in a conversion
table format. In the example of FIG. 3, when the average value DIST̲COH of coherence in the
disturbed voice section is in the range A <DIST̲COH ≦ B, the value Θ1 corresponds as the
target voice determination threshold Θ, and the average value DIST̲COH of coherence in the
disturbed voice section is the range B < The value Θ2 corresponds as the target voice
determination threshold と き に when AVE̲COH ≦ C, and the value Θ3 corresponds as the
target voice determination threshold Θ when the average value DIST̲COH of coherence in the
disturbing voice section is in the range C <DIST̲COH ≦ D. It specifies. Here, there is a relation of
Θ1 <Θ2 <Θ3.
[0048]
The target voice section determination threshold value matching unit 27 searches the range of
the average value DIST̲COH in the storage unit 28 to which the average value DIST̲COH (K)
obtained by the disturbance voice coherence average processing unit 22 belongs, and the range
of the searched average value DIST̲COH The value of the target voice determination threshold
value Θ associated with is extracted.
[0049]
The target voice section determination threshold transmission unit 29 detects a target voice
section as a target voice determination threshold Θ (K) to which the value of the target voice
determination threshold 取 り 出 し extracted by the target voice section determination threshold
comparison unit 28 is applied in the current analysis frame K. It is sent to the section 14.
[0050]
(A-2) Operation of First Embodiment Next, the operation of the audio signal processing apparatus
1 according to the first embodiment will be described in detail with reference to the drawings,
the general operation, and the detailed operation in the target speech section determination
threshold control unit 20 The order will be described.
03-05-2019
15
[0051]
Signals s1 (n) and s2 (n) input from the pair of microphones m̲1 and m̲2 are converted from
time domain to frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 10, respectively.
After that, directional signals B1 (f, K) and B2 (f, K) having a dead angle in a predetermined
direction are generated by the first and second directivity forming units 11 and 12, respectively.
Then, the coherence calculation unit 13 applies the directional signals B1 (f, K) and B2 (f, K) to
execute the operations of the equations (6) and (7), and the coherence COH (K) It is calculated.
[0052]
Based on the coherence COH (K), the target voice section determination threshold control unit 20
determines a determination threshold Θ (K) of the target voice section according to the arrival
direction of the non-target voice (particularly the disturbing voice) at that time. It is supplied to
the target speech segment detection unit 14.
Then, the target voice section detection unit 14 determines whether the target voice section is
the target voice section or not by comparing the coherence COH (K) with the determination
threshold 判定 (K) of the target voice section, and receives the determination result VAD̲RES (K).
Thus, the gain control unit 15 sets the gain VS̲GAIN.
The voice switch gain multiplication unit 16 multiplies the input signal s1 (n) by the gain
VS̲GAIN set by the gain control unit 15 to obtain an output signal y (n).
[0053]
Next, the operation of the target voice section determination threshold control unit 20 will be
described. FIG. 4 is a flowchart showing the operation of the target voice section determination
threshold control unit 20.
03-05-2019
16
[0054]
The coherence receiving unit 21 acquires the coherence COH (K) calculated by the coherence
calculating unit 13 and input to the target voice section determination threshold control unit 20
(step S101). The acquired coherence COH (K) is compared with the fixed threshold Ψ in the nontarget voice coherence averaging processing unit 22 to determine whether it is a non-target voice
section (step S102). If the determination result is the target voice section (if COH (K) Ψ に よ っ
て), the non-target voice coherence averaging processing unit 22 sets the average value
AVE̲COH (K) of coherence in the non-target voice section to the immediately preceding analysis
frame K The average value AVE̲COH (K-1) at -1 is applied as it is (step S103). On the other hand,
if it is an unintended voice section (if COH (K) <Ψ), the average value AVE̲COH (K) of the
coherence in the unintended voice section is calculated according to the above-mentioned
equation (8) (step S104) .
[0055]
Subsequently, the difference calculating unit 24 calculates the absolute value DIFF (K) of the
difference between the instantaneous value COH (K) of the coherence and the average value
AVE̲COH (K) according to the equation (9) (step S105). Then, the value DIFF (K) obtained by the
calculation is compared with the disturbing voice section determination threshold に お い て in
the disturbing voice section detection unit 25, and if the value DIFF (K) is greater than the
disturbing voice section determination threshold 区間Otherwise, it is determined to be an interval
other than the disturbing voice interval (background noise interval) (step S106). If this
determination result is not a disturbed speech section, the disturbed speech coherence averaging
processing unit 26 sets the average value DIST̲COH (K) of the coherence in the disturbed speech
section to the value DIST̲COH (K−1) in the immediately preceding analysis frame K−1. Is
applied as it is (step S108). On the other hand, if it is a disturbing voice section, the average value
DIST̲COH (K) of the coherence in the disturbing voice section is calculated according to equation
(10) (step S107).
[0056]
With the average value DIST̲COH (K) of the disturbed speech segments obtained as described
above as a key, the target speech segment judgment threshold value collating unit 27 executes a
search process for the storage unit 28 and the key average value DIST̲COH (K Is extracted as the
03-05-2019
17
target voice determination threshold Θ (K) applied in the current analysis frame K by the target
voice section determination threshold transmission unit 29. It is transmitted to the voice activity
detection unit 14 (step S109). Thereafter, the parameter K is incremented by 1 (step S110), and
the process returns to the process by the coherence receiver 21.
[0057]
Next, it will be described that the optimal target voice determination threshold Θ (K) is obtained
by the above-described processing.
[0058]
As shown in FIG. 16, since the coherence COH varies in value range depending on the arrival
direction, the average value of the coherence can be associated with the arrival direction.
This means that the arrival direction can be estimated if the average value of the coherence is
obtained. Also, since the voice switch process is a process of passing the target voice without
processing and attenuating the disturbing voice, it is the direction of arrival of the disturbing
voice that is desired to be detected. Therefore, the disturbing voice section detection unit 25
detects the disturbing voice section, and the disturbing voice coherence average processing
section 26 calculates the average value DIST̲COH (K) of the coherence in the non-target voice
section.
[0059]
(A-3) Effects of the First Embodiment According to the first embodiment, the target voice
segment determination threshold value Θ is controlled according to the arrival direction of the
non-target voice (particularly the disturbing voice). The determination accuracy of the target
voice section and the non-target voice section can be improved, and voice switch processing can
be prevented from being erroneously operated in a section other than the desired section to
degrade the sound quality.
[0060]
As a result, it is possible to expect improvement in call sound quality in a communication
apparatus such as a television conference apparatus or a cellular phone, to which the audio
signal processing apparatus, method or program of the first embodiment is applied.
03-05-2019
18
[0061]
(B) Second Embodiment Next, a second embodiment of the audio signal processing device,
method and program according to the present invention will be described with reference to the
drawings.
[0062]
According to the second embodiment, the method of detecting the disturbing voice section in the
first embodiment may be detected as a disturbing voice section although it is not rare, although it
is very rare. It is also intended to prevent detection.
In the method of detecting a disturbed speech segment in the first embodiment, for example, a
background noise segment immediately after transitioning from a target speech segment to an
unintended speech segment is detected as a disturbed speech segment despite not being a
disturbed speech segment. There was also.
When the average value DIST̲COH of the coherence is updated due to such erroneous detection,
an error also occurs in the setting of the target voice section determination threshold Θ (K).
[0063]
The entire configuration of the audio signal processing device 1A according to the second
embodiment can also be represented by FIG. 1 used in the description of the first embodiment.
Further, the internal configuration of the target voice section determination threshold value
control unit 20A according to the second embodiment can also be represented by FIG. 2 used in
the description of the first embodiment.
[0064]
In the case of the second embodiment, the conditions for determining the disturbing voice
section detection unit 20A to be a disturbing voice section are different from the first
embodiment.
03-05-2019
19
[0065]
While the determination condition of the first embodiment is value DIFF (K) is equal to or more
than the disturbance voice section determination threshold Φ , the determination condition of
the second embodiment is value DIFF (K) is a disturbance voice And the coherence COH (K) is
larger than the average value AVE̲COH (K) of the coherence in the non-target voice section.
[0066]
The background of the change of the determination condition will be described.
The coherence has a small value and a small fluctuation in the background noise section, but in
the disturbing speech section, the value is large but not as large as the target speech section, and
the fluctuation is also large.
Therefore, the coherence instantaneous value COH (K) of the disturbing voice section and the
average value AVE̲COH (K) often have a large difference. The condition that the value DIFF (K) is
equal to or greater than the disturbance voice section determination threshold Φ takes this
characteristic into consideration. However, this condition alone may cause the above-described
erroneous determination. The cause of this is that in the background noise period immediately
after the target voice period, the average value AVE̲COH (K) of the coherence in the non-target
voice period is an instantaneous value while the influence of the coherence of the previous
disturbed voice period remains. Since the coherence COH (K) is a small value in the background
noise section, the difference between the instantaneous value and the average value becomes
large, and the value DIFF (K) which is the absolute value also becomes large. . Therefore, in the
second embodiment, an erroneous determination is prevented by adding a condition COH (K)>
AVE̲COH (K) that the coherence instantaneous value of the disturbing voice section is larger
than the average value.
[0067]
FIG. 5 is a flowchart showing the operation of the target voice section determination threshold
control unit 20A of the second embodiment, and the same or corresponding steps are given the
same or corresponding reference numerals as those of FIG. 4 according to the first embodiment.
Is shown.
03-05-2019
20
[0068]
As described above, in the second embodiment, step S106A, which is the determination step of
the disturbing voice section, is value DIFF (K) か ら from DIFF (K) Φ 』 in step S106 of the
first embodiment. The processing is changed to Φ and COH (K)> AVE̲COH (K), and the other
processes are the same as in the first embodiment.
[0069]
As described above, according to the second embodiment, even in the case of the background
noise period immediately after the end of the target voice period, it is possible to prevent the
coherence average value of the disturbing voice period from being erroneously updated. Since
the target voice segment determination threshold can be set to an appropriate value, the
determination accuracy of the target voice segment can be further improved.
[0070]
As a result, it is possible to expect improvement in call sound quality in a communication
apparatus such as a television conference apparatus or a cellular phone, to which the audio
signal processing apparatus, method or program of the second embodiment is applied.
[0071]
(C) Third Embodiment Next, a third embodiment of the audio signal processing device, method
and program according to the present invention will be described with reference to the drawings.
[0072]
In the non-target voice section, the coherence COH sharply increases immediately after switching
from the background noise section to the disturbing voice section.
However, since the coherence average value DIST̲COH (K) of the disturbing voice section is an
average value, even if the coherence COH increases rapidly, it does not appear immediately in the
fluctuation of the coherence average value DIST̲COH (K).
That is, the followability of the coherence average value DIST̲COH (K) to the rapid increase of
03-05-2019
21
the coherence COH is bad.
As a result, immediately after switching from the background noise section to the disturbing
voice section, the coherence average value DIST̲COH (K) of the disturbing voice section is not
accurate.
The third embodiment is made in view of the above points, and the coherence average value
DIST̲COH of the disturbing voice section used to determine the target voice section judging
threshold immediately after switching from the background noise section to the disturbing voice
section. I was trying to make) accurate.
Specifically, in the third embodiment, immediately after switching from the background noise
period to the disturbing voice period, the time constant ζ in equation (10) is to be controlled.
[0073]
(C-1) Configuration of Third Embodiment The entire configuration of the audio signal processing
device 1B according to the third embodiment can also be represented by FIG. 1 used in the
description of the first embodiment.
[0074]
FIG. 6 is a block diagram showing the detailed configuration of the target voice section
determination threshold value control unit 20B of the third embodiment, and the same
corresponding parts as in FIG. 2 according to the second embodiment, and corresponding parts.
It shows.
[0075]
The target voice section determination threshold value control unit 20B of the third embodiment
is the same as the coherence receiver 21 of the second embodiment, the non-target voice section
detector 22, the non-target voice coherence average processor 23, the difference calculation In
addition to the block 24, the disturbing voice section detecting section 25, the disturbing voice
coherence averaging processing section 26, the target voice section judging threshold comparing
section 27, the storage section 28 and the target voice section judging threshold transmitting
section 29, the average parameter control section 30 and the disturbing voice A section
03-05-2019
22
judgment result handover unit 31 is provided.
The average parameter control unit 30 is interposed between the disturbing voice section
detection unit 25 and the disturbing voice coherence average processing unit 26, and the
disturbing voice section determination result handover unit 31 determines the target voice
section determination threshold comparing unit 27 and the target voice section determination. It
is inserted between the threshold transmission units 29.
[0076]
The average parameter control unit 30 receives the determination result from the disturbing
voice section detection unit 25 and stores 0 in the determination result storage variable var̲new
if it is not a disturbing voice section, and the determination result storage variable var̲new if it is
a disturbing voice section. Is stored in 1 and then compared with the determination result
storage variable var̲old in the immediately preceding frame.
When the determination result storage variable var̲new of the current frame exceeds the
determination result storage variable var̲old of the previous frame, the average parameter
control unit 30 considers that the background noise period has shifted to the disturbance voice
period and uses it for calculating the disturbance voice period coherence average value. If the
average parameter ζ is set to a large fixed value close to 1.0 (larger than the initial value
described later) and the judgment result storage variable var̲new of the current frame does not
exceed the judgment result storage variable var̲old of the previous frame, the disturbing voice
An initial value is set as an average parameter ζ used for calculation of the interval coherence
average value.
[0077]
The disturbed speech coherence averaging processing unit 26 of the third embodiment applies
the average parameter ζ set by the average parameter control unit 30 to perform the
calculation of the above-mentioned equation (10).
[0078]
When the setting process of the average parameter に 対 す る for the current frame is
completed, the disturbance voice section judgment result handover unit 31 rewrites the
judgment result storage variable var̲old of the immediately preceding frame to the judgment
03-05-2019
23
result storage variable var̲new of the current frame, It is to be handed over to processing.
[0079]
(C-2) Operation of Third Embodiment Next, the detailed operation of the target speech section
determination threshold control unit 20B of the audio signal processing device 1B of the third
embodiment will be described with reference to the drawings.
The overall operation of the audio signal processing device 1B of the third embodiment is the
same as the overall operation of the audio signal processing device 1 of the first embodiment,
and the description thereof will be omitted.
[0080]
FIG. 7 is a flowchart showing the operation of the target voice section determination threshold
value control unit 20B of the third embodiment, where the same or corresponding steps as in
FIG. It shows.
[0081]
The coherence COH (K) calculated by the coherence calculation unit 13 and input to the target
speech section determination threshold control unit 20B is acquired by the coherence reception
unit 21 (step S101), and fixed by the non-target speech coherence average processing unit 22. It
is compared with the threshold Ψ to determine whether or not it is an unintended speech section
(step S102).
If the determination result is the target voice section (if COH (K) Ψ に よ っ て), the non-target
voice coherence averaging processing unit 22 sets the average value AVE̲COH (K) of coherence
in the non-target voice section to the immediately preceding analysis frame K The average value
AVE̲COH (K−1) at −1 is applied as it is (step S103), and if it is an unintended voice section (if it
is COH (K) <、) The average value AVE̲COH (K) of the coherence in the target voice section is
calculated (step S104).
[0082]
03-05-2019
24
Subsequently, the difference calculating unit 24 calculates the absolute value DIFF (K) of the
difference between the instantaneous value COH (K) of the coherence and the average value
AVE̲COH (K) according to the equation (9) (step S105).
Then, in the disturbing voice section detection unit 25, it is said that value DIFF (K) is equal to
or more than the disturbing voice section determination threshold Φ and coherence COH (K) is
larger than average value AVE̲COH (K) of coherence in non-target voice section . It is
determined whether the condition of the disturbing voice section is satisfied (step S106A).
[0083]
If this condition is not satisfied (if it is not a disturbing voice section), the average parameter
control unit 30 stores 0 in the determination result storage variable var̲new of the current
frame (step S150). Thereafter, the disturbing voice coherence averaging processing unit 26
applies the value DIST̲COH (K-1) in the immediately preceding analysis frame K-1 as the average
value DIST̲COH (K) of the coherence in the disturbing voice section (step S108).
[0084]
On the other hand, when the condition of the disturbing voice section is satisfied (in the case of
the disturbing voice section), 1 is stored in the determination result storage variable var̲new of
the current frame in the average parameter control unit 30 (step S151). The determination result
storage variable var̲new of the frame is compared with the determination result storage variable
var̲old in the immediately preceding frame (step S152). When the determination result storage
variable var̲new of the current frame exceeds the determination result storage variable var̲old
of the previous frame, the average parameter control unit 30 approaches 1.0 as an average
parameter ζ used for calculation of the disturbing voice section coherence average value. If a
large fixed value is set (step S154), and the determination result storage variable var̲new of the
current frame does not exceed the determination result storage variable var̲old of the
immediately preceding frame, the average parameter control unit 30 causes the disturbance
voice section coherence average An initial value is set as the average parameter ζ used to
calculate the value (step S153). After such setting, the disturbed speech coherence averaging
processing unit 26 calculates the average value DIST̲COH (K) of the coherence in the disturbed
speech section according to the equation (10) (step S107).
03-05-2019
25
[0085]
With the average value DIST̲COH (K) of the disturbed speech segments obtained as described
above as a key, the target speech segment judgment threshold value collating unit 27 executes a
search process for the storage unit 28 and the key average value DIST̲COH (K Is extracted as the
target voice determination threshold Θ (K) applied in the current analysis frame K by the target
voice section determination threshold transmission unit 29. It is transmitted to the voice activity
detection unit 14 (step S109).
[0086]
Thereafter, the disturbance voice section determination result handover unit 31 rewrites the
determination result storage variable var̲old of the immediately preceding frame into the
determination result storage variable var̲new of the current frame (step S155).
Then, the parameter K is incremented by 1 (step S110), and the process returns to the process by
the coherence receiver 21.
[0087]
Note that the value stored in the determination result storage variable var̲new of the current
frame or the determination result storage variable var̲old of the immediately preceding frame is
not limited to 1 or 0. When different values are stored, the determination condition of step S152
may be changed accordingly.
[0088]
Also, although the case where the average parameter ζ is set to a large value close to 1.0 has
been described above for only one frame immediately after switching from the background noise
section to the disturbing voice section, the number of frames from the frame immediately after
switching is By averaging, the average parameter 設定 may be set to a large value close to 1.0
continuously for a predetermined number of frames. For example, control may be performed
such that the average parameter ζ is set to a large value close to 1.0 continuously for five
03-05-2019
26
frames immediately after the switching, and the subsequent frames are returned to the initial
value.
[0089]
(C-3) Effects of the Third Embodiment According to the third embodiment, when it is detected
that the background noise section has been switched to the disturbing voice section and
switched, the formula for calculating the coherence average of the disturbing voice section Since
the parameters in the above are controlled, it is possible to minimize the follow-up delay of the
coherence average, and it becomes possible to set the target speech segment determination
threshold more appropriately.
[0090]
As a result, it is possible to expect improvement in call sound quality in a communication
apparatus such as a television conference apparatus or a cellular phone to which the audio signal
processing apparatus, method or program of the third embodiment is applied.
[0091]
(D) Other Embodiments In the description of each of the above-described embodiments, various
modified embodiments are mentioned, but further, modified embodiments as exemplified below
can be mentioned.
[0092]
In equation (10), the coherence average value DIST̲COH (K) in the disturbing voice section is
updated based on the coherence COH (K) in the current frame, but the influence of the
instantaneous variation of the coherence COH (K) depends on the noise characteristics. It may be
more accurate to detect a slight relaxation.
In that case, the coherence average value DIST̲COH (K) in the disturbing voice section may be
updated based on the coherence average value AVE̲COH (K) in the non-target voice section.
The following equation (11) is a calculation equation for this modified embodiment.
03-05-2019
27
[0093]
DIST̲COH (K) = ζ × AVE̲COH (K) + (1−ζ) × DIST̲COH (K−1) (11) In the above
embodiments, the target voice section detection unit is based on the coherence average value of
the disturbing voice section. Although it has been shown that the threshold to be used is
determined, the parameters used to determine the threshold are not limited to the coherence
average value.
The parameter may be any parameter that reflects the tendency of coherence in the immediately
preceding period, and for example, the threshold may be set based on the peak of coherence
obtained by applying a known peak hold method. . Also, the threshold may be set based on
statistics such as variance of coherence, standard deviation, and the like.
[0094]
In each of the above embodiments, although the non-targeted speech coherence average
calculation unit 22 determines which of the two update methods of the coherence average value
is to be applied by one threshold value 、, the update of the coherence average value Three or
more methods may be prepared, and a plurality of threshold values may be provided in
accordance with the number of update methods. For example, a plurality of updating methods in
which δ in equation (8) is different may be prepared.
[0095]
Each of the above-described embodiments may be used in combination with any one or any two
or all of known frequency subtraction, coherence filters, and Wiener filters. Higher noise
suppression performance can be realized by combined use. Hereinafter, the configuration and
operation in the case of using each of the frequency subtraction, the coherence filter, and the
Wiener filter together with the first embodiment will be briefly described.
[0096]
FIG. 8 is a block diagram showing the configuration of a modified embodiment in which
frequency subtraction and the first embodiment are used in combination, and the same or
corresponding parts as in FIG. 1 according to the first embodiment are denoted by the same
03-05-2019
28
reference numerals. It shows.
[0097]
In FIG. 8, an audio signal processing apparatus 1C according to this modified embodiment
includes a frequency subtracting unit 40 in addition to the configuration of the first embodiment.
The frequency subtraction unit 40 includes a third directivity formation unit 41, a subtraction
unit 42, and an IFFT unit 43.
[0098]
Here, frequency subtraction is a method of performing noise suppression by subtracting an
unintended audio signal component from an input signal.
[0099]
The third directivity forming unit 41 receives the two input signals X1 (f, K) and X2 (f, K)
converted from the FFT unit 10 into the frequency domain.
The third directivity forming unit 41 executes the equation (12) to generate the third directivity
signal B3 (f, K) according to the directivity characteristic having a dead angle in the front as
shown in FIG. The directivity signal B3 (f, K) is formed as a noise signal and supplied to the
subtraction unit 42 as a subtraction input. One input signal X1 (f, K) converted to the frequency
domain is given as a subtractive input to the subtractor 42, and the subtractor 42 receives the
input signal X1 ((1)) as shown in equation (13). The third directivity signal B3 (f, K) is subtracted
from f, K) to obtain a frequency subtraction processed signal D (f, K). The IFFT unit 43 converts
the frequency subtraction processing signal D (f, K) into a time domain signal q (n) and supplies it
to the voice switch multiplication unit 16.
[0100]
B3 (f, K) = X1 (f, K) -X2 (f, K) (12) D (f, K) = X1 (f, K) -B3 (f, K) (13) FIG. Is a block diagram
showing a configuration of a modified embodiment in which the coherence filter and the first
03-05-2019
29
embodiment are used in combination, and the same or corresponding parts as in FIG. 1 according
to the first embodiment are given the same or corresponding reference numerals Is shown.
[0101]
In FIG. 10, an audio signal processing apparatus 1D according to this modified embodiment
includes a coherence filter operation unit 50 in addition to the configuration of the first
embodiment.
The coherence filter operation unit 50 includes a coherence filter coefficient multiplication unit
51 and an IFFT unit 52.
[0102]
Here, coherence filter refers to noise removal that suppresses signal components having a
bias in the direction of arrival by multiplying the input signal for each frequency by coef (f, K)
obtained by the above-mentioned equation (6) It is about technology.
[0103]
The coherence filter coefficient multiplication unit 51 multiplies the input signal X 1 (f, K) by the
coefficient coef (f, K) obtained in the process of calculation of the coherence calculation unit 13
as shown in equation (14) to suppress noise The rear signal D (f, K) is obtained.
The IFFT unit 52 converts the noise-suppressed signal D (f, K) into a time domain signal q (n),
and supplies the time domain signal q (n) to the voice switch multiplication unit 16.
[0104]
D (f, K) = X1 (f, K) × coef (f, K) (14) FIG. 11 is a block diagram showing a configuration of a
modified embodiment using the Wiener filter and the first embodiment in combination. The same
or corresponding parts as in FIG. 1 according to the first embodiment are indicated by the same
or corresponding reference numerals.
03-05-2019
30
[0105]
In FIG. 11, an audio signal processing apparatus 1E according to this modified embodiment
includes a Wiener filter operation unit 60 in addition to the configuration of the first
embodiment.
The Wiener filter operation unit 60 includes a Wiener filter coefficient calculation unit 61, a
Wiener filter coefficient multiplication unit 62, and an IFFT unit 63.
[0106]
Here, the "Wiener filter" is a technology for removing noise by multiplying coefficients obtained
by estimating noise characteristics for each frequency from the signal of the noise section as
described in Patent Document 2 .
[0107]
The Wiener filter coefficient calculation unit 61 estimates the Wiener filter coefficient wf̲coef (f,
K) with reference to the detection result of the target speech segment detection unit 14 if it is a
non-target speech segment ("Equation 3" of Patent Document 2). See the formula of
On the other hand, if it is the target voice section, estimation of the Wiener filter coefficient is not
performed. The Wiener filter coefficient multiplication unit 62 multiplies the input signal X1 (f,
K) by the Wiener filter coefficient wf̲coef (f, K) as shown in equation (15) to obtain the noisesuppressed signal D (f, K) obtain. The IFFT unit 63 converts the noise-suppressed signal D (f, K)
into a time domain signal q (n), and supplies the time domain signal q (n) to the voice switch
multiplication unit 16.
[0108]
D (f, K) = X1 (f, K) x wf̲coef (f, K) (15) In the above, one to which the voice switch processing is
applied after the frequency subtraction processing, the coherence filter processing or the Wiener
filter processing is performed Although shown, this processing order may be reversed.
[0109]
03-05-2019
31
In each of the above embodiments, processing that has been processed with signals in the
frequency domain may be processed with signals in the time domain, if possible, and conversely,
processing that has been processed with signals in the time domain is possible If it is, it may be
made to process by the signal of a frequency domain.
[0110]
In the above embodiments, the case where the signals captured by the pair of microphones are
immediately processed is shown, but the audio signal to be processed according to the present
invention is not limited to this.
For example, the present invention can be applied to the case of processing a pair of audio
signals read from a recording medium, and also to the case of processing a pair of audio signals
transmitted from an opposing apparatus. It can apply.
[0111]
1, 1A, 1B: voice signal processing device, m̲1, m̲2: microphone, 10: FFT unit, 11: first directivity
forming unit, 12: second directivity forming unit, 13: coherence calculation unit, 14: purpose
Voice section detection unit 15 Gain control unit 16 Voice switch gain multiplication unit 20,
20A, 20B Target voice section judgment threshold control unit 21 Coherence reception unit 22
Non-target voice section detection unit 23 Non-target voice coherence average processing unit
24 difference calculation unit 25 disturbed voice section detection unit 26 disturbed voice
coherence average processing section 27 target voice section determination threshold collating
unit 28 storage unit 29 target voice Section determination threshold transmission section, 30 ...
average parameter control section, 31 ... disturbance voice section determination result
transmission section.
03-05-2019
32
1/--страниц
Пожаловаться на содержимое документа