close

Вход

Забыли?

вход по аккаунту

JP2009130908

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009130908
A noise suppression method that can follow changes in the direction of a target sound at high
speed, change in the direction of a target sound within a predetermined time length, or even
when the target sound switches, without degrading the noise suppression performance. An
apparatus, a program, and its recording medium are provided. SOLUTION: A time difference
compensation unit 2 is based on received signal series Xi (k) and positional information of each
microphone element M1, M2, ... Mm, and directions θx (k), θy (k) of a target sound. A postcompensation signal sequence yi (k) in which the delay time difference has been compensated is
calculated and output. The error calculation unit 4 outputs the sum of error components between
the post-compensation signal sequences yi (k) as an error e (k). The direction calculation unit 3
calculates the directions θx (k) and θy (k) of the target sound s indicating the minimum value of
the error e (k), and outputs this to the time difference compensation unit 2. The adaptive
beamforming processing unit 5 combines the input after-compensated signal sequences yi (k) to
extract an output signal z (k) of the target sound. [Selected figure] Figure 1
Noise suppression method, apparatus, program and recording medium therefor
[0001]
The present invention relates to a noise suppression method, apparatus, program, and recording
medium therefor, and in particular, a noise suppression method for suppressing noise and
extracting only a target sound based on sound reception signals detected by a plurality of
microphone elements. , Apparatus, program, and recording medium therefor.
[0002]
04-05-2019
1
In a noise environment represented by street, car or train platform, even if a microphone placed
close to the mouth such as a handset or headset is used, there is another noise that interferes
with the desired sound that is the target sound. Voice and ambient noise may be mixed in.
In order to solve this problem, various noise cancelers and noise suppressors have been
proposed. Noise cancellers can be broadly classified into those using a single microphone and
those using a microphone array composed of multiple microphones to obtain higher noise
suppression performance.
[0003]
In the microphone array, a plurality of microphone elements are spatially arranged, and a time
difference or an amplitude difference depending on the spatial positional relationship between
each microphone element and the sound source is reflected in the sound reception signal of each
microphone element. In Non-Patent Document 1, as a noise suppression method using a
microphone array, only target sound is selectively collected using statistical information of time
difference and amplitude difference of sound reception signals detected by each microphone
element. There is disclosed a technology for separating the target sound and the disturbance
sound.
[0004]
As a noise suppression method using the above microphone array, a method based on adaptive
beamforming, a method based on independent component analysis, or a method based on time
frequency masking is well known.
[0005]
In the method based on adaptive beamforming, noise is suppressed by learning the spatial filter
by convergence calculation so that the direction of the target sound is separately input and the
directional dead angle is directed to the direction of the other interference sound. .
For this reason, when the direction of the intended target sound deviates from the actual
direction, or when the direction of the target sound changes every moment, performance
04-05-2019
2
degradation appears notably.
[0006]
In order to solve such technical problems, Patent Document 1 utilizes two adaptive beamformers
that direct the dead angle of directivity to the direction of the target sound and the direction of
the disturbance sound, respectively, and uses its spatial filter. There is disclosed a technique of
estimating each direction and making the directivity follow a change in the direction of the target
sound.
[0007]
In the method based on the independent component analysis, the directions of the target sound
and the disturbance sound may be unknown, and the spatial filter is trained so as to direct the
directional blind spot to the direction of the target sound and the direction of the disturbance
sound.
The learning of the spatial filter takes advantage of the statistical independence of the target
sound and the disturbing sound, and the learning requires a sound reception signal for several
seconds. For this reason, when the direction of the target sound changes momentarily,
performance degradation appears notably. In order to solve this problem, Patent Document 2
discloses a technology in which a spatial filter stored in advance in a storage unit is used as an
initial spatial filter in learning to follow a change in the direction of a target sound.
[0008]
In addition, in the case of using the method based on adaptive beamforming described above, by
providing blunt directivity in the direction of the target sound, it is possible to prevent the
deterioration of the target sound if the change in the direction of the target sound is within a
certain range. Can. However, the noise suppression performance is low because the directivity of
the target sound is dull. In order to solve this problem, Patent Document 3 discloses a technique
in which a noise canceller using a single microphone is connected in series to maintain the noise
suppression performance.
[0009]
04-05-2019
3
On the other hand, in the method based on time frequency masking, the directions of the target
sound and the interference sound may be unknown, and in order to extract the target sound, the
inter-channel level difference and the inter-channel phase difference of each frequency
component are calculated, and those channels The respective frequency components are
classified based on the difference and the target sound and the interference sound are separated.
In calculating the inter-channel level difference and inter-channel phase difference of each
frequency component, it is necessary to perform frequency analysis for each predetermined time
length. For this reason, when the direction of the target sound changes momentarily,
performance degradation appears. Such a time frequency masking method is described, for
example, in Patent Document 4.
[0010]
As described above, in the conventional noise suppression method, noise suppression is realized
by selectively collecting only the target sound or separating the target sound and the
interference sound, and the direction of the target sound is momentarily It is intended to improve
the ability to follow the direction change of the target sound when it changes. JP, 2006-217649,
A JP, 2007-156300, A JP, 2007-093630, A JP, 10-313, 497 A technical research report of the
Institute of Electronics, Information and Communication Engineers EA 2002-11 "On the use of
spatial information in speech enhancement under noise"
[0011]
However, in Patent Document 1, in order to follow the change in direction of the target sound,
the spatial filter of the adaptive beamformer is learned and calculated for each predetermined
time length, and the direction of the target sound is estimated. When there is no change in the
direction of (1), it is possible to accurately estimate the direction, but there is a problem that it is
impossible to follow the change in direction of the target sound within a predetermined time
length.
[0012]
In Patent Document 2, in order to follow the change in direction of the target sound, an initial
spatial filter is determined from a plurality of spatial filters recorded in advance in storage
means, and an initial spatial filter and a sound reception signal of a predetermined time length
04-05-2019
4
are used. Learning calculation is performed.
Therefore, when there is no change in the direction of the target sound within the predetermined
time length, the direction can be accurately estimated, but there is a problem that it is not
possible to follow the change in the direction of the target sound within the predetermined time
length. Moreover, not only many storage areas are required to store a plurality of spatial filters in
advance, but it is possible to perform learning calculation using an initial spatial filter selected
from a plurality of spatial filter candidates. Absent. In addition, when an initial spatial filter
selection error occurs, there is a problem that the target sound is degraded and the learning time
is further lengthened.
[0013]
In Patent Document 3, the change in direction of the target sound is assumed to be within a
certain range, and when the direction of the target sound changes in any other direction, the
change can not be followed and the target sound is degraded. There is a problem of In this case,
it is not possible to restore the degradation of the target sound by using a noise canceller using a
single microphone connected in the latter stage.
[0014]
In Patent Document 4, the inter-channel level difference and inter-channel phase difference of
each frequency component are calculated for each predetermined time length in order to follow
the change in direction of the target sound, so in the direction of the target sound within the
predetermined time length. If there is no change, the inter-channel differences can be calculated
accurately, but there is a problem that it is not possible to follow the change in direction of the
target sound within a predetermined time length.
[0015]
As described above, in any of the above-described conventional techniques, it is necessary to
learn the spatial filter by convergence calculation in consideration of the change in direction of
the target sound.
The spatial filter is a vector having a length of several hundreds of samples or more, and a signal
04-05-2019
5
sequence of at least the same samples is required to obtain the statistic necessary for the
convergence thereof. It takes time.
[0016]
In addition, since learning calculation is performed based on statistical information such as
correlation and independence for each predetermined time length, high-speed directional change
of the target sound, directional change of the target sound within the predetermined time length,
or the target sound In the case where is switched, there is a problem that the change can not be
followed and the deterioration of the noise suppression performance appears remarkably. In
particular, when the microphone array is attached to the portable device, not only the target
sound but also the microphone array is moved, the relative change in direction of the target
sound is faster.
[0017]
The object of the present invention is to solve the above-mentioned problems of the prior art, and
even if the target sound changes in direction at high speed or in a predetermined time length, or
even if the target sound switches, the change can be followed. It is an object of the present
invention to provide a noise suppression method, an apparatus, a program, and a recording
medium therefor, in which the noise suppression performance does not deteriorate.
[0018]
In order to achieve the above object, according to the present invention, in a noise suppressor for
suppressing noise components from sound reception signals detected by a plurality of
microphone elements and extracting a target sound, the noise suppressor separately from
positional information of each microphone element. Delay time calculation means for calculating
the delay time included in each sound reception signal based on the direction of the target sound
to be input, the time-series sound reception signal detected by each microphone element, and the
delay time Time difference compensation means for outputting the post-compensation signal for
which the delay time has been compensated for each microphone element, error calculation
means for calculating an error component between the respective post-compensation signals,
and target sound for minimizing the error component Direction calculation means for calculating
the direction of the delay time, and outputting the target sound to the delay time calculation
means; target sound extraction means for extracting the target sound based on the postcompensation signal of each of the microphone elements; Characterized in that it contains.
04-05-2019
6
[0019]
According to the present invention, even if the direction of the target sound changes, the spatial
filter can be learned on the assumption that there is no change in the direction, that is, the target
direction is fixed. The time required for the movement is significantly reduced, and the response
to the change in direction is improved.
Therefore, it is possible to follow the change in the direction of the target sound, the change in
the direction of the target sound within a predetermined time length, or the switching of the
target sound.
[0020]
The preferred embodiments of the present invention will be described in detail below with
reference to the drawings.
FIG. 1 is a block diagram showing the configuration of the main part of a noise suppression
device to which the present invention is applied, and microphone elements Mi are constructed of
m microphone elements M1, M2,. However, the sound reception signal series Xi (k) detected at
periodic discrete times k (K = 1, 2, 3...) Is output in time series.
[0021]
The time difference compensation unit 2 receives each of the microphone elements M1, M2,...
Mm on the basis of the positional information of each of the microphone elements M1, M2, ...
Mm and the directions θx (k) and θy (k) of the target sound output from the direction
calculation unit Delay time calculation unit 2a for calculating delay time .delta.i (.theta.x, .theta.y)
included in sound signal series Xi (k), which includes delay time .delta.i (.theta.x, .theta.y) and
each sound reception signal series Xi (k) .. Mm and the directions θx (k) and θy (k) of the target
sound, and the delay time difference occurring between the respective sound receiving signal
series Xi (k) is compensated. The post-compensation signal sequence yi (k) is calculated and
output.
[0022]
04-05-2019
7
FIG. 2 schematically shows how a delay occurs in each sound reception signal according to the
relative positional relationship between the direction of the target sound and the position of each
microphone Mi when the target sound is detected by the microphone array 1. FIG.
Here, all the microphone elements Mi (i = 1, 2,..., M) are disposed at arbitrary positions on the XY
plane, and their position coordinates pi (i = 1, 2,. It is assumed that the value is given.
[0023]
When a sound wave sent from a target sound source whose position is unknown arrives at each
microphone Mi as a plane wave, the sound reception signal series xi (k) at discrete time
(sampling timing) k is the distance from the target sound source to the origin of the XY plane
And a delay time δi (θx, θy) dependent on the directions θx and θy of the apparent sound s
from the origin of the XY plane. Therefore, the reception signal series xi (k) and the signal series
s (k) of the target sound s have the relationship of the following equation (1).
[0024]
[0025]
Here, the delay time τ is an absolute delay independent of the directions θx and θy of the
target sound s, and in the present invention, at the discrete time k as if the directions θx and θy
of the target sound s are kept constant. The delay time τ does not have to be taken into
consideration, as long as the received signal series xi (k) can be corrected.
[0026]
A post-compensation signal sequence yi (k) (i = 1,2,) in which the difference between the delay
times δi (θx, θy) included in the sound reception signal of each microphone is compensated
depending on the directions θx and θy of the target sound s. ..., m, for example, as shown by the
following equation (2), each sound receiving signal series xi (k) and each delay time δi (θ x, θ
y) to a sinc function sin c (x) realizing a time shift It is determined by applying and optimizing the
direction θx (k), θy (k) of the target sound s.
04-05-2019
8
Here, D is a fixed delay for satisfying the causality, N is a length of the sink function sinc (x), and
T is a sampling interval.
[0027]
[0028]
In this embodiment, the time difference compensation unit 2 receives the sound reception signal
series Xi (k) output from the microphone array 1 and the directions θx (k) and θy of the target
sound s output from the direction calculation unit 3 described later. By applying (k) to the above
equations (1) and (2), a post-compensation signal sequence yi (k) in which the delay time δi (θx,
θy) is compensated is output.
[0029]
The sinc function sinc (x) in this embodiment is a resampling interpolation for obtaining a postcompensation signal sequence yi (k) obtained by compensating the time difference δi (θx, θy)
from the sound reception signal sequence xi (k). Used as a kernel.
This is to realize a unit delay, that is, a time shift other than an integral multiple of the sampling
interval T. Oversampling can be performed to reduce the sampling interval T, or a necessary shift
is sufficient even with a time shift that is an integral multiple of the sampling interval T. If a
sampling rate is used, it may be realized by a time shift of unit delay.
[0030]
The error calculating unit 4 receives the post-compensation signal sequence yi (k) of each of the
microphone elements M1, M2, ... Mm at the discrete time k, and the error component between
the post-compensation signal sequences yi (k) at the discrete time k The sum is output as an
error e (k).
[0031]
FIG. 3 is a functional block diagram of the error calculation unit 4. In the present embodiment,
04-05-2019
9
the square sum of the differences of the post-compensation signal sequences yi (k) at discrete
time k is determined, and the sum is the error e (k Calculated as).
The error e (k) is one of the error functions minimized when the time difference between the
post-compensation signal sequences yi (k) is zero, and the directions θx and θy of the target
sound s are optimized. Minimized along with.
[0032]
[0033]
The direction calculating unit 3 receives as input the error e (k) at the discrete time k output
from the error calculating unit 5, and the direction θx (k), θy of the target sound s indicating
the minimum value of the error e (k) k) is output.
When the steepest descent method is used to calculate the directions θx and θy of the target
sound s, the directions θx and θy of the target sound s calculated at the discrete time k are
applied to the following equation (4) to obtain the next discrete time k The directions θx and θy
of the target sound s at +1 are calculated.
Here, μ is a step size parameter.
[0034]
[0035]
In the adaptive beamforming processing unit 5, the direction of the target sound is fixed in
advance in the front direction (on the z-axis), and based on each input post-compensation signal
sequence yi (k), the disturbance sound other than the target sound is The spatial filter is trained
to direct the directional blind spot in the direction to extract the output signal z (k) of the target
sound s.
[0036]
04-05-2019
10
Here, in the present embodiment, even if the direction of the target sound changes, the time
difference between the post-compensation signal sequences yi (k) remains zero, so in the
adaptive beamforming processing unit 5, there is no change in the direction of the target sound.
You can learn the spatial filter as
As a result, the time required for the convergence calculation of the spatial filter is significantly
shortened, and the response to the change in direction is improved, so that the direction change
of the target sound at high speed or a predetermined time length, or Even when the sound
switches, it is possible to follow the change.
[0037]
In this embodiment, by calculating the above procedure continuously and sequentially at each
discrete time, only the target sound is selectively picked up without deteriorating the noise
suppression performance, or the target sound and the disturbance sound And it becomes
possible to separate precisely.
[0038]
FIG. 4 is a functional block diagram showing the configuration of the second embodiment of the
present invention, and the same reference numerals as above denote the same or equivalent
parts.
[0039]
The present embodiment is characterized in that a delay-and-sum beamforming processing unit 6
is provided in place of the adaptive beamforming processing unit 5 of the first embodiment
described above.
The delay sum beamforming processing unit 6 outputs the output signal z (k) which is the target
sound by directing the directivity in the direction of the target sound.
Also in this embodiment, the direction of the target sound is a fixed value, and even if the
direction of the target sound changes, the delay-and-sum beamforming processing unit 6 can
04-05-2019
11
learn the spatial filter as having no direction change. Therefore, the response to the change in
direction of the target sound is improved.
[0040]
FIG. 5 is a functional block diagram showing the configuration of the third embodiment of the
present invention, and the same reference numerals as above denote the same or equivalent
parts.
[0041]
The present embodiment is characterized in that an independent component analysis processing
unit 7 is provided instead of the adaptive beamforming processing unit 5 of the first embodiment
described above.
The independent component analysis processing unit 7 causes the spatial filter to be trained so
as to direct the directional dead angle in the direction of the target sound and the disturbance
sound, and the target sound and the other disturbance sound are set as separate output signals zi
(k). Output.
Even in this embodiment, even if the direction of the target sound changes, the independent
component analysis processing unit 7 can learn the spatial filter as having no change in the
direction, so the responsiveness to the change in the direction of the target sound is improved.
Ru.
[0042]
FIG. 6 is a functional block diagram showing the configuration of the fourth embodiment of the
present invention, and the same reference numerals as above denote the same or equivalent
parts.
[0043]
The present embodiment is characterized in that a time frequency masking processing unit 8 is
provided in place of the adaptive beamforming processing unit 5 of the first embodiment
04-05-2019
12
described above.
In order to extract the target sound, the time frequency masking processing unit 8 calculates the
inter-channel level difference and inter-channel phase difference of each frequency component,
and classifies each frequency component based on the inter-channel difference. The sound and
other disturbing sounds are output as separate output signals zi (k).
[0044]
Even in the present embodiment, even if the direction of the target sound changes, the time
frequency masking processing unit 8 can learn the spatial filter as having no direction change, so
the responsiveness to the direction change of the target sound is improved. Ru.
[0045]
It is the block diagram which showed the structure of the principal part of the noise suppression
apparatus to which this invention is applied.
When a target sound is detected with a microphone array, it is the figure which showed typically
a mode that a delay arises in each sound reception signal according to the relative positional
relationship of a target sound and each microphone.
It is a functional block diagram of an error calculation part. It is a functional block diagram
showing composition of a 2nd embodiment of the present invention. It is a functional block
diagram showing composition of a 3rd embodiment of the present invention. It is a functional
block diagram showing the composition of a 4th embodiment of the present invention.
Explanation of sign
[0046]
DESCRIPTION OF SYMBOLS 1 ... microphone array, 2 ... time difference compensation part, 3 ...
direction calculation part, 4 ... delay error calculation part, 5 ... adaptive beamforming processing
part, 6 ... delay sum beamforming processing part, 7 ... independent component analysis
processing part, 8 ... Time frequency masking processing unit
04-05-2019
13
04-05-2019
14
1/--страниц
Пожаловаться на содержимое документа