close

Вход

Забыли?

вход по аккаунту

JP2012165273

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012165273
The present invention provides a sound source signal extraction technique for extracting a sound
source signal with a calculation cost smaller than that of the prior art. A frequency domain
conversion unit converts a signal collected by a microphone array disposed on a two-dimensional
plane or a one-dimensional straight line into a frequency domain signal. A window function unit
multiplies the frequency domain signal by the window function to generate a window function
frequency domain signal. The filter unit filters the frequency domain signal after the window
function using a filter based on back propagation calculation ignoring inhomogeneous waves.
The time domain conversion unit converts the signal after the filtering process into a time
domain signal by inverse Fourier transform. [Selected figure] Figure 1
Sound source signal extraction device, method and program
[0001]
The present invention relates to a technology for collecting a sound signal with a microphone
array installed in a certain sound field and extracting a sound source signal or the like with a
high SN ratio.
[0002]
Non-Patent Document 1 describes a technique for estimating the sound pressure gradient
distribution on a plane parallel to the microphone array plane in order to reproduce the sound
picked up by a large number of microphone arrays with a large number of speaker arrays at a
remote location. It is done.
04-05-2019
1
A sound source signal can be extracted by acquiring the sound pressure gradient distribution on
a plane which is a plane parallel to the microphone array plane and includes the sound source
position according to the technique described in this non-patent document 1.
[0003]
Shoichi Koyama, 4 others, "Inverse Wave Propagation in Wave Field Synthesis", AES 40th
International Conference, Tokyo, Japan, October 8-10, pp. 2-10
[0004]
However, there is a problem that a large computational cost is required to obtain the sound
pressure gradient distribution over the entire plane including the sound source position.
[0005]
An object of the present invention is to provide a sound source signal extraction apparatus,
method and program for extracting a sound source signal at a calculation cost smaller than that
of the prior art.
[0006]
In order to solve the above-mentioned subject, a sound source signal is extracted by estimating
the sound pressure of only a sound source position using a filter based on back propagation
calculation ignoring inhomogeneous waves.
[0007]
By estimating the sound pressure of only the sound source position instead of the sound
pressure gradient distribution over the entire plane, it is possible to extract the sound source
signal at a smaller calculation cost than that of the conventional case.
[0008]
FIG. 1 is a functional block diagram of an example of a sound source signal extraction device
according to a first embodiment.
04-05-2019
2
FIG. 6 is a view for explaining an example of arrangement of microphone arrays of the sound
source signal extraction device according to the first embodiment.
The functional block diagram of the example of the sound source signal extraction apparatus of
2nd embodiment.
The figure for demonstrating the example of arrangement ¦ positioning of the microphone array
of the sound source signal extraction apparatus of 2nd embodiment.
The flowchart which shows the example of the sound source signal extraction method.
The figure which shows the example of a sound source signal. The figure which shows the sound
signal collected by the microphone. The figure which shows the sound source signal extracted by
the sound source signal extraction method of 2nd embodiment. The figure which shows the
sound source signal extracted by the conventional sound source signal extraction method.
[0009]
Hereinafter, an embodiment of the present invention will be described with reference to the
drawings.
[0010]
First Embodiment As shown in FIG. 2, the sound source signal extraction apparatus and method
of the first embodiment includes two Nx × Ny nondirectional microphones arranged at the
position of z = z0. Of sound signals emitted by the sound source S located at (xs, ys, zs) using the
sound signals collected by the two-dimensional microphone arrays M1-1, M2-1,. Extract
[0011]
Nx and Ny are positive integers.
The number of microphones constituting the microphone array M1-1, M2-1,..., MNx-Ny is
04-05-2019
3
basically arbitrary.
However, it is desirable that there be as many microphones as possible to acquire spatial sound
pressure distribution. It is desirable to set each interval of the microphones constituting the
microphone arrays M1-1, M2-1,..., MNx-Ny to a length equal to or less than a half wavelength of
the sound signal to be collected.
[0012]
The positions of the microphones constituting the two-dimensional microphone arrays M1-1,
M2-1,..., MNx-Ny arranged at the position of z = z0 are represented as rij = (xi, yj, z0). It is
assumed that the position (xs, ys, zs) of the sound source S and the positions (xi, yj, z0) of the
microphones constituting the two-dimensional microphone array M1-1, M2-1, ..., MNx-Ny are
known.
[0013]
The sound source signal extraction apparatus according to the first embodiment includes, for
example, a frequency domain conversion unit 1, a window function unit 2, a filter unit 3, and a
time domain conversion unit 4 as shown in FIG. The sound source signal extraction device
performs processing of each step of the sound source signal extraction method illustrated in FIG.
5.
[0014]
The two-dimensional microphone arrays M1-1, M2-1,..., MNx-Ny pick up the sound emitted by
the sound source S to generate a time domain sound signal. The generated sound signal is sent to
the frequency domain conversion unit 1. The sound signal of time t collected in the microphone
Mi-j at rs = (xi, yj, z0) is denoted as f (i, j, t).
[0015]
04-05-2019
4
The frequency domain conversion unit 1 Fourier-transforms the sound signal f (i, j, t) collected by
the microphone arrays M1-1, M2-1, ..., MNx-Ny into the frequency domain signal F (i, j, t).
Convert to ω) (step S1). The generated frequency domain signal F (i, j, ω) is sent to the window
function unit 2. ω is a frequency. For example, the frequency domain signal F (i, j, ω) is
generated by short time discrete Fourier transform. Of course, the frequency domain signal F (i, j,
ω) may be generated by another existing method.
[0016]
The window function unit 2 multiplies the frequency domain signal F (i, j, ω) by the window
function to generate a window function after frequency domain signal Fw (j, j, ω) (step S2). The
window function after frequency domain signal Fw (j, j, ω) is sent to the filter unit 3. As a
window function, a so-called Turkey window function w (i, j) defined by the following equation is
used, for example. Ntpr is a score to which a taper is applied, and is an integer of 1 or more and
Nx and Ny or less. As described later, the window function unit 2 may be omitted.
[0017]
[0018]
The filter unit 3 performs the filter processing defined by the following equation on the
frequency domain signal F (i, j, ω) after the window function (step S 3), and acquires the sound
pressure at the sound source position to obtain Extract the sound source signal S (ω).
The signal S (ω) after the filtering process is sent to the time domain conversion unit 4. k is a
wave number, and c = ω / c, where c is the speed of sound. In the following equation, j to the left
of the wave number k means an imaginary unit. H (i, j, ω) is a filter.
[0019]
[0020]
04-05-2019
5
The time domain conversion unit 4 converts the signal S (ω) after the filtering process into a
time domain signal s (t) by inverse Fourier transform (step S4).
As the inverse Fourier transform, an existing method such as a short time discrete inverse
Fourier transform may be used. This time domain signal s (t) is a signal at the position of the
sound source S.
[0021]
Second Embodiment The sound source signal extraction apparatus and method of the first
embodiment use a two-dimensional microphone array, whereas the sound source signal
extraction apparatus and method of the second embodiment use a one-dimensional microphone
array. As a result, the number of microphones, that is, the number of channels can be reduced,
which makes implementation relatively easy.
[0022]
The sound source signal extraction apparatus and method according to the second embodiment
includes a microphone array M1 including Nx nondirectional microphones arranged at the
positions of y = y0 and z = z0 in the first room shown in FIG. The sound signals emitted by the
sound source S located at (xs, ys, zs) are extracted using the sound signals collected by -1, M2-1,
..., MNx-1.
[0023]
Nx is a positive integer.
The number of microphones constituting the microphone array M1-1, M2-1,..., MNx-1 is basically
arbitrary. However, it is desirable that there be as many microphones as possible to acquire
spatial sound pressure distribution. It is desirable to set the intervals of the microphones
constituting the microphone arrays M1-1, M2-1,..., MNx-1 to a length equal to or less than a half
wavelength of the sound signal to be collected.
[0024]
04-05-2019
6
The positions of the microphones constituting the microphone arrays M1-1, M2-1,..., MNx-1
arranged at the positions of z = z0, y = y0 are represented by ri = (xi, y0, z0). . It is assumed that
the position (xs, ys, zs) of the sound source S and the positions (xi, y0, z0) of the microphones
constituting the microphone array M1-1, M2-1, ..., MNx-1 are known.
[0025]
The sound source signal extraction apparatus according to the second embodiment includes, for
example, a frequency domain conversion unit 1, a window function unit 2, a filter unit 3, and a
time domain conversion unit 4 as shown in FIG. The sound source signal extraction device
performs processing of each step of the sound source signal extraction method illustrated in FIG.
5.
[0026]
The microphone arrays M1-1, M2-1,..., MNx-1 pick up the sound emitted by the sound source S
to generate a time domain sound signal. The generated sound signal is sent to the frequency
domain conversion unit 1. The sound signal of time t collected in the microphone Mi-1 with ri =
(xi, y0, z0) is denoted as f (i, t).
[0027]
The frequency domain conversion unit 1 transforms the sound signal f (i, t) collected by the
microphone arrays M1-1, M2-1,..., MNx-1 into a frequency domain signal F (i, ω) by Fourier
transformation. To do (step S1). The generated frequency domain signal F (i, ω) is sent to the
window function unit 2. ω is a frequency. For example, frequency domain signal F (i, ω) is
generated by short time discrete Fourier transform. Of course, the frequency domain signal F (i,
ω) may be generated by another existing method.
[0028]
04-05-2019
7
The window function unit 2 multiplies the frequency domain signal F (i, ω) by the window
function to generate a window function after frequency domain signal Fw (i, ω) (step S2). The
window function after frequency domain signal Fw (i, ω) is sent to the filter unit 3. As a window
function, for example, a so-called Turkey key function wx (i) defined by the following equation is
used. Ntpr is a score to which a taper is applied, and is an integer of 1 or more and Nx or less.
[0029]
[0030]
The filter unit 3 performs the filter processing defined by the following equation on the
frequency domain signal F (i, ω) after the window function (step S3), and acquires the sound
pressure of the sound source position to obtain the sound source signal in the frequency domain
Extract S (ω) which is
The signal S (ω) after the filtering process is sent to the time domain conversion unit 4. k is a
wave number, and c = ω / c, where c is the speed of sound. In the following equation, j to the left
of the wave number k means an imaginary unit. H (i, ω) is a filter.
[0031]
[0032]
In the above equation, Hn <(1)> (·) is a first-class Hankel function.
The first-class Hankel function Hn <(1)> (x) is defined as follows using a first-type Bessel function
Jn (x) and a second-type Bessel function Yn (x).
[0033]
[0034]
04-05-2019
8
The time domain conversion unit 4 converts the signal S (ω) after the filtering process into a
time domain signal s (t) by inverse Fourier transform (step S4).
As the inverse Fourier transform, an existing method such as a short time discrete inverse
Fourier transform may be used. This time domain signal s (t) is a signal at the position of the
sound source S.
[0035]
[Theoretical Background] The reason why the processing of the filter unit 3 in the first
embodiment and the second embodiment is respectively the formula (1) and the formula (2) will
be described below. In the following, ω may not be described to simplify the notation.
[0036]
The sound pressure P (rs) at the sound source position rs is estimated using the sound pressure P
(r0) at the position r0 = (x0, y0, z0) on the plane z = z0 in FIG. become that way.
[0037]
[0038]
Here, κ <-1> is a kernel function.
[0039]
[0040]
kx and ky are the wave number in the x direction and the wave number in the y direction,
respectively.
04-05-2019
9
The kernel function <<-1> can be decomposed into the sum of the component κ1 <-1>
corresponding to the homogeneous wave and the component κ2 <-1> corresponding to the
inhomogeneous wave.
In general, the kernel function κ <−1> can be approximated as follows, ignoring components
corresponding to inhomogeneous waves that are considered not to contribute to the
reproduction of the sound field.
[0041]
[0042]
Substituting the right side of equation (4) into equation (3) results in the following.
[0043]
[0044]
P (rs) in the above equation corresponds to S (ω) in the first embodiment, and P (r0) corresponds
to Fw (i, j, ω).
It is Formula (1) which expressed the said formula discretely.
[0045]
Also in the second embodiment in which the microphones are arranged on a straight line, the
equation (2) is obtained by ignoring the component 22 <-1> corresponding to the
inhomogeneous wave of the kernel function <<-1> as described above. Is obtained.
[0046]
[Simulation Results] A general closed-type speaker was placed at a position 2 m from the center
of the microphone array on a straight line arranged in 96 channels at 4 cm intervals, and the
04-05-2019
10
sound source signal was extracted by the sound source signal extraction method of the second
embodiment. Show the results.
The sound source signal reproduced from the speaker is 10 seconds of voice (female voice +
male voice) as shown in FIG.
[0047]
At this time, the sound pickup signal by the microphone at the closest distance from the sound
source is as shown in FIG.
The SN ratio of this signal to the original speech is 5.55 dB.
In calculating the SN ratio, the cross correlation was used to correct the time difference, and the
least squares method was used to correct the amplitude so as to minimize the error.
[0048]
FIG. 8 shows a sound source signal extracted by the proposed method.
The sound source position is known.
The SN ratio at this time was 20.07 dB.
[0049]
FIG. 9 shows the result of the conventional focus formation in which the time difference and the
sum are given to the microphone array signals. At this time, a process of dividing the signal by
the distance to the focal position is added from the viewpoint of maximizing the SN ratio. The SN
ratio at this time was 5.43 dB.
04-05-2019
11
[0050]
As described above, by estimating the sound pressure at the sound source position, it is possible
to extract the sound source signal with a higher SN ratio than that of the prior art.
[0051]
Modified Example Etc. The window function unit 2 may be omitted.
In this case, the filter unit 3 generates the frequency domain signal F (i, j, ω) or F (i, ω) rather
than the window function after frequency domain signal Fw (i, j, ω) or Fw (i, ω) On the other
hand, the same filter processing as in the case of the window function unit 2 is performed. In this
case, the filter unit 3 of the first embodiment specifically performs the filter process defined by
the following equation.
[0052]
[0053]
Further, in this case, the filter unit 3 of the second embodiment performs the filtering process
specifically defined by the following equation.
[0054]
[0055]
The sound source signal extraction device can be realized by a computer.
In this case, the processing content of each part of this apparatus is described by a program.
And each part in this apparatus is implement ¦ achieved on a computer by running this program
04-05-2019
12
by computer.
[0056]
The program describing the processing content can be recorded in a computer readable
recording medium.
Further, in this embodiment, these devices are configured by executing a predetermined program
on a computer, but at least a part of the processing contents may be realized as hardware.
[0057]
The present invention is not limited to the above-described embodiment, and various
modifications can be made without departing from the spirit of the present invention.
[0058]
1 frequency domain converter 2 window function 3 filter 4 time domain converter
04-05-2019
13
1/--страниц
Пожаловаться на содержимое документа