Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2012165273 The present invention provides a sound source signal extraction technique for extracting a sound source signal with a calculation cost smaller than that of the prior art. A frequency domain conversion unit converts a signal collected by a microphone array disposed on a two-dimensional plane or a one-dimensional straight line into a frequency domain signal. A window function unit multiplies the frequency domain signal by the window function to generate a window function frequency domain signal. The filter unit filters the frequency domain signal after the window function using a filter based on back propagation calculation ignoring inhomogeneous waves. The time domain conversion unit converts the signal after the filtering process into a time domain signal by inverse Fourier transform. [Selected figure] Figure 1 Sound source signal extraction device, method and program [0001] The present invention relates to a technology for collecting a sound signal with a microphone array installed in a certain sound field and extracting a sound source signal or the like with a high SN ratio. [0002] Non-Patent Document 1 describes a technique for estimating the sound pressure gradient distribution on a plane parallel to the microphone array plane in order to reproduce the sound picked up by a large number of microphone arrays with a large number of speaker arrays at a remote location. It is done. 04-05-2019 1 A sound source signal can be extracted by acquiring the sound pressure gradient distribution on a plane which is a plane parallel to the microphone array plane and includes the sound source position according to the technique described in this non-patent document 1. [0003] Shoichi Koyama, 4 others, "Inverse Wave Propagation in Wave Field Synthesis", AES 40th International Conference, Tokyo, Japan, October 8-10, pp. 2-10 [0004] However, there is a problem that a large computational cost is required to obtain the sound pressure gradient distribution over the entire plane including the sound source position. [0005] An object of the present invention is to provide a sound source signal extraction apparatus, method and program for extracting a sound source signal at a calculation cost smaller than that of the prior art. [0006] In order to solve the above-mentioned subject, a sound source signal is extracted by estimating the sound pressure of only a sound source position using a filter based on back propagation calculation ignoring inhomogeneous waves. [0007] By estimating the sound pressure of only the sound source position instead of the sound pressure gradient distribution over the entire plane, it is possible to extract the sound source signal at a smaller calculation cost than that of the conventional case. [0008] FIG. 1 is a functional block diagram of an example of a sound source signal extraction device according to a first embodiment. 04-05-2019 2 FIG. 6 is a view for explaining an example of arrangement of microphone arrays of the sound source signal extraction device according to the first embodiment. The functional block diagram of the example of the sound source signal extraction apparatus of 2nd embodiment. The figure for demonstrating the example of arrangement ¦ positioning of the microphone array of the sound source signal extraction apparatus of 2nd embodiment. The flowchart which shows the example of the sound source signal extraction method. The figure which shows the example of a sound source signal. The figure which shows the sound signal collected by the microphone. The figure which shows the sound source signal extracted by the sound source signal extraction method of 2nd embodiment. The figure which shows the sound source signal extracted by the conventional sound source signal extraction method. [0009] Hereinafter, an embodiment of the present invention will be described with reference to the drawings. [0010] First Embodiment As shown in FIG. 2, the sound source signal extraction apparatus and method of the first embodiment includes two Nx × Ny nondirectional microphones arranged at the position of z = z0. Of sound signals emitted by the sound source S located at (xs, ys, zs) using the sound signals collected by the two-dimensional microphone arrays M1-1, M2-1,. Extract [0011] Nx and Ny are positive integers. The number of microphones constituting the microphone array M1-1, M2-1,..., MNx-Ny is 04-05-2019 3 basically arbitrary. However, it is desirable that there be as many microphones as possible to acquire spatial sound pressure distribution. It is desirable to set each interval of the microphones constituting the microphone arrays M1-1, M2-1,..., MNx-Ny to a length equal to or less than a half wavelength of the sound signal to be collected. [0012] The positions of the microphones constituting the two-dimensional microphone arrays M1-1, M2-1,..., MNx-Ny arranged at the position of z = z0 are represented as rij = (xi, yj, z0). It is assumed that the position (xs, ys, zs) of the sound source S and the positions (xi, yj, z0) of the microphones constituting the two-dimensional microphone array M1-1, M2-1, ..., MNx-Ny are known. [0013] The sound source signal extraction apparatus according to the first embodiment includes, for example, a frequency domain conversion unit 1, a window function unit 2, a filter unit 3, and a time domain conversion unit 4 as shown in FIG. The sound source signal extraction device performs processing of each step of the sound source signal extraction method illustrated in FIG. 5. [0014] The two-dimensional microphone arrays M1-1, M2-1,..., MNx-Ny pick up the sound emitted by the sound source S to generate a time domain sound signal. The generated sound signal is sent to the frequency domain conversion unit 1. The sound signal of time t collected in the microphone Mi-j at rs = (xi, yj, z0) is denoted as f (i, j, t). [0015] 04-05-2019 4 The frequency domain conversion unit 1 Fourier-transforms the sound signal f (i, j, t) collected by the microphone arrays M1-1, M2-1, ..., MNx-Ny into the frequency domain signal F (i, j, t). Convert to ω) (step S1). The generated frequency domain signal F (i, j, ω) is sent to the window function unit 2. ω is a frequency. For example, the frequency domain signal F (i, j, ω) is generated by short time discrete Fourier transform. Of course, the frequency domain signal F (i, j, ω) may be generated by another existing method. [0016] The window function unit 2 multiplies the frequency domain signal F (i, j, ω) by the window function to generate a window function after frequency domain signal Fw (j, j, ω) (step S2). The window function after frequency domain signal Fw (j, j, ω) is sent to the filter unit 3. As a window function, a so-called Turkey window function w (i, j) defined by the following equation is used, for example. Ntpr is a score to which a taper is applied, and is an integer of 1 or more and Nx and Ny or less. As described later, the window function unit 2 may be omitted. [0017] [0018] The filter unit 3 performs the filter processing defined by the following equation on the frequency domain signal F (i, j, ω) after the window function (step S 3), and acquires the sound pressure at the sound source position to obtain Extract the sound source signal S (ω). The signal S (ω) after the filtering process is sent to the time domain conversion unit 4. k is a wave number, and c = ω / c, where c is the speed of sound. In the following equation, j to the left of the wave number k means an imaginary unit. H (i, j, ω) is a filter. [0019] [0020] 04-05-2019 5 The time domain conversion unit 4 converts the signal S (ω) after the filtering process into a time domain signal s (t) by inverse Fourier transform (step S4). As the inverse Fourier transform, an existing method such as a short time discrete inverse Fourier transform may be used. This time domain signal s (t) is a signal at the position of the sound source S. [0021] Second Embodiment The sound source signal extraction apparatus and method of the first embodiment use a two-dimensional microphone array, whereas the sound source signal extraction apparatus and method of the second embodiment use a one-dimensional microphone array. As a result, the number of microphones, that is, the number of channels can be reduced, which makes implementation relatively easy. [0022] The sound source signal extraction apparatus and method according to the second embodiment includes a microphone array M1 including Nx nondirectional microphones arranged at the positions of y = y0 and z = z0 in the first room shown in FIG. The sound signals emitted by the sound source S located at (xs, ys, zs) are extracted using the sound signals collected by -1, M2-1, ..., MNx-1. [0023] Nx is a positive integer. The number of microphones constituting the microphone array M1-1, M2-1,..., MNx-1 is basically arbitrary. However, it is desirable that there be as many microphones as possible to acquire spatial sound pressure distribution. It is desirable to set the intervals of the microphones constituting the microphone arrays M1-1, M2-1,..., MNx-1 to a length equal to or less than a half wavelength of the sound signal to be collected. [0024] 04-05-2019 6 The positions of the microphones constituting the microphone arrays M1-1, M2-1,..., MNx-1 arranged at the positions of z = z0, y = y0 are represented by ri = (xi, y0, z0). . It is assumed that the position (xs, ys, zs) of the sound source S and the positions (xi, y0, z0) of the microphones constituting the microphone array M1-1, M2-1, ..., MNx-1 are known. [0025] The sound source signal extraction apparatus according to the second embodiment includes, for example, a frequency domain conversion unit 1, a window function unit 2, a filter unit 3, and a time domain conversion unit 4 as shown in FIG. The sound source signal extraction device performs processing of each step of the sound source signal extraction method illustrated in FIG. 5. [0026] The microphone arrays M1-1, M2-1,..., MNx-1 pick up the sound emitted by the sound source S to generate a time domain sound signal. The generated sound signal is sent to the frequency domain conversion unit 1. The sound signal of time t collected in the microphone Mi-1 with ri = (xi, y0, z0) is denoted as f (i, t). [0027] The frequency domain conversion unit 1 transforms the sound signal f (i, t) collected by the microphone arrays M1-1, M2-1,..., MNx-1 into a frequency domain signal F (i, ω) by Fourier transformation. To do (step S1). The generated frequency domain signal F (i, ω) is sent to the window function unit 2. ω is a frequency. For example, frequency domain signal F (i, ω) is generated by short time discrete Fourier transform. Of course, the frequency domain signal F (i, ω) may be generated by another existing method. [0028] 04-05-2019 7 The window function unit 2 multiplies the frequency domain signal F (i, ω) by the window function to generate a window function after frequency domain signal Fw (i, ω) (step S2). The window function after frequency domain signal Fw (i, ω) is sent to the filter unit 3. As a window function, for example, a so-called Turkey key function wx (i) defined by the following equation is used. Ntpr is a score to which a taper is applied, and is an integer of 1 or more and Nx or less. [0029] [0030] The filter unit 3 performs the filter processing defined by the following equation on the frequency domain signal F (i, ω) after the window function (step S3), and acquires the sound pressure of the sound source position to obtain the sound source signal in the frequency domain Extract S (ω) which is The signal S (ω) after the filtering process is sent to the time domain conversion unit 4. k is a wave number, and c = ω / c, where c is the speed of sound. In the following equation, j to the left of the wave number k means an imaginary unit. H (i, ω) is a filter. [0031] [0032] In the above equation, Hn <(1)> (·) is a first-class Hankel function. The first-class Hankel function Hn <(1)> (x) is defined as follows using a first-type Bessel function Jn (x) and a second-type Bessel function Yn (x). [0033] [0034] 04-05-2019 8 The time domain conversion unit 4 converts the signal S (ω) after the filtering process into a time domain signal s (t) by inverse Fourier transform (step S4). As the inverse Fourier transform, an existing method such as a short time discrete inverse Fourier transform may be used. This time domain signal s (t) is a signal at the position of the sound source S. [0035] [Theoretical Background] The reason why the processing of the filter unit 3 in the first embodiment and the second embodiment is respectively the formula (1) and the formula (2) will be described below. In the following, ω may not be described to simplify the notation. [0036] The sound pressure P (rs) at the sound source position rs is estimated using the sound pressure P (r0) at the position r0 = (x0, y0, z0) on the plane z = z0 in FIG. become that way. [0037] [0038] Here, κ <-1> is a kernel function. [0039] [0040] kx and ky are the wave number in the x direction and the wave number in the y direction, respectively. 04-05-2019 9 The kernel function <<-1> can be decomposed into the sum of the component κ1 <-1> corresponding to the homogeneous wave and the component κ2 <-1> corresponding to the inhomogeneous wave. In general, the kernel function κ <−1> can be approximated as follows, ignoring components corresponding to inhomogeneous waves that are considered not to contribute to the reproduction of the sound field. [0041] [0042] Substituting the right side of equation (4) into equation (3) results in the following. [0043] [0044] P (rs) in the above equation corresponds to S (ω) in the first embodiment, and P (r0) corresponds to Fw (i, j, ω). It is Formula (1) which expressed the said formula discretely. [0045] Also in the second embodiment in which the microphones are arranged on a straight line, the equation (2) is obtained by ignoring the component 22 <-1> corresponding to the inhomogeneous wave of the kernel function <<-1> as described above. Is obtained. [0046] [Simulation Results] A general closed-type speaker was placed at a position 2 m from the center of the microphone array on a straight line arranged in 96 channels at 4 cm intervals, and the 04-05-2019 10 sound source signal was extracted by the sound source signal extraction method of the second embodiment. Show the results. The sound source signal reproduced from the speaker is 10 seconds of voice (female voice + male voice) as shown in FIG. [0047] At this time, the sound pickup signal by the microphone at the closest distance from the sound source is as shown in FIG. The SN ratio of this signal to the original speech is 5.55 dB. In calculating the SN ratio, the cross correlation was used to correct the time difference, and the least squares method was used to correct the amplitude so as to minimize the error. [0048] FIG. 8 shows a sound source signal extracted by the proposed method. The sound source position is known. The SN ratio at this time was 20.07 dB. [0049] FIG. 9 shows the result of the conventional focus formation in which the time difference and the sum are given to the microphone array signals. At this time, a process of dividing the signal by the distance to the focal position is added from the viewpoint of maximizing the SN ratio. The SN ratio at this time was 5.43 dB. 04-05-2019 11 [0050] As described above, by estimating the sound pressure at the sound source position, it is possible to extract the sound source signal with a higher SN ratio than that of the prior art. [0051] Modified Example Etc. The window function unit 2 may be omitted. In this case, the filter unit 3 generates the frequency domain signal F (i, j, ω) or F (i, ω) rather than the window function after frequency domain signal Fw (i, j, ω) or Fw (i, ω) On the other hand, the same filter processing as in the case of the window function unit 2 is performed. In this case, the filter unit 3 of the first embodiment specifically performs the filter process defined by the following equation. [0052] [0053] Further, in this case, the filter unit 3 of the second embodiment performs the filtering process specifically defined by the following equation. [0054] [0055] The sound source signal extraction device can be realized by a computer. In this case, the processing content of each part of this apparatus is described by a program. And each part in this apparatus is implement ¦ achieved on a computer by running this program 04-05-2019 12 by computer. [0056] The program describing the processing content can be recorded in a computer readable recording medium. Further, in this embodiment, these devices are configured by executing a predetermined program on a computer, but at least a part of the processing contents may be realized as hardware. [0057] The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention. [0058] 1 frequency domain converter 2 window function 3 filter 4 time domain converter 04-05-2019 13

1/--страниц