Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2009025490 An object of the present invention is to provide a sound collection device having sufficient noise suppression characteristics. A sound collection device according to the present invention includes six or more sound collection units, a processing target signal generation unit, a power spectrum estimation unit, a gain coefficient calculation unit, and a multiplication unit. Each sound pickup unit picks up sound in different regions using output signals of a microphone array configured by mounting a plurality of microphones. The processing target signal generation unit generates a processing target signal from signals from one or more predetermined microphones or a sound collection unit. The power spectrum estimation unit estimates the signal amount of the desired sound source and the signal amounts of the other sound sources for each frequency from the signal amount of each collected sound signal obtained by each sound collection unit. The gain coefficient calculation unit obtains a gain coefficient for each frequency from the signal amount of the desired sound source, the signal amounts of all the sound sources including the signal amount of the desired sound source, and the processing target signal. The multiplication unit multiplies the processing target signal by the gain coefficient calculated by the gain coefficient calculation unit. [Selected figure] Figure 8 Sound collecting device, sound collecting method, sound collecting program using the method, and recording medium [0001] The present invention relates to a sound collection device, a sound collection method, a sound collection program using the method, and a recording medium for collecting sound in a handsfree manner such as voice communication and operation of equipment, except for a desired sound source The problem is greatly related to the case where there are many noise sources of. 04-05-2019 1 [0002] As a method of emphasizing the desired sound source at a specific position assuming a handsfree microphone in an environment where many background noises exist, a method of estimating and emphasizing the desired sound power from a plurality of beamformer outputs has been proposed (Non-Patent Document 1). In this method, estimated signal power ¦ S (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω, l) ¦ <2>, estimated front direction noise power ¦ NC (ω, l) ¦ < 2> Calculate the gain coefficient R (ω, l) using the estimated right-hand noise power ¦ NR (ω, l) ¦ <2>. [0003] Then, the signal to be processed is multiplied by the gain coefficient R (ω, l) to obtain a signal in which the noise component is suppressed for each frequency domain. Hirooka Hioka, Kazunori Kobayashi, Kenichi Furuya, Akitoshi Kataoka, "Enhancing a Sound Source at a Specific Position Using a Small Microphone Array Pair," Proceedings of the Spring Meeting of the Acoustical Society of Japan, pp. 621-622, 2006 . [0004] In the technique of Non-Patent Document 1, the gain coefficient R (ω, l) is a value that fluctuates between 0 and 1, and there was a case where a sufficient noise suppression effect could not be obtained. The sound collection device of the present invention has been made to solve this problem, and its object is to improve the noise suppression performance. [0005] The sound collection device of the present invention includes six or more sound collection units, a processing target signal generation unit, a power spectrum estimation unit, a gain coefficient calculation unit, and a multiplication unit. Each sound pickup unit picks up sound in different regions using output signals of a microphone array configured by mounting a plurality of 04-05-2019 2 microphones. Here, "different" means that they do not match, and there may be overlapping parts. The processing target signal generation unit generates a processing target signal from signals from one or more predetermined microphones or a sound collection unit. The power spectrum estimation unit estimates the signal amount of the desired sound source and the signal amounts of the other sound sources for each frequency from the signal amount of each collected sound signal obtained by each sound collection unit. The gain coefficient calculation unit obtains a gain coefficient for each frequency from the signal amount of the desired sound source, the signal amounts of all the sound sources including the signal amount of the desired sound source, and the processing target signal. The multiplication unit multiplies the processing target signal by the gain coefficient calculated by the gain coefficient calculation unit. [0006] For example, the gain coefficient calculation unit may process the signal to be processed as YS (ω, l), the signal amount of the desired sound source estimated by the power spectrum estimation unit as S (ω, l), and the signal amounts of the other sound sources as N (ω, l). Let l) be the gain factor R (ω, l) [0007] とすればよい。 [0008] According to the sound collection device of the present invention, the gain coefficient is determined also in consideration of the processing target signal. Therefore, it is possible to obtain a gain coefficient that makes use of the advantages of both the gain coefficient that does not consider the processing target signal and the gain coefficient that has been considered. Therefore, noise suppression characteristics can be improved. [0009] 04-05-2019 3 FIG. 1 shows an example of usage of the present invention. Two small scale microphone arrays 3L and 3R are arranged at different positions to some extent (for example, the same distance as the distance between the microphone arrays 3L and 3R and the desired sound source 1), and for each signal received by the microphone The processing described below is performed. By performing the processing described below, the sound of the desired sound source 1 is emphasized and collected, and the sound of the background noise source 2 is suppressed. [0010] Before describing the present invention, first, the technology disclosed in the unpublished patent application (Japanese Patent Application No. 2006-52502) will be described. The whole structure of the sound collection apparatus of Japanese Patent Application No. 2006-52502 is shown in FIG. The outline of the sound collection device will be described with reference to FIG. The respective sound receiving signals generated by the respective microphones of the microphone array 3L are inputted to the first sound collecting unit 4-1 and the third sound collecting unit 4-3 in this example. Further, the respective sound receiving signals generated by the respective microphones of the microphone array 3R are input to the second sound collecting unit 4-2 and the fourth sound collecting unit 4-4 in this example. The signals of the microphones located at the centers of the microphone arrays 3L and 3R are input to the fifth sound collecting unit 4-5 and the sixth sound collecting unit 4-6. The number of microphones mounted on both microphone arrays 3L and 3R is not necessarily the same. [0011] As shown in FIG. 4, the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4 have M filter processing units 41 to which the sound reception signals x1 to xm of the respective microphones are input, It is comprised by the addition part 42 which adds each output signal of the filter process part 41. FIG. Each filter processing unit 41 is constituted by, for example, an FIR filter, and performs analysis processing for each frequency component included in the collected sound signal by digital processing to set the directivity characteristics of the microphone arrays 3L and 3R. Such a technology is described, for example, in "Sound system and digital processing" co-authored by Oga Juro, Yoshio Yamazaki and Toyoda Kanada on March 25, 1995, published by The Institute of Electronics, Information and Communication Engineers, and can be realized by a well-known technology. it can. [0012] 04-05-2019 4 Here, the directivity characteristics of the first sound collection unit 4-1 and the directivity characteristics of the second sound collection unit 4-2 are angle regions Θ L including the position of the desired sound source 1 shown in FIG. 3 from the approximate center position of the microphone arrays 3L and 3R. Set to a characteristic that sets 収 and Θ R as the sound collection range. The directional characteristics of the third sound collecting unit 4-3 and the fourth sound collecting unit 4-4 are angular regions Θ L Θ and ¯ R な い not including the position of the desired sound source 1 shown in FIG. And set the characteristic as the sound collection range. Furthermore, the directivity of the fifth sound collecting unit 4-5 is set to a characteristic that the angle range Θ C including the position of the desired sound source 1 from the approximate middle position of the microphone arrays 3L and 3R is the sound collection range. The directivity of the sixth sound collecting unit 4-6 is set to a characteristic in which the angular range from the approximate middle position between the microphone arrays 3 L and 3 R to the angular range C excluding the position of the desired sound source 1 is the sound collection range. [0013] The sound collection signal collected by the directional characteristics of the first to sixth sound collection units 4-1 to 4-6 is converted to a signal in the frequency domain by the frequency domain conversion unit 5. In the conversion to the frequency domain, the input signal is decomposed into frames of a short time length (for example, about 256 samples in the case of sampling frequency 16000 Hz), and discrete Fourier transform is performed in each frame. For the discrete Fourier transform, for example, a fast Fourier transform or the like called FFT or the like can be used. The signal transformed into the frequency domain is divided into a plurality of frequency domain components. [0014] The collected sound signal converted into the signal in the frequency domain is described as the adding unit 6 and the power spectrum estimating unit 7 (however, in the specification of Japanese Patent Application No. 2006-52502, "the sound source signal component estimating unit"). And). The output signals of the first sound collecting unit 4-1 and the second sound collecting unit 4-2 are input to the adding unit 6. The adder 6 adds the signals of each frequency domain converted to the frequency domain for each same frequency domain component. 04-05-2019 5 [0015] The power spectrum estimation unit 7 receives all output signals of the first sound collection unit 4-1 to the sixth sound collection unit 4-6, and estimates the signal amount of each sound source for each frequency domain. If the signal amount of each sound source can be estimated, the ratio of the signal amount of the desired sound source 1 to the signal amount of other sound sources, that is, the SN ratio can be obtained. This SN ratio is determined for each frequency domain, and this SN ratio is used as a gain coefficient by multiplying each signal having the signal of the desired sound source 1 given from the adding unit 6 as a gain coefficient for each frequency domain. It is possible to suppress the background noise component contained in the signal whose main component is the signal of the sound source 1. The multiplication result of the multiplication unit 9 is converted to a time domain signal by the inverse frequency domain conversion unit 10, and is output as a signal after noise removal. The above is the outline of the invention of Japanese Patent Application No. 2006-52502. [0016] The configuration and operation of each part will be described in detail below. FIG. 4 shows the configuration of the first to fourth sound collecting units 4-1 to 4-4. Here, although the first sound collecting unit 4-1 is described as an example, the same process is performed for the second sound collecting unit 4-2, the third sound collecting unit 4-3, and the fourth sound collecting unit 4-4. It will be. These first sound collecting units 4-1 to 4-4 do not include the sound collecting characteristic and the desired sound source position that set the angle range including the desired sound source position from the directions on both sides of the position of the desired sound source 1 Since it is set to the sound collection characteristic which makes an angle area a sound collection range, it functions as a side beam former. The signal xLmL (n) (mL = 1, 2,..., ML) input to the first sound collection unit 4-1 is input to the filter processing unit 41. The filter processing unit 41 substitutes the filter coefficient wLmL (n) given in advance (the determination method will be described later) and the input signal xLmL (n) into the convolution operation shown in equation (1) to obtain the signal x'LmL ( Output n). [0017] The output signal of each filter processing unit 41 is input to the addition unit 42. The adding unit 42 adds the input signals as shown in equation (2) to obtain an output signal ySL (n) of the 04-05-2019 6 first sound collecting unit 4-1. [0018] Here, the filter coefficient wLmL (n) is designed using, for example, the least squares method or the like so that the directivity characteristic DLSPB (ω, θ) of the first sound collecting unit has the characteristic shown in the equation (3). Similarly, the second sound collecting unit, the third sound collecting unit, and the fourth sound collecting unit are designed to satisfy the conditions of the equations (4) to (6). Each of Θ and 示 す indicates a peripheral direction of the desired signal (for example, a direction within a range of about ± 10 ° from the desired signal direction) and the other direction. Further, D (.omega., .Theta.) Shown in the equations (3) to (6) represents the directivity characteristic of each sound collecting unit. [0019] The first sound collection unit 4-1 emphasizes and collects only the sound emitted in the direction of the desired sound source 1 when viewed from the microphone array 3L. As viewed from the microphone array 3L, the third sound collection unit emphasizes and collects only sounds emitted in directions other than the direction of the desired sound source. As viewed from the microphone array 3R, the second sound collection unit 4-2 emphasizes and collects only the sound emitted in the direction of the desired sound source 1. The fourth sound collecting unit 4-4 emphasizes and collects only sounds emitted in directions other than the direction of the desired sound source 1 as viewed from the microphone array 3R. [0020] FIG. 5 shows the flow of processing in the fifth sound collecting unit 4-5 and the sixth sound collecting unit 4-6 which function as frontal beam formers. In the front beamformer, a signal xL (ML / 2) (n) received by the microphone disposed at the center of the microphone array 3L and a signal xR received by the microphone disposed at the center of the microphone array 3R (MR / 2) (n) is input to the filter processing units 51 and 52, respectively. In the filter processing units 51 and 52, the input signals xL (ML / 2) (n) and xR (MR / 2) (n) are given filters given in advance as shown in equations (7) and (8). Outputs x 'L (ML / 2) (n) and x' R (MR / 2) (n) obtained by convolving coefficients wC (ML / 2) (n) and wC (MR / 2) (n) Do. 04-05-2019 7 [0021] Here, it is desirable that the filter coefficients wC (ML / 2) (n) and wC (MR / 2) (n) have the same phase characteristics, for example, a single impulse signal. [0022] Is used. The fifth sound collection unit 4-5 inputs the output signals x'L (ML / 2) (n) and x'R (MR / 2) (n) of the filter processing units 51 and 52 to the addition unit 53. The adding unit 53 adds the input signals as shown in equation (10), and outputs a signal ySC (n). As a result, in the fifth sound collecting unit 4-5, only the sound emitted in the direction of the desired sound source 1 is emphasized and collected as viewed from the midpoint between the microphone array 3L and the microphone array 3R. [0023] ySC (n) = x'L (ML / 2) (n) + x'R (MR / 2) (n) (10) In the sixth sound collection unit 4-6, the output signals x 'of the filter processing units 51 and 52 L (ML / 2) (n) and x′R (MR / 2) (n) are input to the subtraction unit 54. The subtractor 54 subtracts the input signal as shown in equation (11), and outputs a signal yNC (n). Therefore, in the sixth sound collecting unit 4-6, only the sound emitted in the direction other than the direction of the desired sound source 1 is emphasized and collected, as viewed from the middle point between the microphone array 3L and the microphone array 3R. [0024] yNC (n) = x'L (ML / 2) (n) -x'R (MR / 2) (n) (11) FIG. 6 shows the flow of processing in the power spectrum estimation unit 7. The frequency components YSL (ω, l), YNL (ω, l), YSC (ω, l), YNC (ω, l), YSR (ω, l), YNR (ω, l) input to the power spectrum estimation unit 7 l) are input to the power calculation unit 61, and the power values of the signals ¦ YSL (ω, l) ¦ <2>, ¦ YNL (ω, l) ¦ <2>, ¦ YSC (ω, l) ¦ <2 >, ¦ YNC (ω, l) ¦ <2>, ¦ YSR (ω, l) ¦ <2>, ¦ YNR (ω, l) ¦ <2> are output and input to the vectorization unit 62. In the vectorization unit 62, a power vector Y (ω) is obtained 04-05-2019 8 by putting together the power values of the input first to sixth output signals of the first to sixth sound collection units 4-1 to 4-6 in vector form as in equation (12). , L) are output. [0025] The power vector Y (ω, l) is input to the multiplier 63. The power estimation matrix T <+>, which is the other input of the multiplier 63, is an output signal of the pseudo inverse matrix calculator 64. The gain matrix T defined by the equation (19) is input to the pseudo inverse matrix operation unit 64, and the pseudo inverse matrix T <+> is output. [0026] Each element of the gain inverse matrix T is set in the fifth sound collecting unit 4-5, the sixth sound collecting unit 4-6, and the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4 in the x direction Or, it is the gain of the directional characteristic in the Θx direction, and uses, for example, an average value of the frequency and direction of the directional characteristic as shown in equations (14) to (17). [0027] [alpha] x is an average value of the directivity characteristics set in the first, second, and fifth sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired sound. [beta] x is an average value of the directional characteristics set in the first, second, and fifth sound collecting units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired signal. [gamma] x is an average value of directivity characteristics set in the third, fourth, and sixth sound collecting units 4-3, 4-4, and 4-6 with respect to the peripheral direction of the desired signal. [delta] x is an average value of directivity characteristics set in the third, fourth, and sixth sound collecting units 4-3, 4-4, and 4-6 with respect to directions other than the peripheral direction of the desired signal. In the equations (14) to (17), the subscript x represents any one of R, C, and L. [0028] The multiplying unit 9 multiplies the beam former output power vector and the power estimation 04-05-2019 9 matrix, which are input as shown in equation (18), for each frequency component, and outputs an estimated signal power vector X opt (ω, l). [0029] X opt (ω, l) = T <+> Y (ω, l) (18) FIG. 7 shows the flow of processing in the gain coefficient calculation unit 8. The estimated signal power vector X opt (ω, l) input from the power spectrum estimation unit 7 shown in FIG. 6 is input to the vector element extraction unit 81. The vector element extraction unit 81 estimates the first component of the input estimated signal power vector as the estimated signal power ¦ S (ω, l) ¦ <2> and estimates the second component as shown in equation (19), and the left direction noise Power ¦ NL (ω, l) ¦ <2>, estimated third component Front direction noise power ¦ NC (ω, l) ¦ <2>, estimated fourth component right direction noise power ¦ NR (ω, l) They are output as ¦ <2>, respectively, and they are input to the SN ratio estimation unit 82. [0030] The SN ratio estimation unit 82 calculates the estimated SN ratio ESNR (ω, l) using Equation (20). [0031] The estimated SN ratio ESNR (ω, l), which is the output of the SN ratio estimator 82, is output as a gain coefficient R (ω, l). [0032] The gain factor R (ω, l) is calculated for each frequency domain. Therefore, in the frequency domain where the amount of noise mixing is small, the gain coefficient R (ω, l) has a value close to 1 , and the desired signal component is output as it is. Further, in the frequency domain where the amount of noise mixing is large, the gain coefficient 04-05-2019 10 R (ω, l) becomes a value close to 0 , and the signal component in the frequency domain is largely attenuated to suppress the noise amount. As described above, the noise component is suppressed for each frequency domain by multiplying the signal YS (ω, l) having as a main component the desired signal supplied from the adding unit 6 with the gain coefficient R (ω, l) for each frequency domain. Thus, it is possible to improve the SN ratio of the signal converted to the time domain by the inverse frequency domain conversion unit 10. [0033] First Embodiment FIG. 8 shows an example of the overall configuration of a sound collection device according to a first embodiment of the present invention. The gain coefficient calculation unit 130 and the processing target signal generation unit 140 are different from the entire configuration of the sound collection device of Japanese Patent Application No. 2006-52502 shown in FIG. FIG. 9 is a diagram showing a processing flow of the sound collection device of the first embodiment. [0034] The first and second sound collecting units 4-1 and 4-2 use the output signals of a microphone array configured by mounting a plurality of microphones, and sound ySL of an angular region including a desired sound source position from different positions from each other (N), ySR (n) is picked up (S4-1, S4-2). The third and fourth sound collecting units 4-3 and 4-4 use the output signals of the microphone array to generate sounds yNL (n) and yNR (n) of angle areas not including the desired sound source position from different positions. Are picked up (S4-3, S4-4). The fifth sound collecting unit 4-5 collects the sound ySC (n) in the angle area including the desired sound source position from the middle point of the mutually different positions (S4-5). The sixth sound collecting unit 4-6 collects the sound yNC (n) of the angle area not including the desired sound source position from the middle point (S4-6). The frequency domain conversion unit 5 receives the signals ySL (n), ySR (n), yNL (n), yNR (n), ySC (n), yNC collected by the sound collection units 4-1 to 4-6. The frequency domain signals YSL (ω, l), YSR (ω, l), YNL (ω, l), YNR (ω, l), YSC (ω, l), YNC (ω, l) Convert to The frequency domain conversion unit 5 may be provided in each of the sound collection units 4-1 to 1-6. The processing target signal generation unit 140 outputs the signal YSL (ω, l) from the first sound collection unit 4-1 converted into the frequency domain and the signal YSR (ω, l) from the second sound collection unit 4-2. The average is set as the processing target signal YS (ω, l) (S140). The power spectrum estimation unit 7 receives the respective collected signals YSL (ω, l), YSR (ω, l) and YNL (ω, l) obtained by the respective sound collection units 4-1 to 4-6 converted to the frequency domain. l) Estimate 04-05-2019 11 the signal amount of the desired sound source and the signal amount of the other sound source X opt (ω, l) for each frequency from YNR (ω, l), YSC (ω, l), YNC (ω, l) (S7). The gain coefficient calculation unit 130 obtains a gain coefficient R (ω, l) for each frequency from the signal amount of the desired sound source, the signal amount X opt (ω, l) of the other sound sources, and the processing target signal YS (ω, l). (S130). The multiplying unit 9 multiplies the signal to be processed YS (ω, l) by the gain coefficient R (ω, l) calculated by the gain coefficient calculating unit 130 (S9). The inverse frequency domain transform unit 10 transforms the processing target signal R (ω, l) YS (ω, l) multiplied by the gain coefficient into the time domain. The inverse frequency domain transform unit 10 may be provided in the multiplication unit 9. [0035] Next, details of components different from the sound collection device of FIG. 2 will be described. FIG. 10 is a diagram showing an example of a functional configuration of the processing target signal generation unit 140. As shown in FIG. The processing target signal generation unit 140 includes an addition unit 141 and a division unit 142. The addition unit 141 adds the signal YSL (ω, l) from the first sound collection unit 4-1 in the frequency domain and the signal YSR (ω, l) from the second sound collection unit 4-2. The division unit 142 divides the added signal by 2 and outputs the average value as the processing target signal YS (ω, l). In the sound collection device of FIG. 2, the addition unit 6 causes the signal YSL (ω, l) from the first sound collection unit 4-1 in the frequency domain and the signal YSR (ω, l) from the second sound collection unit 4-2. To be processed as the processing target signal YS (ω, l). The difference is whether to divide by two or not. The difference caused by this difference is only the volume of the entire signal, and since the waveforms are the same, they are equivalent from the viewpoint of signal processing. That is, even if dividing by a value other than 2, it is equivalent processing. [0036] FIG. 11 shows a functional configuration example of the gain coefficient calculation unit 130. The gain coefficient calculation unit 130 includes a vector element extraction unit 81, a first gain calculation unit 131, a second gain calculation unit 132, and a gain multiplication unit 133. As shown in equation (19), the vector element extraction unit 81 estimates the first component of the input estimated signal power vector as the estimated signal power ¦ S (ω, l) ¦ <2>, and estimates the second component to the left. Directional noise power ¦ NL (ω, l) ¦ <2>, estimated third component Front direction noise power ¦ NC (ω, l) ¦ <2>, estimated fourth component right direction noise power ¦ NR (ω, l) Output as ¦ <2>. From the estimated signal power ¦ S (ω, l) ¦ <2> and the processing target signal YS (ω, l), the first gain calculator 131 calculates the first 04-05-2019 12 gain coefficient GS (ω, l) as in the following equation Calculate and output. [0037] The second gain calculator 132 estimates estimated signal power ¦ S (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω, l) ¦ <2>, estimated front direction noise power ¦ NC (ω, The second gain coefficient GSNR (ω, l) is calculated from the following equation using l) ¦ <2> and the estimated right-hand noise power ¦ NR (ω, l) ¦ <2>, and is output. [0038] Note that ¦ NL (ω, l) ¦ <2> + ¦ NC (ω, l) ¦ <2> + ¦ NR (ω, l) ¦ <2> is the power of the signal amount from a sound source other than the desired sound source ¦ If N (ω, l) ¦ <2>, then equation (22) can be expressed as the following equation. [0039] The gain multiplication unit 133 outputs the product of the first gain coefficient GS (ω, l) and the second gain coefficient GSNR (ω, l) as a gain coefficient R (ω, l) as expressed by the following equation. R (ω, l) = GS (ω, l) · GSNR (ω, l) (24) The processing of the other components is the same as the sound collection device of FIG. [0040] Next, the principle of the present invention for suppressing noise will be described. The product of the first gain coefficient GS (ω, l) and the processing target signal YS (ω, l) is a signal having a power spectrum with the same amplitude as the estimated signal power ¦ S (ω, l) ¦ <2> . The estimated signal power ¦ S (ω, l) ¦ <2> is in principle identical to the power of the desired sound source. Therefore, suppression of the noise component can be expected by the process of multiplying the processing target signal YS (ω, l) by the first gain coefficient GS (ω, l). However, in practice there are various disturbances such as reverberation and sensitivity errors of microphones, and since many errors are included, sufficient noise suppression characteristics 04-05-2019 13 can not always be obtained. On the other hand, since the gain coefficient and the second gain coefficient GSNR (ω, l) which are the output of the gain coefficient calculation unit 8 of Japanese Patent Application No. 2006-52502 also use the estimated power of noise in the calculation process, the estimated signal power ¦ S Even when a large amount of noise is included in (ω, l) ¦ <2>, the noise component can be suppressed if the estimated power of noise ¦ N (ω, l) ¦ <2> is accurate. However, since these gain coefficients are normalized in the range of 0 to 1, the noise suppression performance is slow and the noise suppression effect is not high. As described above, the first gain coefficient, and the gain coefficient and the second gain coefficient of Japanese Patent Application No. 2006-52502 both have advantages and disadvantages. The sound collection device according to the first embodiment can obtain a gain coefficient that makes use of the advantages of both by multiplying both gain coefficients. Therefore, noise suppression characteristics can be improved. [0041] Second Embodiment FIG. 12 shows an example of the overall configuration of a sound collection device according to a second embodiment of the present invention. The present embodiment differs from the first embodiment (FIG. 8) in that each of the sound collection units 4'-1 to 4'-6, the processing target signal generation unit 140 ', the power spectrum estimation unit 7', and the gain coefficient calculation unit 130 '. Hereinafter, components different from those of the first embodiment will be described. The processing flow of the sound collection device of the second embodiment is shown in FIG. [0042] FIG. 13 is a diagram showing an area of a sound source position for describing setting of each of the sound collection units 4'-1 to 4'-6. Moreover, FIG. 14 is a figure which shows the function structural example of 1st sound collection part 4'-1. A signal xLmL (n) (mL = 1, 2, ..., ML) is input to the microphone array 3L. In the filter processing unit 41 ′, a signal x ′ obtained by substituting a predetermined filter coefficient wLmL (n) (the determination method will be described later) and the input signal xLmL (n) into the convolution operation shown in equation (25) Output LmL (n). [0043] 04-05-2019 14 The output signal of each filter processing unit 41 'is input to the addition unit 42'. The adding unit 42 'adds the input signals according to the following equation to obtain an output signal yLL (n) of the first sound collecting unit 4'-1. [0044] Here, the filter coefficient wLmL (n) is determined by using, for example, the least squares method or the like so that the directivity characteristic DLSB (ω, θ) of the first sound collecting unit 4′-1 has the characteristic shown in equation Designed. Similarly, the third sound collecting unit and the fifth sound collecting unit are designed to satisfy the conditions of the equations (28) and (29). Each of Θ L1 to Θ L3 indicates an angle area as viewed from the microphone array 3L shown in FIG. [0045] That is, the first sound collecting unit 4'-1 suppresses and collects the sound of the angle region 'L1 (S4'-1). The third sound collecting unit 4'-3 suppresses and collects the sound of the angle region 'L2 (S4'-3). The fifth sound collecting unit 4'-5 suppresses and collects the sound of the angle region 'L3 (S4'-5). [0046] Similarly, as shown in Equations (30) to (32), the second sound collection unit 4′-2 of the microphone array 3R suppresses and collects the sound of the angle region Θ R1 (S4′-2) . The fourth sound collecting unit 4'-4 suppresses and collects the sound of the angle region 'R2 (S4'4). The sixth sound collecting unit 4'-6 suppresses and collects the sound of the angle region 'R3 (S4'-6). [0047] FIG. 15 is a diagram showing an example of a functional configuration of the processing target signal generation unit 140 '. The processing target signal generation unit 140 'includes an addition unit 141' and a division unit 142 '. The adding unit 141 ′ receives the signal YLL (ω, l) 04-05-2019 15 from the first sound collecting unit 4-1 ′ in the frequency domain, the signal YLR (ω, l) from the second sound collecting unit 4-2 ′, and the fifth collection. The signal YRL (ω, l) from the sound unit 4-5 ′ and the signal YRR (ω, l) from the sixth sound collecting unit 4-6 ′ are added as in the following equation, and the addition result Y ′S (ω , L) are output. [0048] The division unit 142 'divides the added signal Y'S (ω, l) by 4 as in the following equation, and outputs the average value as the processing target signal YS (ω, l) (S140'). [0049] YS (ω, l) = Y ′S (ω, l) / 4 (34) As described in the first embodiment, the waveform is the same regardless of the number divided by the dividing unit 142 ′. It is equivalent from the viewpoint of signal processing. That is, even if dividing by a value other than 4, it is equivalent processing. [0050] FIG. 16 shows an example of a functional configuration of the power spectrum estimation unit 7 '. The power spectrum estimation unit 7 'includes a power calculation unit 61', a vectorization unit 62 ', a multiplication unit 63', and a pseudo inverse matrix calculation unit 64 '. The power calculation unit 61 'outputs frequency domain signals YLL (ω, l), YCL (ω, l), YRL (ω, l), YLR (ω, l), YCR (ω, l) from the respective sound collection units. ), YRR (ω, l), power values ¦ YLL (ω, l) ¦ <2>, ¦ YCL (ω, l) ¦ <2>, ¦ YRL (ω, l) ¦ <2>, ¦ YLR (Ω, l) ¦ <2>, ¦ YCR (ω, l) ¦ <2>, ¦ YRR (ω, l) ¦ <2> is calculated and output. The vectorization unit 62 'outputs a power vector Y (ω, l) in which the power values are grouped in vector format as in equation (35). [0051] Then, the power vector Y (ω, l) is input to the multiplier 63 '. The power estimation matrix T <+>, which is the other input of the multiplier 63 ', is the output signal of the pseudo inverse matrix calculator 64'. The gain matrix T defined by the equation (36) is input to the pseudo inverse 04-05-2019 16 matrix operation unit 64 ', and the pseudo inverse matrix T <+> is output. [0052] Each element of the gain inverse matrix T (ω) is the gain of the directivity characteristic in the Θ1 direction, Θ2 direction, and Θ3 direction of each of the sound collection units 4′-1 to 4′6. Use the average value for the direction of directivity as shown in 39). [0053] α x (ω) is an average value of directivity characteristics of the first sound collecting unit 4′-1 and the second sound collecting unit 4′-2 at the frequency ω with respect to the direction of the angle region Θx. β x (ω) is an average value of directivity characteristics of the third sound collecting unit 4′-3 and the fourth sound collecting unit 4′-4 at the frequency ω with respect to the direction of the angle region Θx. γ x (ω) is an average value of directivity characteristics of the fifth sound collecting unit 4′-5 and the sixth sound collecting unit 4′-6 at the frequency ω with respect to the direction of the angular region Θx. Here, any one of L1, L2, L3, R1, R2 and R3 enters x. The multiplying unit 63 ′ multiplies the pseudo-inverse matrix T <+> by the signal Y ′ (ω, l) from which the reverberation is subtracted as shown in the equation (40), and estimates the estimated signal power vector Xopt (ω, l) Output (S7 '). [0054] X opt (ω, l) = T <+> Y (ω, l) (40) FIG. 17 shows a functional configuration example of the gain coefficient calculation unit 130 '. The gain coefficient calculation unit 130 ′ includes a vector element extraction unit 81 ′, a first gain calculation unit 131, a second gain calculation unit 132 ′, and a gain multiplication unit 133. The vector element extraction unit 81 ′ estimates the input estimated signal power vector X opt (ω, l), estimated signal power ¦ S (ω, l) ¦ <2>, estimated left side noise power ¦ NLL (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω, l) ¦ <2>, estimated front direction noise power ¦ NC (ω, l) ¦ <2>, estimated right direction noise power ¦ NR (ω, l) And <2>, and output as estimated right side noise power ¦ NRR (ω, l) ¦ <2>. From the estimated signal power ¦ S (ω, l) ¦ <2> and the processing target signal YS (ω, l), the first gain calculator 131 calculates the first gain coefficient GS (ω, l) as in the following equation Calculate and output. 04-05-2019 17 [0055] The second gain calculator 132 ′ estimates estimated signal power ¦ S (ω, l) ¦ <2>, estimated left side noise power ¦ NLL (ω, l) ¦ <2>, estimated left direction noise power ¦ NL (ω , L) ¦ <2>, estimated front direction noise power ¦ NC (ω, l) ¦ <2>, estimated right direction noise power ¦ NR (ω, l) ¦ <2>, estimated right direction noise power ¦ NRR ( Based on ω, l) ¦ <2>, the second gain coefficient GSNR (ω, l) is calculated according to the following equation and output. [0056] Note that ¦ NLL (ω, l) ¦ <2> + ¦ NL (ω, l) ¦ <2> + ¦ NC (ω, l) ¦ <2> + ¦ NR (ω, l) ¦ <2> + If ¦ NRR (ω, l) ¦ <2> is the power of the signal amount from a sound source other than the desired sound source ¦ N (ω, l) ¦ <2>, equation (42) is expressed as the following equation it can. [0057] The gain multiplication unit 133 outputs the product of the first gain coefficient GS (ω, l) and the second gain coefficient GSNR (ω, l) as the gain coefficient R (ω, l) as in the following equation (S130 ′ ). [0058] R (ω, l) = GS (ω, l) GSNR (ω, l) (44) The processing of the other components is the same as the sound collection device of the first embodiment. [0059] With the above-described configuration, the sound collection device of the second embodiment can improve the noise suppression characteristic as in the first embodiment. [0060] [Modification] FIG. 18 shows another configuration example (modification) of the power spectrum estimation unit of the second embodiment (FIG. 12). The power spectrum estimation unit 7 ′ ′ includes a power calculation unit 61 ′, a vectorization unit 62 ′, and a non-negative constrained least squares unit 63 ′ ′. 04-05-2019 18 The power calculation unit 61 'and the vectorization unit 62' are the same as the power spectrum estimation unit (FIG. 16) of the second embodiment. The non-negative constrained least squares unit 63 ′ ′ is constrained such that the input power vector Y (ω, l) and the gain matrix T are such that the estimated signal power vector Xopt (ω, l) is nonnegative as shown in equation (46) Under the conditions, as shown in equation (45), an estimated signal power vector Xopt (ω, l) that minimizes the square error of Y (ω, l) and T · X opt (ω, l) is determined and output Do. [0061] ‖ Y (ω, l) − T · X opt (ω, l) ‖ <2> (45) subject to X opt (ω, l) 46 0 (46) In addition, as a method of calculating this solution, for example, CL Lawson and RJ Hanson, Solving Least Squares Problems, Prentice-Hall, 1974. The Non-negative Least Square method described in can be used. Each component of Xopt (ω, l) should have a non-negative value because it is the power of the signal, but in the processing of Japanese Patent Application No. 2006-52502, the first embodiment, and the second embodiment, a negative value that can not be realized in reality May be a component. The inclusion of such components causes the degradation of noise suppression performance. In the process of the present modification, each component of the estimated signal power vector X opt (ω, l) always has a nonnegative value, so that the noise suppression characteristics can be improved. [0062] Third Embodiment FIG. 19 shows an example of the overall configuration of a sound collection device according to a third embodiment of the present invention. The power spectrum estimation unit 110 and the reverberation spectrum estimation unit 120 are different from the second embodiment (FIG. 12). Further, FIG. 20 shows an example of the processing flow of the entire sound collection device of the third embodiment. The point of estimating the reverberation 04-05-2019 19 spectrum from the estimation result of the power spectrum and performing feedback (subtraction) differs from the first embodiment and the second embodiment. Hereinafter, components different from those of the second embodiment will be described. [0063] FIG. 21 shows an example of a functional configuration of the power spectrum estimation unit 110. The power spectrum estimation unit 110 includes a power calculation unit 61 ′, a vectorization unit 62 ′, a subtraction unit 111, a multiplication unit 63 ′, and a pseudo inverse matrix calculation unit 64 ′. The power calculation unit 61 'and the vectorization unit 62' are the same as the power spectrum estimation unit 7 '(FIG. 16) of the second embodiment. The vectorization unit 62 'outputs a power vector Y (ω, l) in which the power values are grouped in vector format as in equation (35). [0064] The subtracting unit 111 subtracts the estimated signal amount Z <*> est (ω, l) of the reverberation from the vectorized signal Y (ω, l) as in the following equation, and the result Y ′ (ω , L) to the multiplication unit 63 '. [0065] Y ′ (ω, l) = Y (ω, l) −Z <*> est (ω, l) (47) The power spectrum estimator of the second embodiment is also the multiplier 63 ′ and the pseudo inverse matrix calculator 64 ′. Same as 7 '(FIG. 16). The gain matrix T defined by the equation (36) is input to the pseudo inverse matrix operation unit 64 ', and the pseudo inverse matrix T <+> is output. The multiplying unit 63 ′ multiplies the pseudo-inverse matrix T <+> by the signal Y ′ (ω, l) from which the reverberation is subtracted as shown in equation (48), and estimates the estimated signal power vector X opt (ω, l) Output. [0066] X opt (ω, l) = T <+> Y ′ (ω, l) (48) FIG. 22 shows a functional configuration example of the 04-05-2019 20 reverberation spectrum estimation unit 120. The reverberation spectrum estimation unit 120 includes a gain matrix multiplication unit 125 and a weighted addition unit 126. The gain matrix multiplication unit 125 converts the signal amount of the desired sound source and the signal amount X opt (ω, l) of the other sound source into the signal amount Zest (ω, l) for each sound collection unit. The gain matrix T ′ is the gain of the directivity of each sound collection unit with respect to the reverberation component, and may be, for example, the following equation. [0067] ただし、 [0068] である。 The weighted addition unit 126 records the signal amount Zest (ω, l) of each sound collection unit, and performs weighted addition of the signal amounts of each of a plurality of past sound collection units. Specifically, if weighted addition of the signal amount Zest (ω, l) for each sound collecting unit of N frames in the past is performed, N delay units 1211 to 121 N and N weight multiplying units 1221 to 122N and N−1 adders 1231 to 123N−1 may be provided. The first delay unit 1211 records the signal amount Zest (ω, l) of each sound collection unit and delays the signal amount by one frame. The first weight multiplying unit 1221 multiplies the output of the first delay unit 1211 (the signal amount Zest (ω, l) of each sound collecting unit one frame before) by the weight ρ1. The n-th delay unit 121 n records the signal amount Zest (ω, l) of each sound collection unit before the n−1 frame, and delays the signal amount by one frame. The n-th weight multiplication unit 122 n multiplies the output of the n-th delay unit 121 n (the signal amount Zest (ω, l) for each sound collection unit n frames before) by the weight n n. The n-th addition unit 123 n adds the output of the n-th weight multiplication unit 122 n to the output of the (n + 1) -th addition unit 123 n + 1. The first addition unit 1231 adds the output of the first weight multiplication unit 1221 to the output of the second addition unit 1232 and outputs the signal amount Z <*> est (ω, l) of the reverberation. By performing processing in this manner, weighted addition in which weight ρ n is added to the signal amount Zest (ω, l) of each sound collecting unit n frames before can be performed. Here, the weight n n is a parameter representing the time-based power attenuation of the reverberation component, and for example, from the reverberation time T60, it is given by the following equation. [0069] 04-05-2019 21 Here, LS is the number of samples in one frame, and FS is the sampling frequency. [0070] The processing of the other components is the same as the sound collection device of the second embodiment. Therefore, also in the sound collection device of the third embodiment, the noise suppression characteristic can be improved as in the first and second embodiments. Furthermore, in the case of the sound collection device of the third embodiment, the following effects can be obtained. FIG. 23 shows a model of noise generation. FIG. 24 shows the influence of reverberation on the power spectrum in each frame. The reverberation is delayed by a time corresponding to the distance of the transmission path from the direct sound emitted at a certain time 0 (here, considered in the time frame), and its magnitude is reduced by a constant attenuation rate. To reach the microphone. For example, in the example shown in FIG. 23, the same sound as the direct sound emitted at time 0 affects the frames of time 1 to 3 as reverberation. For this reason, as shown in FIG. 24, the component of direct sound included in the past frame is superimposed as reverberation on the estimated power spectrum in a certain frame l. The attenuation factor at this time corresponds to the weight n n of the reverberation spectrum estimation unit 120. The weight n n is determined from the acoustic characteristics of the room and can be calculated theoretically according to equation (56) using, for example, the reverberation time T60, which is one measure indicating the acoustic characteristics of the room. In the sound collection device of the present invention, the past direct sound component can be obtained as the signal amount Zest (ω, l) for each sound collection unit in the past. Therefore, the gain matrix multiplication unit 125 converts the signal amount Zest (ω, l) for each sound collection unit, the weighted addition unit 126 records the signal amount Zest (ω, l) for each sound collection unit, and Weighted addition is performed on the signal amount of each past sound collecting unit. Thus, the signal amount Z <*> est (ω, l) of the reverberation is determined, and the power spectrum estimation unit 110 estimates the signal amount Z of the reverberation estimated from the vectorized signal Y (ω, l). *> Est (ω, l) is subtracted. Therefore, the sound collection device of the third embodiment can also reduce the influence of reverberation. [0071] EXPERIMENTAL EXAMPLE Next, the experimental result in the sound collection apparatus of 3rd Embodiment is shown. FIG. 25 is a diagram showing an experimental environment. In each 04-05-2019 22 microphone array, four microphones are linearly arranged at an equal interval of 4 cm. The unit of coordinates is meters, and the centers of each are located at (0.4, 0) and (−0.4, 0). The desired sound source (the position of the target speaker) is at (0, 0.5). And three different background noise sources (other speaker positions) are placed at (-1.6, 2.5), (1.6, 1.0), (0.0, 2.5) ing. [0072] FIG. 26 shows the spectral shapes of the desired signal and the noise signal contained in the input signal having a high signal-to-noise ratio, and the first gain coefficient GS (ω, l) and the gain coefficient obtained by the sound collection device of the third embodiment. It is a figure which shows the example of R ((omega), l). FIG. 27 shows the spectral shapes of the desired signal and the noise signal contained in the input signal with a low signal-to-noise ratio, and the first gain coefficient GS (ω, l) and the gain coefficient obtained by the sound collection device of the third embodiment. It is a figure which shows the example of R ((omega), l). 26A and 27A show the spectral shapes of the desired signal and the noise signal contained in the input signal. FIG. 26B and FIG. 27B show the first gain coefficient GS (ω, l) obtained by the sound collection device of the third embodiment. FIGS. 26C and 27C show gain coefficients R (ω, l) obtained by the sound collection device of the third embodiment. In the signal of FIG. 26A, the noise signal is dominant with respect to the desired signal at frequencies near 2000 Hz and 4000 Hz (the frequency indicated by the dotted line in the figure). That is, it is desirable that the gain factor to be multiplied be close to 0 near 2000 Hz and 4000 Hz. In the first gain coefficient GS (ω, l) of FIG. 26B, the coefficient is large even at the corresponding frequency, but in the gain coefficient R (ω, l) of FIG. 26C, the coefficient at the corresponding frequency is small. From this, it can be seen that the gain coefficient formed by the multiplication of the plurality of gain coefficients obtained by the present invention is excellent in the noise suppression effect. Similarly, in FIG. 27A, because the noise signal is dominant in the entire band, it is desirable that the gain factor to be multiplied be close to zero over the entire band. From FIGS. 27B and 27C, it can be seen that the gain coefficient according to the present invention has a smaller band with a large coefficient value, and the noise suppression effect is higher. [0073] FIG. 28 shows the results of measuring the amount of background noise suppression in two experimental environments with different reverberation intensities. When the experimental environment 1 has a reverberation time of 250 ms (reverberation similar to a general bedroom), the experimental environment 2 has a reverberation time of 500 ms (reverberation similar to a general conference room). From the above, it can be seen that the sound collection device of the 04-05-2019 23 present invention has better noise suppression performance than the sound collection device of Japanese Patent Application No. 2006-52502. [0074] FIG. 29 shows an example of the functional configuration of a computer. The sound pickup apparatus of the present invention causes the recording unit 2020 of the computer 2000 to read a program for operating the computer 2000 as each component of the present invention, and operates the processing unit 2010, the input unit 2030, the output unit 2040, and the like. It can be realized by In addition, as a method of reading into a computer, a program is recorded in a computer readable recording medium, and a method of reading into a computer from the recording medium, a program recorded in a server or the like is read into the computer through a telecommunication line or the like. There is a way to [0075] The figure which shows an example of the utilization condition of this invention. The figure which shows the whole structure of the sound collection apparatus of Japanese Patent Application No. 2006-52502. The top view for demonstrating the directivity of the 1st-6th sound collection parts 4-1 to 4-6. The block diagram for demonstrating the structure of the 1st-4th sound collection parts 4-1 to 4-4. The figure which shows the structure of the 5th sound collection part 4-5 and the 6th sound collection part 4-6. FIG. 2 is a diagram showing the configuration of a power spectrum estimation unit 7; FIG. 2 is a diagram showing the configuration of a gain coefficient calculation unit 8; BRIEF DESCRIPTION OF THE DRAWINGS The figure which shows the structural example of the whole sound collection apparatus of 1st Embodiment. The figure which shows the processing flow of the sound collection apparatus of 1st Embodiment and 2nd Embodiment. FIG. 7 is a diagram showing an example of the functional configuration of a processing target signal generation unit 140. FIG. 7 shows an example of a functional configuration of a gain coefficient calculation unit 130. The figure which shows the structural example of the whole sound collection apparatus of 2nd Embodiment. FIG. 7 is a diagram showing an area of a sound source position for describing setting of each sound collection unit 4 ′-1 to 4 ′-6. FIG. 7 is a diagram showing an example of a functional configuration of a first sound collecting unit 4 ′ -1; The figure which shows the function structural example of the process target signal generation part 140 '. The functional structural example of power spectrum estimation part 7 'is shown. The figure which shows the function structural example of gain coefficient calculation part 130 '. FIG. 7 is a view showing a modified configuration example of the power spectrum estimation unit of the second embodiment. The 04-05-2019 24 figure which shows the structural example of the whole sound collection apparatus of 3rd Embodiment. The figure which shows the example of the processing flow of the whole sound collection apparatus of 3rd Embodiment. FIG. 2 shows an example of a functional configuration of a power spectrum estimation unit 110. FIG. 2 is a diagram showing an example of a functional configuration of a reverberation spectrum estimation unit 120. The figure which shows the model of noise generation. The figure which shows the influence of the reverberation on the power spectrum in each flame ¦ frame. The figure which shows experiment environment. Spectral shapes of a desired signal and a noise signal included in an input signal having a high signal-tonoise ratio, and a first gain coefficient GS (ω, l) and a gain coefficient R (ω, l) obtained by the sound collection device of the third embodiment. The figure which shows the example of l). Spectral shapes of a desired signal and a noise signal included in an input signal having a low signal-to-noise ratio, and a first gain coefficient GS (ω, l) and a gain coefficient R (ω, l) determined by the sound collection device of the third embodiment. The figure which shows the example of l). The figure which shows the result of having measured the amount of suppression of background noise in two experimental environments from which the intensity ¦ strength of reverberation differs. The figure which shows the function structural example of a computer. 04-05-2019 25
© Copyright 2021 DropDoc