Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JPH0667691 [0001] BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise removal apparatus for use in a speech recognition apparatus or the like, for removing noise from speech uttered in noise. [0002] 2. Description of the Related Art When speech recognition and speech communication are performed, various noises exist depending on the use environment, and these noises are a major factor that lowers the recognition rate of speech recognition and hinders speech communication. [0003] Conventionally, using two microphones, a voice microphone mainly inputting voice and a noise microphone mainly inputting ambient noise, the noise component contained in the voice microphone is estimated, and the estimated noise is removed from the voice including noise. There is a method called so-called two-input spectral subtraction that converts to a clear speech. [0004] For example, Shibamura et al., "In-vehicle speech recognition using a two-input noise removal method" Technical Report of IEICE. A noise removal apparatus using two-input spectral subtraction as described in 41-48 (1989) (hereinafter referred to as cited document [1]) has a 03-05-2019 1 configuration as shown in FIG. That is, in FIG. 16, a position is placed in front of the speaker's mouth and voice microphone 201 for mainly inputting voice is inputted as much as possible as ambient noise inputted to the voice microphone, and voice is not mixed as much as possible. Two channels of noise microphones 202 installed at the same time are simultaneously input. The voice including noise input by the voice microphone 201 is converted into a time-series feature vector of voice including noise in the voice feature extraction unit 203, and ambient noise input by the noise microphone 202 is converted into noise in the noise feature extraction unit 204. It is converted to a time series feature vector of noise. In the 2-input subtraction unit 205, first, the noise component included in the time-series feature vector of the speech including noise obtained from the feature extraction unit 203 is converted to the time-series feature vector of ambient noise obtained from the feature extraction unit 204. Use to estimate. For this noise component estimation, for example, two inputs are compared at a time position not including speech, a correction coefficient between two inputs is calculated in advance, and the obtained correction coefficient is obtained from the noise feature extraction unit 204 This is done by multiplying the entire time series feature vector of ambient noise. Next, the 2-input subtraction unit 205 subtracts the estimated time-series feature vector of noise from the entire time-series feature vector of the noise-containing speech obtained from the speech feature extraction unit 203 to obtain clear speech after noise removal. Output time series feature vectors of By performing speech recognition using the clear speech time-series feature vector obtained here, speech recognition with less deterioration of the recognition rate due to noise is to be realized. [0005] However, in a normal noise environment, the noise transfer characteristics include non-stationary noise sources whose characteristics change temporally and spatially, such as moving sound of an object or human speech. Because noise and noise arrive from time to time, the noise component input to the voice microphone and the noise input to the noise microphone are always the same in two-input spectral subtraction using a single noise microphone in the past. However, the estimation of the noise contained in the speech has an error and the denoising effect is reduced. Also, in the conventional two-input spectral subtraction, depending on the installation method of the noise microphone or the noise environment used, the voiced voice may be mixed into the noise microphone, and the feature obtained by mixing the mixed voice from the voice microphone In order to subtract from the vector, the feature vector component of the voice that should not be removed may be removed, which has a disadvantage that the speech recognition 03-05-2019 2 rate or the intelligibility of communication is significantly reduced. [0006] SUMMARY OF THE INVENTION The object of the present invention is to solve the abovementioned problems, to efficiently remove noise against non-stationary noise whose property changes temporally and spatially, and to reduce noise to noise microphones. It is an object of the present invention to provide a stable noise removal device that does not remove a necessary audio signal even when mixing occurs. [0007] According to a first aspect of the present invention, there is provided a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and a voice microphone. A speech feature extraction unit that converts an output signal into a time-series feature vector of speech, a plurality of noise feature extraction units that respectively convert output signals of a plurality of noise microphones into a time-series feature vector of noise, and a plurality of noise feature extraction units A noise detection unit that selects a time-series feature vector of noise closest to ambient noise from among time-series feature vectors of noise obtained from a selection unit that selects and outputs a time-series feature vector of noise selected by the noise detection unit And a twoinput subtrack that subtracts the time-series feature vector of the noise output by the selection unit from the time-series feature vector of the sound output by the voice feature extraction unit It is characterized in that it comprises a ® emission portion. [0008] According to a second aspect of the present invention, there is provided a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and an output signal of the voice microphone as a time series feature vector of voice. Select the output signal of the noise microphone selected by the minimum power detection unit and the minimum power detection unit that selects the output signal of the noise microphone with the lowest power among the output signals of the plurality of noise microphones A noise feature extraction unit for converting the output signal of the noise microphone output from the selection unit into a time-series feature vector of noise, and a noise feature from the time-series feature vector of speech output from the voice feature extraction unit And a two-input subtraction unit that subtracts a time-series feature vector of noise output from the extraction unit. . [0009] 03-05-2019 3 According to a third aspect of the present invention, there is provided a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and an output signal of the voice microphone as a time series feature vector of voice. A speech feature extraction unit for converting into noise, a plurality of noise feature extraction units for converting output signals of a plurality of noise microphones into time series feature vectors of noise, and a time series feature vector of noise output from the plurality of noise feature extraction units A similarity calculation unit that calculates and outputs the similarity between the voice feature extraction unit and the time-series feature vector of the voice output by the voice feature extraction unit; A value detection unit; a selection unit for selecting and outputting a time series feature vector of noise corresponding to a similarity degree selected by the maximum value detection unit among time series feature vectors of noise; It is characterized in that it comprises a two-input subtraction unit subtracting the n-th time-series feature vector of noise output by the selecting unit from the time series feature vectors of speech symptoms extracting unit outputs. [0010] According to a fourth aspect of the present invention, in the third aspect, the method further comprises a weighting unit that adds a predetermined weight to the degree of similarity output by the degree of similarity calculation unit and outputs a weighted similarity, and the maximum value detection unit It is characterized in that the highest similarity is selected among the weighted similarities output by the addition unit. [0011] According to a fifth aspect of the present invention, there is provided a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and an output signal of the voice microphone as a timeseries feature vector of voice A voice feature extraction unit for converting into an audio partial feature extraction unit for converting an output signal of a voice microphone into a time series feature vector of a voice partial band, and a time series of a noise partial band for each of output signals of a plurality of noise microphones A plurality of partial feature extraction units for converting into feature vectors, a time-series feature vector of noise sub-bands output by the plurality of partial feature extraction units, and a time-series feature vector of speech partial bands output by the audio partial feature extraction unit Between the similarity calculated by the sub-band similarity calculation unit and the similarity score output by the subband similarity calculation unit. A selection unit for selecting and outputting an output signal from the noise microphone corresponding to the degree of similarity selected by the maximum value detection unit among the output signals of the plurality of noise microphones; Noise feature extraction unit for converting an output signal from a noise microphone to a time-series feature vector of noise, and time-sequence feature vector of noise output from the noise feature extraction unit from 03-05-2019 4 time-series feature vectors of speech output from the speech feature extraction unit And a twoinput subtraction unit that subtracts. [0012] A sixth invention is characterized in that, in the third and fourth inventions, a minimum value detection unit for obtaining the minimum similarity among the inputted similarity is provided instead of the maximum value detection unit. [0013] A seventh aspect of the present invention is a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and an output signal of the voice microphone as a time-series feature vector of voice Speech feature extraction unit for converting into a plurality of noise feature extraction units for transforming output signals of a plurality of noise microphones into time series feature vectors of noise, and time series features of noise obtained from the plurality of noise feature extraction units Average value combining unit that averages vectors and outputs the averaged feature vector as a combined vector of noise, and noise combined vector output from the average value combining unit from time-series feature vectors of speech output by the voice feature extraction unit And a subtractive two-input subtraction unit. [0014] The eighth invention is the seventh invention, wherein a predetermined weight is added to a time-series feature vector of noise output from the noise feature extraction unit instead of the average value synthesis unit, and then averaging and averaging the characteristics The method is characterized by including a weighted average value combining unit that outputs a vector as a noise combined vector. [0015] A ninth aspect of the present invention is a voice microphone for mainly inputting voice, a plurality of noise microphones mainly for receiving ambient noise and arranged around the voice microphone, and an output signal of the voice microphone as a time-series feature vector of voice A voice feature extraction unit for converting into noise, first to Nth noise feature extraction units for converting output signals of a plurality of noise microphones into first to Nth time-series feature vectors of noise, and a plurality of noise feature extraction units A division unit that divides each of time series feature vectors of noise to be output into a plurality of bands and outputs the same, and a minimum power is extracted for each band of time series feature vectors of noise after band division that the division section outputs A minimum value combining unit for 03-05-2019 5 combining the respective minimum values of each band and outputting the result as a combined vector of noise, and a minimum value combining unit based on a time-series feature vector of speech output from the speech feature extraction unit It is characterized in that it comprises a two-input subtraction unit subtracting the resultant vector of noise. [0016] According to a tenth aspect of the present invention, in the first aspect, the noise section detecting section detects a section in which no voice exists using the output signal obtained from the voice microphone as a noise section, and the noise detecting section detects the section by the noise section detecting section. It is characterized in that the noise time-series feature vector is selected using the noise time-series feature vector of the noise segment. [0017] In an eleventh aspect based on the second aspect, the noise power detection unit includes a noise power detection unit that detects a power non-existent power section as a noise power section using an output signal obtained from a voice microphone, and the minimum power detection section is a noise power detection section. The output signal of the noise microphone is selected using the output signal of the noise microphone in the noise zone detected by [0018] According to a twelfth aspect of the present invention, in the third or fourth aspect, the noise section detection unit detects a section in which no voice exists using the output signal obtained from the voice microphone as a noise section, and the similarity calculation section It is characterized in that the similarity is calculated and output using the time-series feature vector of the noise of the noise section detected by the section detection unit. [0019] In a thirteenth aspect of the present invention, in the fifth aspect, the noise section detection unit detects a section in which no voice exists using the output signal obtained from the voice microphone as a noise section, and the sub-band similarity calculation section It is characterized in that the similarity is calculated and output using the time-series feature vector of the noise sub-band of the noise section detected by the detection unit. [0020] A fourteenth invention is characterized in that, in the tenth, eleventh, twelfth or thirteenth invention, the noise interval detecting section detects an interval in which no voice exists as a 03-05-2019 6 noise interval using the feature vector outputted by the 2-input subtraction section. There is. [0021] In a fifteenth invention according to the third, fourth or fifth invention, the noise section detecting unit or the 2-input subtraction unit detects a section in which no voice is present as a noise section using an output signal obtained from a voice microphone. A noise section detection unit that detects a section in which speech does not exist using a feature vector to be output as a noise section is provided, and instead of the maximum value detection section, the similarity that is input within the noise section detected by the noise section detection unit It is characterized in that a maximum / minimum value detection unit is selected out of which the highest similarity is selected and which selects the lowest similarity among the inputted similarities when the noise interval detection unit does not detect the noise interval. [0022] The operation of the first invention will be described with reference to FIG. The voice containing noise is converted into an electrical signal by the voice microphone 1. At the same time, ambient noise is converted into an electrical signal by two or more first to Nth noise microphones 2 placed around the voice microphone 1. There are many installation methods of two or more first to N noise microphones 2. For example, they may be arranged around an audio microphone while keeping an appropriate distance, or to cope with noise coming from all directions. It may be arranged radially or directed towards a specific noise source. The voice feature extraction unit 3 is a converter that converts the electrical signal obtained from the voice microphone 1 into time-series feature quantities that express acoustic features in time series, and, for example, Furui "digital voice processing", Pp. Tokai University Press (1985) (hereinafter referred to as cited document [2]). It consists of DFT (discrete Fourier transformer), FFT (fast Fourier transformer) or BPF (band filter bank) etc. as described in 37-49, for example, power spectrum, amplitude spectrum or BPF output etc. Output time series data of feature vector. 03-05-2019 7 Further, the first to N-th audio feature extraction unit 4 is a time-series feature quantity that represents acoustic features in time-series in the electric signals obtained from the two or more first to N-th noise microphones 2, respectively. , And outputs the first to Nth time-series feature vectors of noise. The first to Nth noise feature extraction units 4 have the same functions as the voice feature extraction unit 3. The noise detection unit 5 selects the nth time series feature vector of noise closest to the ambient noise from the first to Nth time series feature vectors of the noise obtained from the first to Nth noise feature extraction section 4. The determination as to whether the noise is closest to the ambient noise can be made, for example, by storing the first to N-th time-series feature vectors of noise Yi (t) (1 ≦ i N N, t: time), ambient noise stored in advance. Let R be the feature vector of R. At time t, noise such as n = argmin (i) [‖Yi (t) −R‖] where the distance between vectors with the feature vector R of the ambient noise is minimum This can be performed by obtaining n in the n-th time-series feature vector Yi (t). The determination as to whether the noise is closest to the ambient noise can also be made by using information of the frequency distribution state, such as whether the low frequency power is larger than the high frequency power. However, argmin (i) [] is a function for obtaining i which gives the minimum value for the operation result in []. The n-th time-series feature vector of noise selected in the noise detection unit 4 is selected and output in the selection unit 6. The 2-input subtraction unit 7 performs 2-input spectral subtraction by subtracting the nth timeseries feature vector of the noise output from the selection unit 6 from the time-series feature vector of the voice including noise output from the voice microphone 1. Remove the noise 03-05-2019 8 contained in the voice. The two-input subtraction unit 7 has the same function as that of the two-input subtraction unit 205 shown in FIG. 16 as described in, for example, the cited document [1]. That is, according to the first aspect of the present invention, of the first to Nth time-series feature vectors of noise output from the two or more first to Nth noise microphones 2, the nth time-series feature of noise closest to ambient noise By selecting the vector, the output from the noise microphone that always has the highest noise removal effect is selected even if the noise source moves or the noise transfer characteristic changes temporally and spatially. It has the effect of Also, by selecting the n-th time-series feature vector of noise closest to ambient noise, the output signal from the noise microphone whose sound wrap-up is large is not selected. This has the effect of preventing a decrease in the recognition rate of communication or intelligibility of communication. [0023] The operation of the second invention will be described with reference to FIG. The voice containing noise is converted into an electrical signal by the voice microphone 11, and at the same time, ambient noise is converted into an electrical signal by two or more first to Nth noise microphones 12 placed around the voice microphone 11. The voice feature extraction unit 13 is a converter that converts the electrical signal obtained from the voice microphone 11 into time-series feature quantities that express acoustic features in time series, and the voice feature extraction unit 13 It has the same function as the voice feature extraction unit 3 in 1. The minimum power detection unit 14 selects an output signal of the nth noise microphone whose power is the smallest among the output signals of the two or more first to Nth noise microphones 12. That is, assuming that the first to Nth powers obtained from the two or more first to Nth noise microphones 12 is Pi (t) (1 ≦ i ≦ N), the minimum power detection unit 14 detects The operation is performed by n = argmin (i) [Pi (t)], and n is obtained for Pi having the smallest power. The power of the output signal used here may be the power of the signal limited to a partial band. The output signal of the nth noise microphone selected by the minimum power detection unit 14 is selected by the selection unit 15 and output. The output signal of the nth 03-05-2019 9 noise microphone selected in the selection unit 15 is converted into a time-series feature vector of noise in the noise feature extraction unit 16. The noise feature extraction unit 16 has the same function as the speech feature extraction unit 3 in FIG. The 2-input subtraction unit 17 has the same function as that of the 2-input subtraction unit 7 in FIG. 1, and from the time-series feature vector of the voice output by the voice feature extraction unit 13, the noise output by the noise feature extraction unit 16 Two input spectral subtraction is performed by subtracting series feature vectors. That is, the second invention eliminates an input from a specific noise source when there is no noise source in the vicinity of the voice microphone 11 and the noise source is moving in the vicinity of a plurality of noise microphones. And the correlation between the noise input to the voice microphone 11 and the noise obtained from the selection unit 15 is high, and the noise is higher than when two-input spectral subtraction is performed using one conventional noise microphone. There is an effect that the removal performance is obtained. Also, by using the nth time-series feature vector from the noise microphone having the minimum power, the output signal from the noise microphone with a large amount of voice wraparound is not selected, so the speech recognition rate by the voice wraparound into the noise microphone Alternatively, there is an effect that a decrease in communication intelligibility can be prevented. [0024] The operation of the third invention will be described with reference to FIG. The voice containing noise is converted into an electrical signal by the voice microphone 21, and at the same time, ambient noise is converted into an electrical signal by two or more first to Nth noise microphones 22 placed around the voice microphone 11. The voice feature extraction unit 23 is a converter that converts the electrical signal obtained from the voice microphone 21 into timeseries feature quantities that express acoustic features in time series. The first to Nth noise feature extraction unit 24 is a converter that converts the electric signals of the two or more first to Nth noise microphones 22 into time-series feature quantities that express acoustic features in time series. Yes, and outputs first to Nth time-series feature vectors of noise. The voice feature extraction unit 23 and the first to Nth noise feature extraction units 24 have the same functions as the voice feature extraction unit 3 in FIG. 1. The similarity calculation unit 25 sets the first to Nth time-series feature vectors of noise output from the first to N-th noise feature extraction unit 24 and the time-series feature vector of speech output from the speech feature extraction unit 23. The first to N-th similarities are calculated and output, respectively. The method of obtaining the first to N-th similarities may be, for example, X (t) of the time-series feature vector of the speech obtained from the speech microphone 21 or the noise obtained from the first to N-th noise feature extraction unit 24 Assuming that the first to Nth time-series feature vectors are Yi (t) and the similarity to be obtained is βi (t), 03-05-2019 10 [0025] It can be determined by There are many other ways of determining the degree of similarity, but it can also be determined by a method using an inner product of vectors as described in the cited document [2], for example. The maximum value detection unit 26 selects the largest n-th similarity among the first to N-th similarities output by the similarity calculation unit 25. The selection unit 27 selects and outputs an nth time-series feature vector of noise corresponding to the nth similarity selected by the maximum value detection unit 26 among the first to Nth timeseries feature vectors of noise. The 2-input subtraction unit 28 has the same function as the 2input subtraction unit 7 in FIG. 1, and from the time-series feature vector of the voice output by the voice feature extraction unit 23, the nth time of the noise output by the selection unit 27 Two input spectral subtraction is performed by subtracting series feature vectors. That is, in the third invention, the noise removal effect is always best by using the time-series feature vector from the n-th noise microphone that inputs the noise having the highest correlation with the noise input to the voice microphone 21. There is an effect that high noise removal performance can be obtained as compared to the case of performing two-input spectral subtraction using one noise microphone. [0026] The operation of the fourth invention will be described with reference to FIG. 4, in addition to the configuration of the noise removal apparatus shown in FIG. 3, predetermined weights are added to the first to N-th similarities output by the similarity calculation unit 25 to obtain weighted first to N-th similarities. The maximum value detection unit 26 is configured to select the largest n-th similarity among the weighted first to N-th similarities output by the weight addition unit 29. There is. That is, according to the fourth aspect of the present invention, by weighting the first to N-th similarities, it is possible to select with particular emphasis on the input from a specific noise microphone. Thus, for example, the input from the noise microphone 22 installed at a position near the voice microphone 21 is given more weight, and the noise microphone 22 located at a position far from the voice microphone 21 is given a smaller weight. The emphasis is on the input from the nearby noise microphone 22 where noise highly correlated with the ambient noise that is input to the voice microphone 21 may be input, and noise removal is high compared to conventional two-input spectral subtraction. There is an effect that performance is obtained. Alternatively, for example, the input from the distant noise microphone 22 with less possibility of mixing of voices given less weight to the input from the noise microphone 22 installed near the voice microphone 21 is emphasized. There is an effect that it is possible to prevent the deterioration of the recognition rate and the lowering of the communication intelligibility due to the mixing of the voice into the noise microphone. 03-05-2019 11 [0027] The operation of the fifth aspect of the invention will be described with reference to FIG. The voice containing noise is converted into an electrical signal by the voice microphone 41, and at the same time, ambient noise is converted into an electrical signal by the two or more first to Nth noise microphones 42. The voice feature extraction unit 43 is a converter that converts the electrical signal obtained from the voice microphone 41 into time-series feature quantities that represent acoustic features in time series. The voice feature extraction unit 23 has the same function as the voice feature extraction unit 3 in FIG. The voice partial feature extraction unit 44 is a converter that converts the electrical signal obtained from the voice microphone 41 into time-series feature quantities that represent acoustic features of a partial band in a time-series manner, for example, BPF, DFT The partial frequency band selected from the analysis result by B. is output as the feature vector of the audio sub-band. The characteristics of this sub-band also include other analysis results such as cepstrum analysis described in the cited document [2], and feature values compressed by KL transformation or the like. The first to N-th partial feature extraction unit 45 converts the electrical signals of the two or more first to N-th noise microphones 42 into time-series feature quantities that represent acoustic features of a partial band in a time-series manner It is a converter and outputs first to Nth time-series feature vectors of the noise sub-band. The first to Nth partial feature extraction units 45 have the same function as the audio partial feature extraction unit 44. The sub-band similarity calculation unit 46 calculates the first to Nth time-series feature vectors of the noise sub-band output from the first to N-th partial feature extraction unit 45 and the audio part output from the audio partial feature extraction unit 44. The first to N-th similarities between time series feature vectors of the band are calculated and output, respectively. The maximum value detection unit 47 selects the largest n-th similarity among the first to N-th similarities output from the sub-band similarity calculation unit 46. The selection unit 48 selects an output signal from the nth noise microphone corresponding to the nth similarity selected by the maximum value detection unit 47 among the output signals of the two or more first to Nth noise microphones 42. Output. The output signal from the n-th noise microphone obtained from the selection unit 48 is converted into a timeseries feature vector of noise in the noise feature extraction unit 49. The noise feature extraction unit 49 has the same function as the voice feature extraction unit 3 in FIG. The 2-input subtraction unit 50 has the same function as that of the 2-input subtraction unit 7 in FIG. 1, and the noise feature extraction unit 49 outputs noise from the time-series feature vector of the voice output by the voice feature extraction unit 43. Two input spectral subtraction is performed by subtracting series feature vectors. That is, the fifth invention always performs 2-input spectral subtraction using the output signal 03-05-2019 12 of the noise microphone into which noise having the highest correlation with the feature vector of the noise sub-band input to the voice microphone 41 is input. The noise removal effect is the best, and there is an effect that high noise removal performance can be obtained as compared with the case of performing two-input spectral subtraction using one conventional noise microphone. In particular, when it is known in advance that the band in which noise is present is limited, it is possible to remove noise more accurately by setting the sub band in advance to the band in which noise is present. There is. [0028] The operation of the sixth aspect of the invention will be described with reference to FIG. 6 has a minimum value detection unit 30 in place of the maximum value detection unit 26 in the noise removal apparatus shown in FIG. 3, and the minimum value detection unit 30 selects one of the input first to Nth similarities. The smallest n-th similarity is selected. That is, according to the fifth aspect of the invention, by performing 2-input spectral subtraction using the output signal of the noise microphone 22 having the lowest degree of similarity with the input signal of the voice microphone 21, the noise microphone 22 of the least In order to select the output signal, the wraparound of the voice into the noise microphone has an effect of preventing a drop in voice recognition rate or intelligibility of communication due to subtraction of the voice itself. Although the example applied to the 3rd invention was shown in FIG. 6, it is possible to take the same composition to the 4th or 5th invention. [0029] The operation of the seventh invention will be described with reference to FIG. The voice containing noise is converted to an electrical signal by the voice microphone 61, and at the same time, ambient noise is converted to an electrical signal by the two or more first to Nth noise microphones 62. The voice feature extraction unit 63 is a converter that converts the electrical signal obtained from the voice microphone 61 into time-series feature quantities that express acoustic features in time series. [0030] The first to Nth noise feature extraction unit 64 is a converter that converts the electric signals of the two or more first to Nth microphones 62 into time-series feature quantities that express 03-05-2019 13 acoustic features in time series. , And outputs first to Nth time-series feature vectors of noise. The voice feature extraction unit 63 and the first to Nth noise feature extraction units 64 have the same functions as the voice feature extraction unit 3 in FIG. 1. The first to Nth time-series feature vectors of noise obtained from the first to Nth noise feature extraction unit 64 are averaged by the average value combining unit 65 and output as a combined vector of noise. That is, assuming that a time-series feature vector obtained from two or more first to Nth microphones 62 is Yi (t), and a resultant vector of noise is M (t), the average value combining unit 65 at t [0031] The following operation is performed to calculate and output a combined vector M (t) of timeseries feature vectors obtained from two or more of the first to Nth noise microphones 62. As a method of obtaining the average value, in addition to such calculation, a geometric average can be used, or a centroid (pattern center) described in the cited document [2] can be used. The 2input subtraction unit 66 has the same function as that of the 2-input subtraction unit 7 in FIG. 1 and combines the noise output from the average value combining unit 65 from the time-series feature vector of the voice output from the voice feature extraction unit 63. Two input spectral subtraction is performed by subtracting vectors. That is, according to the seventh aspect of the present invention, the noise is reduced to the first to Nth noises by performing two-input spectral subtraction using an average vector of time-series feature vectors obtained from two or more first to Nth microphones 62. As more noise microphones are input to the microphone 62, more noise is reflected in the combined vector, and conversely, noise input only to a specific noise microphone is not included in the combined vector to perform the averaging operation. Since this is not greatly reflected, there is an effect that the removal error due to the noise input only to the specific noise microphone is reduced. [0032] The operation of the eighth invention will be described with reference to FIG. FIG. 8 includes a weighted average value combining unit 67 instead of the average value combining unit 65 shown in FIG. 7, and the weighted average value combining unit 67 outputs the noise output from the first to Nth noise feature extraction units. A predetermined weight is added to the first to N-th time-series feature vectors and then averaged, and the averaged feature vector is output as a composite vector of noise. That is, since the eighth invention can particularly emphasize the input from a specific noise microphone by adding a weight, the same effect as the fourth invention has is obtained, and two more. By using the average vector of the time series feature vectors obtained from the first to Nth noise microphones 62, the same effect as that of the 03-05-2019 14 seventh invention is provided. [0033] The operation of the ninth invention will be described with reference to FIG. The voice containing noise is converted to an electrical signal by the voice microphone 81, and at the same time, ambient noise is converted to an electrical signal by two or more first to Nth noise microphones 82 installed around the voice microphone 81. The voice feature extraction unit 83 is a converter that converts the electrical signal obtained from the voice microphone 81 into time-series feature quantities that represent acoustic features in time series. The first to Nth noise feature extraction unit 84 is a converter that converts electric signals of the two or more first to Nth noise microphones 82 into time-series feature quantities that express acoustic features in time series. Yes, and outputs first to Nth time-series feature vectors of noise. The voice feature extraction unit 83 and the first to Nth noise feature extraction units 84 have the same functions as the voice feature extraction unit 3 in FIG. 1. The first to Nth time-series feature vectors of noise output from the first to Nth noise feature extraction unit 84 are each divided into a plurality of bands in the division unit 85 and output. The minimum combining unit 86 takes out the minimum power for each band of the time-series feature vector of noise after band division output from the dividing unit 85, combines the respective minimum values of each band, and outputs as a combined vector of noise Do. The two-input subtraction unit 87 has the same function as the two-input subtraction unit 7 in FIG. 1 and combines the noise output by the minimum value synthesis unit 86 from the time-series feature vector of the speech output by the speech feature extraction unit 83. Two input spectral subtraction is performed by subtracting vectors. That is, when voiced in an environment in which the transfer characteristics are different for each band, it is considered that the amount of voice wraparound to the noise microphone is different for each band and for each noise microphone. In such a case, by using the ninth invention, the first to Nth time-series feature vectors of noise are divided into a plurality of bands, and the one having the minimum power for each band is selected, and each band is selected. By synthesizing the feature vector of the noise using the feature value of the specific band of the specific noise microphone which always has the smallest amount of voice wrap-around by combining and outputting the minimum value of L There is an effect that it is possible to prevent the deterioration of the recognition rate and the deterioration of the communication intelligibility. [0034] The operation of the tenth invention will be described with reference to FIG. FIG. 10 includes, in addition to the configuration of the noise removal apparatus shown in FIG. 1, a noise section 03-05-2019 15 detection unit 8 that detects a section in which no voice exists using the output signal obtained from the voice microphone 1 as a noise section; The detection unit 5 is configured to select the nth time-series feature vector of noise using the first to N-th time-series feature vectors of the noise of the noise period detected by the noise-interval detection unit 8. That is, in addition to the effect of the first invention, the tenth invention is more correct because it selects one of the first to N-th time-series feature vectors of noise using a noise section in which speech is not mixed. It is possible to estimate noise and obtain an effect that noise removal is enhanced. [0035] The operation of the eleventh invention will be described with reference to FIG. 11 includes, in addition to the configuration of the noise removal apparatus shown in FIG. 2, a noise section detection unit 18 that detects a section in which no voice is present as a noise section using an output signal obtained from the voice microphone 11 The power detection unit 14 is configured to select an output signal of the nth noise microphone using the output signals of the first to Nth noise microphones in the noise period detected by the noise period detection unit 18. That is, in the eleventh invention, in addition to the effect of the second invention, in order to select one of the outputs of the first to Nth noise microphones using a noise section in which no voice is mixed, noise is more correctly It is possible to estimate and obtain an effect that noise removal is enhanced. [0036] The operation of the twelfth aspect of the invention will be described with reference to FIG. FIG. 12 has a noise section detection unit 31 which detects, as a noise section, a section in which no voice exists using an output signal obtained from the voice microphone 21 in addition to the configuration of the noise removal apparatus shown in FIG. The degree calculating unit 25 is configured to calculate and output first to Nth similarities using first to Nth time-series feature vectors of noise of the noise section detected by the noise section detecting unit 31. . Although FIG. 12 shows an example applied to FIG. 3, the same configuration can be applied to the embodiment shown in FIG. That is, in addition to the effects possessed by the third or fourth invention, the twelfth invention selects one of the outputs of the first to N-th time-series feature vectors of noise using a noise section in which no voice is mixed Therefore, it is possible to estimate noise more correctly, and to obtain an effect that the noise removal effect is enhanced. [0037] 03-05-2019 16 The operation of the thirteenth invention will be described with reference to FIG. 13 includes, in addition to the configuration of the noise removal apparatus shown in FIG. 5, a noise section detection unit 51 that detects a section in which no voice exists using the output signal obtained from the voice microphone 41 as a noise section; The band similarity calculation unit 46 calculates and outputs the first to Nth similarities using the first to Nth time series feature vectors of the noise sub-bands of the noise section detected by the noise section detection unit 51 Is configured. That is, in addition to the effect possessed by the fifth invention, the thirteenth invention is more accurate in noise because it selects one of the outputs of the first to Nth noise microphones using a noise section in which no voice is mixed. It is possible to estimate and obtain an effect that noise removal is enhanced. [0038] The operation of the fourteenth invention will be described with reference to FIG. FIG. 14 is configured such that, in the noise removal apparatus shown in FIG. 10, the noise section detection unit 9 detects a section in which no voice is present as a noise section using the feature vector output from the 2-input subtraction section 7 . Although FIG. 14 shows an example applied to FIG. 10, the same configuration can be applied to the noise removal device shown in FIG. 11, FIG. 12 or FIG. That is, in the fourteenth invention, in addition to the effects possessed by the tenth, eleventh, or thirteenth invention, detection of a noise segment is performed by estimating the noise segment using a clear time-series feature vector after noise removal. The accuracy is improved, which has the effect of enabling more sophisticated noise removal. [0039] The operation of the fifteenth invention will be described with reference to FIG. FIG. 15 shows, in addition to the configuration of the noise removal apparatus shown in FIG. 3, a noise section detection unit 31 that detects a section in which no voice exists using the output signal obtained from the voice microphone 21 as a noise section; In the noise section detected by the noise section detection section 31 instead of the section 26, the highest similarity is selected out of the first to the N-th similarities, and when the noise section detection section 31 has not detected the noise section There is provided a maximum / minimum value detection unit 32 which selects the minimum similarity among the ˜ Nth similarity. The noise section detection unit 31 can also be configured to detect a section in which no speech is present as a noise section using the feature vector output from the 2-input subtraction section 28. Although FIG. 15 shows an example 03-05-2019 17 applied to FIG. 3, the same configuration can be applied to the noise removal apparatus shown in FIG. 4 or FIG. That is, according to the fifteenth invention, in addition to the effects possessed by the third, fourth or fifth invention, an output of the noise microphone which is not most similar to the output signal of the voice microphone in a section where voice other than the noise section is present Select a signal. As a result, it is possible to select the output signal from the noise microphone with the smallest amount of voice sneaking into the noise microphone, and prevent the deterioration of the recognition rate and the decrease in communication intelligibility due to the mixing of the voice into the noise microphone. [0040] Next, an embodiment of the present invention will be described with reference to the drawings. [0041] FIG. 1 is a block diagram showing an embodiment of the first invention. The noise removing device shown in FIG. 1 comprises a voice microphone 1 mainly receiving voice and two or more first to Nth noise microphones 2 mainly input ambient noise and arranged around the voice microphone, a voice microphone A voice feature extraction unit 3 that converts an output signal of 1 into a time-series feature vector of voice and output signals of two or more of the first to Nth noise microphones 2 are converted to first to N-th time-series feature vectors of noise Of the first to Nth time-series feature vectors of the noise obtained from the first to Nth noise feature extraction unit 4 and the first to Nth noise feature extraction unit 4 to be converted, a noise detection unit 5 for selecting n time series feature vectors (n = 1 to N); a selection unit 6 for selecting and outputting an nth time series feature vector of noise selected by the noise detection unit 5; Time series features of voice output by Selector 6 has a two-input subtraction unit 7 and subtracting the n-th time-series feature vector of the output noise from the spectrum. [0042] The voice containing noise is converted into an electrical signal by the voice microphone 1. At the same time, ambient noise is converted into an electrical signal by two or more first to Nth noise microphones 2 placed around the voice microphone 1. There are many installation methods of two or more first to N noise microphones 2. For example, they may be arranged around an audio microphone while keeping an appropriate distance, or to cope with noise 03-05-2019 18 coming from all directions. It may be arranged radially or directed towards a specific noise source. The electrical signal obtained from the voice microphone 1 is converted into a time-series feature vector of voice in the voice feature extraction unit 3, and the electrical signals obtained from the two or more first to Nth noise microphones 2 are The first to Nth noise feature extraction unit 4 converts the noise into first to Nth time-series feature vectors. The noise detection unit 5 selects the nth time series feature vector of noise closest to the ambient noise from among the first to Nth time series feature vectors of the noise obtained from the first to Nth noise feature extraction section 4 . The n-th time-series feature vector of noise selected in the noise detection unit 5 is selected and output in the selection unit 6. The 2-input subtraction unit 7 performs 2-input spectral subtraction by subtracting the n-th time-series feature vector of the noise output from the selection unit 6 from the time-series feature vector of the noise-containing voice output from the voice microphone 1 , Remove the noise contained in the voice. The 2-input subtraction unit 7 has the same function as the 2-input subtraction unit 205 shown in FIG. [0043] FIG. 2 is a block diagram showing an embodiment of the second invention. The noise removing device shown in FIG. 2 includes a voice microphone 11 mainly inputting voice, and two or more first to Nth noise microphones 12 mainly input ambient noise and arranged around the voice microphone 11; A speech feature extraction unit 13 for converting an output signal of the speech microphone 11 into a time-series feature vector of speech, and an nth noise microphone having the smallest power among the output signals of two or more first to Nth noise microphones 12 Of the nth noise microphone selected by the minimum power detection unit 14 and an output signal of the nth noise microphone output by the selection unit 15 From the time-series feature vector of the speech output from the speech feature extraction unit 13 and the noise feature extraction unit 16 that converts the noise into the time-series feature vector of noise Out section 16 and a two-input subtraction unit subtracting the time series feature vector of the output noise. [0044] The voice containing noise is converted into an electrical signal by the voice microphone 11. At the same time, ambient noise is converted into electrical signals by two or more first to Nth noise microphones 12 placed around the voice microphone 11. The electrical signal obtained from the voice microphone 11 is converted into a time-series feature vector of voice in the voice feature extraction unit 13. The voice feature extraction unit 13 has the same function as the voice feature extraction unit 3 in FIG. The minimum power detection unit 14 selects an output signal of 03-05-2019 19 the nth noise microphone with the smallest power among the output signals of the two or more first to Nth noise microphones 12. The output signal of the nth noise microphone selected by the minimum power detection unit 14 is selected by the selection unit 15 and output. The output signal of the nth noise microphone selected in the selection unit 15 is converted into a timeseries feature vector of noise in the noise feature extraction unit 16. The 2-input subtraction unit 17 has the same function as the 2-input subtraction unit 7 in FIG. 1, and performs 2-input spectral subtraction by subtracting the time-series feature vector of the noise output from the voice feature extraction unit 13. [0045] FIG. 3 is a block diagram showing an embodiment of the third invention. The noise removal apparatus shown in FIG. 3 includes a voice microphone 21 mainly inputting voice and two or more first to Nth noise microphones 22 mainly input ambient noise and arranged around the voice microphone 21; A voice feature extraction unit 23 for converting an output signal of the voice microphone 21 into a time-series feature vector of voice, and an output signal of two or more of the first to Nth noise microphones 22 respectively represent first to Nth time series of noise First to Nth noise feature extraction units 24 for converting into feature vectors, first to Nth time series feature vectors for noises output from the first to Nth noise feature extraction units 24, and the voice feature extraction units 23 The similarity calculation unit 25 calculates and outputs the first to N-th similarities to the time-series feature vector of the voice to be selected, and the largest one of the first to N-th similarities output by the similarity calculation unit 25. Select the nth similarity Selects and outputs the nth time-series feature vector of noise corresponding to the nth similarity selected by the maximum value detection unit 26 among the large value detection unit 26 and the first to Nth time-series feature vectors of noise The selection unit 27 has a 2-input subtraction unit that subtracts the nth time-series feature vector of the noise output from the selection unit 27 from the time-series feature vector of the sound output from the voice feature extraction unit 23. [0046] The voice containing noise is converted into an electrical signal by the voice microphone 21. At the same time, ambient noise is converted into electrical signals by two or more first to Nth noise microphones 22 placed around the voice microphone 11. The electric signal obtained from the speech microphone 21 is converted to a time-series feature vector of speech in the speech feature extraction unit 23, and the output signals of the two or more first to Nth noise microphones 22 are first to Nth, respectively. The noise feature extraction unit 24 converts the 03-05-2019 20 noise into first to N-th time-series feature vectors. [0047] The voice feature extraction unit 23 and the first to Nth noise feature extraction units 24 have the same functions as the voice feature extraction unit 3 in FIG. 1. The similarity calculation unit 25 sets the first to Nth time-series feature vectors of noise output from the first to N-th noise feature extraction unit 24 and the time-series feature vector of speech output from the speech feature extraction unit 23. The first to N-th similarities are calculated and output, respectively. [0048] The maximum value detection unit 26 selects the largest n-th similarity among the first to N-th similarities output by the similarity calculation unit 25. The selection unit 27 selects and outputs an nth time-series feature vector of noise corresponding to the nth similarity selected by the maximum value detection unit 26 among the first to Nth time-series feature vectors of noise. The 2-input subtraction unit 28 has the same function as the 2-input subtraction unit 7 in FIG. 1, and from the time-series feature vector of the voice output by the voice feature extraction unit 23, the nth time of the noise output by the selection unit 27 Two-input spectral subtraction is performed by subtracting series feature vectors. [0049] FIG. 4 is a block diagram showing an embodiment of the fourth invention. In addition to the configuration of the embodiment shown in FIG. 3, the noise removal apparatus shown in FIG. 4 adds a predetermined weight to the first to Nth similarities outputted by the similarity calculation unit 25, The maximum value detection unit 26 selects the largest nth similarity among the weighted first to Nth similarities output from the weight addition unit 29. Is configured as. [0050] FIG. 5 is a block diagram showing an embodiment of the fifth invention. The noise removal apparatus shown in FIG. 5 includes a voice microphone 41 mainly inputting voice, and two or 03-05-2019 21 more first to Nth noise microphones 42 mainly input ambient noise and arranged around the voice microphone 41; A voice feature extraction unit 43 that converts an output signal of the voice microphone 41 into a time-series feature vector of voice; a voice partial feature extraction unit 44 that converts an output signal of the voice microphone 41 into a time-series feature vector of a voice partial band; First to Nth partial feature extraction units 45 for converting output signals of two or more of the first to Nth noise microphones 42 into first to Nth timeseries feature vectors of a partial band of noise; Between the first to Nth time-series feature vectors of the noise sub-band output by the N-part feature extraction unit 45 and the time-series feature vectors of the speech sub-band output of the audio partial feature extraction unit 44 Maximum value for selecting the largest n-th similarity among the first to N-th similarities output from the sub-band similarity calculation unit 46 that calculates and outputs the first to N-th similarities and the sub-band similarity calculation unit 46 The output signal from the nth noise microphone 45 corresponding to the nth similarity selected by the maximum value detection unit 47 among the output signals of the detection unit 47 and the two or more first to Nth noise microphones 42 is selected And a noise feature extraction unit 49 for converting an output signal from the n-th noise microphone 45 output by the selection unit 48 into a time-series feature vector of noise, and a voice output from the voice feature extraction unit 43 The twoinput subtraction unit 50 is obtained by subtracting the time-series feature vector of noise output from the noise feature extraction unit 49 from the time-series feature vector of [0051] The voice containing noise is converted into an electrical signal by the voice microphone 41. At the same time, ambient noise is converted into electrical signals by the two or more first to Nth noise microphones 42. An output signal obtained from the audio microphone 41 is converted into a time-series feature vector of audio in the audio feature extraction unit 43, and at the same time, an output signal of the audio microphone 41 is processed in time series of audio partial bands in the audio partial feature extraction unit 44 Converted to feature vector. The voice feature extraction unit 43 has the same function as the voice feature extraction unit 3 in FIG. The output signals of the two or more first to Nth noise microphones 42 are converted to first to Nth time-series feature vectors of the noise partial band in the first to Nth partial feature extraction units 45, respectively. The sub-band similarity calculation unit 46 calculates the first to Nth timeseries feature vectors of the noise sub-band output from the first to N-th partial feature extraction unit 45 and the audio part output from the audio partial feature extraction unit 44. The first to N-th similarities between time series feature vectors of the band are calculated and output, respectively. The maximum value detection unit 47 selects the largest n-th similarity among the first to N-th similarities output from the sub-band similarity calculation unit 46. The selection unit 48 selects an output signal from the nth noise microphone 45 corresponding to the nth similarity selected by the maximum value detection unit 47 among the output signals of the 03-05-2019 22 two or more first to Nth noise microphones 42. Output. The output signal from the n-th noise microphone 45 obtained from the selection unit 48 is converted into a time-series feature vector of noise in the noise feature extraction unit 49. The noise feature extraction unit 49 has the same function as the voice feature extraction unit 3 in FIG. The 2-input subtraction unit 50 has the same function as that of the 2-input subtraction unit 7 in FIG. 1, and the noise feature extraction unit 49 outputs noise from the time-series feature vector of the voice output by the voice feature extraction unit 43. Two-input spectral subtraction is performed by subtracting series feature vectors. [0052] FIG. 6 is a block diagram showing an embodiment of the sixth invention. The noise removal apparatus shown in FIG. 6 is a minimum value detection unit 30 for obtaining the largest n-th similarity among the input first to N-th similarity, instead of the maximum value detection unit 26 in the embodiment shown in FIG. have. Although the example applied to FIG. 3 is shown in the present embodiment, the same configuration can be applied to the embodiment shown in FIG. 4 or FIG. [0053] FIG. 7 is a block diagram showing an embodiment of the seventh invention. The noise removal apparatus shown in FIG. 7 includes an audio microphone 61 mainly inputting speech, two or more first to Nth microphones 62 mainly input ambient noise, and disposed around the audio microphone 61, and speech A voice feature extraction unit 63 for converting an output signal of the microphone 61 into a time-series feature vector of voice and an output signal of two or more first to Nth noise microphones 62 respectively represent first to N-th time-series feature vectors of noise The first to Nth time-series feature vectors of the noise obtained from the first to Nth noise feature extraction unit 64 and the first to Nth noise feature extraction unit 64 that convert to From the time series feature vector of the speech output by the speech feature extraction unit 63 and the mean value synthesis unit 65 that outputs the synthesized vector of noise, the noise synthesis vector output by the average value synthesis unit 65 is subtracted 2 And a force subtraction unit 66. [0054] 03-05-2019 23 The voice containing noise is converted into an electrical signal by the voice microphone 61. At the same time, ambient noise is converted into electrical signals by two or more first to Nth noise microphones 62. An output signal of the voice microphone 61 is converted into a time-series feature vector of voice in the voice feature extraction unit 63, and output signals of two or more first to Nth noise microphones 62 are first to Nth noise feature extraction units. At 64, they are converted into first to Nth time-series feature vectors of noise. The voice feature extraction unit 63 and the first to Nth noise feature extraction units 64 have the same functions as the voice feature extraction unit 3 in FIG. 1. The first to Nth time-series feature vectors of noise obtained from the first to Nth noise feature extraction unit 64 are averaged by the average value combining unit 65 and output as a combined vector of noise. The 2-input subtraction unit 66 has the same function as that of the 2-input subtraction unit 7 in FIG. 1 and combines the noise output from the average value combining unit 65 from the time-series feature vector of the voice output from the voice feature extraction unit 63. Two input spectral subtraction is performed by subtracting the vectors. [0055] FIG. 8 is a block diagram showing an eighth embodiment of the present invention. In the noise removal apparatus shown in FIG. 8, instead of the average value combining unit 65 in the embodiment shown in FIG. 7, the first to Nth time-series feature vectors of noise output from the first to Nth noise feature extraction units A weighted average value combining unit 67 is provided which adds a predetermined weight and then averages the weighted average value, and outputs the averaged feature vector as a combined vector of noise. [0056] FIG. 9 is a block diagram showing an embodiment of the ninth invention. The noise removing device shown in FIG. 9 includes a voice microphone 81 mainly inputting voice, and two or more first to Nth noise microphones 82 mainly input ambient noise and arranged around the voice microphone 81; A voice feature extraction unit 83 for converting an output signal of the voice microphone 81 into a time-series feature vector of voice and an output signal of two or more of the first to Nth noise microphones 82 respectively represent first to Nth time series of noise The first to Nth noise feature extraction unit 84 for converting into feature vectors and the first to Nth time-series feature vectors of noise output from the first to Nth noise feature extraction unit 84 are divided into a plurality of bands. The minimum power is extracted for each band of the time division feature vector of the noise after band division which the division unit 85 and the division unit 85 output, and the minimum value of each band is synthesized to obtain noise 03-05-2019 24 Composite vector And a 2-input subtraction unit 87 that subtracts the synthetic vector of the noise output from the minimum value synthesis unit 86 from the time-series feature vector of the speech output from the speech feature extraction unit 83. ing. [0057] The voice containing noise is converted into an electrical signal by the voice microphone 81. At the same time, ambient noise is converted into an electrical signal by two or more first to Nth noise microphones 82 installed around the voice microphone 81. The output signal of the voice microphone 81 is converted into a time-series feature vector of voice in the voice feature extraction unit 83, and the output signals of the two or more first to Nth noise microphones 82 are respectively extracted by the first to Nth noise feature In the unit 84, noise is converted into first to Nth time-series feature vectors. The voice feature extraction unit 83 and the first to Nth noise feature extraction units 84 have the same functions as the voice feature extraction unit 3 in FIG. 1. The first to Nth time-series feature vectors of noise output from the first to Nth noise feature extraction unit 84 are each divided into a plurality of bands in the division unit 85 and output. The minimum value combining unit 86 extracts the minimum power for each band of the noise time-series feature vector after band division output from the dividing unit 85, combines the respective minimum values for each band, and generates a noise combination vector Output as The 2-input subtraction unit 87 has the same function as that of the 2-input subtraction unit 7 in FIG. 1 and is a synthesized vector of noise output from the minimum value synthesis unit 86 from the time-series feature vector of speech output by the speech feature extraction unit 83 Perform two-input spectral subtraction by subtracting. [0058] FIG. 10 is a block diagram showing an embodiment of the tenth invention. In addition to the configuration of the embodiment shown in FIG. 1, the noise removal apparatus shown in FIG. 10 includes a noise section detection unit 8 for detecting a section in which no voice exists using a feature vector obtained from the voice microphone 1 as a noise section. And the noise detection unit 5 is configured to select the nth time-series feature vector of noise using the first to N-th time-series feature vectors of noise of the noise period detected by the noise period detection unit 8. ing. [0059] 03-05-2019 25 FIG. 11 is a block diagram showing an eleventh embodiment of the present invention. In addition to the configuration of the embodiment shown in FIG. 2, the noise removal apparatus shown in FIG. 11 includes a noise section detection unit 18 for detecting a section in which no voice exists using an output signal obtained from the voice microphone 11 as a noise section. And the minimum power detection unit 14 is configured to select the output signal of the nth noise microphone using the output signals of the first to Nth noise microphones of the noise period detected by the noise period detection unit 18. . [0060] FIG. 12 is a block diagram showing an embodiment of the twelfth invention. In addition to the configuration of the embodiment shown in FIG. 3, the noise removal apparatus shown in FIG. 12 includes a noise section detection unit 31 that detects a section in which no voice exists using an output signal obtained from the voice microphone 21 as a noise section. The similarity calculating unit 25 calculates and outputs the first to Nth similarities using the first to Nth timeseries feature vectors of the noise of the noise period detected by the noise period detecting unit 31; Is configured as. Although the present embodiment shows an example applied to FIG. 3, the same configuration can be applied to the embodiment shown in FIG. [0061] FIG. 13 is a block diagram showing an embodiment of the thirteenth invention. In addition to the configuration of the embodiment shown in FIG. 5, the noise removal apparatus shown in FIG. 13 detects a section in which no voice is present as a noise section using an output signal obtained from the voice microphone 41. The sub-band similarity calculation unit 46 uses the first to N-th time-series feature vectors of the noise sub-band of the noise section detected by the noise section detection section 51 to generate Are configured to calculate and output. [0062] FIG. 14 is a block diagram showing an embodiment of the fourteenth invention. In the configuration of the embodiment shown in FIG. 10, in the configuration of the embodiment shown in FIG. 10, in the noise removal apparatus shown in FIG. 14, the noise segment detection unit 9 detects a segment without voice as a noise segment using a feature vector output by the 2- 03-05-2019 26 input subtraction unit 7 Is configured as. In the present embodiment, an example applied to FIG. 10 is shown, but the same configuration can be applied to the embodiment shown in FIG. 11, FIG. 12 or FIG. [0063] FIG. 15 is a block diagram showing an embodiment of the fifteenth invention. In addition to the configuration of the embodiment shown in FIG. 3, the noise removal apparatus shown in FIG. 15 detects a section without voice as a noise section using an output signal obtained from the voice microphone 21. In the noise section detected by the noise section detecting section 31 instead of the maximum value detecting section 26, the largest similarity is selected out of the first to N similarities, and the noise section detecting section 31 detects the noise section. If not, it has a maximum / minimum value detection unit 32 that selects the lowest similarity among the first to Nth similarities. The noise section detection unit 31 can also be configured to detect a section in which no speech is present as a noise section using the feature vector output from the 2-input subtraction section 28. Although the example applied to FIG. 3 is shown in the present embodiment, the same configuration can be applied to the embodiment shown in FIG. 4 or FIG. [0064] As described above, according to the noise removing apparatus of the present invention, the noise component contained in the voice microphone is estimated by using a plurality of noise microphones to remove the noise component, so that the characteristics are temporally and spatially separated. It is possible to perform noise removal efficiently even for changing nonstationary noise, and perform stable noise removal without removing a necessary voice signal even when voice mixing into a noise microphone occurs. Have the effect of [0065] Brief description of the drawings [0066] 1 is a block diagram showing an embodiment of the first invention. [0067] 2 is a block diagram showing an embodiment of the second invention. 03-05-2019 27 [0068] 3 is a block diagram showing an embodiment of the third invention. [0069] 4 is a block diagram showing an embodiment of the fourth invention. [0070] 5 is a block diagram showing an embodiment of the fifth invention. [0071] 6 is a block diagram showing an embodiment of the sixth invention. [0072] FIG. 7 is a block diagram showing an embodiment of the seventh invention. [0073] FIG. 8 is a block diagram showing an eighth embodiment of the present invention. [0074] FIG. 9 is a block diagram showing an embodiment of the ninth invention. [0075] 10 is a block diagram showing an embodiment of the tenth invention. FIG. [0076] FIG. 11 is a block diagram showing an eleventh embodiment of the present invention. [0077] 12 is a block diagram showing an embodiment of the twelfth invention. FIG. 03-05-2019 28 [0078] FIG. 13 is a block diagram showing an embodiment of the thirteenth invention. [0079] FIG. 14 is a block diagram showing an embodiment of the fourteenth invention. [0080] FIG. 15 is a block diagram showing an embodiment of the fifteenth invention. [0081] 16 is a block diagram showing a conventional two-input spectral subtraction denoising apparatus. [0082] Explanation of sign [0083] 1, 11, 21, 41, 61, 81, 201 voice microphones 2, 12, 22, 42, 62, 82, 202 noise microphones 3, 13, 23, 43, 63, 83, 203 voice feature extraction unit 4, 16 , 24, 49, 64, 84 Noise feature extraction unit 5 Noise detection unit 6, 15, 27, 48 Selection unit 7, 17, 28, 50, 66, 87, 205 2input subtraction unit 8, 9, 18, 31, 51 noise section detection unit 14 minimum power detection unit 25 similarity calculation unit 26, 47 maximum value detection unit 29 weight addition unit 30 minimum value detection unit 32 maximum / minimum value detection unit 44 audio partial feature extraction unit 45 partial feature extraction unit 46 Subband similarity calculation unit 65 average value combining unit 67 weighted average value combining unit 85 dividing unit 86 minimum value combining unit 03-05-2019 29
© Copyright 2021 DropDoc