close

Вход

Забыли?

вход по аккаунту

JP2001100774

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2001100774
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
speech processing apparatus for performing processing to improve the intelligibility of speech
that has been loudened indoors.
[0002]
2. Description of the Related Art When giving a lecture or lecture in a room such as a lecture hall,
a multipurpose hall, a classroom, or a church, a voice generated by a speaker is detected by a
microphone and subjected to electrical processing such as amplification. It radiates into the room
as sound from the speaker installed in the hall and finally reaches the ear of the audience.
[0003]
Under such circumstances, the reverberation in the room usually reduces the intelligibility of the
sound emitted from the speaker.
In particular, for people with senile deafness and deafness, the adverse effects of such an
influence are large, and the voice becomes very difficult to hear.
[0004]
08-05-2019
1
As described above, when the speech of the speaker is detected by the microphone in the room,
and the speaker is loudened and radiated into the room, the speech clearly reaches the ear of the
audience under the influence of the reverberation. There was a problem that the degree
decreased.
[0005]
The present invention can improve the intelligibility of the sound emitted from the speaker and
reaching the ear of the audience by subjecting the sound signal detected by the microphone to a
specific process before being output to the speaker An object of the present invention is to
provide a voice processing device.
[0006]
SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present
invention is an audio processing apparatus which performs processing on an input audio signal
before being output to a speaker. The basic feature is to perform processing to emphasize
specific frequency components of the modulation spectrum.
[0007]
It is known that there is a strong correlation between the shape of the modulation spectrum of
the speech (spectrum relative to the temporal envelope of the speech signal) and the
intelligibility of the speech.
When reverberation is added to speech indoors, the intelligibility of the speech is reduced
depending on the degree of the reverberation, because the modulation spectrum of the speech is
changed due to the reverberation.
[0008]
The appearance of the change received by the modulation spectrum in this way is generally
described by MTF (modulation transfer function).
In reverberant rooms, the MTF has low pass characteristics.
08-05-2019
2
Although the peak of the modulation spectrum of speech originally exists around about 4 Hz,
reverberation shifts the position of the peak to a lower frequency, and at the same time the
modulation index also decreases, resulting in a loss of speech intelligibility.
[0009]
From this consideration, it is possible to perform the processing before the reverberation is
added to the input voice signal, and the modulation spectrum of the voice after the reverberation
is added to the original voice as compared to the case where the processing is not performed. It
is expected that if it gets close, it will be possible to prevent a reduction in the intelligibility of
speech that is actually loudened and reaches the audience's ear.
[0010]
According to the study of the present invention, as a process to be applied to such an audio
signal in advance, a specific frequency component of the modulation spectrum of the audio
signal, for example, a component near 4 Hz which is the peak of the modulation spectrum,
specifically 2 Hz to 8 Hz It has been confirmed that it is effective to use a process to emphasize
low frequency components in the range of
[0011]
A speech processing apparatus according to one aspect of the present invention includes a filter
bank that divides an input speech signal into a plurality of bands, and a plurality of envelope
extractions that extract envelope information from the speech signal of each band divided by the
filter bank. , A plurality of filters that perform processing for emphasizing specific frequency
components with respect to envelope information extracted by the plurality of envelope
extractors, and a filter bank divided with respect to output signals of the plurality of filters It has
a plurality of multipliers which multiply the phase information of the audio signal of each band
respectively, and an adder which adds the output signals of the plurality of multipliers.
[0012]
According to another aspect of the present invention, a voice processing apparatus performs
windowing processing on an input voice signal to divide the voice signal into a plurality of
frames, and a window processing unit divides the voice signal into a plurality of frames. A fast
Fourier transformer that performs fast Fourier transformation on the audio signal of each frame
to obtain amplitude information and phase information for each frame, and specific frequencies
for the amplitude information for each frame obtained by this fast Fourier transformer A
08-05-2019
3
plurality of filters for performing processing for emphasizing components, and an inverse fast
Fourier transformer for performing inverse fast Fourier transform on output signals of each
frame from the plurality of filters using phase information obtained by the fast Fourier
transformer; And an overlap-add unit for adding and partially overlapping the output signal of
each frame of the inverse fast Fourier transformer.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be
described below with reference to the drawings.
FIG. 1 shows an example of a voice amplification system to which the present invention is
applied.
In a room 1 such as a lecture hall, a multipurpose hall, a classroom, or a church, a microphone 3
detects a voice generated by a speaker 2 who gives a lecture or lecture.
The audio signal output from the microphone 3 as an electrical signal is amplified by the
preamplifier 4 and then input to the audio processing device 5 according to the present
invention.
[0014]
The audio processing device 5 performs signal processing for improving the intelligibility of the
audio to the input audio signal, that is, processing for emphasizing a specific frequency
component of the modulation spectrum of the audio signal as will be described in detail later. .
The audio signal processed by the audio processing device 5 is amplified by the power amplifier
6 and then supplied to the speaker 7 installed in the room 1 and is finally emitted as sound by
being emitted from the speaker 7 as a sound. To reach the ear.
[0015]
08-05-2019
4
First Embodiment Next, the voice processing device 5 will be described in detail. FIG. 2 is a block
diagram showing a first embodiment of the speech processing device 5. In FIG. 2, the audio
signal amplified by the preamplifier 4 of FIG. 1 is input to the input terminal 10. This input audio
signal is sampled by the A / D converter 11 at a sampling frequency of 16 kHz, for example, and
converted into a digital signal of about 16 bits.
[0016]
The digitized speech signal output from the A / D converter 11 has a 1/3 octave equivalent Q
commonly used in speech processing to engineeringly simulate the critical band of human
auditory characteristics. The filter bank 12 composed of the band pass filters 12-1, 12-2, ..., 12-n
divides the signal into a plurality of (n) bands. Although the number of band divisions n is not
limited to this, for example, n = 16. FIG. 3 shows an example of a time waveform of an output
signal of a certain band pass filter 12-1.
[0017]
The speech signals of each band divided by the filter bank 12 are input to n processing blocks
13-1, 13-2, ..., 13-n. The processing blocks 13-1, 13-2, ..., 13-n basically have the same
configuration, and thus only one processing block 13-1 will be described.
[0018]
In the processing block 13-1, the audio signal which has been band-limited by the band pass
filter 12-1 is first input to the envelope extractor 14. The envelope extractor 14 extracts
envelope (strictly, temporal envelope) information of the input audio signal, in other words,
amplitude information, and is specifically realized by, for example, a Hilbert transformer. The
envelope extractor 14 extracts the phase information of the input audio signal separately from
the envelope information.
[0019]
08-05-2019
5
The envelope information extracted by the envelope extractor 14 is input to the down sampler
16 through the low pass filter 15 and downsampled to 1 / M to facilitate the subsequent filter
processing. Since the modulation spectrum component of 50 Hz or more is not so important due
to the structure of the modulation spectrum of speech, the downsampling ratio M is, for example,
50 Hz for the highest frequency after downsampling, ie 100Hz for the sampling frequency after
downsampling. So that M = 160. The low pass filter 15 removes unnecessary components of high
frequency generated by the Hilbert transformer, which is the envelope extractor 14, and also
prevents the occurrence of aliasing distortion during downsampling by the down sampler 16.
The cut-off frequency is set to, for example, 40 Hz.
[0020]
The down-sampled envelope information output from the down sampler 16 is input to the
modulation spectrum filter 17 according to the present invention. In FIG. 2, the modulation
spectrum filter is described as a modulation filter for the sake of simplicity.
[0021]
FIG. 4 shows an example of the time waveform of the output signal of the down sampler 16
inputted to the modulation spectrum filter 17. The modulation spectrum filter 17 has frequency
characteristics as shown in, for example, FIGS. 5 (a), (b), (c) and (d), and a specific frequency
component of the spectrum (modulation spectrum) for the input envelope information, for
example The speech intelligibility is improved by performing processing that emphasizes
components of preferably 1 Hz to 10 Hz, more preferably 3 Hz to 8 Hz.
[0022]
Although the characteristic of the modulation spectrum filter 17 may be fixed, it may be adjusted
to an optimum characteristic according to the MTF characteristic of the room 1 using a filter of
variable characteristic. Alternatively, a plurality of filters having different characteristics may be
prepared as the modulation spectrum filter 17, and a filter optimum for each channel may be
selected and used from these according to the MTF characteristics of the room 1. That is, the
characteristics of the modulation spectrum filter 17 may be the same for each channel, but may
be different.
08-05-2019
6
[0023]
FIG. 6 shows the time waveform of the output signal of the modulation spectrum filter 17, and
FIG. 7 shows an example of the frequency characteristic. This is an example using a filter having
the characteristic of FIG. 5A as the modulation spectrum filter 17, and it is apparent from
comparison of the frequency characteristic of the input signal of the modulation spectrum filter
17 shown in FIG. The peak around 4 Hz is emphasized.
[0024]
The output signal of the modulation spectrum filter 17 is up-sampled M times by the up-sampler
18 to be a sampling frequency before down-sampling by the down-sampler 16, and then input to
the multiplier 20 through the half-wave rectifier 19. And multiplied by the phase information
separated by the envelope extractor 14. The time waveform of the output signal of the multiplier
20 is shown in FIG. Then, the band pass filter 21 removes an unnecessary component generated
in the process from the output signal of the multiplier 20, and becomes an output of the
processing block 13-1.
[0025]
The output signals of the processing blocks 13-1, 13-2,... 13-n are synthesized into one audio
signal by the adder 22, then converted from digital signal to analog signal by the D / A converter
23, and output It is output from the terminal 24. The audio signal output from the output
terminal 24 is input to the power amplifier 6 of FIG. 1 and emitted from the speaker 7 as sound.
[0026]
Next, the effects of the voice processing device 5 will be specifically described. In the case where
each of the filters having the characteristics shown in FIGS. 5 (a), (b), (c) and (d) is used as the
modulation spectrum filter 17, the speech processed by the speech processing device 5 of this
embodiment and the original speech not processed. The following experiment was conducted in
the reverberant church of the church. FIGS. 5A, 5B, and 5C are characteristics that mainly
08-05-2019
7
emphasize the vicinity of 4 Hz, but the peak value and shape of the frequency response are
different. FIG. 5D is a characteristic that mainly emphasizes around 6 Hz.
[0028]
Table 8 shows the results of having four deaf persons who participated as subjects select which
of the processed speech and the original speech is easy to hear. The numerical values in Table 1
represent the proportion of subjects who answered that the processed speech is easier to hear
than the original speech when the filters in FIGS. 5 (a), (b), (c) and (d) are used respectively. is
there. Here, particularly when the filter of FIG. 5 (a) is used, it is answered that all the four
subjects' voices are easier to hear than the original voice, even if the filter of FIG. 5 (d) is used.
Three out of four said that the processed voice was easier to hear. When a hearing person was
asked to participate in the same experiment as a reference, the impression of the voice after the
treatment received by the hearing person was the answer that it was almost the same as the
original speech.
[0029]
On the other hand, in the case of using the filters in FIGS. 5 (b) and 5 (c), the subject who
answered that the voice after processing was easier to hear was divided into half and the
evaluation was divided. Two of the persons who answered that the processed speech was easy to
hear were all relatively deaf.
[0030]
From the above results, it has been confirmed that the speech processing device according to the
present invention is particularly effective for the hearing impaired person in preventing a
reduction in clarity due to reverberation.
[0031]
Second Embodiment FIG. 9 is a block diagram showing a second embodiment of the speech
processing apparatus 5 of the present invention.
As in the first embodiment shown in FIG. 2, the audio signal amplified by the preamplifier 4 of
FIG. 1 is input to the input terminal 10 and sampled by the A / D converter 11 at a sampling
08-05-2019
8
frequency of 16 kHz, for example. And converted to a digital signal of about 16 bits.
[0032]
The windowing processing unit 31 first performs windowing processing using a Hamming
window or the like on the digitized input audio signal output from the A / D converter 11.
That is, the windowing processing unit 31 divides the input voice signal, which is a time
waveform, into a plurality of frames, and each frame overlaps with a half period or a quarter
frame period. The time length of the frame is, for example, 16 msec.
[0033]
The signal of each frame from the windowing processing unit 31 is subjected to fast Fourier
transform by fast Fourier transformer (FFT) 32, that is, the signal of time domain which is an
input voice signal is converted to signal of frequency domain. Each amplitude information and
phase information are output. Phase information for each frame is sequentially held for use in an
inverse fast Fourier transformer described later. The amplitude information for each frame
output from the fast Fourier transformer 32 is weighted according to the auditory characteristic
by the auditory weighting unit 33 as needed, and after the critical band characteristic is given,
the modulation spectrum filter 34-1, 34-2, ... 34-n. In FIG. 9 also, the modulation spectrum filter
is described as a modulation filter for the sake of simplicity.
[0034]
The modulation spectrum filters 34-1, 34-2,... 34-n emphasize the specific frequency component
of the modulation spectrum, for example, 2 Hz to 8 Hz, similarly to the modulation spectrum
filter 17 in the first embodiment. By doing, it is for improving the intelligibility of speech.
[0035]
The output signals of the modulation spectrum filters 34-1, 34-2, ... 34-n are input to an inverse
fast Fourier transformer (IFFT) 36 through half wave rectifiers 35-1, 35-2, ..., 35-n. Here, the
inverse fast Fourier transform, that is, the conversion from the signal in the frequency domain to
08-05-2019
9
the signal in the time domain is performed using the phase information output from the fast
Fourier transformer 32 and held, that is, the overlap and addition unit (OLA) Processing reverse
to the windowing processing unit 31 is performed by 37.
That is, in the overlap and addition unit 37, the signals subjected to inverse fast Fourier
transform for each frame are sequentially added while being overlapped each other by a half
frame period or a quarter frame period to synthesize one audio signal. Do.
[0036]
The audio signal output from the overlap and addition unit 37 is converted from a digital signal
to an analog signal by the D / A converter 23 and output from the output terminal 24. The audio
signal output from the output terminal 24 is input to the power amplifier 6 of FIG. 1 and emitted
from the speaker 7 as sound.
[0037]
It is apparent that the same effects as in the first embodiment can be obtained by the
configuration of the second embodiment described above. The present invention can be variously
modified and implemented.
[0038]
As described above, according to the present invention, an audio signal detected by a microphone
or the like is subjected to processing for emphasizing a specific frequency component in the
vicinity of the peak of its modulation spectrum, so that It can effectively improve the
intelligibility of the emitted sound and is effective for the hearing impaired and the elderly.
08-05-2019
10
1/--страниц
Пожаловаться на содержимое документа