JPH0243893

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH0243893
[0001]
The present invention relates to a speech recognition apparatus. 2. Related Art In recent years,
development of speech recognition devices has been brisk, and specific speaker methods and
non-specific speaker methods have been put to practical use. However, the recognition rate of
this device also changes significantly if the use environment changes. For example, in the case
where a voice is picked up with a reflector such as a conference desk or a blackboard placed in
the vicinity of the microphone, the transmission frequency characteristic thereof is a comb-like
shape having a large number of dips, and the speaker-microphone-reflection Body order '+? !
Depending on the relationship, the level characteristics fluctuate greatly. This is one of the
recognition rate reduction factors when speech recognition is performed by DP matching or the
like. Regarding the effect of transmission frequency characteristics on the recognition
performance, we conducted a recognition evaluation experiment using similar words for speech
where the distance between the speaker and the microphone for sound collection was changed,
and with and without the reflector. The results are reported in the Proceedings of the Acoustical
Society Conference, March 1988, 269, 270 pages. According to the report, the height from the
reflector to the sound source (speaker) and the height from the reflector to the microphone are
measured simultaneously using three microphones arranged at different distances from the
speaker's mouth. When the distance from the sound source to the microphone is 10 ° 50. 90 ■
as 30 Gm. (2) The recognition rate when there is no reflector and there is no personal variation
of the speaker (fluctuation of the speech that occurs each time the user speaks), there is almost
no difference due to the difference in microphone pickup distance. ■ The recognition rate in the
case of intra-individual fluctuation has caused about 20% fluctuation. (1) The recognition rate in
the presence of a reflector decreases as the distance between the speaker and the microphone
for sound collection increases, and becomes remarkable particularly in the case of L = 90 an. In
addition, the value changes with speakers. (2) Even when there is a reflector, the recognition rate
08-05-2019
1
of L = 50 ao is in the range included in the amount of individual white fluctuation of the speaker,
but is reduced by several percent compared to the case of L = 10 a ++ under the same conditions.
It is believed that he understood. Thus, there is a difference in recognition rate between when
there is a reflector nearby and when it is not. This is noticeable, for example, when using a
speaker-independent recognition system in a car. In the specific speaker system, although it is
possible to avoid to some extent by creating a standard pattern in the use environment, it is not
possible to formulate measures against this because the specific speaker does not know what
kind of environment it will be used in . That is, in a narrow space such as in a car, the frequency
characteristics of the microphones that reflect sound are often changed to lower the recognition
rate.
The present invention has been made in view of the above-described circumstances, and in
particular, it is an object of the present invention to provide a recognition device in which the
recognition rate does not decrease even in places with different acoustic characteristics, such as
in a car. It is. In order to achieve the above object, the present invention has an acoustic /
electrical converter that converts voice into an electrical signal, a filter group that analyzes the
electrical signal, and a pattern comparison unit that compares the analyzed results. In the speech
recognition apparatus, means for reproducing one or more of the sound at the center frequency
of each filter, means for calculating and storing the addition or average value of the outputs of
each filter, the output value of each filter and the stored It is characterized in that it comprises
means for obtaining a difference from a value, and means for making the filter output larger or
smaller depending on the magnitude of the difference. The present invention will be described
based on the embodiments of the present invention. FIG. 1 is a block diagram for explaining one
embodiment of the present invention, in which 1 is a ROM, 2 is a D / A conversion circuit, 3 is an
amplifier, 4 is a speaker, 5 is a microphone, 6 is a microphone Amplifier 7, 7 Filter group, 8
Subtractor, 9 Register, 10 10 Adder, 11 Bit shift unit 12, 12 Register, 13 Difference calculator,
14 Switch, 15 Comparer, 16 Speech dictionary 17 is a maximum similarity calculation unit, and
18 is a recognition result output unit. In the illustrated embodiment, the number of filters in the
filter group 7 is eight, and the sound of the microphone 5 for voice input is amplified by the
microphone amplifier 6, Analyze with filter group 7. Although the result of analysis is rectified
and quantized by an A / D converter (not shown) and stored in a subtractor 8 and stored in a
register 9 (not shown in FIG. 10), the output of the microphone amplifier 6 is converted to a
logarithm That's normal. As shown in FIG. 2, the subtraction unit 8 subtracts the values
determined from the outputs of the filters 71, 7..., And the subtraction thresholds 8 .quadrature.
8 .quadrature. It is given to 81.82... But 0 is inserted as an initial value. Next, the eight values
stored in the register 9 are added by the adding circuit 10, and bit shifting is performed three
times by the bit shift unit 11 to become 1/8, and the average value is calculated. Store it. First,
the switch 14 is turned to the A side, each value of 1 to 8 of the register 9 is subtracted from the
average value of the register 12, and the value is subtracted by the subtraction unit 8 of the filter
output, that is, the threshold 81182 of FIG. substitute. Accordingly, the value Yi to be set to the
08-05-2019
2
threshold value i is expressed as Yi = X-Xi (1) by using the average X of eight values X1 (i = 1 to
8) of the register.
On the other hand, in the ROM 1, a signal to which sine waves of respective central frequencies
of the filters 7 □, 7 □,. \ This signal is converted to analog and amplified and reproduced from
the electroacoustic transducer (speaker) 4. At this time, it is necessary to set an amplitude value
such that the output level becomes constant when each component of the frequency stored in the
ROMI is reproduced. Make the above adjustment while playing back this sound. FIG. 3 is a
diagram showing the above adjustment, in which the horizontal axis is the channel number of
each band pass filter representing frequency, and the vertical axis is the level. (A) The figure
shows the characteristics in free space, and a speech dictionary for speaker-independent
recognition is created under this condition. When this recognition device is brought into a
narrow space such as a car, it has characteristics as shown in FIG. Here, when the average level is
calculated from these eight points, it becomes like a broken line in the figure. Further, when each
value of (b) in the figure is subtracted from the average value according to the equation (1), it
becomes as shown in (c), and this value is used as each threshold in FIG. After this adjustment,
the output of each filter becomes as shown in (d), and can be corrected to the original
characteristics as shown in (a), thereby preventing a decrease in the recognition rate due to use
in a narrow space. At the time of recognition, the switch 14 is turned to the B side, and the
frequency characteristic is corrected. In the figure, a portion for obtaining the maximum
similarity and the comparison portion as the recognition portion is described, but this is a
necessary portion regardless of the pattern matching method, and specifically, the dynamic
programming method is used. Any method may be used such as what is known as DP matching.
Effect As apparent from the above description, according to the present invention, it is possible
to correct the influence of the reflection of sound even in a limited space such as a room and
make the frequency characteristics of the input voice from the microphone flat. As a result, the
recognition rate can be improved.
[0002]
Brief description of the drawings
[0003]
1 is a block diagram for explaining an embodiment of the present invention, FIG. 2 is a detailed
view of a subtraction unit shown in FIG. 1, and FIG. 3 is for explaining the operation of the
present invention. It is the same.
08-05-2019
3
DESCRIPTION OF SYMBOLS 1 ... ROM, 2 ... D / A conversion circuit, 3 ... amplifier, 4 ... speaker, 5
... microphone, 6 ... microphone amplifier, 7 ... filter group, 8 ... Subtraction unit, 9 ... register, 10
... adder, 11 ... bit shift unit. 12 ... register, 13 ... difference operation unit, 14 ... switch, 15 ...
comparison unit, 16 ... speech dictionary, 17 ... maximum similarity calculation unit, 18 ...
recognition result Output unit. Figure 1
08-05-2019
4