JPH03131198

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH03131198
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
speech recognition apparatus for recognizing input speech, and more particularly to a speech
recognition apparatus capable of detecting that the relative position between a speech source
and a microphone is inappropriate. <Prior Art> Whether the relative positional relationship
between a microphone and a voice source (speaker's mouth) is appropriate whether voice (when
operating with a word processor or personal computer etc. incorporating a voice recognition
device) The factor greatly affects the recognition rate. Such misrecognition due to the relative
position shift between the voice source and the micropong is particularly effective when there is
a few-day interval between the date of voice registration and the date of voice recognition in
voice recognition by a specific speaker. Because the position of the microphone with respect to
the sound source at the time of voice registration and the position with respect to the sound
source of the microphone at the time of voice recognition are different. : It is easy to occur. In the
conventional speech recognition apparatus, the relative position between the microphone and
the speech source is not taken into account, and the relative position between the microphone 7
speech source and the microphone 7 speech source is detected by slight spectral fluctuation of
the input speech. There are only those (Japanese Patent Application Laid-Open No. 63-131194)
and the like which instruct the speaker to restore the image. <Problems to be Solved by the
Invention> However, since the above-mentioned conventional relative position shift detection
method of microphone and voice source uses spectrum fluctuation of human-powered speech,
spectrum fluctuation due to change in vocalization is detected by microphone position. There is a
problem that the recognition rate does not improve so much in the case of one change
recognition and the opposite case. Therefore, it may be considered to use the spectral variation
of only vowels with little spectral variation due to the change of the utterance. However, in such
a case, relative positional deviation detection processing between the micropong and the speech
03-05-2019
1
source must be executed based on the result of speech recognition processing. There is a
problem to try. Therefore, an object of the present invention is to provide a speech recognition
apparatus capable of stably detecting that the relative position between the micropong and the
speech source is inappropriate. <Means for Solving the Problems> In order to achieve the above
object, according to a first aspect of the present invention, there is provided a speech recognition
apparatus for recognizing an input speech, located in a feature j1 of speech seconds, arranged in
the vicinity of a speech source. A first microphone, a second microphone arranged at a fixed
distance from the upper two sound sources, and a power of an output from the j-th microphone
(a power of the l-th microphone and a power of an output from the second microphone) Based on
the power ratio of both powers, the power ratio calculation unit and the power ratio calculation
unit determine whether or not the power ratio value obtained by the power ratio calculation unit
satisfies the predetermined condition. When the above condition is not satisfied, the first
microphone is characterized in that it has a position signal <position signal <position
determination section 7 <j 'which is appropriate for the position signal <7' of the voice 7 of the
first microphone.
In the second invention, an average power ratio, which is an average value of power ratios
determined by -F at the time of voice registration,-by the power ratio calculation unit in the voice
north (in the recognition device [)] The memory to be stored is' Q ', and the above microphone
position 1', ¥ judgment unit (j, Goni writing memory (this is stored at the time of voice
registration ([; The average power ratio Ii: comparing the power ratio values at the time of seven
speech recognition, when the value of the power ratio at the time of speech recognition does not
fall within the predetermined area based on the value of the average power ratio at the time of
speech registration It is characterized in that it is determined that the position of the first
microphone with respect to the sound source is inappropriate. According to a third aspect of the
present invention, in the above-described speech recognition apparatus 713, the position of the
17I cropon with respect to the sound source is changeable, and the position of the first
microphone with respect to the sound source is determined by the microphone position
determination unit. The display device is characterized by including a display unit that displays
an instruction to change the position of the first microphone when it is determined that the
position is inappropriate. In the first aspect of the invention, the voice emitted from the voice
source is input to the first micro pon provided near the voice source and the second microphone
placed at a fixed distance from the voice source. Oh. Then, based on the power of the output from
the first microphone and the power of the output from the second microphone, the power ratio
calculator determines the power ratio of the two powers. Then, the microphone position
determination unit determines that the position of the first microphone with respect to the sound
source is inappropriate when the value of the power ratio obtained by the power ratio calculation
unit satisfies a predetermined condition. Therefore, it can always be monitored whether the
position of the first microphone with respect to the sound source is inappropriate. In the second
aspect of the invention, the value of the average power ratio obtained at the time of voice
03-05-2019
2
registration by the power ratio calculation unit in the voice recognition device is stored in the
memory. Then, at the time of voice recognition, the value of the average power ratio at the time
of voice registration stored in the memory is compared with the value of the power ratio at the
time of voice recognition by the microphone position determination unit. Then, when the value of
the power ratio at the time of speech recognition does not fall within the predetermined area
based on the value of the average power ratio at the time of speech registration, it is determined
that the position of the seventeenth transphone relative to the speech source is inappropriate. is
there. Therefore, it becomes possible to perform voice input in the same state as at the time of
voice registration at the time of voice recognition at the time of voice recognition by a specific
speaker. In the third aspect of the invention, the microphone position determination unit of each
of the voice recognition devices determines that the position of the first microphone relative to
the voice source is inappropriate, and the display unit instructs the position change of the first
microphone. Is displayed.
Then, the position of the first microphone with respect to the sound source is changed by the
operator according to the display content of the display unit. Therefore, the position of the first
microphone with respect to the sound source is always maintained at the optimum position. The
present invention will be described in detail by way of the illustrated embodiments. FIG. 1 is a
block diagram showing an embodiment of the present invention. The voice uttered from the
voice source 1 is manually applied to the first microphone 2 disposed in the vicinity of the voice
source 1 and the second microphone 3 disposed on the fixed shield from the voice source 1.
Then, the voices from the voice source 1 input to the first microphone 2 and the second
microphone 3 are converted into acoustic signals, respectively, and input to the amplification
unit 4. Then, only the voice band is passed and amplified by the amplification WJ 4 and sent to
the feature extraction unit 5. The feature extraction unit 5 calculates the feature amount (for
example, power, power ratio cepstrum, etc.) based on the waveform of the input 2+ and the
sound signal from the second microphone 2.3, and obtains the feature pattern of the input
sound. . Standard patterns of various voices are registered in advance in the standard pattern
memory WJ 7, and the voice recognition unit 8 is registered in, for example, the feature pattern
of the input voice sent from the feature extraction unit 5 and the standard pattern storage unit 7.
The input speech is recognized by finding the likelihood of the standard pattern 7 by pattern
matching with the standard pattern. Then, the recognition result is displayed on the display unit
IO. The memory 9 is a working memory used in the voice recognition operation described above,
and the keyboard 11 is for inputting an instruction of the voice recognition operation. The
control unit 6 has a feature extraction unit 5. Drift pattern storage 7 speech recognition 8
Memory 91 display m10. The key recognition unit 11 controls the entire voice recognition
device to execute voice recognition operation. At the same time, a process of determining
whether or not the position of the first microphone 2 with respect to the sound source 1 is
inappropriate is performed as described in detail later. The first microphone 2 and the second
microphone 3 are composed of, for example, a headset microphone as shown in FIG. At both ends
03-05-2019
3
of the band 21 which is curved along the head of the speaker, there are respectively attached
head mounting members 22.23 each having a butt attached to the surface in contact with the
head. The arm holding member 24 attached to the outside of one of the head mounting members
23 penetrates the arm holding member 24 so as to be slidable in the penetrating direction and
rotates about the arm holding member 24. Wear the arm 25 so that it can move.
In addition, the key earphone 26 is attached to the outside of the arm 25 of the arm holding
member 24. At this time, the position of the earphone 26 is set so as to be the position of the
adjustment ear when the head set microphone is attached to one head. The first microphone 2 is
attached to the front end of the arm 25 and the second microphone 3 is attached to the top of
the band 21. By doing this, when the headset microphone is worn on the head, the headset
microphone is fixed at a predetermined position on the head by the head mounting members 22
and 23, so The distance between the two microphones 3 is constant. On the other hand, while the
first microphone 2 is disposed in the vicinity of the mouth, the position relative to the mouth is to
slide or rotate the first microphone 2 back and forth by hand? You can change it freely. The voice
recognition apparatus of the above configuration detects whether the relative position between
the voice source 1 and the first microphone 2 is inappropriate as follows. When the relative
position between the audio source 1 and the first microphone 2 is in an ideal positional
relationship, the power of the output from the first microphone 2 calculated by the feature
extraction unit 5 is Pa, and the output from the second microphone 3 The power of the above is
Pb, and the power ratio of both (Pa / Pb) is C6. On the other hand, at the time of speech
recognition, it is assumed that the position of the first microphone 2 moves and the power of the
output from the first microphone 2 changes to Pa '. Then, the power ratio after movement of the
first microphone 2 calculated by the feature extraction unit 5 is Pa '/ Pb. In this case, the power
ratio is Pa ′ ′ / Pb <C6 when the position of the first microphone 2 is changed and the relative
position from the audio source 1 is far. Further, the power ratio when the position of the first
microphone 2 changes and the relative position from the audio source 1 approaches is Pa ′ /
Pb> C. となる。 Therefore, when the power ratio (Pa '/ Pb) at the time of speech recognition is in
the relation with the power ratio C8 under the ideal condition as shown in the following equation
(1), the control unit 6 determines the position of the first microphone 2 Is determined to be
closer to the audio source 1 than the ideal position. On the other hand, in the case where there is
only one relationship such as equation (2), it is determined that the position of the first
microphone 2 is farther from the sound source 1 than the ideal position. P a ′ / P b <αxco(DPa ′ / Pb> βXCo- (2) where, (α <1 <β) where α and β are constants determined in the
experiment. Then, based on the relationship between the power ratio C6 when the relative
position between the voice source 1 and the first microphone 2 is ideal, the relationship is, for
example, the relationship of the above (+) equation. If it becomes Please move the microphone
close to your mouth.
03-05-2019
4
An instruction to change the position of the first microphone 2 is displayed on the display unit
10. By doing this, the speaker can know whether or not the position of the first microphone 2
with respect to the mouth of the first microphone 2 is appropriate by the display of the display
unit 10, and the word processor or personal computer by voice operation can be used for a long
time Even in the case of operation, the relative position between the sound source 1 and the first
microphone 2 can always be maintained at an ideal position. Usually, when operating the voice
interchanging device, the R, 7% -voice is uttered with a substantially constant strength, unlike in
conversation. Therefore, even if the utterance content changes, the average power is stable.
Therefore, the determination as to whether or not the position of the microphone 0 according to
the present embodiment is inappropriate can be stably determined based on the spectral
fluctuation <f-11. Thus, in the present embodiment, the feature is based on the voice signal from
the first microphone 2 disposed in the vicinity of the voice source 1 and the voice signal from the
second microphone 3 disposed at a constant distance from the voice source 1 The extraction unit
5 obtains the respective powers to calculate the power ratio of the two. The power ratio C9 when
the relative position between the voice source 1 and the first microphone 2 is in an ideal
positional relationship by the control fi 6 and the power ratio at the time of Yoshi voice
recognition, the above (1), (2 It is determined whether the position of the first microphone 2 is
inappropriate based on the equation (4). Then, if it is determined that the Iff placement of the
first microphone 2 is inappropriate , the fact is displayed on the display 1110. Therefore, the
speaker can always maintain the position of the first microphone 2 with respect to the voice 1
original 1 according to the display content of the display f $ 10. That is, it is possible to eliminate
misrecognition caused by the right side without the position of the first microphone 2. Further,
by using the above-described method, it is possible to reduce misrecognition at the time of
speech recognition by a specific speaker as follows. That is, first, the average value of the power
ratio of the specific speaker at the time of voice registration is stored in the feature pattern
storage unit 7 by the # feature extraction unit 51 and calculation 7. Then, the control unit 6
combines the value of the average power ratio at the time of voice registration stored in the
feature pattern storage unit 7 with the value C6 of the power ratio when the position of the first
microphone 2 in the above embodiment is ideal. The quality of the position of the first
microphone 2 at the time of speech recognition is determined based on the above equations (1)
and (2). By doing this, it is possible to detect whether the relative position of the voice source 1
and the first microphone 2 at the time of voice recognition is the same as the relative position of
the voice source 1 and the first microphone 2 at the time of voice registration.
Then, as in the case of the above embodiment, when the relationship of the power ratio becomes,
for example, the relationship of the above-mentioned equation (2), "Please move the microphone
away from the mouth on the display unit 10." It is displayed as J. Therefore, the speaker can
perform voice input in the same state as at the time of voice registration according to the display
on the display unit 10 at the time of voice recognition. Thus, in the present embodiment, the
average power from the calculated power q of the output from the first microphone 2 and the
03-05-2019
5
power of the output from the second microphone 3: The value of the ratio is stored in the feature
pattern storage unit 7. Then, based on the value of the average power ratio stored in the feature
pattern storage unit 7 and the value of the power ratio at the time of speech recognition, the
control unit 6 sets the first microphone based on the above equations (1) and (2). a Determine
whether the position of the phone 2 is' inappropriate. When it is determined that the position of
the first microphone 2 is inappropriate, the display unit 10 displays that effect. Thus, the speaker
can keep the position of the first microphone 2 with respect to the sound source 1 at the time of
speech recognition always in the same optimum position as at the time of speech registration
according to the display content of the display LIO. That is, even when there is a difference of
several days between voice registration and voice recognition device in voice recognition by a
specific speaker, voice input can be performed in the same manner as voice registration, and
caused by fluctuation of voice input state False positives can be eliminated. In the present
embodiment, the optimum position of the first microphone 2 at the time of speech recognition is
determined based on the power. That is, the present embodiment does not keep the distance of
the first microphone 2 to the sound source 1 optimum, but keeps one input state from the first
microphone 2 optimum. Therefore, this embodiment can be applied to the case where the
microphone at the time of voice registration and the microphone at the time of voice recognition
are different. Further, in the present embodiment, the value of the stable average power ratio can
be obtained by using the method of keeping the relative position between the voice source and
the first microphone at the ideal position described in the above embodiment at the time of voice
registration. You can ask for it. The display in the display F $ 10 in the above-described
embodiment of the valley may be a screen display by a CRT (Cathode Louis Tube) display, or the
earphone 261 shown in FIG. It may be an audio display. The algorithm for determining whether
or not the position of the first microphone 2 is inappropriate in the present invention is not
limited to the algorithm in each of the above embodiments.
As is apparent from the above, the speech recognition apparatus according to the first aspect of
the present invention is characterized in that the power of the output from the first microphone
in the vicinity of the speech source and the power of the second microphone located at a fixed
distance from the speech source. Since the power ratio calculation unit determines the power
ratio based on the power of the output, and based on the value of the power ratio, the
microphone position determination unit determines whether the position of the first microphone
relative to the sound source is good or not. It can be stably detected that the relative position to
the sound source is inappropriate, and the position of the microphone can be constantly
monitored. A voice recognition apparatus according to a second aspect of the present invention is
the voice recognition apparatus according to the first aspect, further comprising a memory for
storing the value of the average power ratio calculated by the power ratio calculating unit at the
time of voice registration. Since the determination unit determines the quality of the position of
the first microphone based on the value of the power ratio at the time of speech recognition and
the value of the average power ratio at the time of speech registration, the microphone seven
03-05-2019
6
speech sources at the time of speech recognition It is possible to stably detect that the relative
position between them is inappropriate and to keep the speech input state at the time of speech
recognition always the same as at the time of speech recognition. Further, in the speech
recognition apparatus according to the third invention, the position of the first microphone with
respect to the speech source in each speech recognition apparatus can be changed, and the
position of the first microphone is inappropriate by the microphone position determination unit.
When it is determined that the first microphone position change is displayed on the display unit,
it is possible to stably determine whether the position of the first microphone is good or not, and
also to display the display result of the display unit. The position of the first microphone can be
always optimally estimated according to
[0002]
Brief description of the drawings
[0003]
FIG. 1 is a block diagram of an embodiment of the speech recognition apparatus according to the
present invention, and FIG. 2 is a diagram showing an embodiment of the configuration of the
microphone in FIG.
DESCRIPTION OF SYMBOLS 1 ... audio ¦ voice source 2 ... 1st microphone, 3 ... 2nd microphone, 5
... feature extraction part, 6 ... control part, 7 ... standard pattern memory part, 8 ... Voice
recognition unit, 10 ... display unit.
03-05-2019
7