JP2004233794

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004233794
An object of the present invention is to make it possible to control start and stop of speech
recognition processing with certainty by simple operation. A voice recognition start request
receiving unit accepts an instruction input from a voice recognition control button when a voice
recognition process is not performed as a voice recognition process start request. The control
unit 10 starts the speech recognition process in response to the received start request. Further,
the voice recognition interruption request receiving unit 16 receives an instruction input from
the voice recognition control button 5 while executing the voice recognition process as a request
for stopping the voice recognition process. The control unit 10 stops the speech recognition
process in response to the received stop request. [Selected figure] Figure 2
Speech recognition apparatus and speech recognition method
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
speech recognition apparatus for recognizing input speech and a speech recognition method.
[0002] Conventionally, as a device for inputting a command (operation command) for operation
control by voice, for example, there is a music playback device which wirelessly connects an
audio player and headphones (for example, Patent Document 1) . The music player uses the
headphone microphone and voice recognition technology to operate the headphone as a remote
control of the audio player. When the headphone is operated as a remote control, predetermined
operation commands (commands) such as "play" and "next music" are uttered and input from the
microphone, and the input audio data is transmitted to the audio player. The audio player
interprets audio data sent from headphones in an audio command interpretation unit, and
controls the player's operation according to the interpreted content. Generally, as a method of
performing voice recognition processing based on a voice recognition start request by user
operation, there are, for example, button push type and push talk type voice recognition. [0005]
10-05-2019
1
The button push type voice recognition is adopted, for example, in a car navigation system
commercially available, and the user presses the voice recognition processing start button and
then makes the voice speak, and the voice of the input voice is voiced. It is a method of executing
recognition processing. In this method, in order to stop the speech recognition process, the
speech recognition stop button provided separately from the speech recognition process start
button is pressed. In addition, push-to-talk type voice recognition is a method of performing voice
recognition on voice input while a button is pressed by keeping the button pressed down while
voice input is being performed by the user. In this method, the voice recognition process is
stopped when the pressing of the button is stopped. [Patent Document 1] Japanese Patent
Application Laid-Open No. 2002-112383 (FIG. 9, paragraphs 0046 to 0049) As described above,
according to the related art, speech recognition processing (speech input) The start and stop
instructions were input by button operation. By the way, when the voice recognition device is
mounted on a portable portable device such as a headphone (for example, Patent Document 1),
the mounting of a user interface such as a button is limited due to the limitations on the size and
performance of the device.
Therefore, it may be difficult to implement a button for instructing the start of the speech
recognition process (speech input) and the stop of the speech recognition process. Further, even
if the voice recognition device is configured as a headphone and a button is mounted, the button
is mounted on a main body case (a part to be in contact with the ear of the headphone) in which
the speaker is housed. In this case, depending on the mounting position of the button, it may not
be in the field of view of the user, and it is necessary to operate the button by groping. Therefore,
there is a possibility that the user's erroneous operation is likely to occur. Furthermore, even if
the button operation is performed, the user can confirm that the voice recognition process is
activated according to the instruction input from the button, and that the input voice can be
correctly recognized. In addition, there was a possibility that the timing of the start of utterance
might be mistaken. In particular, in the case of push-to-talk voice recognition, it is considered
appropriate to make an utterance after pressing a button, but in practice the user speaks
immediately before pressing the button or at the same time as pressing the button. May start. In
such a case, the voice recognition process can not correctly recognize the voice uttered by the
user. The present invention has been made in consideration of the above circumstances, and
provides a voice recognition device and a voice recognition method capable of reliably
controlling start and stop of voice recognition processing by a simple operation. The purpose is
to According to the present invention, there is provided an instruction input unit for inputting an
instruction to control speech recognition processing in a speech recognition apparatus for
performing speech recognition processing on input speech. Voice recognition start request
receiving means for receiving an instruction input from the command input means as a start
request for voice recognition processing when the voice recognition process is not executed, and
start received by the voice recognition start request receiving means A first control means for
starting speech recognition processing in response to a request; and an instruction input from
10-05-2019
2
the instruction input means when the speech recognition processing started by the first control
means is being executed, the speech recognition processing Means for accepting voice
recognition stop request accepted as a request for stopping the voice recognition process, and
the voice recognition processing is suspended in response to the request for stop accepted by the
means for accepting voice recognition stop request. That consists of the second control means.
Further, according to the present invention, in a voice recognition apparatus for performing voice
recognition processing on input voice, processing means for executing processing other than the
voice recognition processing, and processing for instructing execution of processing by the
processing means Execution instruction input means, speech recognition stop request receiving
means for receiving an instruction input from the process execution instruction input means
when the speech recognition process is being executed as a stop request for speech recognition
process, and the speech recognition stop And control means for stopping the speech recognition
process in response to the stop request received by the request receiving means.
DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention will be
described below with reference to the drawings. FIG. 1 is a view showing an example of an
appearance configuration in the case where a voice recognition device in the present
embodiment is configured as a headset 1. FIG. 1 shows a state in which the headset 1 is attached
to the head of a person. The headset 1 has two speaker units 2 provided corresponding to the
right side and the left side respectively housed in, for example, a circular case main body, and the
two main body cases are human head It is connected by a connecting member which is curved in
accordance with the shape of the part. In the connecting member, there are also stored, for
example, lines connecting various components provided in the left and right main body cases. A
microphone support member, to which the microphone 3 is attached at its tip, is attached to the
main body case accommodating the speaker unit 2 for the left side. The microphone support
member is configured to have a length and a shape located near the mouth when the microphone
3 at the tip end portion is attached to the headset 1. The main case housing the speaker unit 2
for the right side is provided with a power switch 4 for turning on / off the power of the device,
and a voice recognition control button 5 for inputting an instruction to control voice recognition
processing. It is done. In the headset 1 (speech recognition device) of the present embodiment,
the start and stop of the speech recognition process are controlled by an instruction input from
one speech recognition control button 5. Further, in the main body case accommodating the left
speaker unit 2, an LED (Light Emitting Diode) 6 is provided for notifying the user that the voice
recognition process has started by, for example, blinking. In addition, an antenna 7 used when
performing wireless communication with another device (for example, an audio player, etc.) is
connected to a connecting member for connecting the main body case on both left and right
sides, for example, around the center (headset 1 is attached It is provided near the top of the
hour). In addition to the voice recognition control button 5, only the power switch 4 is provided
in the main body case, but a button for instructing execution of a process other than the voice
recognition process can be provided. For example, when using the headset 1 as a remote
10-05-2019
3
controller of an audio player connected by wireless communication, buttons (not shown) such as
fast forward (FF), rewind (REQ), play (PLAY), stop (STOP), etc. Can be provided.
FIG. 2 is a block diagram showing a functional configuration of the headset 1 shown in FIG. As
shown in FIG. 2, the headset 1 (voice recognition device) includes the control unit 10 in addition
to the speaker unit 2, the microphone 3, the power switch 4, the voice recognition control button
5, the LED 6 and the antenna 7 shown in FIG. A voice recognition processing unit 12, a voice
recognition start request receiving unit 14, a voice recognition interruption request receiving
unit 16, and a wireless communication unit 18 are provided. The control unit 10 controls the
headset 1 (speech recognition device), and executes various programs stored in the memory by
the CPU to control the respective units to realize various functions. The control unit 10 causes
the voice recognition processing unit 12 to start the speech recognition process in response to
the start request accepted by the voice recognition start request acceptance unit 14, and
responds to the stop request accepted by the voice recognition interruption request acceptance
unit 16. Control to stop the speech recognition process. In addition, the control unit 10 causes
the voice recognition processing unit 12 to start voice recognition processing, and when it
becomes possible to perform voice recognition on the input voice, the control unit 10 notifies the
user of the fact. Control to notify by a voice (message) output from or the blinking display (or
lighting) on the LED 6. The voice recognition processing unit 12 executes voice recognition
processing under the control of the control unit 10, recognizes a voice input from the
microphone 3, and converts the contents of the user's utterance into data. For example, when the
headset 1 is used as a remote controller of an audio player and a control command is input by
voice, recognition of predetermined command voice (for example, fast forward, rewind, play,
stop, etc.) Can. The voice recognition start request receiving unit 14 receives an instruction input
from the voice recognition control button 5 when the voice recognition processing is not
performed by the voice recognition processing unit 12 as a start request for voice recognition
processing, and performs control. It has a function of notifying the section 10. The voice
recognition interruption request receiving unit 16 receives an instruction input from the voice
recognition control button 5 while executing the voice recognition processing unit 12 as a
request to stop voice recognition processing and notifies the control unit 10 Have the ability to
Further, the voice recognition interruption request receiving unit 16 instructs the execution of
processing other than the voice recognition processing to be performed from the power switch 4,
for example, from the power switch 4 while the voice recognition processing is being performed.
An instruction to turn off the power or an instruction to fast forward, rewind, play, stop, etc.
when used as a remote controller is received as a request to stop voice recognition processing
and notified to the control unit 10.
The wireless communication unit 18 controls communication with other devices under the
control of the control unit 10, and realizes wireless communication by transmitting and receiving
10-05-2019
4
a wireless signal through the antenna 7, for example. When the headset 1 is used as a remote
controller of an audio player, the wireless communication unit 18 wirelessly receives music data
from the audio player, and is recognized by an instruction input from an operation control button
or voice recognition processing. Control data indicating an instruction by the command voice is
transmitted to the audio player. The wireless communication unit 18 performs wireless
communication using, for example, Bluetooth. Bluetooth is a short-range wireless communication
standard, and is a person who realizes wireless communication within 10 meters by hand using
an ISM (Industry Science Medical) band of 2.4 GHz band. Bluetooth uses frequency hopping as a
spread spectrum technology, and up to eight devices can be connected by time division
multiplexing. Next, the control operation (start / stop) of the speech recognition process of the
headset 1 (speech recognition device) according to the present embodiment will be described
with reference to the flowchart shown in FIG. Here, the headset 1 receives and reproduces music
data by wireless communication with an audio player (not shown), and outputs it from the
speaker unit 2 and is used as a remote controller for controlling the operation of the audio
player. The case will be described as an example. When controlling the operation of the audio
player, the headset 1 inputs a control command by voice, for example, a predetermined
command voice (fast forward, rewind, play, stop, etc.), and recognizes it by voice recognition
processing. can do. When a control command is to be input by voice input, the user presses the
voice recognition control button 5 (step A1). When the voice recognition control button 5 is
pressed, the voice recognition start request receiving unit 14 and the voice recognition
interruption request receiving unit 16 determine whether the voice recognition processing is
performed by the voice recognition processing unit 12 (step A2). If it is determined that the
speech recognition process is not being performed (No at Step A2), the speech recognition
process start request receiving unit 14 determines whether it is a valid speech recognition start
request (Step A4). . That is, the voice recognition start request receiving unit 14 determines
whether the voice recognition control button 5 has been pressed for a predetermined time or
more when the voice recognition processing start request continues for a predetermined time or
more.
Here, when it is determined that the request is a valid voice recognition start request (Yes in step
A 4), the voice recognition start request receiving unit 14 voice-recognizes the instruction input
from the voice recognition control button 5. Accept as a start request and notify the control unit
10. The control unit 10 activates the voice recognition processing unit 12 in response to the
notification from the voice recognition start request receiving unit 14 to start voice recognition
processing. Further, when the voice recognition processing unit 12 enables the voice recognition
processing, the control unit 10 notifies the user that the voice recognition processing has become
possible using the speaker unit 2 or the LED 6. For example, a predetermined sound or a
message voice is output from the speaker unit 2. In addition, the LED 6 notifies by blinking (step
A5). The notification from the speaker unit 2 and the LED 6 may be either one. In this case, it is
assumed that the control unit 10 is made to be able to set in advance which notification to be
10-05-2019
5
made in accordance with an instruction from the user. Moreover, it is also possible to use
another display form in which the LED 6 not only blinks but also lights up or changes the
blinking interval at the time of blinking. On the other hand, if the duration time of the voice
recognition process start request from the user is less than a predetermined value, ie, the time
for which the user keeps pressing the voice recognition control button 5 is shorter than a
predetermined value. The voice recognition start request receiving unit 14 invalidates the start
request and does not start the voice recognition process (No in step A4). This avoids the
activation of the speech recognition process due to the user's erroneous operation. The user
presses the voice recognition control button 5 to make a voice recognition start request, but
since the headset 1 is mounted on the head, the voice recognition control button 5 does not exist
in the user's field of view, In order to find out, the voice recognition control button 5 must be
depressed. Even in such a situation, when the voice recognition process start request from the
user continues for a predetermined time or more, the voice recognition process is started as a
valid voice recognition start request, so the voice recognition control button Erroneous operation
on 5 can be avoided. Further, the user can recognize that the voice recognition process has been
started, that is, that the voice start has become possible by the voice from the speaker unit 2 or
the blinking of the LED 6.
The user can clearly distinguish the start request and the start of the speech without confusion
because the timing when the start of the speech recognition start request is different from the
timing when the start of the speech is notified. it can. That is, by the speech recognition process
for the input speech, it is possible to correctly recognize the speech content (command speech)
which the user intended and intended. The user receives the notification that the user can start
speaking and starts speaking. The user's speech is input through the speech input unit 12, and
speech recognition processing is performed by the speech recognition processing unit 15 in an
operable state. The speech recognition processing unit 12 converts the input speech into data
representing a control command as a result of speech recognition processing on the input speech
and notifies the control unit 10 of the data. The control unit 10 controls the operation of the
audio player by transmitting data representing the control command obtained by the voice
recognition processing unit 12 to the audio player through the wireless communication unit 18.
Thus, after the voice recognition processing unit 12 starts voice recognition processing, when the
voice recognition control button 5 is pressed (step A1), the voice recognition start request
receiving unit 14 and the voice recognition interruption request receiving unit 16 In step A2, it is
determined whether the speech recognition process is being performed by the speech
recognition processing unit 12 (step A2). When it is determined that the speech recognition
process is in progress, the speech recognition suspension request reception unit 16 receives an
instruction input from the speech recognition control button 5 as a speech recognition
suspension request and notifies the control unit 10 (Step A2, No). The control unit 10 stops the
speech recognition processing by the speech recognition processing unit 12 in response to the
notification from the speech recognition interruption request reception unit 16 (step A3). Thus,
10-05-2019
6
not only speech recognition processing can be started from one speech recognition control
button 5, but speech recognition processing can be stopped. As a result, even when the user
interface is limited due to the size and performance limitations of the headset 1 (voice
recognition device), the voice recognition control button 5 is input when performing voice
recognition processing. The instruction can be accepted as a speech recognition stop request.
Since the user only needs to remember the position of the voice recognition control button 5 as a
button for controlling voice recognition processing, operability can be improved. Next, operation
control for stopping the speech recognition process by an operation other than the speech
recognition process will be described with reference to the flowchart shown in FIG.
In the operation control shown in the flowchart of FIG. 3, the operation of the speech recognition
process is stopped by the speech recognition control button 5, but the speech recognition
process can also be stopped by an instruction to execute a process other than the speech
recognition process. Do. The voice recognition interruption request receiving unit 16 receives an
instruction to execute some process from other than the voice recognition control button 5 when
the user operates a button or the like provided on the headset 1 (step B 1). In this case, it is
determined whether the speech recognition process is being performed by the speech
recognition processing unit 12 (step B2). For example, there are an instruction to turn off the
power from the power switch 4 and an instruction to fast forward, rewind, play, stop, etc. when
the headset 1 is used as a remote controller. Here, when the speech recognition process is not
executed (step B2, No), the speech recognition suspension request reception unit 16 does not
execute any process. For example, when the power switch 4 is operated, the control unit 10
performs a process for turning off the power, or when the buttons such as fast forward, rewind,
play, and stop are operated. Control data corresponding to these commands is transmitted to the
audio player (step B4). On the other hand, when execution of processing other than the voice
recognition processing is input from a button other than the voice recognition control button 5,
the voice recognition interruption request reception unit 16 performs the voice recognition
processing (step B2, Yes), accepts the input instruction as a request for stopping the speech
recognition process, and notifies the control unit 10 of it. The control unit 10 stops the speech
recognition process by the speech recognition processing unit 12 in response to the notification
from the speech recognition interruption request reception unit 16 (step B3). After that, the
control unit 10 executes the process assigned to the original operation according to the input
instruction (step B4). In this manner, it is possible to stop the voice recognition process not only
by the operation on the voice recognition control button 5 but also by an operation for executing
another process. In this case, when it is desired to stop the voice recognition process being
processed and perform another operation, the voice recognition stop is automatically performed
by performing another operation other than the voice recognition without separately making a
voice recognition stop request. It takes place in This can reduce the time and effort of the user
operation. In the above description, although the example in which the headset 1 is used as a
remote controller for an audio player is given, the audio reproduction function is mounted on the
10-05-2019
7
main body of the headset 1 to control the operation of the audio reproduction function. Speech
recognition processing may be used.
The audio reproduction function is provided with, for example, a slot for mounting an SD (Secure
Digital) card, and reads and reproduces music data recorded on the SD card inserted in the slot.
In addition, although the case where voice recognition processing is performed for command
voice for controlling the operation of the audio player is described, the present invention is also
applied to the case where voice recognition processing is performed for input voice other than
command voice. Of course it is possible. Further, in the above-described embodiment, when the
voice recognition process is started, the user is notified by the blinking display by the LED 6 that
the input of the command voice has become possible. A display device using an LCD (Liquid
Crystal Display) or the like may be provided in the same main body case as the speaker unit 2
and the notification may be given by displaying a predetermined message on this display device.
In addition, the voice recognition device according to the present invention is configured as a
head mounted display provided with a microphone, and the display device (display) displays
various messages in addition to a message notifying that voice recognition processing has
started. It is good. In the above description, the LED 6 for notifying that the voice recognition
process has started is provided on the main body case (for example, for the left side), but is
provided on the tip of the microphone support member together with the microphone 3 You may
do so. As a result, the LED 6 can be reliably put in the field of view of the user, and the LED 6 can
be blinked (or lit) to surely notify that the voice recognition process has started. Further, the
method described in the above-described embodiment may be, for example, a magnetic disk
(flexible disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, etc. as a
program that can be executed by a computer. It can be written to a recording medium and
provided to various devices. Moreover, it is also possible to transmit by a communication
medium and to provide to various apparatuses. A computer for realizing the present apparatus
reads a program recorded in a recording medium, or receives a program via a communication
medium, and executes the above-mentioned processing by controlling the operation by this
program. The present invention is not limited to the embodiments described above, and can be
variously modified in the implementation stage without departing from the scope of the
invention.
Further, the embodiments include inventions at various stages, and various inventions can be
extracted by appropriate combinations of a plurality of disclosed constituent requirements. For
example, in the case where some configuration requirements are removed from all the
configuration requirements shown in the embodiment, a configuration from which this
configuration requirement is removed can be extracted as the invention. As described above in
detail, according to the present invention, the speech recognition process is started or stopped
only by the input from the instruction input means for inputting the instruction for controlling
10-05-2019
8
the speech recognition process. It becomes possible to control the speech recognition process by
a simple operation. Further, since the start request is accepted when the instruction is
continuously input for a predetermined time from the instruction input unit, reliable control is
realized. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a view showing an example of an
appearance configuration in the case where a voice recognition device in the present
embodiment is configured as a headset. FIG. 2 is a block diagram showing a functional
configuration of the speech recognition device in the present embodiment. FIG. 3 is an operation
explanatory flowchart of speech recognition start / stop; FIG. 4 is an operation explanatory
flowchart of speech recognition stop by an operation other than speech recognition. [Description
of the code] 1 ... headset, 3 ... microphone, 4 ... power switch, 5 ... voice recognition control
button, 6 ... LED, 7 ... antenna, 10 ... control unit, 12 ... voice recognition processing unit, 14 ...
voice recognition Start request reception unit, 16 ... speech recognition stop request reception
unit, 18 ... wireless communication unit.
10-05-2019
9