close

Вход

Забыли?

вход по аккаунту

JP2007228135

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007228135
PROBLEM TO BE SOLVED: To provide a motor-driven wheelchair that accurately listens to a usergenerated operation voice even during noise and controls based on the voice. SOLUTION: The
electric wheelchair is provided with voice input means provided with a plurality of sound
receiving means arranged to be separated from each other for receiving a plurality of user voices
by multi channels, and a multi channel received by the sound receiving means. It is characterized
in that it comprises speech position estimation means for estimating a speech position of a user
from speech data and outputting a speech position estimation signal, and control means for
controlling a drive source of a wheel based on the speech position estimation signal. [Selected
figure] Figure 1
VOICE POSITION ESTIMATING METHOD AND VOICE POSITION ESTIMATING DEVICE USING
THE SAME
[0001]
Independent movement to go where you want to go with your own intention is an important life
function for people. The realization of self-sustaining movement exerts a tremendous effect on
life and mental aspects when physical function is limited due to severe disability. In order to
support such independent movement, the present invention can emit sounds such as voice, "sue",
and whistling to some extent, and further, using the neck or upper body in the desired direction,
the sound source position such as the tip of the mouth The present invention relates to an
utterance position estimation method for estimating an utterance position and operating a
motorized wheelchair based on the information for a disabled person who can move an object, an
occurrence position estimation apparatus using the same, and an electrically operated
04-05-2019
1
wheelchair.
[0002]
Although there exist patent document 1 and patent document 2 grade ¦ etc., As a prior art
regarding the electric wheelchair which can be controlled by audio ¦ voice, all presuppose use of
a single microphone as an audio input device. Therefore, in such prior art, it is impossible to
detect the voice position of the user's voice, and in order to control the motor-driven wheelchair,
it is necessary to combine with the voice recognition technology. JP 2003-310665 A JP 6225910 A
[0003]
In the real environment where various environmental noises exist, when operating an electric
wheelchair by the interface using sound and voice, robustness against noise is indispensable. In a
conventional electric wheelchair controlled by voice input from a single microphone, a closetalking microphone such as a headset is widely used in order to suppress noise mixing. However,
the headset microphone needs to be worn each time the power wheelchair is used, and if it is
dislocated during use, it must be corrected by itself. This is not always practical, for example, for
disabled people who have difficulty moving their hands freely. Also, in the prior art that uses
voice recognition for wheelchair control, it is premised that the user can utter distinguishable
voices, but it is difficult to produce clear voices among disabled people with severe disabilities.
There is also. An object of the present invention is, in view of the above-mentioned drawbacks of
the prior art, to accurately estimate a voiced position for a user even during noise, and to control
the voiced position based on it. The device is to provide an electric wheelchair.
[0004]
The present invention adopts the following solutions in order to solve the above problems. (A)
The electric wheelchair includes voice input means for receiving user voice with a microphone
array, voice position estimation means for user voice robust against ambient noise from multichannel voice data, joystick and emergency stop button Control means for controlling the drive
means on the basis of the estimated voice position, the auxiliary operation means, the display
means for visually indicating the vocal position estimation result of the user's voice and the state
of the wheelchair, the drive means for driving the wheel of the wheelchair Characterized by
04-05-2019
2
having means. (B) The utterance position estimation method and the utterance position
estimation device for user speech are characterized by using a parallel microphone array. (C) The
electric wheelchair estimates and operates the voice position of the user's voice using the
microphone array, and is operated and controlled based thereon. Specifically, the following
means are adopted.
[0005]
(1) The voicing position estimation method: Observation using the position vector a (ω, P) of the
sound source at the coordinate P = (Px, Py, Pz) and the short time Fourier transform value of the
continuous frame of the microphone input as an element The noise subspace correlation matrix
Rn (ω) of the correlation matrix R (ω) determined from the vector is calculated, and the
following function F (P) is calculated to obtain coordinates P0 = (Px0, Py0) that maximizes the
function F (P) , Pz0), the maximum value F (P0) is determined from the coordinates, the power of
the sound source arriving from the coordinates is determined by the function P (P0) described
below, and the determined function F (P0) and the function P (P0) Is compared with a
predetermined threshold value, and it is determined that the sound has been generated at the
coordinate P0 when both values are equal to or greater than the predetermined threshold value.
(2) In the method of estimating a speech position according to (1), when obtaining the position
vector a (ω, P) of the sound source, a function g simply depends on the distance between the
sound source and the microphone r. The following equation is used as r). (3) The utterance
position estimation method according to the above (1) is characterized in that the number S of
sound sources necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained
by the following equation. (4) The utterance position estimation method according to (1)
described above, characterized in that the coordinates P0 = (Px0, Py0, Pz0) are displayed as an
image. (5) The generated position estimation apparatus is an observation made up of the position
vector a (ω, P) of the sound source at the coordinate P = (Px, Py, Pz) and the short time Fourier
transform value of the continuous frame of the microphone input The noise subspace correlation
matrix Rn (ω) of the correlation matrix R (ω) determined from the vector is calculated, and the
following function F (P) is calculated to obtain coordinates P0 = (Px0, Py0) that maximizes the
function F (P) , Pz0), the maximum value F (P0) is determined from the coordinates, the power of
the sound source arriving from the coordinates is determined by the function P (P0) described
below, and the determined function F (P0) and the function P (P0) Is compared with a
predetermined threshold value, and it is determined that the sound has been generated at the
coordinate P0 when both values are equal to or greater than the predetermined threshold value.
(6) In the utterance position estimation apparatus according to (5), when obtaining the position
vector a (ω, P) of the sound source, the function g simply depends on the distance r between the
sound source and the microphone. The following equation is used as r). (7) The utterance
position estimation device according to (5) above is characterized in that the number S of sound
04-05-2019
3
sources necessary for obtaining the noise subspace correlation matrix Rn (ω) is obtained by the
following equation.
(8) The utterance position estimation device according to (5) described above is characterized in
that the coordinates P0 = (Px0, Py0, Pz0) are displayed as an image. (9) The electric wheelchair
comprises: voice input means comprising a plurality of sound receiving means arranged to be
separated from each other for receiving a plurality of user voices by multi-channel; multi-channel
sound received by the sound receiving means It is characterized in that it comprises: utterance
position estimation means for estimating an utterance position of the user from the data and
outputting an utterance position estimation signal; and control means for controlling a drive
source of the wheel based on the utterance position estimation signal. (10) The electric
wheelchair described in (9) is characterized in that the voice input means includes microphones
respectively disposed on both sides of the user when the user is seated on the seat. (11) The
motor-driven wheelchair according to (10) is characterized in that the microphones are arranged
in parallel. (12) The electric wheelchair according to any one of the above items (9) to (11) is a
voice provided with sound receiving means comprising a plurality of microphone arrays
arranged to be separated from each other for receiving user voices. Auxiliary operation signals
are provided by input means, vocal position estimation means for estimating a vocal position of
the user based on multi-channel voice data received by the sound receiving means and
outputting a vocal position estimation signal, coordinate position designation means and a stop
button Auxiliary operation means for outputting, image display means for visually indicating the
vocal position estimation signal and the state of the wheelchair, drive means for driving and
controlling a drive source of the wheel of the wheelchair, the vocal position estimation signal and
the auxiliary operation signal And control means for controlling the drive means based on the
above.
[0006]
The use of a microphone array fixed to a wheelchair eliminates the need for the user to wear any
cords or equipment and facilitates the use of a powered wheelchair. For this reason, even if a
disabled person who can not move his hands freely is used, a practical electric wheelchair is
realized which does not require procedures such as mounting of a microphone and correction of
the position of the microphone. By using a parallel microphone array voice input device, it is
possible to estimate the user's vocalization position even if there are multiple disturbing noises in
the surroundings. And it becomes possible to control an electric wheelchair using the voiced
position. For this reason, the user does not have to utter a clear voice. For example, even if it is an
unclear voice, a friction sound of "sue", or even a whistling, such sound can be emitted to some
extent, and the purpose is further The electric wheelchair of the present invention can be used as
04-05-2019
4
long as it can move the sound source position such as the mouth by using the neck and upper
body in the direction.
[0007]
Embodiments of the present invention will be described in detail based on the drawings.
Hereinafter, an embodiment of the electric wheelchair of the present invention will be described.
In addition, the electric wheelchair shown below is one Embodiment of this invention, It is not
limited to the said embodiment. FIG. 1 is an external view of the electric wheelchair according to
the present embodiment, and FIG. 2 is a functional block diagram of the electric wheelchair
shown in FIG. As shown in FIG. 1, an electric wheelchair, for example, two rear wheels 36a (not
shown), 36b, two front wheels 35a, 35b, a seat 37 and a backrest 40, a backrest 40 installed
above the rear wheels 36a, 36b. The armrests 33a and 33b installed on both sides of the front
and the footrests 41a and 41b installed in front of the front wheels 35a and 35b. The display 31
is fixed to the armrest 33a, and the joystick 32 and the emergency stop button 34 are fixed to
the armrest 33b. The backrest 40 is fitted with an adjustment bar 39 provided with a mounting
bracket via posts 42a and 42b. A stand bracket is constituted by the columns 42a and 42b and
an adjustment bar 39 provided with a mounting bracket. Microphone mounting bodies 30c and
30d provided with microphone arrays 30a and 30b at their tip portions are slidably provided on
mounting hardware provided on the adjustment bar 39. The pair of microphone mounts 30c,
30d are arranged in parallel, have a length that extends from behind the user, over both
shoulders and beyond the user's mouth, and allows microphones to be placed thereon. In
addition, as shown in FIG. 2, in the electric wheelchair, for example, a parallel microphone array
composed of two microphone arrays 30a and 30b, a microphone amplifier and an ADC (analog /
digital converter) 61 according to the present invention Input means, display 31 is display means
of the present invention, CPU (central processing unit) board 63, storage device 64 is control
means of the present invention, drive control 65, drive motor 67 is drive means of the present
invention, joystick 32 or emergency Operation switches 66 such as the stop button 34
respectively correspond to the operation means of the present invention. The CPU 63 and the
drive control 65 are connected by a serial cable 69.
[0008]
Parallel Microphone Array Voice Input Device The voice input means comprises sound receiving
means comprising a plurality of microphone arrays spaced apart from one another for receiving
user voice. The configuration of the parallel microphone array voice input device shown in FIGS.
1 and 2 will be described below. As shown in FIG. 1, the two metal fittings 30a and 30b for
04-05-2019
5
attaching the microphones have one end fixed to the metal fitting 39 of the stand, parallel at any
distance, for example 37 cm, from behind to both shoulders of the user It has a length that
extends beyond the mouth, and an arbitrary number, for example, four microphones (eight in
total) are arranged at arbitrary intervals, for example, 3 cm, on each of the left and right
brackets. For vibration while traveling, insert a shock absorber under the vibration control
mechanism, for example, two columns 42a and 42b, and attach a movable bracket up and down
at the center of the bracket that connects the two columns, A mechanism is used that absorbs
upper and lower vibrations by putting a spring in it. Also, if necessary, by inserting a spring on
both the left and right sides of the vertically movable bracket, it absorbs the vibration in the
lateral direction. The height and width of the microphone stand and the position of the
microphone can be adjusted for each user. As shown in FIG. 2, the voice input means includes
parallel microphone arrays 30 a and 30 b and a microphone amplifier and an ADC (analog /
digital converter) 61. The sound receiving means comprises at least a plurality of microphones,
preferably a microphone array in which a large number of microphones are arranged in an array.
Also, the arrangement directions of the microphones are at least mutually separated so that the
vectors from the sound source are different. More preferably, the microphones are arranged on
both sides of the user. Such placement on both sides of the user makes the voice input of the
user easy and clear.
[0009]
(Speech position estimation means and control means) The CPU (central processing unit) board
68 is formed of a board on which the CPU is mounted, and includes speech position estimation
means and control means. The utterance position estimation means and the control means
include a storage device 64 connected to the CPU board 68. The utterance position estimation
means estimates a user's utterance position based on the multi-channel voice data received by
the sound reception means, and outputs an utterance position estimation signal. The control
means controls the drive means based on the utterance position estimation signal and the
auxiliary operation signal. The ADC 61 and the CPU board 63 are connected via the USB cable
68, and the microphone amplifier and the power of the ADC 61 are supplied from the CPU board
63. The sampling rate can be set arbitrarily, for example 8 kHz, and the number of quantization
bits can be set arbitrarily, for example 16 bits. To increase the processing accuracy, increase the
sampling rate and the number of quantization bits.
[0010]
(Auxiliary Input Means) The auxiliary operation means is represented by the operation switch 66,
and outputs, for example, an auxiliary operation signal by means of coordinate position
designation means comprising a joystick (not shown) and an emergency stop button (not shown).
04-05-2019
6
[0011]
(Image Display Means) The image display means has a display 31 and visually indicates the
voiced position estimation signal and the state of the wheelchair.
[0012]
(Drive Means) The drive means includes a drive control device 65, and drives and controls a
drive motor 67 which is a drive source of a wheel of a wheelchair.
[0013]
(Speech position estimation) A speech position estimation process using an input signal from a
speech input device provided with a plurality of sound receiving means by the speech position
estimation means will be described below.
3Sound signals output from point sources placed at arbitrary positions in the dimensional space
are received by Q microphones arranged at arbitrary positions in the three-dimensional space.
The distance Rq between the point sound source and each microphone can be obtained by the
following equation.
The propagation time τq from the point sound source to each microphone can be obtained by
the following equation, where the sound velocity is v.
[0014]
The gain gq for the point source of the narrow band signal of the center frequency ω received by
each microphone is generally defined as a function of the distance Rq between the point source
and the microphone and the center frequency ω. The transfer characteristic between the point
sound source and each microphone with respect to the narrow band signal of the center
frequency ω is expressed as Then, a position vector a (ω, P0) representing the sound source at
04-05-2019
7
the position P0 is defined as a complex vector having as an element the transfer characteristic
between the point sound source and each microphone related to the narrow band signal as in the
following equation.
[0015]
In the voicing position estimation, the signal subspace and the noise subspace are determined by
eigen value decomposition of the correlation method using the MUSIC method (the correlation
matrix is determined, and the inverse of the inner product of the arbitrary sound source position
vector and the noise subspace is determined. Perform the following procedure using the method
to be examined. The short-time Fourier transform of the q-th microphone input is represented by,
and this is used as an element to define an observation vector as follows. Here, n is an index of
the frame time. The correlation matrix is determined by the following equation from N
continuous observation vectors.
[0016]
Let the eigenvalues of this correlation matrix arranged in descending order be と, and let
eigenvectors corresponding to each be. Then, the number of sound sources S is estimated by the
following equation. Define the noise subspace correlation matrix R n (ω) as
[0017]
As a frequency band and a voicing region U, calculate and determine coordinates that maximize
the function F (P).
[0018]
Next, the power of the sound source coming from the above coordinates is estimated by
Then, two threshold values Fthr and Pthr are prepared, and when the following condition is
satisfied, it is determined that an utterance has occurred at coordinates P = (Px, Py, Pz) in N
consecutive frame times. The utterance position estimation process processes N consecutive
frames as one block. In order to perform the voicing position estimation more stably, it is
04-05-2019
8
determined that there is a voicing if the number of frames N is increased and / or all the
consecutive Nb blocks satisfy the condition of equation 20. The number of blocks is set
arbitrarily. As the number of blocks increases, the accuracy generally tends to improve.
[0019]
Below, as a specific example, as shown in FIG. 3, the case where eight microphones are arrange ¦
positioned in parallel on a plane is demonstrated. The position vector a (ω, Px, Py) of the sound
source at the coordinates (Px, Py) is expressed by the following equation. Here, m represents a
microphone array number (right = 1, left = 2), and i represents a microphone number. Here, as a
function g (r) whose function of gain is simply dependent on the distance r between the sound
source and the microphone, for example, a function such as the following equation obtained
experimentally is used.
[0020]
The above function can be used even if the microphone arrays are not parallel. Speech position
estimation is performed according to the following procedure. The short-time Fourier transform
of the (m, i) -th microphone input is represented by and the observation vector is defined as
follows using this as an element.
[0021]
Here, n is an index of the frame time. The correlation matrix is determined by the following
equation from N continuous observation vectors. Let the eigenvalues of this correlation matrix
arranged in descending order be と, and let eigenvectors corresponding to each be. Then, the
number of sound sources S is estimated by the following equation.
[0022]
The matrix R n (ω) is defined as follows, and the following is calculated as a frequency band and
a voicing region, and the coordinates that maximize the function F (Px, Py) are obtained.
[0023]
04-05-2019
9
Next, the power of the sound source coming from the above coordinates is estimated by
[0024]
Then, two threshold values Fthr and Pthr are prepared, and when the following condition is
satisfied, it is determined that an utterance has occurred at coordinates (Px0, Py0) in N
consecutive frame times.
The utterance position estimation process processes N consecutive frames as one block.
In order to perform the voicing position estimation more stably, it is determined that there is a
voicing if the number of frames N is increased and / or all of the consecutive Nb blocks satisfy
the condition of equation 35. The number of blocks is set arbitrarily. As the number of blocks
increases, the accuracy generally tends to improve.
[0025]
(Control Method of Electric Wheelchair) A control method of the electric wheelchair using the
utterance position estimation means by the above-mentioned parallel microphone array voice
input device will be described below. FIG. 4 is a flowchart for explaining the operation example.
FIG. 5 is a layout example of the electric wheelchair control interface displayed on the display 31.
In this embodiment, the voiced area represented by the above equation 12 is divided into four
areas as shown in FIG. 5, and four types of wheelchairs, ie, forward 100, right turn 101, left turn
102, and stop 103, are shown in each area. Allocate the behavior of The user creates a posture
using the upper body such as the neck so that the position of the sound source, such as the
mouth, is in the area corresponding to the desired motion of the wheelchair. And not only a voice
but by issuing various sounds, such as a whistling and a frictional noise, an operation is
instructed to a wheelchair.
[0026]
The apparatus according to the present invention controls the electric wheelchair according to
the following procedure for the user's utterance. Step 1: 80 Data of N frames (one block) worth of
04-05-2019
10
sound emitted by the user are input from the parallel microphone array voice input device. Step
2: 81 The user's utterance position (Px0, Py0) is determined by the aforementioned utterance
position estimation means. Step 3: If the condition of Expression 16 is satisfied, it is determined
that there is an utterance, and the process proceeds to Step 4. If not, it is judged that there is no
utterance, and the process returns to step 1. Step 4: 83 In the following procedure, it is checked
which region in FIG. 5 the utterance position (Px0, Py0) corresponds to. First, two functions are
defined as follows.
[0027]
If the following condition is met, it is determined that the vehicle is moving forward 100. When
the following condition is satisfied, it is determined that the vehicle turns right 101. When the
following condition is satisfied, it is determined that the left turn 102 is made.
[0028]
When the following condition is satisfied, it is determined as stop 103. Step 5: 84 In the layout of
FIG. 5 displayed on the display 31, the result of specifying the operation is displayed by inverting
the color of the area corresponding to the operation of the motor-driven wheelchair specified
from the vocal position. Then, by transmitting a control signal from the CPU 63 to the drive
control 65, the electric wheelchair is controlled to perform a target operation. Thereafter, the
process returns to step 1. The electric wheelchair of the present embodiment transmits the
control signal to the drive control 65 from the control means of the present invention configured
by the CPU 68 and the storage device 64, and directly controls the drive from the operation
switch 66 including the joystick 34 and the emergency stop button 32. It is possible to control
65.
[0029]
Sitting on the seat of the electric wheelchair of the present embodiment equipped with a parallel
microphone array, an experiment is made to generate frictional noise while moving the tip of the
mouth so as to draw a circle twice in the microphone array and estimate the movement of the tip
with the device of the present invention Did. The estimation of the voicing position was
performed on a grid of 1 cm intervals with the frequency limited to 2 to 4 kHz and the voicing
area as [cm]. The frame width of FFT is 64 ms, the frame period is 12.5 ms, and the correlation
04-05-2019
11
matrix is obtained from data of N = 15 frames. FIG. 6 shows the result of speech position
estimation in the absence of disturbing noise in the surroundings. The illustrated region indicates
the utterance region of the experimental condition, and the intersection of the horizontal axis and
the vertical axis indicates the grid of the utterance position estimation. Also, the dot placed on
the grid indicates the detected utterance position. Next, a speaker is installed toward the center
of the microphone array at a distance of 1.2 m from the center of the microphone array in the
direction of the right 60 degrees in the traveling direction, and a disturbance sound (television
sound) is emitted, and the disturbance sound Only record. Then, noise on the computer is added
to the friction noise previously recorded in the absence of the disturbance noise, by adding the
disturbance noise whose level has been adjusted so that the SNR of the signal received by the
microphone to be approximately 0 dB. Artificially generate data recorded under the environment.
FIG. 7 shows the result of speech position estimation from data in a noise environment.
Comparing the result of detecting the utterance position in the absence of disturbing noise in
FIG. 6 with the result of estimating the utterance position in the presence of disturbing noise in
FIG. It can be seen that, as the user intended, the trajectory of the tip of the mouth moved is
circular. From these, it can be seen that the device of the present invention can estimate the
vocalization position with sufficient accuracy to control the electric wheelchair. The vertical and
horizontal axes in FIGS. 6 and 7 represent the distance (in cm) from the tip of the microphone
array.
Industrial Applicability
[0030]
Not only electric wheelchairs, but also heavy equipment such as cranes and shovels that have
large noises and can not be used because of complicated operations, and various home
appliances such as TVs and videos by installing microphone arrays on living sofas etc. It can be
used in a situation that requires operation by sound and voice in an environment where there are
various other noises and further vibration, such as an application that operates as an interface to
operate the.
[0031]
1 is an overview of a motorized wheelchair equipped with a parallel microphone array voice
input device.
It is a functional block diagram of the electric wheelchair shown in FIG. It is a layout of a parallel
04-05-2019
12
microphone array audio ¦ voice input device. It is a flowchart for demonstrating an operation
example. It is an example of a layout of an electric wheelchair control interface. It is the result of
performing a vocal position detection in the state where there is no disturbing noise around. It is
the result of performing a vocal position detection in the state where there is an interference
noise in the surroundings.
Explanation of sign
[0032]
30a, 30b Microphone array 30c, 30d Microphone mounting body 31 Display 32 Joystick 33a,
33b Armrest 34 Emergency stop button 35a, 35b Front wheel 36a, 36b Rear wheel 39
Adjustment bar 40 Backrest 41a, 41b Footrest 42a, 42b Support 61 Microphone and ADC 63
CPU 65 Drive control means 66 Operation switch 67 Drive motor
04-05-2019
13
1/--страниц
Пожаловаться на содержимое документа