Patent Translate
Powered by EPO and Google
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to
three-dimensional reproduction of sound in a three-dimensional three-dimensional image display
device represented by a computer graphics device (hereinafter referred to as CG).
2. Description of the Related Art In the field of virtual reality (VR) experience, a technology for
causing a user to experience a VR experience using a three-dimensional three-dimensional image
created by CG or the like has been a topic in the past. . By the way, in this VR experience
machine, although the image can be displayed stereoscopically, the audio is, for example,
monaural audio in the example of a personal computer, and stereo reproduction of headphones
in the case of using a glasses-type display Both are voices, and only flat voices can be obtained,
and there is a problem that the position of the sound source corresponding to the threedimensional position of the object on the screen can not be reproduced.
In the literature, an apparatus which gives a good sound image position equally to a large
number of listeners using a large number of speakers is only disclosed in JP-A-4-56500 (H04S
1/00). .
SUMMARY OF THE INVENTION The present invention has been made in view of the abovementioned problems of the prior art, and provides an apparatus for generating sound from a
position in a three-dimensional space corresponding to a three-dimensional position of an image.
According to the present invention, there is provided a three-dimensional image display
apparatus for displaying an image on a monitor screen based on image information including
position information and audio signal information, and the image information from the image
display apparatus. A sound image for controlling the sound generator such that sound is emitted
from at least two sound generators that generate sound of an arbitrary image received and sound
space corresponding to the three-dimensional position of the image It consists of control means.
[Operation] By generating sound of different direction and strength from each sound generator,
for example, in the image displayed on the screen, a small bird is singing in the far right hand, a
puppy is broken in the left hand rear, and the eye Assuming a scene where the river flows in
front, the voice also sounds like a small bird singing from the front of the user's right hand, a
dog's cry from the rear left hand, and the user's voice of the river from near the front. I can hear
reproducing apparatus according to the present invention will now be described with reference
to the drawings.
FIG. 2 is a diagram for explaining the principle of sound image control of the 2ch system prior to
the present invention.
In the figure, O is a sound source, KL and KR are transfer functions from the sound source O to
the left and right ears of the dummy head U, HLL, HLR, HRL and HRR are transfer from the left
and right speakers SL and SR to the left and right ears. The functions EL and ER are microphone
signals for the left and right of the dummy head U, TL and TR are correction filters (digital filters)
for the left and right, and Input is a speaker control signal input terminal.
Here, it is assumed that the voice is monaural voice, and the transfer function HLL from the left
speaker SL to the left ear is the time response of the left microphone of the dummy head U when
a unit impulse is applied to the left speaker at time t = 0. Means.
Now, consider a state in which both ears are listening to the sound of the sound source O and a
state in which the two speakers SL and SR are listening to the sound.
When the sound of the signal S is emitted from the position of the sound source O, the signals of
the left and right ears are
When the input signal S is reproduced from the two speakers SL and SR through the filters TL
and TR, the signals at the left and right ears are
If the coefficients of the filters TL and TR are obtained from the above equations 1 and 2,
From this, when sound is generated from the speakers SL and SR using the filters TL and TR
having coefficients of Equation 3, it sounds as if there is a sound source at the position of O.
Next, a system configuration of an embodiment in which the present invention is applied to CG is
shown in FIG.
In the figure, 1 is a CG display, 2 is a central processing unit (hereinafter referred to as a CPU) for
creating an image signal to be output to the display 1, 3 is a filter processing of audio signals
received from the CPU 2 and left and right speakers SL. , SR are sound image control means.
The sound image control means, as shown in FIG. 2, outputs to the filter coefficient memory 33
filter coefficients of a filter 32 described later corresponding to each information signal (position
information and audio signal) based on the information signal 21 of the image sent from the
CPU. Coefficient setting means selected and set from a table stored in advance, FIR (Finite
Impulse Response) filter 32 for filtering (convolution operation processing) the information
signal 21 by the coefficient set by the coefficient setting means, and The output circuit of the
addition circuit 34 is the output signal of the speaker SL. The output of the addition circuit 35 is
the output of the speaker SR. Become.
The information signal 21 is three-dimensional position information ([θ 1, r 1] to [θ n, rn] in
polar coordinate display) of an image (for example, a small bird, a dog, a river in the example of
the screen shown in FIG. To n, which are data input together with pixel information at the time of
image creation by CG.
The coefficient setting means 31 comprises n coefficient setting means (1) to (n) according to the
configuration of the information signal 21, and the filter 32 further applies to each of the
position information of the information signal 21 and the audio signal. It comprises the prepared
FIR (11), FIR (12) to FIR (n1), and FIR (n2).
The position information is input at intervals of several tens to several hundreds of msec for one
of the audio signals 1 to n, and the coefficients of the filter 32 are rewritten by the coefficient
setting means 31.
For example, fs = 44.1 kHz or fs = 48 kHz may be used as the frequency fs of the audio signals 1
to n.
Although the filter coefficient memory 33 is used as a means for setting the coefficient of the
filter 32 in the above configuration, the coefficient may be obtained by calculation from position
information using an arithmetic circuit.
As an example of setting the coefficients of the filter 32, first, as shown in FIG. 6, the filter
coefficients for the sound source O located 1 m from the dummy head U are stored in the
coefficient memory 33 as shown in Table 1 every 10 °. Set
Then, as described below, interpolation is performed for the direction, and amplitude is adjusted
for the distance.
I) interpolation in the direction,
Ii) Adjustment of the distance,
Furthermore, the adder circuits 34 and 35 are simple adders, and let the output of the coefficient
setting means 31 be Sn (n = 1 to n), the output of the adder circuit 34 be Out1 and the output of
35 be Out2 ,
Next, the operation of the sound image control means having such a configuration will be
First, position information [θ n, rn] and an audio signal n are received for each independent
image (a small bird, a dog, a river in the above example) from the CPU 2.
Then, coefficients are set by the coefficient setting means for each sound source from the
position information, and FIRs (n1) and (n2) corresponding to the position information and the
audio signal are driven by the set coefficients, and are respectively filtered. The left and right
speakers SL and SR control signals are output.
The outputted speaker control signal is divided into the left and right speakers, and the addition
circuit 34 or 35 performs addition operation to become a drive signal of each speaker SL, SR.
The reception of the position information and the audio signal from the CPU 2 is performed at a
fixed sampling cycle for each image, and therefore the outputs from the speakers SL and SR are
also output for each image at the same sampling cycle. I assume.
Fig. 4 shows a virtual sound image of the sound output in this way, so that the bird's voice can be
heard from point A, the river's sound can be heard from point B, and the dog's call can be heard
from point C. And match the actual screen of FIG.
Since the present invention is configured as described above, the voice can be reproduced as if
the voice can be heard from the position of the user space corresponding to the position on the
image of each object of the three-dimensional image such as CG. It can be expected to have the
effect of making the experience more realistic.