JP2015170926

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2015170926
Abstract: PROBLEM TO BE SOLVED: To provide a technology for preventing a loud volume from
becoming loud and unpleasant even when a listener gets out of a sweet spot at the time of
transaural reproduction. SOLUTION: A binaural acoustic signal is acquired. A crosstalk
cancellation process is performed on the binaural acoustic signal to generate a crosstalk-canceled
binaural acoustic signal. The binaural acoustic signal is delayed to generate a delayed binaural
acoustic signal. If the listener's ears are located in the sweet spot, the crosstalk cancellation
processed binaural acoustic signal is output, and if at least one of the ears is outside the sweet
spot, the delayed binaural acoustic signal is output. Output. [Selected figure] Figure 1
Sound reproduction apparatus, sound reproduction method
[0001]
The present invention relates to sound reproduction technology.
[0002]
A signal recorded by a dummy head microphone or a signal obtained by convoluting a headrelated transfer function (HRTF) in the sound source direction is called a binaural signal.
There is a three-dimensional sound reproduction technology that reproduces the state in which a
human is listening to a sound with both ears by reproducing a binaural signal, and reproduces a
realistic three-dimensional sound field. This includes binaural reproduction technology using
10-05-2019
1
headphones as reproduction equipment and transaural reproduction technology using speakers.
[0003]
In binaural reproduction, there is a problem that the sound image in front is localized in the head.
In transaural reproduction, this problem is solved, but in order to deliver separate signals to the
listener's ears, it is necessary to carry out processing to eliminate the effects of transfer functions
between multiple speakers used for reproduction and the binaural is there. In particular, since
the output signals of a plurality of speakers are mixed in the transmission path to the ear and
crosstalk is generated strongly, this processing is called crosstalk cancellation because this
processing is cancellation.
[0004]
In transaural reproduction technology, the listener can not correctly recognize stereophonic
sound if crosstalk cancellation is not performed correctly. In order to perform crosstalk
cancellation correctly, it is necessary to exactly match the phase of the signal output from each
speaker used for reproduction at the listening position. Therefore, although it depends on the
configuration and the arrangement of the speakers used for reproduction, there is a problem that
the place where the listener can recognize the three-dimensional sound effect, that is, the socalled sweet spot is narrow. Therefore, even if the listener moves a little, stereophonic sound can
not be heard.
[0005]
In order to solve this problem, there is a prior art in which the position of the listener is always
detected and the crosstalk cancellation process is adjusted according to the position of the
listener. For example, in the technique disclosed in Patent Document 1, the position of the
listener is always detected, and the transfer function used for the cancellation process is
retrieved from the transfer function database by the position and applied to the process. Thereby,
regardless of the position of the listener, a stereophonic sound effect can be obtained.
[0006]
10-05-2019
2
Further, in the technology disclosed in Patent Document 2, the frequency at which the phase
shift occurs due to the position shift is calculated, and the low band and the high band are
divided by the frequency, and the high band component is delayed. As a result, the localization of
the virtual sound image is stabilized by emphasizing the sound image localization based on only
the low frequency component by the Haas effect.
[0007]
In addition, such a three-dimensional sound technology is generally applied as a virtual surround
reproduction technology that virtually reproduces a surround sound signal with a smaller
number of speakers than the number of channels.
[0008]
JP, 2000-295698, A JP, 2009-171144, A
[0009]
In transaural reproduction, when the listening position deviates from the sweet spot, there is a
problem that not only stereo sound can not be heard but also it feels extremely loud.
FIG. 15 is a diagram showing the frequency characteristics of the two-channel crosstalk
cancellation filter when performing transaural reproduction when two speakers are in the
direction of 5 degrees to the left and right with respect to the front of the listener.
[0010]
Since the crosstalk cancellation filter is a filter for correcting the interference between a plurality
of speakers, as shown, a strong peak appears in the high region.
The frequency at which this peak occurs is determined by the difference in path from the two
speakers to each ear in the case of two-channel transaural reproduction. In the example of FIG.
15, a peak occurs near 10 kHz. The sound wavelength of this frequency is about 5.7 cm,
assuming that the sound velocity is 340 m / sec. Therefore, when the path difference from both
10-05-2019
3
speakers to the ear shifts about 2.8 cm of this half wavelength from the reference path
difference, a strong peak may be heard by the listener and may be annoyingly offensive.
[0011]
Also, crosstalk cancellation is originally a process of canceling the signals from both speakers in
the sweet spot. Therefore, when the volume is appropriately adjusted in the sweet spot, there is a
problem that the unerased remaining component appears when the head is deviated from the
sweet spot, and the entire volume itself becomes large.
[0012]
According to the technique disclosed in Patent Document 1, when the listener moves the
position, crosstalk cancellation can be correctly performed by designing the crosstalk
cancellation filter using a transfer function according to the position. It is like that. However, with
such a technique, there is a time lag between detection of head movement and filter design, and
during that time the listener will feel loud. In addition, there is a problem that there is a limit to
the range in which the position of the listener can be chased.
[0013]
The present invention has been made in view of such a problem, and a technique for preventing
the volume from becoming loud and unpleasant even when the listener is removed from the
sweet spot at the time of transaural reproduction is disclosed. provide.
[0014]
One aspect of the present invention includes means for acquiring a binaural acoustic signal,
processing means for performing crosstalk cancellation processing on the binaural acoustic
signal to generate crosstalk-canceled binaural acoustic signal, and the binaural acoustic signal.
Delay means for generating a delayed binaural acoustic signal by delaying the time required for
the crosstalk cancellation process; determination means for determining whether or not the
listener's both ears are positioned in a sweet spot; The cross talk canceled binaural acoustic
signal is output if both ears are located in the sweet spot, and the delayed binaural acoustic
signal is output if at least one of the ears is located outside the sweet spot. And output means for
outputting.
10-05-2019
4
[0015]
According to the configuration of the present invention, even when the listener is removed from
the sweet spot at the time of transaural reproduction, it is possible to prevent the volume from
becoming loud and annoying.
[0016]
FIG. 1 is a block diagram showing a configuration example of a sound reproduction device.
The flowchart of the process which a sound reproduction apparatus performs.
The flowchart which shows the details of the processing in step S3.
FIG. 8 is a diagram for explaining the process in step S3. FIG. 1 is a block diagram showing a
configuration example of a sound reproduction device. FIG. 1 is a block diagram showing a
configuration example of a sound reproduction device. FIG. 1 is a block diagram showing a
configuration example of a sound reproduction device. The flowchart of the process which a
sound reproduction apparatus performs. FIG. 1 is a block diagram showing a configuration
example of a sound reproduction device. The flowchart of the process which a sound
reproduction apparatus performs. FIG. 1 is a block diagram showing a configuration example of a
sound reproduction device. The flowchart of the process which a sound reproduction apparatus
performs. The flowchart of the process which a sound reproduction apparatus performs. FIG. 1 is
a block diagram showing a configuration example of a sound reproduction device. The figure
which shows the frequency characteristic of 2 channel crosstalk cancellation filters. A figure
explaining general crosstalk cancellation processing.
[0017]
Hereinafter, preferred embodiments of the present invention will be described with reference to
the accompanying drawings. The embodiment described below shows an example when the
present invention is specifically implemented, and is one of the specific examples of the
configuration described in the claims.
10-05-2019
5
[0018]
First Embodiment First, a configuration example of a sound reproduction device according to the
present embodiment will be described using the block diagram of FIG. Note that the
configuration shown in FIG. 1 is only an example of a configuration capable of realizing each
process described below, and any configuration may be adopted as long as the configuration
described below can be realized. Absent.
[0019]
The dummy head microphone 1 picks up (acquires) the sound signal of one ear and the sound
signal of the other ear in which sound wraparound by the head is naturally convoluted, and the
collected binaural sound signal is an analog signal. Convert to electrical signal and output.
[0020]
The microphone amplifiers 2a and 2b are a microphone amplifier for one ear of the listener and
a microphone amplifier for the other ear.
The microphone amplifiers 2a and 2b appropriately amplify and output a weak acoustic signal
for one ear and a weak acoustic signal for the other ear output from the dummy head
microphone 1, respectively.
[0021]
The ADCs (A / D converters) 3a and 3b convert analog binaural acoustic signals amplified by the
microphone amplifiers 2a and 2b into digital binaural acoustic signals and output the digital
binaural acoustic signals.
[0022]
The crosstalk canceller 5 performs crosstalk cancellation processing on the binaural acoustic
signals output from the ADCs 3a and 3b, and generates and outputs a crosstalk cancellation
processed binaural acoustic signal.
10-05-2019
6
[0023]
The delay unit 4 delays the binaural acoustic signal output from the ADCs 3a and 3b by the time
required for the crosstalk cancellation process to generate a delayed binaural acoustic signal.
[0024]
The video camera 12 picks up a moving image of a nearby area including a sweet spot (indicated
by b ) of the sound reproducing apparatus according to the present embodiment, and the
picked up images of each frame are sequentially listened to It is sent to the state detector 7.
[0025]
The listener status detector 7 analyzes the image of each frame transmitted from the video
camera 12 and, if the listener (indicated by A ) appears in the image, the listeners both ears
Estimate the position of
[0026]
The existing sweet spot determiner 6 determines whether or not the listener's both ears are
located in the sweet spot from the positions of both ears estimated by the listener state detector
7.
When the sweet spot judging unit 6 judges that the listener's ears are located in the sweet spot,
the sweet spot judging unit 6 selects the crosstalk cancellation processed binaural acoustic signal
as the output signal switches 8a and 8b. To direct.
On the other hand, when the sweet spot judging unit 6 judges that at least one of the listener's
ears is located outside the sweet spot, the output signal switches 8a and 8b select the delayed
binaural acoustic signal. Instruct
[0027]
The output signal switches 8a and 8b select one of the output from the delay unit 4 and the
output from the crosstalk canceller 5 in accordance with the instruction from the existing sweet
spot determination unit 6.
10-05-2019
7
[0028]
That is, when the output signal switches 8a and 8b respectively instruct the sweet spot judging
unit 6 to select the crosstalk canceled cancellation binaural acoustic signal, the crosstalk
cancellation processed binaural acoustic signals (for one speaker Select and output the signal and
the other speaker signal).
On the other hand, when the output signal switches 8a and 8b instruct the sweet spot determiner
6 to select the delayed binaural acoustic signal, respectively, the delayed binaural acoustic signal
(one speaker signal and the other speaker signal) Select and output.
[0029]
The DACs (D / A converters) 9a and 9b convert digital acoustic signals output from the output
signal switches 8a and 8b into analog acoustic signals and output the analog acoustic signals.
The amplifiers 10a and 10b appropriately amplify analog acoustic signals output from the DACs
9a and 9b, respectively, and then output the amplified signals.
The speakers 11a and 11b output sounds based on the analog sound signals output from the
amplifiers 10a and 10b, respectively.
[0030]
The operation unit 13 is an input interface that can be operated by the user to input various
instructions such as a touch panel screen, a hard key, a keyboard, a mouse, and the like to the
apparatus.
The controller 14 performs operation control of each part which comprises the sound
reproduction apparatus which concerns on this embodiment.
10-05-2019
8
[0031]
Next, processing performed by the sound reproducing apparatus for switching and outputting
the sound based on the delayed binaural acoustic signal and the sound based on the crosstalk
cancellation processed binaural acoustic signal based on the image captured by the video camera
12 will be described with reference to FIG. This will be described using the flowchart of FIG.
[0032]
<Step S1> The video camera 12 captures a moving image of a nearby area including a sweet spot
(indicated by ロ ), and sequentially sends the captured image of each frame to the listener
status detector 7.
[0033]
In addition, the dummy head microphone 1 picks up the sound signal of one ear and the sound
signal of the other ear in which the sound wraparound by the head is naturally convoluted, and
the collected binaural sound signals are analog. Convert to an electrical signal of
The microphone amplifiers 2a and 2b appropriately amplify and output weak acoustic signals of
one ear and weak acoustic signals of the other ear output from the dummy head microphone 1,
respectively.
The ADCs 3a and 3b respectively convert the analog binaural acoustic signal amplified by the
microphone amplifiers 2a and 2b into a digital binaural acoustic signal and output it.
[0034]
<Step S2> The crosstalk cancellation unit 5 performs crosstalk cancellation processing on the
binaural acoustic signals output from the ADCs 3a and 3b, and generates and outputs a crosstalk
cancellation processed binaural acoustic signal.
[0035]
On the other hand, the delay unit 4 delays the binaural acoustic signal output from the ADCs 3a
and 3b by the time required for the crosstalk cancellation process to generate and output a
delayed binaural acoustic signal.
10-05-2019
9
[0036]
Here, general crosstalk cancellation processing will be described with reference to FIG.
FIG. 16 is a schematic diagram for explaining general crosstalk cancellation processing in a case
where two speakers are used, that is, in a two-channel reproduction environment.
[0037]
In a two-channel playback environment, it can be considered that there are a total of four sound
transmission paths between the two left and right speakers and the listener's (indicated by "A").
As shown in FIG. 16, the transfer function from the left speaker 16b to the left ear is HLL, and
the transfer function from the left speaker 16b to the right ear is HLR. The transfer function from
the right speaker 16a to the left ear is HRL, and the transfer function from the right speaker 16a
to the right ear is HRR. At this time, when the sound is reproduced directly from the speakers
16a and 16b without performing the crosstalk cancellation processing, the input signals (Lin and
Rin) to the left and right speakers (16a and 16b) and the listener's both ears (left and right The
relationship shown in the following equation (1) is established between the listening signals
(Lear, Rear) reaching the ear).
[0038]
[0039]
Here, let A shown in the following equation (2) be a transfer function matrix.
[0040]
[0041]
Since the crosstalk cancellation processing is processing for making the listening signal identical
to the input signal, the inverse matrix X of the transfer function matrix A according to the
10-05-2019
10
reproduction environment may be designed as shown in the following equation (3).
[0042]
[0043]
When the inverse matrix X is multiplied from the left with respect to the above equation (1), the
input signal and the listening signal become identical as shown in the following equation (4).
[0044]
[0045]
Therefore, the crosstalk cancellation process can be accurately performed by designing the filters
X1, X2, X3, and X4 in the crosstalk canceller 5 shown in FIG. 16 so as to satisfy the transfer
function of the equation (3).
[0046]
In the present embodiment, although the case where the output channel is 2ch is described, the
crosstalk cancellation filter can be designed in the same manner even when the output channel is
3ch or more.
In this case, since the solution of the inverse filter is indeterminate, it can be solved by using, for
example, the Moore-Penlose general inverse matrix which is the norm minimum.
Since these treatments are common in the field and are known, the detailed description thereof is
omitted.
[0047]
<Step S3> The listener status detector 7 analyzes the image of each frame sent from the video
camera 12, and if the listener is shown in the image, the position of the listener's both ears is
10-05-2019
11
presume.
Details of the process in this step will be described later using the flowchart of FIG.
[0048]
<Step S4> The existing sweet spot determiner 6 determines the positions of both ears estimated
by the listener state detector 7 and the speakers 11a and 11b stored in advance in the memory
managed by the existing sweet spot determiner 6. The distance between each of the speakers
11a and 11b is calculated for each of the listeners' ears using the position of.
That is, the sweet spot determination unit 6 determines the distance between the left ear and the
speaker 11a, the distance between the left ear and the speaker 11b, the distance between the
right ear and the speaker 11a, and the distance between the right ear and the speaker 11b. Find
the distance of
[0049]
<Step S5> The existing sweet spot determiner 6 is an absolute value (left path difference) of a
difference between the distance between the left ear and the speaker 11a and the distance
between the left ear and the speaker 11b, The absolute value of the difference (right path
difference) between the distance between the speaker 11a and the distance between the right ear
and the speaker 11b is calculated.
[0050]
For example, assuming that the distance between the left ear and the speaker 11b is Lsl, and the
distance between the left ear and the speaker 11a is Lsr, the path difference Dle for the left ear is
calculated according to the following equation (5).
[0051]
[0052]
10-05-2019
12
<Step S6> The existing sweet spot determiner 6 calculates how much each of the left path
difference and the right path difference is deviated from the prescribed path difference at the
reference listening position, and the calculated respective deviations are prescribed allowances
Determine if it is within the range.
In the present embodiment, it is assumed that the reference listening position is predetermined
in front of the left and right speakers and at one point on a central line equidistant from the left
and right speakers.
[0053]
In the crosstalk cancellation technology, since it is necessary to match the phases of the signals
from the left and right speakers, cancellation does not work well if the path difference is
deviated.
In particular, at a frequency at which the left and right interference occurs, a filter is designed to
raise the frequency component, so it becomes very noisy when the phase of this frequency shifts
by half a wavelength.
Therefore, in the present embodiment, a range in which the path difference is shifted by a half
wavelength shift of this frequency is set as a threshold value.
For example, in the present embodiment, assuming that the installation direction of the speakers
is ± 5 °, the characteristic of the crosstalk cancellation filter is as shown in FIG. 15, and a peak
occurs in the vicinity of 10 kHz.
The sound wavelength of this frequency is about 5.7 cm, assuming that the sound velocity is 340
m / sec.
Therefore, when the path difference from both speakers to the ear shifts about 2.8 cm of this half
wavelength from the reference path difference, a strong peak may be heard by the listener and
may be annoyingly offensive.
10-05-2019
13
Therefore, this threshold is set to 2.8 cm. If the path difference is within this threshold, the
listener is in a range where he can listen to stereophonic sound. On the other hand, if the path
difference exceeds the threshold, the listener will not only be able to hear stereophonic sound,
but will be in a very loud and unpleasant area. By doing this, it is possible to determine whether
the listener can listen to a sweet spot, that is, a three-dimensional sound, and is within a quiet
range.
[0054]
As a result of the above determination, if both the left path difference and the right path
difference are within the allowable range from the specified path difference, the process
proceeds to step S8, and at least one of the left path difference and the right path difference is
specified. If the difference is out of the allowable range, the process proceeds to step S7.
[0055]
<Step S7> The existing sweet spot determiner 6 instructs the output signal switches 8a and 8b to
select a delayed binaural acoustic signal.
The output signal switches 8a and 8b select and output delayed binaural acoustic signals,
respectively.
[0056]
The DACs 9a and 9b convert the digital audio signals output from the output signal switches 8a
and 8b into analog audio signals, respectively, and the amplifiers 10a and 10b output analog
audio signals output from the DACs 9a and 9b, respectively. Is appropriately amplified and then
output. The speakers 11a and 11b output sounds based on the analog sound signals output from
the amplifiers 10a and 10b, respectively.
[0057]
<Step S8> The existing sweet spot determiner 6 instructs the output signal switches 8a and 8b to
10-05-2019
14
select the crosstalk cancellation processed binaural acoustic signal. However, each of the output
signal switches 8a and 8b selects and outputs the crosstalk-canceled binaural acoustic signal.
[0058]
The DACs 9a and 9b convert the digital audio signals output from the output signal switches 8a
and 8b into analog audio signals, respectively, and the amplifiers 10a and 10b output analog
audio signals output from the DACs 9a and 9b, respectively. Is appropriately amplified and then
output. The speakers 11a and 11b output sounds based on the analog sound signals output from
the amplifiers 10a and 10b, respectively.
[0059]
<Step S9> The controller 14 determines whether the termination condition of the process
according to the flowchart of FIG. 2 is satisfied. For example, when the controller 14 detects that
the user operates the operation device 13 to input a process end instruction, the controller 14
determines that the process end condition according to the flowchart of FIG. 2 is satisfied. Then,
when the controller 14 determines that the end condition of the process according to the
flowchart of FIG. 2 is satisfied, the process according to the flowchart of FIG. 2 is ended, and
when it is determined that the condition is not satisfied. , Processing returns to step S1.
[0060]
Next, from the image of each frame sent from the video camera 12, the process of estimating the
positions of the listeners' ears if the listener appears in the image from the process in step S3
described above Will be described with reference to the flowchart of FIG. In the following, for
convenience of explanation, only the process of calculating the position and orientation in the
horizontal plane will be described. Also, the flowchart of FIG. 3 shows the processing for an
image of one frame, and in fact the flowchart of FIG. 3 is performed for the image of each frame
sent from the video camera 12 is there.
[0061]
10-05-2019
15
<Step S101> The listener state detector 7 detects an area occupied by the face of the listener
from the image (target image) sent from the video camera 12. In this embodiment, instead of
recognizing the face of a specific person, it is detected whether there is an area that is
determined to be merely a human face. A technique for recognizing a face from an image and
detecting an area occupied by the face is well known, and therefore the description of this
technique is omitted.
[0062]
<Step S102> The listener state detector 7 controls the video camera 12 to focus on the face in
the area detected in step S101. Since this process is common in the camera field and known, the
description of this technique is omitted.
[0063]
<Step S103> The listener state detector 7 uses the focal length f of the lens of the video camera
12 and the distance a from the principal point to the imaging plane to reach from the camera
principal point to the face of the listener who is the subject Find the distance L of The focal
length f and the distance a from the principal point to the imaging plane are transmitted to the
listener state detector 7 together with the target image as parameters of the video camera 12.
According to the lens formula, the distance L between the principal point of the video camera 12
and the face of the listener can be determined by solving the following equation (6).
[0064]
[0065]
<Step S104> The listener state detector 7 uses the angle of view of the video camera 12 and the
horizontal position in the target image of the listener to form an angle between the front of the
video camera 12 and the horizontal plane in the direction of the listener Find θb.
The width d of the imaging surface and the focal distance f are transmitted to the listener state
detector 7 together with the target image as parameters of the video camera 12. First, the
10-05-2019
16
horizontal angle of view α is obtained by calculating the following equation (7).
[0066]
[0067]
Next, the horizontal angle of view α calculated using Equation (7), the horizontal pixel distance p
between the central pixel position of the region detected in step S101 and the central pixel
position of the target image, and An angle θb between the front of the video camera 12 and the
direction of the listener is calculated by calculating the following equation (8) using the number
of pixels H.
[0068]
[0069]
The horizontal pixel distance p is positive when the central pixel position of the area detected in
step S101 is on the left side of the central pixel position of the target image, and negative when it
is on the right side.
Therefore, θb is also a positive value when the center pixel position of the area detected in step
S101 is on the left side of the center pixel position of the target image, and a negative value
when it is on the right side.
In the example illustrated in FIG. 4, θb has a negative value.
[0070]
<Step S105> The listener state detector 7 obtains the position coordinates of the listener using
the distance L obtained in step S103 and the angle θb obtained in step S104.
First, coordinates handled in the present embodiment are defined. As shown in FIG. 4, the Y axis
10-05-2019
17
is set on the straight line connecting the two speakers, and the video camera 12 is installed on
the Y axis. Then, the position of the video camera 12 is set as the origin, and the X axis is set
perpendicular to the Y axis with the listener side as the positive direction. An angle θa between
the X axis and the imaging direction of the video camera 12 is a predetermined angle, and is
stored in advance in a memory managed by the listener condition detector 7. In this coordinate
system, the angle between the X axis and the direction of the listener is θa + θb, so the
coordinates (Lx, Ly) of the listener can be obtained by calculating the following equation (9).
[0071]
[0072]
<Step S106> The listener state detector 7 detects the face orientation θc on the target image of
the listener.
The technique for detecting the orientation of the face in the image is well known, and therefore
the description of this technique is omitted.
[0073]
In the present embodiment, when the listener is facing left from the front with respect to the
video camera 12, θc is a positive value, and θc is a negative value when the listener is facing
the right. In the example illustrated in FIG. 4, θc has a negative value.
[0074]
<Step S107> The listener state detector 7 obtains the direction θ of the listener's face with
respect to the above-mentioned coordinate system, using the above θa and θb and the θc
obtained in the step S106. From FIG. 4, this θ can be obtained by calculating the following
equation (10).
[0075]
10-05-2019
18
[0076]
<Step S108> The listener state detector 7 uses the coordinates of the listener determined in step
S105 and the face orientation θ determined in step S107 to determine the coordinates of the
listener's both ears.
As illustrated in FIG. 4, assuming that the horizontal cross section of a human head is a circle,
assuming that its diameter is 16 cm, and assuming that the left and right ear holes are at ± 90
° with respect to the front, The coordinates of (Elx, Ely) and the right ear (Erx, Ery) can be
obtained by calculating the following equation (11).
[0077]
[0078]
As described above, according to the present embodiment, the positions of the listener's ears are
constantly monitored, and when it is detected that at least one of the ears is out of the sweet
spot, the signal is output as a binaural signal not subjected to crosstalk cancellation processing. ,
Which can prevent the listener from feeling loud.
[0079]
Modified Example 1 In the first embodiment, an example has been described in which the
binaural signal recorded using the dummy head microphone is divided into two to switch the
binaural signal and the crosstalk cancellation processing signal.
[0080]
However, for example, as shown in FIG. 5, stereophonic recording is simultaneously performed
by the microphones 15a and 15b, and depending on whether the listener is in the sweet spot,
delayed stereo sound signals and crosstalk cancellation processed binaural sound signals The
same effect can be obtained by switching.
[0081]
10-05-2019
19
In FIG. 5, the microphones 15a and 15b respectively collect sounds for the left and right
channels and output corresponding stereo sound signals.
The microphone amplifiers 2c and 2d are the same as the microphone amplifiers 2a and 2b,
respectively, and the ADCs 3c and 3d are the same as the ADCs 3a and 3b, respectively.
[0082]
The delay unit 4 delays the stereo sound signal output from the ADCs 3 c and 3 d by the time
required for the crosstalk cancellation process to generate a delayed stereo sound signal.
[0083]
If the sweet spot judging unit 6 judges that the listener's ears are located in the sweet spot, the
sweet spot judging unit 6 instructs the output signal switches 8a and 8b to select the crosstalk
cancellation processed binaural acoustic signal. Do.
On the other hand, when the sweet spot judging unit 6 judges that at least one of the listener's
ears is located outside the sweet spot, the output signal switches 8a and 8b select the delayed
stereo audio signal. Instruct
[0084]
Each of the output signal switches 8a and 8b selects and outputs the crosstalk canceled binaural
acoustic signal when the sweet spot determiner 6 instructs to select the crosstalk canceled
binaural acoustic signal.
On the other hand, each of the output signal switches 8a and 8b selects and outputs the delayed
stereo audio signal when the existing sweet spot determiner 6 instructs to select the delayed
stereo audio signal.
[0085]
10-05-2019
20
<Modification 2> Further, as shown in FIG. 6, the outputs of the ADCs 3a and 3b are temporarily
stored as data in the memory 22. Thereafter, the extractor 23 reads this data from the memory
22 at an arbitrary timing to delay the data. A configuration may be adopted in which the device 4
and the crosstalk canceler 5 are supplied.
The storage unit 22 is a suitable memory such as a hard disk drive or a RAM.
Also in such a configuration, the output signal can be switched depending on whether the
listener is in the sweet spot, and the same effect can be obtained.
[0086]
Second Embodiment In this embodiment, an example will be described in which a virtual
surround signal and a stereo downmix signal are switched when performing surround sound
reproduction. A configuration example of the sound reproduction device according to the present
embodiment will be described using the block diagram of FIG. In the following, differences from
the first embodiment will be mainly described, and it will be described as being the same as the
first embodiment unless otherwise specified.
[0087]
The stereo downmixer 31 mixes the 5.1 channel surround sound signal, which is the input signal,
with the input downmix coefficient to generate a stereo downmix signal (stereo downmix signal L
for the left channel, right Convert to a stereo downmix signal R) for a channel and output.
[0088]
The virtual surround signal generator 32 convolutes a head related transfer function (HRTF) in a
direction based on the standard arrangement of the speakers of each channel with respect to
each channel signal of the input signal 5.1 channel surround sound signal except LFE. .
After that, the virtual surround signal generator 32 generates a virtual surround signal (a virtual
10-05-2019
21
surround signal for the left ear and a virtual surround signal for the right ear) by mixing the LFE
to convert it into a binaural signal and outputs it. Do.
[0089]
The delay unit 4 delays each stereo downmix signal output from the stereo downmixer 31 by the
time required for the crosstalk cancellation process to generate a delayed stereo downmix signal.
[0090]
The crosstalk canceller 5 performs crosstalk cancellation processing on the virtual surround
signal output from the virtual surround signal generator 32, and generates and outputs a
crosstalk cancel processed virtual surround signal.
[0091]
If the sweet spot judging unit 6 judges that the listener's ears are located in the sweet spot, it
instructs the output signal switches 8a and 8b to select the crosstalk cancellation processed
virtual surround signal. Do.
On the other hand, when the sweet spot judging unit 6 judges that at least one of the listener's
ears is located outside the sweet spot, the output signal switch 8a selects the delayed stereo
downmix signal, Tell 8b.
[0092]
The output signal switches 8a and 8b respectively select and output the crosstalk cancel
processed virtual surround signal when the existing sweet spot determiner 6 instructs to select
the crosstalk cancel processed virtual surround signal.
On the other hand, each of the output signal switches 8a and 8b selects and outputs the delayed
stereo downmix signal when the existing sweet spot determiner 6 instructs to select the delayed
stereo downmix signal.
10-05-2019
22
[0093]
Next, a process performed by the sound reproduction apparatus according to the present
embodiment will be described using FIG. 8 showing a flowchart of the process.
[0094]
<Step S201> The stereo downmixer 31 mixes the 5.1 channel surround sound signal, which is an
input signal, with the input downmixing coefficient to thereby input the stereo downmix signal L
for the left channel, the right channel Convert to stereo downmix signal R and output.
This process is performed according to the following equation (12).
[0095]
[0096]
Here, kc is the down-mixing coefficient of the center channel signal, and ks is the down-mixing
coefficient of the surround channel signal, and a value such as 1/22 or 0.5 is usually used.
Also, kLFE is a LME down-mixing coefficient, which is usually 0 in many cases. These coefficients
are specified by the content producer or content sender and transmitted along with the surround
sound signal.
[0097]
<Step S202> The virtual surround signal generator 32 generates a virtual surround signal for the
left ear and a virtual surround signal for the right ear based on the 5-channel signal excluding
the LFE of the 5.1 channel surround sound signal which is the input signal. Generate First, for
each channel signal to be processed, binaural signals are created by convoluting a head-related
transfer function in the direction of each channel in a standard speaker arrangement. Next,
virtual surround signals for each ear are generated by adding the signals of both ears created for
each channel for five channels for each ear.
10-05-2019
23
[0098]
The processes in steps S2 to S6 are as described above, but the virtual surround signal is also a
kind of binaural acoustic signal, and crosstalk cancellation processing is necessary at the time of
transaural reproduction. As a result, sweet spots are limited, and a phenomenon that a listener
feels noisy when moving from the sweet spots occurs similarly. Therefore, also in this case, it is
detected that the listener has moved from the sweet spot, and the output is switched to the
downmix signal without crosstalk cancellation.
[0099]
If it is determined in step S6 that both the left path difference and the right path difference are
within the allowable range from the specified path difference, the process proceeds to step S8,
and at least one of the left path difference and the right path difference is If it is out of the
allowable range because of the prescribed path difference, the process proceeds to step S203.
[0100]
<Step S203> The existing sweet spot determiner 6 instructs the output signal switches 8a and 8b
to select a delayed stereo downmix signal.
The output signal switches 8a and 8b respectively select and output delayed stereo downmix
signals.
[0101]
As described above, according to the present embodiment, when the listener deviates from the
sweet spot, the output is switched to the stereo downmix signal, so that it is possible to prevent a
feeling of loudness.
[0102]
Third Embodiment In this embodiment, in addition to the first embodiment, when a plurality of
10-05-2019
24
listeners are detected from an image by the video camera 12, a delayed binaural acoustic signal
is output.
A configuration example of the sound reproduction device according to the present embodiment
will be described using the block diagram of FIG. The configuration itself of the sound
reproducing apparatus is the same as that of the sound reproducing apparatus according to the
first embodiment shown in FIG. 1, but another listener (indicated by "ha") in the imaging range
by the video camera 12 Is different from the first embodiment in that In the following,
differences from the first embodiment will be mainly described, and it will be described as being
the same as the first embodiment unless otherwise specified.
[0103]
A process performed by the sound reproduction apparatus according to the present embodiment
will be described with reference to FIG. 10 showing a flowchart of the process.
[0104]
<Step S301> The listener state detector 7 detects the area occupied by the human face from the
image (target image) sent from the video camera 12, and counts the number (number of people)
of the detected area. .
Since this technology is well known, the description related to this technology is omitted. In the
present embodiment, all the faces of the person recognized from the target image are determined
to be the faces of the listener, and the number thereof is counted.
[0105]
<Step S302> The listener state detector 7 determines whether the number of areas counted in
step S301 is one. As a result of this determination, the process proceeds to step S7 if the number
of areas counted in step S301 is other than 1 (0 or more), and if the number of areas counted in
step S301 is 1, the process proceeds to step S3 .
[0106]
10-05-2019
25
As described above, according to the present embodiment, when the listeners are plural, the
listeners who are at a position away from the sweet spot may feel loud because the output is
switched from the crosstalk canceled signal to the normal binaural signal. Can be prevented.
Also, even when there is no listener, switching to the binaural signal can suppress the overall
output volume and the influence of the sound to the surroundings.
[0107]
Fourth Embodiment In the present embodiment, the position of the listener is tracked, and if the
listener is within the tracking range, the crosstalk cancellation filter is sequentially changed
according to the positions of the listener's both ears. Switch to delayed binaural acoustic signal if
out of tracking range. In the following, differences from the first embodiment are mainly
described, and it is assumed that the second embodiment is the same as the first embodiment
unless otherwise specified.
[0108]
A configuration example of the sound reproduction device according to the present embodiment
will be described using the block diagram of FIG. The range shown by "d" shows the range which
can track a listener.
[0109]
The listener status detector 41 analyzes the image of each frame sent from the video camera 12
to determine the position of the listener, and outputs the obtained position to the presence
tracking range determiner 42.
[0110]
The current tracking range determiner 42 determines whether the position determined by the
listener status detector 41 is within the traceable range, and the output signal switches 8a and
8b output the output from the delay unit 4 according to the determination. , Or one of the
outputs from the crosstalk canceller 44.
10-05-2019
26
[0111]
The crosstalk cancellation filter design unit 43 estimates the transfer function between the left
and right speakers and the listener's both ears, and redesigns the crosstalk cancellation filter
coefficient using this.
The crosstalk cancellation filter design unit 43 then supplies the crosstalk cancellation filter
coefficient to the crosstalk cancellation filter 44.
[0112]
The crosstalk canceller 44 performs crosstalk cancellation processing using the crosstalk
cancellation filter coefficient supplied from the crosstalk cancellation filter design unit 43 on the
binaural acoustic signal output from each of the ADCs 3 a and 3 b. Generate and output the
crosstalk cancellation processed binaural acoustic signal.
[0113]
Next, the process performed by the sound reproduction apparatus according to the present
embodiment will be described using FIG. 12 showing a flowchart of the process.
[0114]
<Step S401> The listener state detector 41 analyzes the image of each frame sent from the video
camera 12, and when the listener is shown in the image, estimates the position of the listener.
This estimation process can be realized by executing the processes of steps S101 to S105 in the
flowchart of FIG.
The listener condition detector 41 then sends the estimated listener position to the tracking
range determiner 42.
[0115]
10-05-2019
27
<Step S402> The present tracking range determiner 42 is a tracking in which the position of the
listener received from the listener state detector 41 is created in advance and registered as data
in the memory managed by the present tracking range determiner 42. It is determined whether it
is within the possible range (coordinate position range).
As a result of this determination, if the position of the listener received from the listener state
detector 41 is within the traceable range (within the prescribed area), the process proceeds to
step S403 and is not within the traceable range (outside the prescribed area) In the case, the
process proceeds to step S407.
[0116]
<Step S403> The listener state detector 41 estimates the positions of both ears of the listener
using the position obtained in step S401. This process can be realized by executing the processes
of steps S106 to S108 in the flowchart of FIG. 3. Then, the listener state detector 41 sends the
obtained positions of both ears to the crosstalk cancellation filter design device 43.
[0117]
<Step S404> The crosstalk cancellation filter design unit 43 sets the positions of the ears
received from the listener state detector 41 and the speakers 11a and 11b stored in advance in
the memory managed by the crosstalk cancellation filter design unit 43. The transfer function to
each of speaker 11a, 11b is calculated about each of a listener's both ears using each position of,
and. That is, the crosstalk cancellation filter design device 43 has a transfer function between the
left ear and the speaker 11a, a transfer function between the left ear and the speaker 11b, a
transfer function between the right ear and the speaker 11a, and the right ear and the speaker
Find the transfer function between and 11b.
[0118]
In the case of the configuration of FIG. 11, a total of four transfer functions are estimated by
combining two speakers and both ears. In this embodiment, since the distance to each speaker
and each ear can be calculated from the coordinates, it is assumed that a transfer function that
10-05-2019
28
reflects only the delay based on the distance difference is estimated. Alternatively, a typical room
response may be stored in memory in advance, and the room response may be further folded.
[0119]
<Step S405> The crosstalk cancellation filter design unit 43 calculates a crosstalk cancellation
filter coefficient using the transfer function estimated in step S404, and supplies the calculated
crosstalk cancellation filter coefficient to the crosstalk cancellation unit 44. In general, the
crosstalk cancellation filter design can be designed by calculating an inverse filter of the transfer
function. This process is commonly performed in the art and, as it is known, the detailed
description thereof is omitted.
[0120]
<Step S406> The crosstalk cancellation unit 44 uses the crosstalk cancellation filter coefficient
supplied from the crosstalk cancellation filter design unit 43 in step S405 to the binaural
acoustic signal output from each of the ADCs 3a and 3b. A talk cancellation process is performed
to generate and output a crosstalk cancellation processed binaural acoustic signal.
[0121]
<Step S407> The delay unit 4 delays the binaural acoustic signal output from each of the ADCs
3a and 3b by the time required for the crosstalk cancellation process, and generates and outputs
a delayed binaural acoustic signal.
[0122]
As described above, according to the present embodiment, even in the case of performing
transaural reproduction while tracking the position of the listener and applying the crosstalk
cancellation filter, the listener may feel loud when it deviates from the tracking range. It is
possible to prevent.
[0123]
[Fifth Embodiment] In the case of the above embodiment, when the listener goes out of or enters
the sweet spot while searching for the trial listening position, switching of the output signal
occurs each time, and it may be difficult to hear. Be
10-05-2019
29
Therefore, it may be determined whether the listener is at a sweet spot after determining that the
listener has stopped for a predetermined time, and switching of the output signal may be
performed.
In that case, for example, the process of FIG. 13 is performed instead of the process shown in
FIG.
[0124]
In step S11, the listener state detector 7 determines whether the position of the listener has not
changed for a prescribed time or more (whether the listener is stationary).
Various processes can be considered for the process for this determination, and any process may
be adopted. For example, if the change in the position of the listener determined from the image
of the frame is equal to or less than the specified amount continuously over N (N is an integer of
2 or more) frames or more, the position of the listener It has not been changed for N frames).
Then, as a result of this determination, if it is determined that "the position of the listener has not
changed for a prescribed time or more", the process proceeds to step S4, and it can not be
determined that "the position of the listener has not changed for a prescribed time or more" If
yes, the process proceeds to step S9.
[0125]
By performing such processing, since switching of the output signal does not occur while the
listener is moving, it is possible to prevent frequent switching of the output signal and making it
difficult to hear.
[0126]
Sixth Embodiment In the first embodiment, it is determined whether the listener's both ears are
in the sweet spot by detecting the position and orientation of the listener from the image by the
video camera 12 explained.
10-05-2019
30
This determination may be performed by collecting the sound at the listener position and
comparing it with the input signal as in the configuration of FIG.
[0127]
The transfer function superposer 51 sets the transfer function between the speakers 11a and
11b and the listener's both ears at the reference listener position (managed in the memory) at
the time of designing the crosstalk cancellation filter, of the ADCs 3a and 3b. It convolves into the
binaural acoustic signal output from each. As a result, when the binaural acoustic signal is
reproduced from the speaker as it is, the acoustic signal that can be heard by the listener's both
ears is reproduced assuming that the listener is located at the reference position.
[0128]
The binaural microphones 52a and 52b are a binaural microphone for the right ear and a
binaural microphone for the left ear, respectively. The binaural microphones 52a and 52b are
attached to the right and left ears of the listener and pick up sounds caught by the ears. A sound
signal collected by the binaural microphone 52a is amplified by the microphone amplifier 2p and
converted into a digital binaural acoustic signal by the ADC 3p. A sound signal collected by the
binaural microphone 52b is amplified by the microphone amplifier 2q and converted into a
digital binaural acoustic signal by the ADC 3q.
[0129]
The present sweet spot determiner 53 positions the listener at the sweet spot by determining
whether the signals selected by the output signal switches 54 a and 54 b substantially match the
binaural acoustic signals from the ADCs 3 p and 3 q. Determine if it is.
[0130]
For example, it is assumed that the output signal switches 8a and 8b select the output from the
crosstalk canceller 5 at present.
10-05-2019
31
At this time, the existing sweet spot determiner 53 instructs the output signal switches 54a and
54b to select the output from the delay unit 4, and the output from the delay unit 4 and the
binaural acoustic signal from the ADCs 3p and 3q. It is determined whether or not is
approximately equal. On the other hand, it is assumed that the output signal switches 8a and 8b
select the output from the delay unit 4 at present. At this time, the sweet spot determiner 53
instructs the output signal switches 54a and 54b to select the output from the transfer function
superimposing unit 51, and the output from the transfer function superimposing unit 51 and the
binaural from the ADCs 3p and 3q. It is determined whether or not the acoustic signal is
substantially equal.
[0131]
As a result of such judgment, when it is judged that the listeners are substantially equal, it is
judged that the listener's ears are located in the sweet spot, and when it is judged that they are
not substantially equivalent, the listeners It is determined that both ears are located outside the
sweet spot.
[0132]
When the sweet spot judging unit 53 judges that the listener's ears are located in the sweet spot,
the sweet spot judging unit 53 controls the output signal switches 8 a and 8 b to select the
output of the crosstalk canceller 5. , And controls the output signal switches 54a and 54b to
select the output from the delay unit 4.
[0133]
On the other hand, when the sweet spot judging unit 53 judges that the listener's ears are located
outside the sweet spot, the sweet spot judging unit 53 controls the output signal switches 8a and
8b so as to select the output of the delay unit 4. The output signal switches 54a and 54b are
controlled to select the output from the transfer function superimposing unit 51.
[0134]
Such a configuration makes it possible to determine whether the listener is in the sweet spot.
Based on this determination, it is possible to prevent the listener from feeling loud by switching
to a signal that does not cancel crosstalk when the listener leaves the sweet spot.
10-05-2019
32
[0135]
In the first embodiment, although the image of the video camera 12 is analyzed to detect the
state of the listener, a sensor for detecting the position and orientation of the head of the listener
is directly attached to the head of the listener, The state of the listener may be detected based on
sensor information from the sensor.
[0136]
Moreover, it is also possible to use the various embodiment and modification which were
mentioned above combining suitably one part or all, and if it is the structure equivalent to it, you
may employ ¦ adopt another structure. .
[0137]
Furthermore, for example, all the configurations shown in FIGS. 1, 5, 6, 7, 9, 11, and 14 may be
configured by hardware, or a portion may be configured by software (computer program). .
In that case, this software is stored in a memory managed by the controller 14 and executed by
the controller 14.
[0138]
Other Embodiments The present invention is also realized by executing the following processing.
That is, software (program) for realizing the functions of the above-described embodiments is
supplied to a system or apparatus via a network or various storage media, and a computer (or
CPU or MPU etc.) of the system or apparatus reads the program It is a process to execute.
[0139]
DESCRIPTION OF SYMBOLS 1 Dummy head microphone 2a, 2b Microphone amplifier 4 Delay
device 5 Cross talk cancellation device 6 Sweet spot determination device 8a, 8b Output signal
switch 10a, 10b Amplifier 11a, 11b Speaker
10-05-2019
33
10-05-2019
34