Patent Translate
Powered by EPO and Google
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
An apparatus is provided for creating a sense of sound source spatially separated from an area
between a pair of headphones using a pair of oppositely facing headphone speakers. An
apparatus for creating a sense of sound source spatially separated from an area between a pair of
headphones by using a pair of headphone speakers opposite to each other, comprising: (a)
idealization A series of audio inputs representing an audio signal being projected from an
idealized source localized at a spatial location in relation to the selected listener, (b)
interconnected to the audio input and the series of feedback inputs (C) a filter system including:
(c) feedback response filtering for generating feedback input, and separate filters for direct
response and short-time response filtering and approximation for reverberation response ( d) an
apparatus comprising second matrix mixing means for combining the filtered intermediate
output signals. [Selected figure] Figure 3
Using Filter Effects in a Stereo Headphone Device to Enhance the Spatial Spread of Sound
Sources Around a Listener
The invention relates in particular to the field of audio signal processing and audio reproduction
on headphones, and further discloses a sound reproduction technology that produces an
enhanced effect such as a sense of spatial expansion of objects surrounding the listener in a
computationally efficient manner. It is
It would be desirable to provide a more comfortable listening experience on a pair of
Preferably, a listening experience is desired that reproduces the intended atmosphere of the
original recording. In particular, the desirable aspects of a comfortable listening experience
include the listener's feeling that the sound is occurring outside of his head, or more particularly
that it does not originate from the headphones themselves. This effect is hereinafter referred to
as out of head (OOH). Additionally and somewhat related, ideally, the listener closes his eyes and
listens to an external speaker set that is either in one room with the performer or placed at a
certain distance. It must be able to have the feeling that
Often, it may be desirable to create a three-dimensional surround sound environment feel for
headphone listeners in any particular environment. For example, one common form of
environment for the use of headphones is, for example, the case of long flight flights where inflight movies and videos are screened. Another common use of headphones is in crowded
environments where the listener wishes to employ private listening of the headphone signal
without disturbing the people around him. Within such environments, it would be desirable to
provide a means for providing full surround sound across headphones.
Unfortunately, when using standard headphones, out-of-head perception is lost and sounds are
perceived as originating from a location inside the listener's head and are substantially
Other sound formats face similar problems when played over headphones.
For example, another common format, the Dolby AC-3 format, is designed to place a certain
number of speakers around the listener to create a substantially richer sound environment.
Again, if a headphone device is utilized in such an environment, the intended spatial positioning
of the sound is lost and again, the sound is felt as though it is emanating from the listener's head.
Convolution of audio signals with appropriate head related transfer functions (HRTFs) is known
in the art. However, such full convolution techniques often require excessive computational
resources and can not be easily implemented unless adequate resources are made available.
It is an object of the present invention to provide an efficient method and apparatus for
simulating an acoustic space, such as through headphones.
According to one aspect of the invention, an apparatus for creating a sense of sound source
spatially separated from an area between a pair of headphones utilizing a pair of oppositely
facing headphone speakers, (A) a series of audio inputs representing an audio signal being
projected from an idealized sound source localized at a spatial location relative to the idealized
listener; (b) the audio input as an intermediate output signal First mixing matrix means
interconnected to the audio input and the series of feedback inputs for outputting a
predetermined combination; (c) filtering the intermediate output signal and filtering the
intermediate output signal and the series A filter system that outputs a feedback input, in
addition to feedback response filtering to generate the feedback input, a direct response and A
second matrix mixing combining the filtered intermediate output signals to produce stereo
outputs of the left and right channels, and (d) including separate filters for filtering of the time
response and approximation to the reverberation response; An apparatus comprising means is
The system of the present invention includes improvements on reducing the computational
requirements in existing systems and improving the realism of virtual speaker systems.
Preferably, a predetermined number of feedback inputs are also input to the second matrix
mixing as well.
The filtering of the feedback response may include a reverberation filter.
The reverberation filter may include one of a sparse tap FIR, a recursive algorithm filter, or a full
convolution FIR filter, and may include a surround sound set of audio input signals.
Furthermore, in one embodiment, the feedback input is mixed with only the forward portion of
the audio input.
The filter system can include a forward sum filter that filters the sum of audio inputs located in
front of the idealized listener, the forward sum filter including direct and shadow heads for the
forward input. An approximation of the sum of the part-related transfer functions is substantially
Additionally, the filter system may include a forward difference filter that filters out the
differences in the audio input located in front of the idealized listener, the forward difference
filter including direct and forward inputs for the forward input. An approximation of the
difference between the shadow head related transfer functions is substantially included.
Additionally, the filter system may include a rear summation filter that filters the sum of the
audio inputs located behind the idealized listener, the rear summation filter including direct and
shadow heads for the rear input. An approximation of the sum of the part-related transfer
functions is substantially included. In addition, the filter system may include a rear difference
filter that filters the differences of the audio input located behind the idealized listener, the rear
difference filter including direct and reverse input for the rear input. An approximation of the
difference between the shadow head related transfer functions is substantially included. The
filter system further includes a reverberation filter interconnected to the sum of the audio inputs.
According to yet another aspect of the invention, a first series of filters for simulating direct
sound and initial echo in a binauralization unit for binauralizing at least one input signal;
simulating late reflections A binauralization unit, comprising: a binaural reverberation processor
further comprising: at least one recursive filter structure and a series of finite impulse response
filters interconnected to the at least one recursive filter structure ing.
The binaural reverberation processor can include at least two recursive filter structures each
having a finite impulse response filter of left and right channels interconnected at its output end,
wherein the first recursive filter structure is It has a longer reverberation decay time than the
recursive filter structure of.
The binaural reverberation processor can further include a series of recursive filter structures
interconnected with sum and difference filters that themselves output to the left and right
channel outputs.
In one embodiment, a portion of the output from one of the finite impulse response filters may
be fed back to one of the at least one of the recursive filter structures.
According to a further aspect of the invention, there is provided a method of providing a compact
form of processing a series of sound output signals for output as a stereo signal on a pair of
headphones, the method comprising: Convolving the pre-defined binaural room response with
the sound output signal in real time to generate.
In one embodiment, the convolution is performed utilizing a skip protection processor unit
located inside the CD-ROM player unit.
In another embodiment, convolution is performed utilizing a dedicated integrated circuit that
includes a modified form of digital to analog converter.
In another embodiment, the convolution is implemented using a dedicated or programmable
digital signal processor.
In another embodiment, convolution is performed on the analog input by a DSP processor
interconnected between the analog to digital converter and the digital to analog converter.
In another embodiment, the convolution is performed on the stereo output signal on a separately
removable external device connected between the sound output signal generator and the
headphones, which is the output signal It is processed in digital form for processing by the
In another embodiment, implemented on a stereo output signal on a separately removable
external device connected between the sound output signal generator and the headphones, the
sound output signal is output in analog form.
In order to facilitate the discussion of the preferred embodiments, a definition of terms used
herein will be made.
[System] A system for virtual presentation of sound sources on headphones.
In the abstract form, this consists of a device with a fixed number of inputs (for each speaker
position) and two outputs (for the left and right headphone ears).
Transfer Function: Signal mapping from a given input to a given output. If the system has M
inputs and N outputs, there are M × N possible transfer functions. If the system is linear and
time invariant, these transfer functions will be static and independent. These are often referred to
individually as input-output transfer functions (e.g. left-left, left-back-right).
Filter Characteristics HRTF Each transfer function has an initial part of the response that
represents an approximation of a particular HRTF. The length of this part is usually up to 100
HRTF Symmetry HRTF can reflect this symmetry if the virtual source of the input source has
some symmetry around the listener. For example, if there are virtual speakers localized at 30
degrees to the left and right of the listener, then the HRTF, ie the initial part of the left-left
transfer function, will be identical to the initial part of the right-right transfer function. Therefore,
left-right and right-left will show similarity or equivalence in the initial part.
Sparse Reverb After the initial HRFT, a reverberant field approximation will be present in each
transfer function. This approximation is very sparse. The property of the sparse transfer function
is that it has distinguishable degrees of freedom covering a much smaller subset than the filter is
somewhat degenerate and is covered by the full degree of freedom of the filter taps across the
length of the filter It is to have.
There are several possibilities for this sparse property: * The actual sparse tap. The transfer
function is roughly zero with a fixed number of non-zero taps. These are discrete and equivalent
in all respects except amplitude and sign. * Filtered sparse taps. The transfer function exhibits
repetitive patterns at temporally sparse locations. This is the result of further passing the coarse
tap type filter through the filter and expanding the taps. Sparse patterns are equivalent in all
respects except amplitude and sign. The patterns may also overlap, in which case it may not be
obvious to an observer who is indifferent to the presence of filtered sparse taps. * Compound
filtered sparse taps. Multiple unique sparse tap type sections can be created and passed through
different filters. This will be identified by a plurality of temporally repeated different filter
patterns that are identical in all respects except amplitude and sign. The filter pattern used
thereby corresponds to the initial HRTF of part or all of the transfer function of the system. *
Recursive sparse taps. Sparse tap with recursive elements. These sparse taps are infinitely
continuous in time and decay in geometric series. * Sparse taps filtered recursively. The result of
filtering a recursive sparse tap type implementation through a specific filter and / or HRTF. As a
result of this, algorithmic reverberation with totally different filtered sparse taps will result in an
apparently complex response over time. The filter may correspond to the initial HRTF of some or
all of the system transfer functions.
Mono Reverberation The reverberation portion of the transfer function may be derived from a
mono or combined sound source. This is apparent from the fact that the transfer functions from
all inputs to a particular output are equivalent. For example, in the stereo virtual speaker
example, the left-left and right-left transfer functions will exhibit very similar characteristics in
the later part of the response. If there is a difference in response, this is due to shifting, scaling or
simple filtering operations over time.
Referring initially to FIG. 1, a schematic illustration of the operation of the first implementation is
provided. In this embodiment, a series of audio inputs 11 are provided to a mechanism 12 which
will normally be part of the prior art that inputs audio signals and produces a series of
loudspeaker inputs 13. This speaker input may be provided for various output formats, such as a
stereo output format or an AC-3 output format. The operation of the part within dashed line 14 is
completely conventional. The speaker inputs are forwarded to the headphone processing system
15, which outputs to a set of standard headphones 16 to simulate the presence of a fixed number
of speakers around the listener using the headphones 16.
FIG. 1 shows an example in which the headphone processing system 15 simulates the presence
of two virtual speakers 17, 18 in front of the user of the headphone 16, as if it were a normal
stereo response. The arrangement of FIG. 1 can be incorporated into any commonly used system
for stereo audio reproduction. System 15 processes the normal signal 13 intended for playback
on the speaker, and thus is compatible with any other system designed to enhance the playback
of audio on the loudspeaker. And can be used in combination with these.
The general structure of a first example implementation of the headphone processing system 15
is by means of a filter in which each of the intended loudspeaker inputs is passed through two
filters, one for each ear. The result of the addition of these filters is the signal sent to the
appropriate headphone channel for that ear. In an alternative embodiment, the filter may or may
not be updated to reflect changes in the head orientation of the listener within the virtual
speaker array. By updating the filter based on the physical orientation of the listener's head, a
more faithful, head-following environment can be created, but tracking the head's movement is
required. Various implementations may be variants based on this theme to reduce the
computational requirements. Non-linear active or adaptive components can also be added to the
structure to further improve performance.
An example of the general structure of a headphone processing system in a more complex form
is illustrated in FIG. Implementation 20 applies to each of the different desired impulse response
filters, eg 22, 23, with one filter, eg 22 for the left channel and another filter, eg 23 for the right
channel, A series of loudspeaker inputs, for example 21 are included. The filters represent HRTFs
from each sound source to the corresponding ear. The filter outputs are summed (e.g. 24) to
form a final output 25.
The arrangement of FIG. 2 leads to an undue burden of complexity in that a large number of
filters (e.g. 22) must be provided, which is likely to significantly increase the computational cost.
The first technique for significantly reducing computational requirements by exploiting
symmetry is to use "shuffling" techniques. For a pair of channels, this represents applying a filter
to the sums and differences of the channels before combining. If the filters are arranged
symmetrically (i.e. left-left filter = right-right filter, left-right filter = right-left filter) in stereo, this
can reduce the calculation requirements by 50%. This technique can be represented by inserting
a linear matrix mix before and after the filter bank.
More generally speaking, as shown in FIG. 3, the realization structure 30 may consist of: * a set
of signals, each of a fixed number of inputs 31 * being a linear combination of input signals (Note
that the intermediate signal set may include the input signal itself and may include duplicate
signals). In an alternative embodiment, the matrix gain may change over time. * A series of filters
(eg 33) for each of the intermediate signals. These filters may be independent and thus may have
different structures, lengths and delays (eg, IIR, FIR, sparse tap IR, and low latency convolutions).
* A mixing matrix 35 for properly combining the filtered intermediate signals to produce two
headphone output signals 36.
A number of specific implementations encompassed by the general system of FIG. 3 are as
High End AC-3 Decoder As illustrated in FIG. 4, the Dolby® AC-3® standard defines a set of 5 (,
1) channels to be used as speaker input 41. doing.
These channels may be derived from an AC-3 bit stream data source using an AC-3 decoder.
Once decoded, the speaker input is suitable for use as the input 41 to the device 40 of FIG. 4 that
produces the headphone output 42. Each of the five speaker inputs is passed through a filter (eg
43, 44) for each ear and summed (eg 45) to produce a headphone signal, thus producing a total
of 10 filters .
Filters 43, 44 are provided to simulate corresponding virtual speaker arrays in the room utilizing
the techniques described above.
In order to achieve high levels of quality in simulations of virtual speaker arrays, rather long
filters are needed to take into account the spatial geometry of the listening environment.
In the case of an appropriate filter set (incorporating equalization for headphones and an
appropriate head related transfer function), the result provides a near perfect illusion of the set
of external speakers being used. However, depending on the application environment, the
processing requirements may be excessive.
By using 10 shorter filters and only 2 full length filters, it is possible to improve the 10-filter
design to reduce computational power without excessive quality degradation. The two longer
filters 47, 48 can be binaural simulations of the average room response tail. A combination of all
five speaker inputs is provided via analog adder 49 into binaural tail filters 47, 48 to provide an
approximation of the real room response. Each of the short filters (e.g. 43, 44) may be an initial
part of the response to that particular speaker to the listener's ear.
The filter lengths used in the prototype implementation were typically 2000 taps at a sampling
rate of 48 KHz for short filters (eg 43, 44) and 32000 taps for longer filters 47, 48 . Long filters
usually have lower bandwidth and can be implemented with delay times. This can take advantage
of the use of reduced sample rate processing to reduce computational requirements. The filter
uses a low latency convolution algorithm, such as that disclosed in commonly assigned US Pat.
No. 5,502,747, to reduce system latency and computational requirements. It can be realized.
In the simplest case, no filtering is used, and the filter simulates a virtual loudspeaker setup using
an acoustic modeling package such as a CATT acoustic instrument or a real or synthetic placed
inside a real loudspeaker array It can be obtained by using the head of
High-end AC-3 decoders provide fairly accurate simulations through the headphones of a virtual
speaker array, which simultaneously require a large amount of computational resources.
Low-End Stereo Decoder A low-end stereo decoder, such as illustrated at 50 in FIG. 5, is an
apparatus that utilizes only a portion of the features of the system with high-end computational
The main purpose is to manipulate the stereo input source for playback on the headphones 52 to
simulate the experience of listening to a well-configured stereo, giving an impression of the
sound emitted from around the listener .
The system of FIG. 5 is designed to be suitable for mass production at low cost; the more
important design issue is to reduce computational complexity.
As mentioned above, the general structure of the low end stereo decoder 50 has two inputs 51
for conventional stereo and two outputs for headphone signals. A bank of two filters is used with
a first filter 53 operating on the sum of the left and right signals output from the analog adder
55 and a second filter 54 operating on the difference signal output from the difference unit 56 .
The low end stereo decoder 50 is another example equivalent to the general implementation
described above. In this case, the matrix operation is a shuffle of two channels sum 55 and
difference 56. A filter is applied to the sum and difference signals to halve the calculation
requirements if the desired result has loudspeaker symmetry (ie L → L = R → R and L → R = R →
L) .
The performance of this system depends on the choice of filter coefficients. In order to reduce
the calculation requirements, ideally short filters are used. It has been found that the difference
filter can be made somewhat shorter than the sum filter and still give reasonable results.
The preferred form is a combination of head related transfer function for a speaker position of
30 ° with respect to the horizontal plane and a filter that is a semi-reverberation tail but quite
sparse. The configuration of the filter is as follows: The constructed impulse response as follows:
D: direct ear response normalized to unit energy; S: shadow ear response scaled in direct
proportion to D: R: unit energy And the following parameters: α: the presence or amount of
reverberation input in the mix; and for the sum and difference signals to generate new Sum 'and
Diff' signals The following pre-calculated filters can be applied:
In order to further reduce the required throughput, a fixed number of approximations can be
made to the filter set.
The shadow ear response may be approximated by a 5-tap FIR that matches the frequency
response and group delay of the exact signal derived by deconvoluting the ear response directly
from the properly shadowed response. About 20 sparse taps can approximate the reverberation
response from a 5-10 ms delay line.
According to this approach, it has been found that coefficients can be closely quantized and
proper performance can be maintained. The sum filter may be implemented as a set of 25 taps
from a 256 tap delay line (at 48 KHz), while the difference filter may be as few as 6 taps from a
30 tap delay line so that adequate results are obtained . Thus, the system can be realized with
about 3 million instructions per second (MIPS), thus making it suitable for low-cost mass
production and suitable for incorporation in other audio products using headphones. Become.
Further extensions to implementation 50 may include: Use of low delay convolutions to allow for
the possibility of longer filters. * Addition of additional inputs and similar budget processing to
enable simulation of "surround sound" format. For example, surround channels can be added that
simulate the presence of sounds behind or around the listener. Addition of asymmetric
components to provide better performance when the stereo signal has a large amount of mono
components in the mix. Addition of non-linear components to enhance performance (e.g. dynamic
range compressor to improve listening quality in noisy environments).
Thus, it can be seen that the first set of embodiments utilizes a unique combination of input mix
processing, filter and output mix processing to create a sensory impression of three dimensional
sound on headphones. The disclosed apparatus includes variations on computational complexity
and reduced memory requirements, thus resulting in significantly reduced implementation costs.
Filter structures and coefficients improve the directionality and depth of sound while minimizing
the increase in computational complexity. A simple HRTF approximation requires only a small
processing power which is significantly reduced from the normal 50-60 filter taps.
Features of significant HRTFs include: (a) A large principal energy component of the direct
response (short time approximation) and an approximation of the convolution mapping of the
direct response to the shadow or reflection response.
(B) Use of filter coefficients that include 5 to 10 ms sparse taps after approximately 50 to 100
The use of reverberation filters enhances the performance of HRTF approximations, normal
HRTFs and room impulse responses by increasing the localization and depth of sound.
(C) In a variant, the HRTF approximation can include coefficients to include anti-phase
components in the shadow response to improve localization behind.
(D) The filters of various embodiments include a first portion that provides directivity and
localization, and a second portion that provides minimal atmosphere and one that provides room
acoustic instruments. be able to.
The use of the delivery format of these embodiments provides great flexibility with regard to
optimal computing and tradeoffs between memory usage and performance.
One extension of the system 50 of FIG. 5 to Dolby AC-3 input may be as shown at 60 in FIG.
A central channel 61 is added 62, 63 to the left front and right rear channels, respectively.
The output signals are provided to delay units 64, 65, which may be 5-10 msec delay lines,
before being provided to HRTFs 67-69 which provide outputs for summing (70, 71) to the left
and right ears.
The rear signals 73, 74 are used to form a sum and difference signal 76, 77 which are supplied
to the HRTFs 79, 80 and the sum HRTFs 79 are supplied to the left and right sum units 70, 71,
the difference HRTF 80 is a sum unit 70, 71 are supplied in reverse phase.
Further variants are also possible. Turning now to FIG. 7, a first variation 90 of the general
configuration described above with reference to the general implementation shown in FIG. 3 is
illustrated. The device of FIG. 7 includes filters 91, 92 and a feedback path 93. The mixing matrix
94 remains a simple linear matrix with the ability to invert, scale, add and re-derivate its input
signal as required for a particular implementation. In another implementation, the outputs 93 of
the feedback filters 91, 92 are also input into the second mixing matrix (not shown) and
contribute directly to the output 98. In a more general arrangement, all filter outputs are fed
back to the first mixing matrix 94 where they are included or excluded from the mix. However, in
general, it is preferable to minimize the size of the mixing matrix 94.
The modified general configuration 90 allows for feedback paths 93 having more than recursive
elements in each filter. By providing the output of the reverberation filter built as part of the
filter 91, 92 through a filter array, eg 96, 97, a more realistic reverberation can be built up. A
filtered signal can be added to the filter feed signal prior to HRTF filtering. This is likely to impart
more relevant spatial components to the reverberation and improve the listening experience.
The reverberation generation filter 91, 92 may be a sparse tap FIR, a recursive algorithm filter or
a full convolution FIR. In all these cases, it may be beneficial to feedback the reverberation output
within the virtual speaker input. The result is most significant in low resource systems where
sparse tap FIR is used to simulate reverberation. At this time, it is assumed that the simulation of
the sparse tap reflection is emitted from the outside source of the listener, not from the
Referring now to FIG. 8, a further modified embodiment 100 similar to the embodiment 50 of
FIG. 5 is shown. This arrangement includes two sum and difference filters 101, 102, which are
short-time FIR approximations to the direct plus shadow and direct minus shadow HRTFs of the
two speakers localized at approximately 30 ° on either side of the listener. However, in the
apparatus 100 of FIG. 8, an additional signal is derived as the sum 103 of the two inputs and
provided to a single sparse tap reverberation FIR delay line 104. Two sparse tap outputs 105,
106 are derived from the set of coefficients in the FIR 104. This signal pair 105, 106 is then
added to the input stereo signal prior to the shuffling process 109 (107, 108). In this way, stereo
sparse tap reverberation is "binauralized".
The arrangement of FIG. 8 can be extended to a surround sound decoder similar to the
arrangement of FIG. Such an extension is illustrated in FIG. 9, but portion 111 is similar to that of
FIG. The arrangement of FIG. 9 provides a central speaker input 112 which should be a virtual
speaker panned in the middle between the left front and right front speakers. This is achieved
(113, 114) by adding the center feed input 112 to the left front and right front speaker inputs.
The rear speaker inputs 116, 117 have a shuffler 118 and a sum 119 and a difference filter 120
to approximate the HRTF response for speakers localized at 120 ° on either side of the listener's
head. The outputs are then mixed (122, 123) and fed into a single shuffler (124) to form a
binaural output. Each of the inputs is summed 126 to form a single monaural signal for
reverberation processing by the sparse tap reverberation FIR filter 127. The output of the
reverberation filter is then added to the front speaker input (113, 114). Although additional
reverberant signals can be added to the rear speaker input, it is generally advantageous for the
system to throw images forward to overcome psycho-acoustic frontal confusion and ascent.
Using only the front speaker position for reverberation helps to throw the image forward and
gives a more convincing front sound.
Referring now to FIG. 10, several terms are defined to better describe the derivation of the filter
values for the sparse filter reverberation FIR 127 of FIG. First of all, the direct HRTF is defined as
the transfer function from the virtual loudspeaker locations 130, 131 to the ear 132 on the same
side of that of the human head. The shadowed HRTF function is defined as the transfer function
from the virtual speaker locations, eg 130 and 131, to the ear 133 of the person on the other
side of the head. The actual HRTF measurement set can be used to approximate the filter.
Forward HRTFs can be measured from speakers placed 30 ° on either side of the front of the
listener. The backward HRTF can be measured from a speaker placed at 120 ° on both sides of
the listener. Preferably, the HRTFs are equalized to maximize sound quality with excellent voicing
The forward sum filter 128 of FIG. 9 is an approximation of the sum and direct and shadow
forward HRTFs. The implementation of the filter is direct form transfer functions (FIR) and (IIR)
with substantial FIR components that allow for non-minimum phase transfer functions. The order
of the system can be selected by calculating the approximation error for the FIR and IIR order
grids. Sum and difference filters are approximated with the order set at each point of the grid,
and then the errors in the direct and shadowed HRTFs are plotted. This is illustrated in FIGS. 11
and 12 for forward direct and shadow responses, respectively. Prony analysis was used for the
approximation. The plot shows the "knee" property, which is worse for some orders and worse.
The orders for the two forward filters may be selected based on this information. The effective
result is that the order of FIR is 14 and the order of IIR is 4.
The forward difference filter of FIG. 9 may be an approximation of forward direct HRTF minus
forward shadow HRTF. This approximation can be performed as described in the previous
paragraph, resulting in an FIR order of 14 and an IIR order of 4.
The backward sum filter 119 is an approximation of backward direct HRTF plus backward
shadow HRTF. The approximation can be performed as described for the forward filter. An FIR
order of 25 and an IIR order of 4 were selected.
The backward difference filter 120 is an approximation of backward direct HRTF minus
backward shadow HRTF. The approximation can be performed as described for the forward filter.
An FIR order of 25 and an IIR order of 4 were selected.
The reverberation filter long delay line 127 is supplied with the sum 126 (monaural signal) of all
the inputs. Two sets of sparse tap coefficients are used to produce two outputs from this delay
line. The delay line 127 may be as long or short as memory allows. For reasonable results, a
minimum length of about 300-400 taps is preferred. The two sets of sparse tap coefficients are
similar in characteristics but significantly different in value. In the first example, the actual taps
used were generated by a random process with the following constraints: * There are no taps in
the first 300-400 taps. This is to create a gap between the initial HRTF response and the first
initial echo. This is to prevent overlapping spatial locations in the initial HRTF. * The tap size
decreases over time. This is to model the attenuation of transmission through air and lossy
reflection. The reduction was dithered to provide some degree of randomness. This level of detail
is not necessary, but for longer filters, this produces a much more natural echoing outcome. *
The frequency of taps increases over time. This is to model the increasing density of the initial
echo as the path length increases and the possible paths to the listener increase.
Under these constraints, several sets of random coefficients were created, and a set was chosen
that appeared to be evenly spread (not overly concentrated) and produced a good sound. An
example of such a sparse tap filter is shown in FIG.
Although other methods and approximations for deriving the sparse tap coefficients can be used,
experiments have shown that this method is appropriate.
The basic property of the reverberation filter 127 is to produce two uncorrelated outputs that do
not have significant frequency coloration and contain information from the temporally dispersed
monaural input.
Thus, the filter may be a recursive low sample rate, or it may involve other sophisticated
processing if memory and computing power allow.
FIGS. 14 and 15 respectively show left and right impulse outputs from the reverberation filter
after passing through the forward HRTF as an example. It can be seen that a large amount of
detail is obtained in the output filter for a relatively small amount of computation and memory.
As mentioned above, in general the use of very long FIR filters allows achieving a very accurate
simulation of the three-dimensional acoustic space, but this involves the large size storage of
audio data and filter coefficients. Memory is required. In contrast, recursive (IIR) filter structures
require much less memory, and often also relatively less power, and can be used to implement
reverberation-like filter responses. Unfortunately, significantly reducing the memory storage
used within the IIR reverberator can result in a much more unsatisfactory three-dimensional
acoustical impression.
One approach taken in the creation of 3D binaural audio signals is to apply higher quality
processing (using higher order filter structures) on the initial part of the simulated acoustic
response. In this way, the processing of the direct sound (simulation of the signal path from the
virtual loudspeaker to the direct listener) and some initial reflections will be realized using
separate filter pairs for each sound arrival Become. In each pair, one filter is activated to generate
a left ear response, and the other filter is operated to generate a right ear response.
FIG. 16 shows a further example of the implementation. In this example system, all head related
transfer functions (HRTFs) are implemented using 50-tap FIR filter pairs. The two top filters 152,
153 in FIG. 16 process the input audio to simulate direct sound reaching the listener's two ears.
A pair of FIR filters (eg, 5 pairs) attached to delay line 160 process delayed input audio to
simulate the arrival of the initial echo in the virtual room at the two ears of the listener. Finally,
the reverberators, eg 156, 157, generate a plurality of uncorrelated reverberant signals, each
individually binauralized by FIR filter pairs 158, 159 taking their inputs therefrom.
In this example, the diffused 3D reverberation field impressions are each processed through
different HRTF FIR filters, eg 158, 159, arranged in such a way that the set of HRTF FIR filters
covers the large spread of the incident angle around the listener. This is achieved by utilizing a
number of reverberators (generally implemented with a recursive filter structure), eg 156, 157.
In practice, a system implementation such as that shown in FIG. 16 may use different FIR filter
lengths within each FIR filter.
Most of the overall processing requirements can be consumed in these FIR filter
implementations, and use shorter, approximated HRTFs, if possible, as a means of improving the
efficiency of the algorithm Can.
The HRTF filter does not have to have a duration of about 4 ms or more. The use of a 50 tap
filter (assuming a sample rate of 48 KHz) is merely an example.
FIG. 17 shows another implementation 170 of the three-dimensional sound processing system in
which the late reverberation portion is implemented using a pair of long FIR filters 171. In this
example (assuming a sample rate of 48 KHz), a 32 k-tap FIR filter allows simulation of acoustic
space with reverberation times up to 670 ms.
By using the real measured binaural acoustic response, the reverberation FIR filter of FIG. 17 can
provide a much more accurate three-dimensional acoustic impression than the recursive
reverberation structure used in FIG.
The long FIR filter used in the reverberation filter of FIG. 17 may be efficiently implemented
using techniques such as those described in commonly assigned US Pat. No. 5,502,747. it can.
Although the computational efficiency required in these filter implementations can be reduced by
using such techniques, the memory requirements are still very high.
Yet another embodiment describes a class of reverberator intended for the generation of binaural
reverberation, where a recursive filter is used to create a long impulse response, and the binaural
properties are paired. Is applied by using a medium length FIR filter.
FIG. 18 shows the general structure of yet another embodiment 180.
As mentioned above, FIR filters such as 181, delay lines 182 and summing elements 183 are
included for the purpose of simulating direct sound and initial echo. The moderate to late
reverberation portions of the three-dimensional acoustic response are provided by the binaural
reverberation processor 185.
Some desirable features of the Bannaural Reverberation Processor 185 are as follows: * The
cross correlation between the left and right channel impulse responses of the Binaural
Reverberation Processor 185 is the same as that of the real (measured) Binaural Room response
It should show similar characteristics. This should preferably include time-varying crosscorrelations, as occurs when the side energy components of the reverberation response grow in
the late part of the room response of some acoustic spaces. * The spectral density of the
reverberation response must follow the same approximate time-contour as that of the real
(measured) binaural room response. This problem is due to the fact that recursive filter loops act
to attenuate high frequencies more rapidly than low frequencies (for example) to simulate
absorption of air and other effects In the recursive reverberation processor this has already been
Several other structures have been proposed for implementation of the binaural reverberation
processor 185. FIG. 19 shows one preferred embodiment.
In principle, a single recursive filter can be used to generate the desired attenuated reverberation
profile of the acoustic space, and a single pair of FIR filters is used to add the diffusion binaural
property to the left and right outputs be able to. However, in practice, if there is a continuous
significant interchannel amplitude imbalance or frequency response irregularity within the FIR
filter, these will be noticeable at the output of the system. For this reason, a large number of
recursive filter structures 191 (each with its own FIR filter binaural pair, eg, 192, 193) are used
to provide a more random binaural response.
In yet another embodiment of the present invention, the two recursive filter structures of FIG. 19
are adapted in such a way that the upper recursive filter structure 190 has a longer
reverberation decay time than the lower recursive filter structure 191. ing. In this case, the
binaural properties of the lower FIR filter pair 194, 195 will govern the response of the system in
the initial part of the reverberation decay, and the binaural properties of the upper filters 192,
193 will be the system in the later part of the reverberation decay Will dominate the response.
FIG. 20 illustrates another embodiment 200, which now shows more recursive filter structures
201-204. In this system 200 shown in FIG. 20, if there is a possible imbalance between the left
and right filter coefficients used in the FIR filter, this is the case with each binaural filter pair (left
and right filters along its mirror image). Transfer functions are corrected by using the same
filtered binaural pair).
In a further arrangement 210 shown in FIG. 21, two mirror image pairs of FIR filters are realized
using a single pair of sum (example 211) and difference (212) filters. Thus, the computational
effort of the FIR is significantly reduced.
Yet another alternative embodiment 220 is shown in FIG. 22, where the output 221 of one FIR
filter is fed back into one or more of the recursive filter structures. This feedback path 221 also
makes it possible to realize a denser reverberation filter.
As mentioned above, the discussed embodiment takes a stereo input signal or alternatively, if
available, digital input signals such as Dolby Pro Logic, Dolby Digital (AC-3) and DTS or surround
sound input signals, Use one or more headphone sets for output. The input signal is binaurally
processed to improve the listening experience through headphones of various source material
thus making it an "out of head" sound or providing enhanced surround sound listening.
With knowledge of such processing techniques for producing out-of-head effects, it is possible to
provide a system for taking care of the processing in a number of different forms. For example,
numerous possible different physical embodiments are possible, and the final result may be
performed utilizing either analog or digital signal processing techniques or a combination of
In a purely digital implementation, the input data will be obtained in digital, time-sampled form.
If the embodiment is implemented as part of a digital audio device such as a compact disc (CD),
minidisc, digital video disc (DVD) or digital audio tape (DAT), the input data will already be
available in this form . If the unit is uniquely implemented as a physical device, it can include a
digital receiver (SPDIF or similar optical or electrical). If the invention is implemented such that
only an analog input signal is available, this analog signal must be digitized using an analog to
digital converter (ADC).
This digital input signal is then processed by a digital signal processor (DSP) programmed to
implement selected filters and mixing effects. Examples of usable DSPs include: Semi-custom or
full custom integrated circuits designed as application specific DSPs; 2. a programmable DSP
chip, eg Motorola DSP56002; One or more programmable logic devices.
In a standard implementation, the process may include the following major building blocks:
Derived from a measured or synthesized head related transfer function (HRTF) using a low
latency technique such as that described in commonly assigned US Pat. No. 5,502,747 Operation
with different filter characteristics.
2. Recursive filtering with infinite impulse response (IIR) for all or part of the impulse response
derived from the measured or synthesized HRTF.
3. A "sparse tap" finite impulse response (FIR) or IIR reverberation filter to simulate late
reflections present in a standard listening environment with speakers. A sparse tap FIR filter is
one in which most of the coefficients are zero and do not need to be calculated.
4. If the embodiment is to be used in conjunction with a specific set of headphones, filters can
be applied to compensate for any undesirable frequency response characteristics of these
After processing, the stereo digital output signal is converted to an analog signal using a digitalto-analog converter (DAC), amplified if necessary, and possibly led to the stereo headphone
output through other circuits. This last step may be performed inside the audio device if the
embodiment is embedded, or so if the embodiment is implemented as part of another device It
may be
The ADCs and / or DACs may likewise be incorporated on the same integrated circuit as the
processor. Embodiments can also be implemented such that some or all of the processing occurs
in the analog domain. Embodiments preferably have several methods of switching the
"binauralization" effect on and off, and methods of switching between equalizer settings for
different headphone sets or possibly including output volume Other changes in the process to be
controlled can be incorporated.
In one embodiment, the processing steps are incorporated into a adjustable CD or DVD player
instead of the skip protection IC. Many currently available CD players incorporate a "skip
protection" feature that buffers data read from the CD in random access memory (RAM). If a
"skip" is detected, ie the audio stream is interrupted due to an off-track in the unit's mechanics,
then this unit plays the data from the RAM while reading the data from the CD again. Can. This
skip protection is often implemented as a dedicated DSP, with the RAM either on-chip or off-chip.
This embodiment is implemented to be usable as an alternative to the skip protection processor
while minimizing the burden on existing designs. In this implementation, it will be implemented
as a full custom integrated circuit that performs the functions of both the existing skip protection
processor and the implementation of out-of-head processing. The portion of RAM that is already
included for skip protection can be used to implement the out-of-head algorithm for HRTF type
processing. A number of building blocks of the skip protection processor are available for the
processing described in connection with the present invention. An example of such an
arrangement is shown in FIG.
In yet another embodiment illustrated in FIG. 24, the process is incorporated into a digital audio
device (such as a CD, mini disc, DVD or DAT player) instead of a DAC. In this implementation,
signal processing is performed by a dedicated integrated circuit that incorporates a DAC. This
can be easily incorporated into digital audio and devices with only minor modifications to
existing designs, as integrated circuits can be virtually pin compatible with existing DACs.
In yet another embodiment illustrated in FIG. 25, the processing is incorporated into a digital
audio device (eg, a CD, mini disc, DVD or DAT player) as an additional stage in the digital signal
chain. In this implementation, the signal processing will be implemented by a dedicated or
programmable DSP that is internal to the digital audio device and inserted into the stereo digital
signal chain in front of the DAC.
In a further embodiment illustrated in FIG. 26, the processing is incorporated into an audio
device (e.g. a personal cassette player or a stereo radio receiver) as an additional means in the
chain of analog signal processing. In this embodiment, an ADC is used to use the analog input
signal. This embodiment will be fabricated on a single integrated circuit that incorporates the
ADC, DSP and DAC. It may also incorporate some analog processing as well. This can be easily
added into the analog signal chain in the existing design of cassette players and similar devices.
In a further embodiment illustrated in FIG. 27, the processing is implemented as an external
device for use with stereo input in digital form. This embodiment may be unique and present as a
physical unit, or may be integrated into the headphone set as described above. It can be powered
from a battery with the option to receive power from an external DC plug pack power supply.
The device takes digital stereo input in optical or electrical form, as available on some CD and
DVD players or the like. The input format may be SPDIF or similar, and the unit may support
surround sound formats such as Dolby Digital AC-3, DTS. It can also have an analog input as
described below. The processing is performed by some form of DSP. This is followed by the DAC.
If the DAC can not drive the headphones directly, an additional amplifier is added after the DAC.
This embodiment of the invention can be implemented on a custom integrated circuit
incorporating a DSP, a DAC and possibly a headphone amplifier.
Alternatively, this embodiment can be implemented as a unique physical unit or integrated into a
headphone set. This is powered from the battery with the option to accept power from an
external DC plug pack power supply. This device takes an analog stereo input which is converted
to digital data via an ADC. This data is then processed using the DSP and returned to analog via
the DAC. It is also possible to carry out part or all of the processing in the analog domain instead.
This implementation can be manufactured on a custom integrated circuit that incorporates an
ADC, DSP, DAC and possibly a headphone amplifier as well as any required analog processing
circuitry. This embodiment may incorporate a distance or "zoom" control that allows the listener
to vary the perceived distance of the sound source.
In yet another embodiment, the control mechanism is implemented as a slider control
mechanism. When this control mechanism is in its minimum position, the sound seems to come
from very close to the ear, and may actually be a flat, non-binauralized stereo. At the maximum
setting of this control scheme, the sound is perceived as coming from a certain distance. This
control mechanism can be varied between these limits to control the perceived "out of head"
nature of the sound. Starting the control mechanism at the minimum position and sliding it to the
maximum allows the user to adapt to the binaural environment more quickly than with a simple
binaural on / off switch.
Implementations of such control mechanisms include storing different sets of filter responses
measured while placing sound sources at different distances, and the processor changing the
current set of filter coefficients according to the current zoom control position or setting. .
As a further variation, one embodiment can be implemented as a comprehensive integrated
circuit solution adapted to a wide range of applications, including those described above.
This embodiment can be implemented as an integrated circuit incorporating some or all of the
building blocks mentioned in the above implementations.
This integrated circuit can be incorporated into virtually any audio equipment product with
headphone output.
It can also be the basic building block of any physical unit specifically manufactured as an
implementation of the invention. Such integrated circuits include control pins to allow the device
to operate in different modes (eg, analog or digital inputs) as well as ADCs, DSPs, DACs, memoryI2S stereo digital audio inputs, S / PDIF digital audio inputs, headphones It will include some or
all of the amplifiers.
Those skilled in the art can make numerous additional changes and / or modifications to the
present invention as shown in the specific embodiments without departing from the spirit or
scope of the invention as broadly described. You will understand that you can Accordingly, these
embodiments should be considered in all respects as illustrative and not restrictive.
While any other form may be included within the scope of the present invention, a preferred
form of the present invention will now be described, by way of example only, with reference to
the accompanying drawings. In the drawings, the operation of the system of the invention is
illustrated. 1 illustrates a generalized form of one embodiment. 2 illustrates a more detailed
schematic form of an embodiment. FIG. 6 illustrates a schematic of a converter from Dolby AC-3
to stereo headphones. 1 illustrates an embodiment from stereo input to stereo output in
schematic form. Figure 1 illustrates in schematic form one form of Dolby AC-3 input to stereo
output conversion in accordance with the present invention. Fig. 6 illustrates a modified general
embodiment. FIG. 7 illustrates a schematic of stereo mixing in a modified form. Figure 2
illustrates surround sound mixing in a modified form. We illustrate the calculation process of
direct and shadow responses. Illustrate the resulting direct response. Illustrate the resulting
shadow response. An example of a suitable reverberation sparse tap. An example of a suitable
reverberation filter. An example of a suitable reverberation filter. It illustrates how to realize
binauralization. The second known method for realizing binauralization is illustrated. The basic
overall structure of a further embodiment is illustrated. FIG. 19 illustrates a first implementation
of the binaural reverberation process of FIG. 18; 4 illustrates an implementation of a binaural
reverberation processor value. Fig. 14 illustrates yet another implementation of a binaural
reverberation processor. Figure 16 illustrates the use of feedback in yet another implementation
of a binaural reverberation processor. FIG. 1 illustrates one embodiment that includes a
binauralization device that replaces a skip protection DSP in a CD or DVD player. FIG. 1
illustrates one embodiment that includes a binauralization device that replaces a digital to analog
converter in a digital audio device. Figure 1 illustrates one embodiment that includes the
incorporation of a binauralization device into a digital audio device. Figure 1 illustrates one
embodiment that includes the incorporation of a binauralization device into an analog audio
device. 1 illustrates a stand-alone binauralization device. Fig. 6 illustrates various possible
physical implementations of a stand-alone binauralization device.
Explanation of sign
DESCRIPTION OF SYMBOLS 11 audio input 13 speaker input 15 headphone processing system
16 headphone 17, 18 virtual speaker 22, 23 impulse response filter 32, 35 mixing matrix 33
filter 43, 44, 47, 48 filter 49 analog adder 50 low end stereo decoder 52 headphone 53 , 54
filter 55 analog adder 56 difference unit 91, 92 filter 93 feedback path 94, 95 mixing matrix 96,
97 filter array 101 sum filter 102 difference filter 104 single sparse tap reverberation FIR delay
line 105, 106 sparse tap output 109 shuffle Ring process 112 Center speaker input 116, 117
Rear speaker input 118 Shuffler 119 Sum filter 120 Difference filter 124 Reference numeral 127
sparse tap reverberation FIR filter 130, 131 virtual speaker 132, 133 ear 152, 153 filter 156,
157 reverberator 158, 159 filter 160 delay line 171 FIR filter 181 FIR filter 182 delay line 182
183 addition element 185 binaural echo processor 190, 191 recursive filter structure 192, 193
upper FIR filter 194, 195 lower FIR filter 201, 202, 203, 204 recursive filter structure 211 sum
filter 212 difference filter 221 feedback path