close

Вход

Забыли?

вход по аккаунту

JP2012147413

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012147413
PROBLEM TO BE SOLVED: To provide a narrow directional audio reproduction processing
technique having directivity more sharply than that in the prior art while reproducing audio with
sufficient SN ratio and reproducing audio in any direction. . A filter is determined for a direction
to be subjected to audio reproduction, using a transfer characteristic a of the sound from each
speaker in each direction φ included in one or more directions assumed as the sound traveling
direction. Each transfer characteristic a is a direct sound in the direction φ, in which the sound
emitted from the speaker array and the traveling direction of the reflected sound where the
sound is reflected by the reflection object is parallel to the direct sound direction is dual sound It
is represented by the sum of the transfer characteristics of and the transfer characteristics of one
or more dual tones. A filter is applied to the frequency domain signal S in which the source signal
is converted to the frequency domain for each frequency to obtain the frequency domain signal X
of M channels. [Selected figure] Figure 3
Narrow directional voice reproduction processing method, device, program
[0001]
The present invention relates to a signal processing technology (narrow directional sound
reproduction processing technology) for reproducing sound in a narrow range including a
desired direction.
[0002]
As a situation of sound reproduction using a speaker, there is a situation where it is desired to
09-05-2019
1
reproduce sound at a sufficient volume in a specific direction.
For example, when playing a sound explaining the exhibit only in a limited area in front of the
exhibit in the exhibition hall, a voice prompting attention in a limited area such as the front of the
stairs or the edge of the station platform If you want to play. Such signal processing technology
(narrow directional sound reproduction processing technology) for reproducing sound in a
narrow range including a desired direction (target direction) viewed from the speaker has been
conventionally researched and developed. The relationship between the surroundings of the
speaker and the sound pressure of the sound emitted from the speaker (sound pressure
distribution) is called directivity, and as the directivity in a certain direction is sharper, the sound
in a narrower range including the direction is The sound pressure of the voice can be suppressed
in a range other than the range. Here, three conventional techniques relating to the narrow
pointing sound reproduction processing technique are illustrated. Note that, in this specification,
"voice" is not limited to human voices, but refers to "sounds" in general such as musical tones and
environmental noise as well as human and animal voices.
[0003]
[1] Narrow directional voice reproduction processing technology using physical characteristics
Representative examples of this category include horn speakers and parabola speakers. The horn
speaker is, for example, a speaker in which a hook-like horn whose cross-sectional area gradually
widens toward the opening end is attached in front of the speaker. The longer the horn length,
the sharper the directivity of the horn speaker. The parabolic speaker has a configuration in
which the speaker is disposed at the focal point of the parabolic plate (paraboloid surface), and a
straight line connecting the apex of the parabolic plate and the focal point of the parabolic plate
by radiating voice from the speaker toward the parabolic plate The voice is transmitted in the
direction of
[0004]
[2] Narrow-Directed Speech Reproduction Processing Technology Using Ultrasonic Wave As a
representative example of this category, a parametric speaker can be mentioned (see, for
example, Patent Document 1). The parametric speaker uses, as a carrier wave, a highly linear
ultrasonic wave, and emits, for example, a modulated wave whose amplitude is modulated by the
sound source signal at a high sound pressure. In the process of propagating the modulated wave
in air, the non-linear characteristic of air causes a distortion component, and this distortion
09-05-2019
2
component and the human auditory characteristic cause the audio in the audible band to appear.
[0005]
[3] Narrow-Directed Speech Reproduction Processing Technology Using Signal Processing A
typical example of this category is a phased speaker array (see, for example, Non-Patent
Document 1). The phased speaker array is a speaker array composed of a plurality of speakers,
and each speaker is a signal obtained by performing signal processing of applying a filter
including information of time difference and level difference to a sound source signal and
superimposing the signal To spatially radiate the voice, and as a result, the voice is reproduced in
the target direction.
[0006]
Unexamined-Japanese-Patent No. 2010-258938
[0007]
Haneda Yoichi, Kataoka Akitoshi, "Real Space Performance of Small-sized Speaker Array Based on
Multipoint Control Using Free Space Transfer Function", Proceedings of the Spring Meeting of
the Acoustical Society of Japan, pp. 631-632, 2008.
[0008]
According to the narrow directional sound reproduction processing technology described in
category [1], for example, as can be understood from the example of the horn speaker and the
parabola speaker, the voice can not be reproduced in the target direction unless the loudspeaker
itself is directed to the target direction. .
That is, when the direction of the purpose changes, if it does not depend on the physical activity
of a person, a drive control means for changing the direction of the horn speaker or parabola
speaker itself becomes necessary.
In addition, it is difficult for both the horn speaker and the parabolic speaker to realize narrow
directivity (for example, sharp directivity of about ± 5 ° to ± 10 ° with respect to the target
09-05-2019
3
direction) of, for example, a perspective angle of about 5 ° to 10 °. .
[0009]
According to the narrow directional audio reproduction processing technology described in
category [2], although excellent in terms of narrow directivity, audio can not be reproduced in
the target direction unless the parametric speaker itself is directed to the target direction. That is,
when the direction of the purpose changes, if it does not depend on the physical activity of a
person, a drive control means for changing the direction of the parametric speaker itself becomes
necessary. There are also issues that are still being considered for ultrasound exposure (whether
it is a high dose of ultrasound and there are no health problems).
[0010]
According to the narrow directional sound reproduction processing technology described in
category [3], in order to realize narrow directivity, it is necessary to increase the number of
speakers and to increase the array size (total length of the array). It is not realistic to increase the
array size indefinitely from the viewpoint of space restrictions for installing a phased speaker
array, cost, the number of speakers that can perform real-time processing, and the like. For
example, although the maximum value of signals that can be processed in real time with a
commercially available speaker is about 100, the directivity that can be realized with a phased
speaker array using about 100 speakers is ± 30 for the target direction It is difficult to
reproduce voice in the target direction with a sharp directivity of about ± 5 ° to about ± 10 °,
for example. Further, in the prior art of category [3], it is difficult to reproduce voice with a high
SN ratio toward the target direction so as not to be buried in voice in directions other than the
target direction.
[0011]
In view of such a current situation, the present invention reproduces the sound with a sufficient
SN ratio, and can reproduce the sound in any direction without requiring physical movement of
the speaker, but in the desired direction. It is an object of the present invention to provide a
narrow directional sound reproduction processing technique having directivity that is sharper
than before.
[0012]
09-05-2019
4
A filter is determined for a direction to be subjected to voice reproduction using the transfer
characteristics aφ of the sound from the M speakers in each direction φ included in one or
more directions assumed as the sound traveling direction [filter design processing].
M is an integer of 2 or more, and M speakers constitute a speaker array. (1) A sound emitted
from the speaker array, and (2) a sound whose reflection direction is the direction φ as the
reflected sound is reflected by the reflection object is a dual sound, and each transfer
characteristic aφ is It is represented by the sum of the transfer characteristics of the direct
sound in the direction φ and the transfer characteristics of one or more dual tones. The filter
converts the frequency domain signal S obtained by converting the sound source signal into the
frequency domain into the frequency domain signal X of M channels for each frequency. The
filter determined in the filter design process is applied to the frequency domain signal S for each
frequency to obtain the frequency domain signal X of M channels [filter application process]. The
M channel time domain signal x obtained by converting the M channel frequency domain signal
X into the time domain is normally reproduced by the speaker array.
[0013]
Each transfer characteristic aφ is, as a specific example, a sum of a steering vector of a direct
sound and each steering vector of one or more dual tones whose attenuation of sound due to
reflection and time difference relative to the direct sound of the reflected sound are corrected, or
It may be obtained by actual measurement in a real environment.
[0014]
In the filter design process, a filter may be obtained for each frequency so that the power of the
voice in the direction other than the direction of the voice reproduction target is minimized.
Alternatively, a filter may be obtained for each frequency so that the SN ratio in the direction to
be subjected to audio reproduction is maximized. Alternatively, with the filter coefficient for one
of the M speakers fixed at a constant value, the power of the voice in one or more directions
assumed as the voice traveling direction is minimized, for each frequency You may ask for a
filter.
09-05-2019
5
[0015]
Alternatively, in the filter design process, under the conditions of (1) full band pass of voice in a
direction to be subjected to voice reproduction and (2) full band suppression of voice to one or
more blind spots, A filter may be determined for each frequency so that the power of voice in
directions other than the target direction and each dead angle is minimized. Alternatively, the
filter may be determined for each frequency by normalizing the transfer characteristic as in the
direction φ = s to be an object of audio reproduction. Alternatively, the filter may be determined
for each frequency using a spatial correlation matrix represented by the transfer characteristic
aφ corresponding to each direction other than the direction to be an audio reproduction target.
Alternatively, under the condition that the amount of degradation of voice in the direction to be
subjected to voice reproduction is equal to or less than a predetermined amount, the power of
the voice in directions other than the direction to be voice reproduction is minimized. You may
ask for a filter. Alternatively, a filter may be determined for each frequency using a spatial
correlation matrix represented by a frequency domain signal obtained by converting a signal
obtained by observation by the microphone array into a frequency domain.
[0016]
According to the present invention, not only direct sound in the audio reproduction target
direction but also reflected sound is used, so reproduction is possible with a sufficiently large SN
ratio in the direction, and sound reproduction in the direction is performed by signal processing.
It is also possible to play audio in any direction without requiring physical movement of the
speaker. Further, the details will be described in the section of Principle described later, but
each transfer characteristic aφ is expressed by the sum of the transfer characteristic of direct
sound in the direction φ and the transfer characteristics of one or more dual tones. When
designing a filter based on general filter design criteria, it is possible to design a filter that
increases the degree of suppression of coherence that determines the wideness and narrowness
of the directivity of the sound reproduction target direction. That is, the directivity of the sound
reproduction target direction is sharper than that in the conventional case.
[0017]
(A) Diagram schematically showing that narrow directivity can not be realized sufficiently when
only direct sound is considered, (b) Schematically that narrow directivity can be realized
sufficiently when direct sound and reflected sound are considered Figure showing. FIG. 2
09-05-2019
6
illustrates the direction dependency of coherence according to the prior art and the principles of
the present invention. FIG. 2 is a block diagram showing the functional arrangement of a narrow
pointing voice reproduction processor according to the first embodiment; FIG. 7 is a diagram
showing a processing procedure of the narrow pointing voice reproduction processing method
according to the first embodiment. The figure which shows the structure of a 1st Example. The
figure which shows the experimental result of a 1st Example. The figure which shows the
experimental result of a 1st Example. The figure which shows the directivity by filter W <->>
((omega), (theta) in a 1st Example. The figure which shows the structure of a 2nd Example. The
figure which shows the experimental result of a 2nd Example. The figure which shows the
experimental result of a 2nd Example. The figure which shows the implementation structural
example of this invention. (A) Top view. (B) Front view. (C) Side view. (A) The side view which
shows another implementation structural example of this invention. (B) The side view which
shows another implementation structural example of this invention. The figure which shows the
use form in the implementation structural example shown in FIG.13 (b). The figure which shows
the implementation structural example of this invention. (A) Top view. (B) Front view. (C) Side
view. The side view which shows the implementation structural example of this invention. FIG. 7
is a block diagram showing the functional arrangement of a narrow pointing voice reproduction
processor according to the second embodiment; FIG. 7 is a view showing a processing procedure
of a narrow pointing voice reproduction processing method according to a second embodiment;
[0018]
<< Principle >> The principle of the present invention will be described. The present invention is
based on the essence of speaker array technology that can reproduce voice in any direction
based on signal processing, and reproducing voice with high SN ratio by actively using reflected
sound. However, one of the features is the combination of signal processing technology that
enables sharp directivity.
[0019]
Since signal processing in the frequency domain is mainly described, symbols are defined prior to
the description. Since the discrete frequency index ω has a relationship of ω = 2πf between the
frequency f and the angular frequency ω, the discrete frequency index ω may be identified with
the angular frequency ω. With respect to ω, let index of discrete frequency be simply
frequency ), and let k be the index of frame number. 1The filter is designed with S (ω, k) as
the frequency domain representation of the kth frame of the source signal of the channel, and
the direction θs as the direction of voice reproduction target from the center of the speaker
09-05-2019
7
array, and the frequency of the source signal is frequency ω A filter W for converting the domain
signal S (ω, k) to a frequency domain signal of M channel is W <→> (ω, θs), and a filter W <→
>> (ω is converted to the frequency domain signal S (ω, k) of the source signal , θs) and the
frequency domain signal of the M channel (hereinafter referred to as reproduction signal) is X
<→> (ω, k) = [X 1 (ω, k),..., X M (ω, k) ]. M is an integer of 2 or more. At this time, the
reproduction signal X <→> (ω, k) = [X 1 (ω, k),..., X M (ω, k)] of the kth frame is given by the
equation (1). H represents Hermite displacement. Note that the reproduction signal X <→> (ω, k)
= [X 1 (ω, k),. The time domain signals are reproduced by speakers corresponding to the
respective channels (details will be described later). The number of speakers is M.
[0020]
Although "the center of the speaker array" can be arbitrarily determined, generally, the geometric
center of the arrangement of M speakers is "the center of the speaker array", for example, a
linear speaker array (M speakers In the case of a speaker array arranged in a straight line, the
middle point of the speakers at both ends is taken as the center of the speaker array , and is a
flat speaker arranged in a square matrix of m × m (m <2> = M) In the case of an array, the
position at which the diagonals of the speakers at the four corners meet is taken as the "center of
the speaker array".
[0021]
There are various design methods for the filter W <→> (ω, θs), but here, the case based on the
minimum variance distortion response method (MVDR method) will be described.
In the minimum variance no-distortion response method, the filter W <→> (ω, θs) uses the
spatial correlation matrix Q (ω) under the constraint condition of equation (3) to generate
speech in a direction other than the target direction θs ( Hereinafter, the power of voice in a
direction other than the target direction θs is also referred to as leakage voice is designed
to be minimum at the frequency ω (see Equation (2)). a <→> (ω, θs) = [a1 (ω, θs),..., aM (ω,
θs)] <T>, assuming that the listening position is in the direction θs, the listening position and
the M speakers And the transfer characteristic at the frequency ω. T represents transposition. In
other words, a <→> (ω, θs) = [a1 (ω, θs),..., AM (ω, θs)] <T> represents the sound in the
direction θs from each speaker included in the speaker array. It is a transfer characteristic at
frequency ω. The spatial correlation matrix Q (ω) is a collected sound signal obtained by
observing with a microphone array composed of M microphones (preferably, a microphone array
in which the speakers included in the speaker array are replaced with microphones, respectively)
Can be expressed using a frequency domain signal obtained by converting the frequency domain
09-05-2019
8
into the frequency domain, but can also be expressed using a transfer characteristic. Hereinafter,
the case where the spatial correlation matrix Q (ω) is expressed using transfer characteristics for
a while will be described.
[0022]
It is known that the filter W <→> (ω, θs), which is the optimal solution of equation (2), is given
by equation (4). (Reference 1) Simon Haykin, Suzuki Hiro et al., "Adaptive Filter Theory", First
Edition, Science and Technology Publishing Co., Ltd., 2001. pp. 66-73, 248-255
[0023]
As can be seen from the inverse matrix of the spatial correlation matrix Q (ω) included in
equation (4), it can be seen that the structure of the spatial correlation matrix Q (ω) is important
in achieving sharp directivity. Further, it can also be understood from Equation (2) that the power
of leaked speech depends on the structure of the spatial correlation matrix Q (ω).
[0024]
The set to which the index p of the direction of propagation of leaked speech (propagation
direction) belongs is taken as {1, 2,..., P−1}. It is assumed that the index s of the target direction
θs does not belong to the set {1, 2,..., P−1}. At this time, the spatial correlation matrix Q (ω) is
given by equation (5a). Although it is preferable that P be a relatively large value from the
viewpoint of making a filter that achieves narrow directivity, it is an integer that satisfies P ≦ M.
Here, the target direction θs is described as if it is a specific direction from the viewpoint of
clearly explaining the principle of the invention (therefore, a direction other than the target
direction θs is the direction of leakage voice ), As will become clear in the later-described
embodiments, in practice, the target direction θs is an arbitrary direction that can be an object
of audio reproduction, and a plurality of directions are generally assumed as directions that can
be the target direction θs. From this point of view, the distinction between the target direction
θs and the direction of the leaked voice is almost subjective, and P is assumed as a plurality of
directions assumed as the voice traveling direction without distinction of the reproduced voice or
the leaked voice. It is more accurate to understand that the different directions are
predetermined, and one of the P directions selected is the target direction, and the other
direction is the direction of the leaked voice. Then, assuming that the union of the set {1, 2,...,
09-05-2019
9
P−1} and the set {s} is Φ, the spatial correlation matrix Q (ω) is included in a plurality of
directions assumed as the voice traveling direction. Transfer characteristics of the sound from
each speaker in each direction θφ represented by a <→> (ω, θφ) = [a1 (ω, θφ),..., AM (ω,
θφ)] <T> (φ∈φ) Is a spatial correlation matrix represented by equation (5b). Note that ¦ Φ ¦ =
P. ¦ Φ ¦ represents the number of elements of the set Φ.
[0025]
Here, the transfer characteristic a <→> (ω, θs) of the voice to the target direction θs and the
transfer characteristic a <→> (ω, to the direction p∈ {1, 2,..., P−1} It is assumed that θp) = [a1
(ω, θp),..., aM (ω, θp)] <T> are orthogonal to each other. That is, it is assumed that there are P
orthogonal basis sets satisfying the condition expressed by equation (6). The symbol 表 す
represents orthogonality. When A <→> ⊥B <→>, the inner product value of the vector A <→>
and the vector B <→> is zero. Here, it is assumed that P ≦ M is satisfied. Note that if it is
assumed that there are P basis sets that can be regarded as approximately orthogonal basis sets
by relaxing the condition represented by equation (6), then P is about M, or is somewhat larger
than M It is preferable that it is a value.
[0026]
At this time, the spatial correlation matrix Q (ω) can be expanded as shown in equation (7).
Equation (7) is a matrix V (ω) = [a <→> (ω, θs), a <→> (ω, θ1),. It means that the spatial
correlation matrix Q (ω) can be decomposed by →> (ω, θP−1)] <T> and the unit matrix Λ (ω).
ρ is an eigen value of the transfer characteristic a <→> (ω, θφ) satisfying the equation (6)
based on the spatial correlation matrix Q (ω) and is a real number.
[0027]
At this time, the inverse matrix of the spatial correlation matrix Q (ω) is given by equation (8).
[0028]
Substituting equation (8) into equation (2), it can be seen that the power of leaked speech is
minimized.
09-05-2019
10
When the power of leaked speech is minimized, directivity for the target direction θs is realized.
Therefore, the establishment of orthogonality between transfer characteristics in different
directions is an important condition for achieving directivity with respect to the target direction
θs.
[0029]
Hereinafter, the reason why it is difficult to achieve sharp directivity with respect to the target
direction θs in the prior art will be discussed.
[0030]
In the prior art, filters were designed on the assumption that the transfer characteristics
consisted of only direct sound.
In reality, the sound emitted from the speaker is reflected by a wall, a ceiling, etc., so there is a
reflected sound, but the reflected sound is considered as a factor that deteriorates the directivity,
and the presence of the reflected sound is ignored. Assuming that the steering vector of only
direct sound in the direction θ is h <→> d (ω, θ) = [hd1 (ω, θ),..., HdM (ω, θ)] <T>, the
transfer characteristic in the prior art a <→> conv (ω, θ) = [a1 (ω, θ),..., aM (ω, θ)] <T>, a <→>
conv (ω, θ) = h <→> d (ω , θ). The steering vector is a complex vector in which phase response
characteristics at the frequency ω of each speaker with respect to the reference point are
arranged with respect to the sound wave in the direction θ viewed from the center of the
speaker array.
[0031]
Assuming that speech is radiated as a plane wave from a linear speaker array, the m-th element
hdm (ω, θ) that constitutes the steering vector h <→> d (ω, θ) of the direct sound is, for
example, Given. m is an integer that satisfies 1 ≦ m ≦ M. c represents the speed of sound, and u
represents the distance between adjacent speakers. j is an imaginary unit. The reference point is
half the length of the linear loudspeaker array (the center of the linear loudspeaker array). The
direction θ is defined as an angle formed by the direction of the direct sound and the
arrangement direction of the speakers included in the linear speaker array as viewed from the
center of the linear speaker array (see FIG. 5). Note that there are various ways of expressing a
09-05-2019
11
steering vector. For example, assuming that the reference point is the position of a speaker at one
end of a linear speaker array, the m-th that constitutes the steering vector h <→> d (ω, θ) of
direct sound The element hdm (ω, θ) of is given by, for example, equation (9b). Hereinafter, the
m-th element hdm (ω, θ) that configures the steering vector h <→> d (ω, θ) of the direct sound
will be described as being given by equation (9a).
[0032]
The inner product value γconv (ω, θ) of the transfer characteristic in the direction θ and the
transfer characteristic in the target direction θs is expressed by equation (10). Note that θ ≠
θs.
[0033]
Hereinafter, γconv (ω, θ) is referred to as coherence. The direction θ in which the coherence
γconv (ω, θ) becomes 0 is given by equation (11). q is any integer except 0. Further, since 0
<θ <π / 2, the range of q is limited for each frequency band.
[0034]
In the equation (11), only the parameters (M and u) related to the size of the speaker array can
be changed. Therefore, when the difference in direction (angle difference) ¦ θ−θs ¦ It is
difficult to reduce the coherence γconv (ω, θ) without changing the parameters related to the
size. In this case, the power of the leaked voice does not become sufficiently small, and as
schematically shown in FIG. 1A, directivity becomes wide with a wide beam width with respect to
the target direction θs.
[0035]
On the other hand, according to the present invention, based on such a consideration, for filter
design to have sharp directivity with respect to the target direction θs, sufficient coherence is
obtained even when the difference in direction (angle difference) ¦ θ−θs ¦ Unlike the prior art,
it is characterized in that the reflected sound is actively considered, based on the finding that it is
important to be able to reduce the
09-05-2019
12
[0036]
Here, "dual tone" is defined.
(1) A sound radiated from a speaker array, (2) a sound that satisfies the condition that the sound
is reflected by a reflecting object and the traveling direction of the reflected sound becomes a
target direction I call it ".
[0037]
Assuming that the sound wave is a plane wave, sound (direct sound) that is sound from each
speaker of the speaker array and travels without reflection in any direction θ, and reflection
where dual sound is reflected by the reflector 300 Two types of plane waves with sound will be
directed. The number of reflections (or dual tones) is taken as Ξ. Ξ is one or more
predetermined integers. At this time, the transfer characteristics a <→> (ω, θ) = [a1 (ω, θ),...,
AM (ω, θ)] <T> are the direct sound transfer characteristics from the speaker array to the
direction θ and A sum of the transfer characteristics of one or more dual tones corresponding to
the direct sound, specifically, a time difference between the direct sound and the reflected sound
of the first (1 ≦ ξ ≦ Ξ) is τξ (θ), Assuming that αξ (1 ≦ ξ ≦ Ξ) is a coefficient for
considering the attenuation of sound due to reflection, the steering vector of direct sound and the
attenuation of sound due to reflection and the direct sound of reflected sound are as shown in
equation (12a) It can be expressed as the sum of the steering vectors of a pair of dual tones
whose time difference has been corrected. h <→> rξ (ω, θ) = [hr1ξ (ω, θ),..., hrMξ (ω, θ)]
<T> represents the steering vector of the dual tone corresponding to the direct sound in the
direction θ. αξ (1 ≦ ξ ≦ Ξ) is usually αξ ≦ 1 (1 ≦ ξ ≦ Ξ). For each reflected sound, if
the number of times the sound from the speaker array (dual sound) from the speaker array is
reflected by the reflector is one, then α ξ (1 ξ Ξ 音 は) is the sound of the object on which the
second dual sound is reflected. It can be considered as representing the reflectance of
[0038]
Preferably, one or more reflectors are present, as it is desirable for one or more reflected sounds
to be present for a loudspeaker array comprised of M speakers. From this point of view,
assuming that the listening position is in the target direction, the positional relationship between
09-05-2019
13
the listening position, the speaker array, and the one or more reflectors is that the sound from
the speaker array (the dual tone) is at least one reflection. It is preferable that each reflector be
disposed so as to be reflected by an object and reach the listening position. The shape of each
reflector is a two-dimensional shape (for example, a flat plate) or a three-dimensional shape (for
example, a parabolic shape). The size of each reflector is preferably equal to or greater than that
of the speaker array (approximately 1 to 2 times). In order to make effective use of the reflected
sound, the reflectance αξ (1 ≦ ξ ≦ Ξ) of each reflector is at least greater than 0, and further,
the amplitude of the reflected sound reaching the listening position is the amplitude of the direct
sound. For example, 0.2 times or more of is desirable, for example, each reflector is made into the
solid which has rigidity. The reflector may be a movable object (eg, a reflector) or an immovable
object (floor, wall or ceiling). Note that if an immovable object is set as a reflector, it is necessary
to change the steering vector of the dual sound as the installation position of the speaker array is
changed, etc. (See the function Ψ (θ) or Ψξ (θ) described later) ) And, consequently,
recalculation (re-setting) of the filter calculation is required. Therefore, in order to be robust
against environmental changes, it is preferable that each reflector be a follower of the speaker
array (in this case, it is considered that the assumed number of reflected sounds is due to each
reflector). become). Here, "a subject of the speaker array" is "a tangible object that can follow
changes in the position, orientation, etc. of the speaker array while maintaining the arrangement
relationship (geometrical relationship) with the speaker array. A simple example is a
configuration in which each reflector is fixed to a speaker array.
[0039]
Hereinafter, from the viewpoint of specifically explaining the advantages of the present
invention, it is assumed that 双 = 1, and the number of reflections of the dual sound is one, and
there is one reflector at a distance of L meters from the center of the speaker array. Assume. The
reflector is a thick rigid body. In this case, since Ξ = 1, equation (12a) can be expressed as
equation (12b) as a subscript representing this is abbreviated.
[0040]
The m-th element of the steering vector h <→> r (ω, θ) = [hr1 (ω, θ),..., HrM (ω, θ)] <T> of the
dual tone represents the steering vector of the direct sound (See equation (9a)) in the same
manner as in the above equation (13). The function Ψ (θ) outputs the traveling direction of the
dual tone viewed from the center of the speaker array. When the steering vector of the direct
sound is expressed by equation (9b), the steering vector of the dual sound h <→> r (ω, θ) = [hr1
(ω, θ),..., HrM (ω, θ) The m-th element of <T> is expressed by equation (13b). In general, the
09-05-2019
14
m-th steering vector h <→> rξ (ω, θ) = [hr 1ξ (ω, θ),. The element of is expressed by
equation (13c) or equation (13d). The function Ψξ (θ) outputs the traveling direction of the
second (1 ≦ ξ ≦ Ξ) dual tone as viewed from the speaker array.
[0041]
Since the position of the reflector can be set appropriately, the traveling direction of the dual
tone can be treated as a changeable parameter.
[0042]
Assuming that a flat reflector is in the vicinity of the speaker array (the distance L is not
extremely large compared to the size of the speaker array), the coherence γ (ω, θ) is expressed
by equation (14).
Note that θ ≠ θs.
[0043]
From equation (14), it can be seen that the coherence γ (ω, θ) of equation (14) may be smaller
than the conventional coherence γconv (ω, θ) of equation (11). Since the parameters (Ψ (θ)
and L) that can be changed depending on how the reflector is placed exist in the second to fourth
items of the equation (14), the first item h <→> d <H> (ω, θ) h) → d (ω, θ) may be removed.
[0044]
For example, when a flat reflector is arranged such that the arrangement direction of the
loudspeakers is the normal to the reflector for a linear loudspeaker array, then Ψ (θ) = π-θ
holds for the function Ψ (θ), and direct Since equation (15) holds for the time difference τ (θ)
between the sound and the reflected sound, the conditions of equations (16) and (17) are
generated in the elements constituting equation (14). The symbol * is an operator representing a
complex conjugate.
09-05-2019
15
[0045]
The absolute value of h <→> d <H> (ω, θ) h <→> r (ω, θ) is h <→> d <H> (ω, θ) h <→> d (ω,
θ) The coherence γ (ω, θ) can be approximated as in equation (18), ignoring the second and
third terms of equation (14), because it is sufficiently smaller than the above.
[0046]
Even if h <→> d <H> (ω, θ) h <→> d (ω, θ) ≠ 0, the approximate coherence γ ˜ (ω, θ) is the
minimal solution θ of equation (19) Have.
q is any positive integer. Also, the range of q is limited for each frequency band.
[0047]
That is, the coherence can be suppressed not only in the direction given by equation (11) but also
in the direction given by equation (19). If the coherence can be suppressed, the power of the
leaked speech can be further reduced, so that sharp directivity can be realized as schematically
shown in FIG. 1 (b).
[0048]
Although FIG. 1 schematically shows the difference in directivity according to the principle of the
present invention and the prior art, FIG. 2 is given by θ given by equation (11) and equation
(19). Specifically, the difference of the calculated θ is shown. ω = 2π × 1000 [rad / s], L = 0.70
[m], θs = π / 4 [rad]. In FIG. 2, the direction dependency of the normalized coherence is shown
for comparison of the two, and the direction indicated by the symbol ○ is θ given by equation
(11), and the direction indicated by the symbol + is It is θ given by equation (19). As apparent
from FIG. 2, according to the prior art, θ in which the coherence is zero with respect to θs = π
/ 4 [rad] is only the direction indicated by the symbol 、, but according to the principle of the
present invention θ with zero coherence for θ s = π / 4 [rad] exists in a number of directions
indicated by the symbol +, and in particular, θs = π / 4 [rad] than in the direction indicated by
the symbol ○ It can be seen that a sharp directivity is achieved compared to the prior art, as
there is a direction indicated by the symbol + in a much closer direction.
09-05-2019
16
[0049]
As apparent from the above description, the gist of the feature of the present invention is that the
transfer characteristic a <→> (ω, θ) = [a1 (ω, θ),..., AM (ω, θ)] <T> For example, as shown in
equation (12a), it is expressed by the sum of the steering vector of the direct sound and the
steering vector of 双 dual sounds. Therefore, since the filter design concept itself is not affected,
the filter W <→> (ω, θs) can be designed by a method other than the minimum variance nondistortion response method.
[0050]
As methods other than the above-mentioned minimum variance zero distortion response method,
filter design method by <1> SN ratio maximization criteria, <2> filter design method based on
Power Inversion, <3> one or more blind spots Filter design method with minimum variance and
no distortion response method with constraint condition (direction in which gain of leaked
speech is suppressed), <4> Filter design method by delay-and-sum beam forming method, <5>
The filter design method by the maximum likelihood method and the filter design method by the
<6> AMNOR (Adaptive Microphone-array for noise reduction) method will be described. Refer to
Reference 2 for the filter design method based on the <1> SN ratio maximizing criterion and the
filter design method based on <2> power inversion. <3> Refer to Reference 3 for the filter design
method by the minimum variance and no distortion response method which has one or more
dead angles (directions in which the gain of leaked speech is suppressed) as constraints. Refer to
reference 4 for the filter design method by <6> AMNOR (Adaptive Microphone-array for noise
reduction) method. (Reference 2) Nobuyoshi Kikuma, "Adaptive antenna technology", 1st edition,
Ohmsha Co., Ltd., 2003, pp. 35-90 (Reference 3) Asano Ta, "The Acoustical Society of Japan"
Acoustic Techno Series 16 Array signal processing of sound-localization, tracking and separation
of sound source-, first edition, Corona Co., Ltd., pp. 88-89, 259-261 (Reference 4) Yutaka Kanada,
"Adaptive noise suppression microphone array (AMNOR 44), The Japanese Acoustical Society
Journal, Vol. 44, No. 1 (1988), pp. 23-30).
[0051]
<1> Filter design method by SN ratio maximization criteria In filter design method by SN ratio
maximization criteria, filter W <→> (ω, θs) is a criterion that maximizes the SN ratio (SNR) in
the target direction θs. decide. A spatial correlation matrix of voice in the target direction θs is
09-05-2019
17
Rss (ω), and a spatial correlation matrix of voice in a direction other than the target direction θs
is Rnn (ω). At this time, SNR is expressed by equation (20). R ss (ω) is expressed by equation
(21), and R nn (ω) is expressed by equation (22). Transfer characteristic a <→> (ω, θs) = [a1
(ω, θs),..., AM (ω, θs)] <T> is represented by the equation (12a) (correctly, the equation (12a)
Of θ is θs)).
[0052]
The filter W <→> (ω, θs) that maximizes the SNR in Equation (20) can be obtained by setting
the gradient for the filter W <→> (ω, θs) to zero, that is, Equation (23) .
[0053]
Thus, the filter W <→> (ω, θs) that maximizes the SNR of equation (20) is given by equation
(24).
[0054]
Although the inverse matrix of the spatial correlation matrix R nn (ω) of speech in directions
other than the target direction θ s is included in the equation (24), the inverse matrix of R nn
(ω) It is known that the inverse of the spatial correlation matrix Rxx (ω) of the entire input
including speech in directions other than the direction θs may be substituted.
In addition, it is Rxx ((omega)) = Rss ((omega)) + Rnn ((omega)) = Q ((omega)) (refer Formula (5a),
Formula (21), Formula (22)).
That is, the filter W <→> (ω, θs) that maximizes the SNR in equation (20) may be determined by
equation (25).
[0055]
<2> Filter Design Method Based on Power Inversion In the filter design method based on power
inversion, the filter W with a criterion for minimizing the average output power of the
beamformer with the filter coefficient for one speaker fixed at a constant value →> Determine
(ω, θs). Here, as an example, the filter coefficient for the first speaker among the M speakers is
09-05-2019
18
described as being fixed. In this design method, filters W <→> (ω, θs) are assumed to be
omnidirectional (progressive direction of speech from the speaker array) using spatial correlation
matrix Rxx (ω) under the constraint condition of equation (27) To minimize the power of speech
in all directions (see equation (26)). Transfer characteristic a <→> (ω, θs) = [a1 (ω, θs),..., AM
(ω, θs)] <T> is represented by the equation (12a) (correctly, the equation (12a) Of θ is θs)). In
addition, it is Rxx ((omega)) = Q ((omega)) (refer Formula (5a), Formula (21), Formula (22)).
[0056]
It is known that the filter W <→> (ω, θs) which is the optimal solution of the equation (26) is
given by the equation (28) (see reference 2).
[0057]
<3> Filter Design Method by Minimum Variance No-Distortion Response Method with One or
More Dead Angles in Constraint Condition In the above-mentioned minimum variance nodistortion response method, all voices in the target direction θs are expressed as expressed by
equation (3) Bandpass is a constraint condition, and the average output power of the beamformer
expressed by equation (2) is minimized (that is, the power of leaked voice which is voice in
directions other than the target direction is minimized) The filter W <→> (ω, θs) was designed
with the criterion under a single constraint of.
According to this method, although it is possible to suppress the power of leaked speech as a
whole, it is not necessarily a preferable method when it is desired to strongly suppress the sound
propagation in one or more specific directions. In such a case, a filter that strongly suppresses
one or more known specific directions (i.e., blind spots) is required. Therefore, in the filter design
method described here, (1) all band pass of the voice in the target direction θs, and (2) known B
(B is a predetermined integer of 1 or more) dead angles θN1, θN2 The average output power of
the beamformer represented by equation (2) is minimized with the constraint of the whole band
suppression of the speech of..., ΘNB (that is, the speech in the direction excluding the target
direction and each dead angle) Find the filter with the smallest power). As described above,
assuming that the set to which the index φ in the propagation direction of speech belongs is {1,
2,..., P}, Nj∈ {1, 2, ..., P} (where j∈ {1, 2,. ..., B}), B ≦ P−1.
[0058]
09-05-2019
19
At this time, a <→> (ω, θi) = [a1 (ω, θi),..., AM (ω, θi)] <T>, and the listening position in the
direction θs is the direction θNj (where j∈ { Transfer characteristics at a frequency ω between
the direction θi (where i, {s, N1, N2, ..., NB}) and M speakers, assuming that there is a dead angle
at 1, 2, ..., B}) In other words, a <→> (ω, θi) = [a1 (ω, θi),..., AM (ω, θi)] <T> represents the
sound from each speaker included in the speaker array in the direction θi In the transfer
characteristic at the frequency ω of, the constraint condition is expressed by equation (29).
However, for the index i, iε {s, N1, N2,..., NB}, and the transfer characteristics a <→> (ω, θi) =
[a1 (ω, θi), ..., aM (ω, θi) ] <T> is represented by Formula (12a) (it is specifically, the θ of
Formula (12a) is θi). fi (ω) represents the pass characteristic at the frequency ω with respect to
the direction θi.
[0059]
When Expression (29) is expressed in matrix form, it can be expressed, for example, as
Expression (30). However, A <→> (ω, θs) = [a <→> (ω, θs), a <→> (ω, θN1),..., A <→> (ω,
θNB)].
[0060]
Considering the constraint of (1) full band pass of the voice in the target direction θs and (2) full
band suppression of the known B blind spots θN1, θN2, ..., θNB voice, ideally fs It should be
(ω) = 1.0, fi (ω) = 0.0 (i∈ {N1, N2,..., NB}). This represents the full band complete passage of the
voice in the target direction θs, and the full band complete rejection of the known B blind spots
θN1, θN2,..., ΘNB voice. However, in reality, it may be difficult to control full band complete
pass or full band complete blocking. In such a case, the absolute value of fs (ω) may be set to a
value close to 1.0, and the absolute value of fi (ω) (i∈ {N1, N2, ..., NB}) may be set to a value
close to 0.0 . Of course, fi (ω) and fj (ω) (i ≠ j, i, j∈ {N1, N2,..., NB}) may be equal or different.
[0061]
According to the filter design method described here, the filter W <→> (ω, θs), which is the
optimum solution of the equation (2) under the equation (29) representing the constraint
condition, is given by the equation (31) See reference 3).
[0062]
09-05-2019
20
<4> Filter Design Method by Delayed Combining Method According to the delayed combining
method, assuming that a direct sound or a reflected sound propagates as a plane wave, the filter
W <→> (ω, θs) is given by Expression (32).
That is, the filter W <→> (ω, θs) is obtained by normalizing the transfer characteristic a <→>
(ω, θs). Transfer characteristic a <→> (ω, θs) = [a1 (ω, θs),..., AM (ω, θs)] <T> is represented
by the equation (12a) (correctly, the equation (12a) Of θ is θs)). According to this design
method, the filter accuracy may not always be good, but the amount of calculation can be small.
[0063]
<5> Filter Design Method by Maximum Likelihood Method In the above-mentioned minimum
variance and no distortion response method, the degree of freedom for suppressing leaked
speech by not including the spatial information of the voice in the target direction in the spatial
correlation matrix Q (ω) Can improve the power of leaked speech even more. Therefore, in the
filter design method described here, the spatial correlation matrix Q (ω) is expressed by the
second term on the right side of Equation (5a), that is, Equation (5c). The filter W <→> (ω, θs) is
given by the equation (4) or the equation (31). At this time, Q (ω) included in the equation (4) or
the equation (31), or Rxx (ω) = Q (ω) included in the equation (25) or the equation (28) is
expressed by the equation (5c) Space correlation matrix.
[0064]
<6> Filter Design Method by AMNOR Method The AMNOR method determines the amount of
voice deterioration D in the target direction based on the trade-off relationship between the
amount of voice deterioration D in the target direction and the power of noise remaining in the
filter output signal. (Eg, keep the degradation amount D below a certain threshold D ^), [a]
transfer between the sound source and the microphone to a virtual signal in the target direction
(hereinafter referred to as virtual target signal) The filter output signal is the least square when
the mixed signal of the characteristic-applied signal and the noise [b] (for example, obtained by
observation with M microphones in a noise environment without voice in the target direction) is
input This is a method of obtaining a filter that reproduces the virtual target signal best from the
viewpoint of error (that is, the power of noise contained in the filter output signal is minimized).
[0065]
09-05-2019
21
The filter design method described here can be considered the same as the AMNOR method
except that the input and output of the filter are reversed.
That is, based on the trade-off relationship between the voice degradation amount D in the target
direction and the power of leaked voice remaining in the filter output signal, the voice
degradation amount D in the target direction is allowed to some extent (for example, the
degradation amount The filter output signal when the frequency domain signal S (ω, k) of the
sound source signal is input is kept from the viewpoint of the least square error, and the
frequency domain signal S (ω , k) are sought the best (that is, the power of leaked speech
contained in the filter output signal is minimized). The filter output signal is a signal (hereinafter
referred to as a listening signal) obtained by acting on the frequency domain signal S (ω, k) the
transfer characteristic at the frequency ω of the voice from the speakers included in the speaker
array to the target direction θs. And [b] noise (eg, obtained by observation with M microphones
in a noise environment).
[0066]
According to the filter design method described here, the filter W <→> (ω, θs) is given by the
equation (33) as in the AMNOR method (see Reference 4). R ss (ω) is expressed by equation (21),
and R nn (ω) is expressed by equation (22). Transfer characteristic a <→> (ω, θs) = [a1 (ω,
θs),..., AM (ω, θs)] <T> is represented by the equation (12a) (correctly, the equation (12a) Of θ
is θs)).
[0067]
Ps is a coefficient for weighting the level of the listening signal, and is referred to as a listening
signal level. The listening signal level Ps is a constant independent of frequency. The listening
signal level Ps may be determined on the basis of a rule of thumb, or determined so that the
difference between the amount of degradation D of the voice in the target direction and the
threshold D ^ falls within an arbitrarily defined error range. May be An example of the latter will
be described. At frequency ω, the frequency response F (ω) of the voice in the target direction
θs of the filter W <→> (ω, θs) is expressed by equation (34). Denoting the degradation amount
D when using the filter W <→> (ω, θs) given by the equation (33) as D (Ps), the degradation
amount D (Ps) is defined by the equation (35). ω 0 represents the upper limit of the target
frequency ω (usually, the high frequency side frequency adjacent to the discrete frequency ω).
09-05-2019
22
The amount of deterioration D (Ps) is a monotonically decreasing function of Ps. Therefore, an
error in which the difference between the amount of deterioration D (Ps) and the threshold D ^ is
arbitrarily determined by repeatedly determining the amount of deterioration D (Ps) while
changing Ps due to the monotonicity of D (Ps) The listening signal level Ps falling within the
range can be obtained.
[0068]
<Modifications> In the above description, the spatial correlation matrices Q (ω), Rss (ω), Rnn
(ω) are expressed using transfer characteristics. However, as described above, spatial correlation
matrices Q (ω), Rss (ω), Rnn (ω) are obtained using frequency domain signals obtained by
converting analog signals obtained by observation with a microphone array into frequency
domains. Can also be expressed. Although the spatial correlation matrix Q (ω) will be described
below, the same applies to Rss (ω) and Rnn (ω) (Q (ω) may be replaced with Rss (ω) or Rnn
(ω)). The spatial correlation matrix R ss (ω) is obtained by frequency domain representation of
an analog signal obtained by observation with a microphone array (including M microphones) in
an environment where only voice in the target direction exists, and spatial correlation The matrix
R nn (ω) is obtained by frequency domain representation of an analog signal obtained by
observation with a microphone array (including M microphones) in an environment without
speech in the target direction (ie noise environment).
[0069]
The spatial correlation matrix Q (ω) using the frequency domain signal U <→> (ω, k) = [U1 (ω,
k),..., UM (ω, k)] <T> is represented by equation (36) Be done. The operator E [·] is an operator
that represents a statistical averaging operation. When the discrete time series of an analog
signal received by M microphones is regarded as a stochastic process, the operator E [•] is an
arithmetic mean value (expected value) when it is a so-called broad-based stationary or quadratic
stationary state It becomes an operation. In this case, the spatial correlation matrix Q (ω) is, for
example, frequency domain signals U <→> (ω, ki) (i = 0, 1,...) Of the total number of current and
past frames stored in a memory or the like. , ζ−1) to be expressed by equation (37). When i = 0,
that is, the kth frame is the current frame. Note that the spatial correlation matrix Q (ω)
according to Equations (36) to (37) may be recalculated for each frame, or may be recalculated at
regular or irregular intervals, or It may be calculated before implementation of the embodiment
described later (in particular, when Rss (ω) or R nn (ω) is used for filter design, the frequency
domain acquired before implementation of the embodiment) Preferably, the spatial correlation
matrix Q (ω) is calculated in advance using the signal). When recalculating the spatial correlation
09-05-2019
23
matrix Q (ω) for each frame, since the spatial correlation matrix Q (ω) depends on the present
and past frames, the space is explicitly expressed as in equation (36a) or equation (37a) Let the
correlation matrix be denoted Q (ω, k).
[0070]
Since the filter W <→> (ω, θs) also depends on the present and past frames when the spatial
correlation matrix Q (ω, k) expressed by the equation (36a) or the equation (37a) is used, Let W
be expressed as W <→> (ω, θs, k). At this time, the filter represented by any one of the
equations (4), (24), (25), (28), (31), and (33) described in the various filter design methods
described above W <→> (ω, θs) is corrected to the expressions (4m), (24m), (25m), (28m),
(31m), and (33m) in expression.
[0071]
First Embodiment The functional configuration and the processing flow of the first embodiment
of the present invention are shown in FIGS. 3 and 4. The narrow directional audio reproduction
processing device 1 according to the first embodiment includes an AD conversion unit 210, a
frame generation unit 220, a frequency domain conversion unit 230, a filter application unit
240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.
Including.
[0072]
[Step S1] In advance, the filter design unit 260 calculates filters W <→> (ω, θi) for each
frequency for each of the discrete directions that can be targets for audio reproduction.
Assuming that the total number of discrete directions that can be an object of voice reproduction
is I (I is a predetermined integer greater than or equal to 1 and satisfies I ≦ P), W <→> (ω, θ1),.
<→> (ω, θi), ..., W <→> (ω, θI) (1 i i I I, ω ∈ Ω; i is an integer and Ω is a set of frequencies ω)
are calculated in advance is there.
[0073]
09-05-2019
24
For this purpose, except in the case described in the above-mentioned <Modification>, the
transfer characteristics a <→> (ω, θi) = [a 1 (ω, θi),..., A M (ω, θi)] <T> It is necessary to
determine (1 i i 必要 I, ω ∈ Ω), this is the arrangement of the speakers in the speaker array, the
relative positions of the reflectors such as the reflector, floor, wall and ceiling with respect to the
speaker array, direct sound Can be calculated specifically by equation (12a) based on
environmental information such as the time difference between the first and the second (1 (ξ Ξ)
reflected sound and the reflectance of the sound of the reflector (precisely, equation (12a) Of θ
is θi). Note that, in the case of the filter design method based on the minimum variance nodistortion response method having the above-described <3> one or more dead angles as a
constraint condition, the transfer characteristics a <→> (ω, θi) (1 ≦ i ≦ I, ω) It is desirable that
the index i in the direction in which ∈Ω) be obtained covers all of the indexes N1, N2, ..., NB in
the direction of at least B blind spots. In other words, the indices N1, N2,..., NB in the direction of
the B blind spots are set as any different integer from 1 to I inclusive.
[0074]
The number 反射 of the reflected sound (or the dual tone) is set to an integer satisfying 1 ≦ Ξ,
but there is no particular limitation on the value of Ξ, and the value may be set appropriately
according to the calculation capability and the like. When one reflector is placed in the vicinity of
the speaker array, the transfer characteristic a <→> (ω, θi) can be specifically calculated by the
equation (12b) (exactly, θ of the equation (12b) It is considered as θi).
[0075]
For the calculation of the steering vector, for example, the equations (9a), (9b), (13a), (13b),
(13c), and (13d) can be used. As the transfer characteristics used for filter design, for example,
transfer characteristics obtained by actual measurement in a real environment may be used
without depending on the equation (12a) or the equation (12b).
[0076]
Then, except for the case described in the above-mentioned <Modification>, the transfer
characteristic a <→> (ω, θi) is used to, for example, the equation (4), the equation (24), the
equation (25) and the equation (28) W <→> (ω, θi) (1 ≦ i ≦ I) is obtained by any of the
equation (31), the equation (32) and the equation (33). When the equation (4), the equation (25),
09-05-2019
25
the equation (28), or the equation (31) is used except for the case of the filter design method
using the <5> maximum likelihood method described above, the spatial correlation matrix Q ( ω)
(or Rxx (ω)) can be calculated by equation (5b). When the equation (4), the equation (25), the
equation (28), or the equation (31) is used according to the filter design method by the abovementioned <5> maximum likelihood method, the spatial correlation matrix Q (ω) (or Rxx (ω))
can be calculated by equation (5c). When Expression (24) is used, the spatial correlation matrix
Rnn (ω) can be calculated by Expression (22). The filters W <→> (ω, θi) (1 ≦ i ≦ I, ω∈Ω) are
stored in the storage unit 290. ¦ Ω ¦ represents the number of elements of the set Ω.
[0077]
[Step S2] The sound source 200 outputs a sound source signal ss (t). In the first embodiment, the
sound source signal ss (t) from the sound source 200 is assumed to be an analog signal.
However, digital signals can also be used as sound source signals.
[0078]
[Step S3] The AD conversion unit 210 AD converts the sound source signal ss (t) into a digital
signal s (t). Here, t represents an index of discrete time. When the digital signal is a sound source
signal, it is not necessary to perform the process of step S3, and the sound source signal can be
regarded as s (t) which is an output signal of the AD conversion unit 210.
[0079]
[Step S4] The frame generation unit 220 receives the digital signal s (t) output from the AD
conversion unit 210, stores N samples in a buffer, and outputs a digital signal s (k) in frame units.
k is an index of a frame number. s (k) = [s ((k-1) N + 1),..., s (kN)]. N depends on the sampling
frequency, but in the case of 16 kHz sampling, around 512 points are appropriate.
[0080]
[Step S5] The frequency domain conversion unit 230 converts the digital signal s (k) of each
frame into a signal S (ω, k) in the frequency domain and outputs it. ω is the index of the discrete
09-05-2019
26
frequency. Although there is a fast discrete Fourier transform as one of the methods for
converting time domain signals into frequency domain signals, the present invention is not
limited to this, and other methods for converting into frequency domain signals may be used. The
frequency domain signal S (ω, k) is output for each frequency ω and frame k.
[0081]
[Step S6] The filter application unit 240 applies filters W <→> (ω, θs corresponding to the
target direction θs to be reproduced to the frequency domain signal S (ω, k) for each frequency
ω∈Ω for each frame k. Is applied to output reproduced signal X <→> (ω, k) = [X 1 (ω, k),..., X
M (ω, k)] (see equation (38)). Since the index s of the target direction θs is sε {1,..., I} and the
filter W <→> (ω, θs) is stored in the storage unit 290, for example, each time the process of
step S6 The filter application unit 240 may obtain the filter W <→> (ω, θs) corresponding to the
target direction θs to be reproduced from the storage unit 290. When the index s of the target
direction θs does not belong to the set {1,..., I}, that is, when the filter W <→> (ω, θs)
corresponding to the target direction θs is not calculated in the process of step S1. The filter
design unit 260 may temporarily calculate the filter W <→> (ω, θs) corresponding to the target
direction θs, or the filter W <→> (ω corresponding to the direction θs ′ closer to the target
direction θs , θs ′) may be used.
[0082]
[Step S <b> 7] The time domain conversion unit 250 outputs the reproduced signal X <→> (ω, k)
= [X 1 (ω, k),..., X M (ω, k)] of each frequency ω∈Ω of the k-th frame. Is converted to the time
domain to obtain the frame unit time domain signal x <→> (k) = [x1 (k),..., XM (k)] of the kth
frame, and the obtained frame unit time domain Time domain in which the signal x <→> (k) = [x1
(k),..., XM (k)] is connected in the order of the frame number index and the voice is emphasized
toward the target direction θs which is the reproduction direction The signal x <→> (t) = [x1
(t),..., XM (t)] is output. The method of converting the frequency domain signal into the time
domain signal is an inverse transform corresponding to the conversion method used in the
process of step S5, and is, for example, high-speed discrete inverse Fourier transform.
[0083]
[Step S8] The M-channel time domain signals x1 (t),..., XM (t) are the speakers corresponding to
09-05-2019
27
the channel among the M speakers 280-1,. Played with. That is, the time domain signal xm (t) of
the mth (1 ≦ m ≦ M) channel is reproduced by the mth speaker 280-m.
[0084]
There is no limitation on the arrangement of the M speakers. It may be an array configuration in
which the speakers are linearly arranged as in a linear speaker array, or may be an array
configuration in which M speakers are two-dimensionally or three-dimensionally arranged. Also,
in order to widen the direction that can be set as the reproduction direction, the directivity of
each speaker has directivity capable of reproducing audio with a certain sound pressure in a
direction that can be the target direction θs that is the reproduction direction. You should have
been. Therefore, a speaker having relatively moderate directivity such as a nondirectional
speaker or a unidirectional speaker is preferable.
[0085]
Here, the first embodiment is described in which the filter W <→> (ω, θi) is calculated in
advance in the process of step S 1, but the reproduction direction is determined according to the
calculation processing capability of the narrow directional audio reproduction processing device
1. It is also possible to adopt an embodiment in which the filter design unit 260 calculates the
filter W <→> (ω, θs) for each frequency after the target direction θs is determined.
[0086]
Second Embodiment The functional configuration and processing flow of a second embodiment
of the present invention are shown in FIGS. 17 and 18.
The narrow directional audio reproduction processing device 2 according to the second
embodiment includes an AD conversion unit 210, a frame generation unit 220, a frequency
domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a
filter calculation unit 261, a storage unit 290, It includes an AD conversion unit 310, a frame
generation unit 320, and a frequency domain conversion unit 330.
[0087]
09-05-2019
28
[Step S11] The sound source 200 outputs a sound source signal ss (t). In the second embodiment,
the sound source signal ss (t) from the sound source 200 is assumed to be an analog signal.
However, digital signals can also be used as sound source signals.
[0088]
[Step S12] The AD conversion unit 210 AD converts the sound source signal ss (t) into a digital
signal s (t). Here, t represents an index of discrete time. When the digital signal is a sound source
signal, it is not necessary to perform the process of step S12, and the sound source signal can be
regarded as s (t) which is an output signal of the AD conversion unit 210.
[0089]
[Step S13] The frame generation unit 220 receives the digital signal s (t) output from the AD
conversion unit 210, stores N samples in a buffer, and outputs a digital signal s (k) in frame units.
k is an index of a frame number. s (k) = [s ((k-1) N + 1),..., s (kN)]. N depends on the sampling
frequency, but in the case of 16 kHz sampling, around 512 points are appropriate.
[0090]
[Step S14] The frequency domain conversion unit 230 converts the digital signal s (k) of each
frame into a signal S (ω, k) in the frequency domain and outputs it. ω is the index of the discrete
frequency. Although there is a fast discrete Fourier transform as one of the methods for
converting time domain signals into frequency domain signals, the present invention is not
limited to this, and other methods for converting into frequency domain signals may be used. The
frequency domain signal S (ω, k) is output for each frequency ω and frame k.
[0091]
[Step S <b> 15] The filter calculation unit 261 uses the filter W <→> (ω, θs, k) (ω; Ω; Ω is the
frequency of ω for each frequency corresponding to the target direction θs, which is used in the
current k-th frame. Set) is calculated.
09-05-2019
29
[0092]
For this purpose, it is necessary to prepare the transfer characteristics a <→> (ω, θs) = [a1 (ω,
θs),..., AM (ω, θs)] <T> (ω∈Ω). This is the arrangement of the speakers in the speaker array,
the relative positions of the reflectors such as the reflector, floor, wall, and ceiling with respect to
the speaker array, the time difference between the direct sound and the first (1 ≦ 1 ≦ ξ)
reflected sound, and the reflection Formula (12a) can be specifically calculated based on
environmental information such as the reflectance of the sound of an object (precisely, θ in
formula (12a) is θs).
Note that, in the case of the filter design method based on the minimum variance no-distortion
response method having the above-described <3> one or more dead angles as a constraint
condition, the transfer characteristic a <→> (ω, θNj) (1 ≦ j ≦ B, ω) It is also necessary to
determine ∈Ω), but these are the arrangement of speakers in the speaker array, the relative
positions of the reflectors such as the reflector, floor, wall, and ceiling with respect to the speaker
array, and direct sound and the first (1 ≦ ξ ≦ ξ). Can be calculated specifically by equation
(12a) based on environmental information such as the time difference from the reflected sound
of) and the reflectance of the sound of a reflector (precisely, θ in equation (12a) is θNj). ).
[0093]
The number 反射 of the reflected sound is set to an integer satisfying 1 ≦ 1, but there is no
particular limitation on the value of よ い, and the value may be set appropriately according to
the calculation capability and the like. When one reflector is placed in the vicinity of the speaker
array, the transfer characteristic a <→> (ω, θs) can be specifically calculated by the equation
(12b) (correctly, θ of the equation (12b) It is considered as θs). In this case, similarly, the
transfer characteristics a <→> (ω, θNj) (1 ≦ j ≦ B, ω∈Ω) can be specifically calculated by the
equation (12b) (precisely, θ of the equation (12b) And θNj).
[0094]
For the calculation of the steering vector, for example, the equations (9a), (9b), (13a), (13b),
(13c), and (13d) can be used. As the transfer characteristic used for the filter design, the transfer
characteristic obtained by measurement in an actual environment may be used, for example,
without depending on the equation (12a) or the equation (12b).
09-05-2019
30
[0095]
Then, the filter calculation unit 261 determines whether the transfer characteristic a <→> (ω,
θs) (ω∈Ω) or the transfer characteristic a <→> (ω, θNj) (1 ≦ j ≦ B, ω∈Ω, if necessary. (Ω,
θs, k) (ω ∈ Ω), the equation (4 m), the equation (24 m), the equation (25 m), the equation (28
m), the equation (31 m), the equation It asks according to either of (33 m). The spatial correlation
matrix Q (ω) (or Rxx (ω)) can be calculated by, for example, equation (36a) or equation (37a).
For calculation of the spatial correlation matrix Q (ω), frequency domain signals X <→> (ω, ki) (i
= 0, 1,...,) Of the current and past total frames accumulated in the storage unit 290 ζ-1) is used.
[0096]
The frequency domain signal X <→> (ω, k) is accumulated in the storage unit 290 as follows.
Sound is collected using M microphones 300-1,..., 300-M constituting a microphone array. The
arrangement of the M microphones is preferably the same as the speaker array. The AD
conversion unit 310 converts an analog signal (pickup signal) collected by the M microphones
300-1, ..., 300-M into a digital signal x <→> (t) = [x1 (t), ..., Convert to xM (t)]. t represents an
index of discrete time. The frame generation unit 320 receives the digital signal x <→> (t) = [x1
(t),..., XM (t)] output from the AD conversion unit 310, and stores N samples in a buffer for each
channel. The digital signal x <→> (k) = [x <→> 1 (k),..., X <→> M (k)] in frame units is output. k is
an index of a frame number. x <→> m (k) = [x m ((k−1) N + 1),..., x m (k N)] (1 ≦ m ≦ M). N
depends on the sampling frequency, but in the case of 16 kHz sampling, around 512 points are
appropriate. The frequency domain conversion unit 330 converts the digital signal x <→> (k) of
each frame into a signal X <→> (ω, k) = [X 1 (ω, k),..., X M (ω, k) of the frequency domain.
Convert to] and output. ω is the index of the discrete frequency. Although there is a fast discrete
Fourier transform as one of the methods for converting time domain signals into frequency
domain signals, the present invention is not limited to this, and other methods for converting into
frequency domain signals may be used. The frequency domain signal X <→> (ω, k) is output for
each frequency ω and frame k and stored in the storage unit 290.
[0097]
[Step S16] The filter application unit 240 generates a filter W <→> (ω, θs corresponding to the
target direction θs to be reproduced to the frequency domain signal S (ω, k) for each frequency
ω∈Ω for each frame k. , k) are applied, and the reproduction signal X <→> (ω, k) = [X 1 (ω,
k),..., X M (ω, k)] is output (see equation (39)).
09-05-2019
31
[0098]
[Step S17] The time domain conversion unit 250 outputs the reproduction signal X <→> (ω, k) =
[X1 (ω, k),..., XM (ω, k)] of each frequency ω∈Ω of the k-th frame. Is converted to the time
domain to obtain the frame unit time domain signal x <→> (k) = [x1 (k),..., XM (k)] of the kth
frame, and the obtained frame unit time domain Time domain in which the signal x <→> (k) = [x1
(k),..., XM (k)] is connected in the order of the frame number index and the voice is emphasized
toward the target direction θs which is the reproduction direction The signal x <→> (t) = [x1
(t),..., XM (t)] is output.
The method of converting the frequency domain signal into the time domain signal is an inverse
transform corresponding to the conversion method used in the process of step S14, and is, for
example, high-speed discrete inverse Fourier transform.
[0099]
[Step S18] The M-channel time domain signals x1 (t),..., XM (t) are the speakers corresponding to
the channel among the M speakers 280-1,. Played with. That is, the time domain signal xm (t) of
the mth (1 ≦ m ≦ M) channel is reproduced by the mth speaker 280-m.
[0100]
Experimental results according to Embodiment 1 of the present invention (minimum variance nostrain response method under single constraint conditions) will be described. As shown in FIG. 5,
24 nondirectional speakers are linearly arranged, and the reflector 300 is arranged such that the
arrangement direction of the linear speaker array is the normal to the reflector 300. Although
the shape of the reflecting plate 300 is not limited, a flat reflecting plate which has a flat
reflecting surface and a size of 1.0 m × 1.0 m, a suitable thickness, and a rigidity is used. The
distance between adjacent speakers was 4 cm, and the reflectance α of the reflection plate was
0.8. The target direction θs was set to 45 degrees. Assuming that speech is radiated as a plane
wave from a linear speaker array, the transfer characteristic is calculated by equation (12b) (see
equation (9a) and equation (13a)) to verify the directivity of the generated filter . As a
comparative object, the conventional method (the minimum dispersion zero distortion response
09-05-2019
32
method without a reflector) described in the above-mentioned Non-Patent Document 1 was used.
[0101]
The experimental results are shown in FIG. 6 and FIG. As compared with the conventional
method, it can be seen that Embodiment 1 of the present invention achieves sharp directivity in
the target direction in any frequency band. In particular, the utility of the present invention is
understood as the lower frequency band (the human voice contains many frequency components
of about 100 Hz to about 2 kHz). Further, FIG. 8 shows directivity by the filter W <→> (ω, θ)
generated according to the first embodiment of the present invention. It can be seen from FIG. 8
that not only direct sound is transmitted in the target direction θs = 45 degrees, but also sound
is transmitted in the direction in which the reflecting plate 300 is placed.
[0102]
Further, as shown in FIG. 9, the same as in the above-described experiment, the reflector 300 is
disposed such that the angle between the arrangement direction of the speakers included in the
linear loudspeaker array and the plane of the reflector 300 is 45 degrees. Experiment was
conducted. The target direction θs was set to 22.5 degrees, and the other experimental
conditions were the same as in the case where the reflection plate 300 was arranged such that
the arrangement direction of the speakers included in the linear speaker array was the normal to
the reflection plate 300.
[0103]
The experimental results are shown in FIG. 10 and FIG. As compared with the conventional
method, it can be seen that Embodiment 1 of the present invention achieves sharp directivity in
the target direction in any frequency band. In particular, the utility of the present invention is
understood as the lower frequency band.
[0104]
Next, an example of the implementation configuration of the present invention will be described
09-05-2019
33
with reference to FIGS. 12 to 16. Although the speaker array configuration is illustrated as a
linear speaker array in these examples, it is not limited to the linear speaker array configuration.
[0105]
In the embodiment shown in FIG. 12, the M speakers 280-1,..., 280-M constituting the linear
speaker array are fixed to the rectangular flat support member 400, and in this state the
loudspeaker holes of the respective speakers Are arranged in a plane (hereinafter referred to as
an opening surface) of the support member 400 (M = 13 in the illustrated example). In addition,
the wiring connected to each speaker 280-1, ..., 280-M is not shown in figure. The reflector 300
is fixed to the end of the support member 400 such that the arrangement direction of the
speakers 280-1, ..., 280-M is the normal to the rectangular flat reflector 300. The opening
surface of the support member 400 is a surface that makes an angle of 90 degrees with the
reflection plate 300. In the embodiment shown in FIG. 12, the preferable property of the
reflecting plate 300 is the same as the property of the reflecting material described above, and
the property of the supporting member 400 is not particularly limited, and each speaker 280-1,
..., It is sufficient if it has enough rigidity to fix the 280-M firmly.
[0106]
In the embodiment shown in FIG. 13A, the shaft portion 410 is fixed to the end portion of the
support member 400, and the reflection plate 300 is rotatably attached to the shaft portion 410.
According to this implementation, it is possible to change the geometry of the reflector 300
relative to the loudspeaker array.
[0107]
In the embodiment shown in FIG. 13 (b), two more reflecting plates 310 and 320 are added to
the embodiment shown in FIG. The properties of the two added reflectors 310 and 320 may be
the same as or different from the properties of the reflector 300. Further, the property of the
reflecting plate 310 may be the same as or different from the property of the reflecting plate
320. Hereinafter, the reflection plate 300 is referred to as a fixed reflection plate 300. The shaft
510 is fixed to the end of the fixed reflection plate 300 (the end opposite to the end of the fixed
reflection plate 300 fixed to the support member 400), and the reflection plate 310 is rotated to
the shaft 510. It is attached freely. The shaft 520 is fixed to the end of the support 400 (the end
09-05-2019
34
opposite to the end of the support 400 to which the fixed reflection plate 300 is fixed), and the
reflection plate 320 is fixed to the shaft 520. It is attached rotatably. The reflectors 310 and 320
are hereinafter referred to as movable reflectors 310 and 320, respectively. According to the
embodiment shown in FIG. 13B, for example, when the position of the movable reflection plate
310 is set so that the reflection surface of the fixed reflection plate 300 and the reflection
surface of the movable reflection plate 310 coincide with each other, the fixed reflection plate
300 and the movable reflection plate The combination of the reflectors 310 can function as a
reflector having a larger reflective surface than the fixed reflector 300. Further, according to the
embodiment shown in FIG. 13B, by setting the movable reflecting plates 310 and 320 at
appropriate positions, for example, as shown in FIG. 14, the supporting member 400, the fixed
reflecting plate 300, and the movable reflecting plate Since sound can be reflected many times in
the space surrounded by 310 and 320, the number of reflected sounds can be controlled. In the
case of the embodiment shown in FIG. 13B, since the support member 400 plays a role as a
reflector, it preferably has the same properties as the properties of the reflector described above.
[0108]
The embodiment shown in FIG. 15 is different from the embodiment shown in FIG. 12 in that the
reflector plate 300 is also provided with a speaker array (a linear speaker array in the example
shown). In the embodiment shown in FIG. 15, the arrangement direction of the M speakers fixed
to the support member 400 and the arrangement direction of the M ′ speakers fixed to the
reflection plate 300 are on the same plane, but It is not limited to the arrangement configuration
(M '= 13 in the illustrated example). For example, M ′ speakers may be fixed to the reflection
plate 300 so as to have an arrangement direction orthogonal to the arrangement direction of the
M speakers fixed to the support member 400. According to the embodiment shown in FIG. 15,
the speaker array provided on the support member 400 and the reflector 300 (the reflector 300
is used as a reflector without using the speaker array provided on the reflector 300) A
combination of the present invention in combination or support member 400 (using the support
member 400 as a reflector without using the speaker array provided on the support member
400) and the speaker array provided on the reflector 300 To implement the present invention.
[0109]
Further, as an extended implementation configuration example of the implementation
configuration example shown in FIG. 15, similarly to the implementation configuration example
shown in FIG. 13B, a configuration in which two more reflecting plates 310 and 320 are added to
the implementation configuration example shown in (See FIG. 16). Further, although not shown,
09-05-2019
35
at least one of the movable reflecting plates 310 and 320 may be provided with a speaker array.
The loudspeaker holes of the speakers constituting the speaker array provided in the movable
reflection plate 310 are disposed, for example, in the plane (opening surface) of the movable
reflection plate 310 that can face the opening surface of the support member 400. The
loudspeaker holes of the speakers constituting the speaker array provided in the movable
reflection plate 320 are disposed, for example, in the plane (opening surface) of the movable
reflection plate 320 which can form the same plane as the opening surface of the support
member 400. Even in such an implementation configuration example, the same usage form as
the implementation configuration example shown in FIG. 13B is possible. Further, according to
this embodiment, for example, when the position of the movable reflection plate 320 is set so
that the opening surface of the support member 400 and the opening surface of the movable
reflection plate 320 coincide with each other, the combination of the support member 400 and
the movable reflection plate 320 The speaker array provided on the support member 400 can
function as a larger speaker array. Also in the embodiment shown in FIG. 16, in the embodiment
in which the speaker array is provided on at least one of the movable reflecting plates 310 and
320, the same usage as the embodiment shown in FIG. 14 is possible. Further, in the embodiment
shown in FIG. 16 as well, in the embodiment in which the speaker array is provided at least one
of the movable reflectors 310 and 320, for example, the movable reflectors 310 and 320 are
used as ordinary reflectors, It is also possible to use the speaker array provided on the support
member 400 and the speaker array provided on the fixed reflection plate 300 as an integral
speaker array. In this case, this embodiment is equivalent to an embodiment using a speaker
array composed of (M + M ') speakers and two reflectors.
[0110]
When a speaker array is provided on the movable reflection plate 310, the loudspeaker holes of
the speakers constituting the speaker array provided on the movable reflection plate 310 are
opposite to the plane of the movable reflection plate 310 which can face the opening surface of
the support member 400. The movable reflection plate 310 may be provided with a speaker
array so as to be disposed in a plane (opening surface). When the movable reflector plate 320 is
provided with a speaker array, the loudspeaker holes of the speakers constituting the speaker
array provided in the movable reflector plate 320 can form the same plane as the opening
surface of the support member 400. The movable reflector 320 may be provided with a speaker
array so as to be disposed in a plane (opening plane) opposite to the plane. Of course, at least one
of the movable reflecting plates 310 and 320 may be provided with a speaker array on the
movable reflecting plate so as to have an opening on both sides thereof.
[0111]
09-05-2019
36
[A] In the case where the speaker array is provided on at least one of the movable reflecting
plates 310 and 320, the opening face of the movable reflecting plate 310 is a flat surface that
can face the opening face of the support member 400 Assuming that the opening surface of 320
is a plane capable of forming the same plane as the opening surface of the support member 400,
the opening surface of the movable reflection plate 310 and / or the movable reflection plate
320 with respect to the viewing direction in the use configuration shown in FIG. The movable
reflective plate 310 and / or the movable reflective plate 320 is provided so that the apparent
array size in the sight line direction is reduced by arranging the movable reflective plate 310 and
/ or the movable reflective plate 320 so as not to be visible. By using the speaker array, the same
effect as increasing the array size can be obtained.
[0112]
[B] In the case where the speaker array is provided on at least one of the movable reflecting
plates 310 and 320, the opening surface of the movable reflecting plate 310 is a plane opposite
to the plane which can face the opening surface of the support member 400. When the opening
surface of the movable reflection plate 320 is a plane opposite to the plane which can be formed
on the same plane as the opening surface of the support member 400, in the usage form shown
in FIG. The same effect as increasing the array size can be obtained while maintaining the size.
[0113]
When at least one of the movable reflecting plates 310 and 320 is provided with a speaker array
on the movable reflecting plate so as to have opening surfaces on both sides thereof, it is also
possible to obtain the effects of both [A] and [B]. It is possible.
[0114]
<Application Example> Hereinafter, a service example in which the narrow pointing voice
reproduction technology according to the present invention is useful will be described.
[0115]
A first example is audio reproduction with digital signage.
According to the present invention, since the voice can be provided only in a narrow range in a
specific direction than in the past, the advertisement can be conveyed only to people in the range
without causing any trouble to the surroundings.
09-05-2019
37
[0116]
A second example is application to a video conference system (which may be a voice conference
system).
According to the present invention, the present invention can provide audio only to a narrow
range in a specific direction when a conference is performed under a situation where a room
dedicated to a TV conference can not be prepared, so that the conference can be It can be carried
out.
[0117]
<Hardware Configuration Example of Narrow-Direction Sound Reproduction Processing Device>
The narrow-direction sound reproduction processing device according to the above-described
embodiment includes an input unit to which a keyboard can be connected, an output unit to
which a liquid crystal display can be connected, CPU (Central Processing) Unit] [a cache memory
or the like may be provided. Memory, RAM (Random Access Memory) and ROM (Read Only
Memory), external storage devices as hard disks, and exchange of data between these input units,
output units, CPU, RAM, ROM, and external storage devices It has a bus etc. to connect as
possible.
In addition, if necessary, a device (drive) capable of reading and writing a storage medium such
as a CD-ROM may be provided in the narrow directional audio reproduction processing device.
Examples of physical entities provided with such hardware resources include general purpose
computers.
[0118]
The external storage device of the narrow-oriented voice reproduction processing device stores a
program for reproducing voice toward a narrow range including the target direction, data
09-05-2019
38
required for processing the program, etc. [External storage device] For example, the program
may be stored in the ROM, which is a read only storage device. ]. In addition, data and the like
obtained by the processing of these programs are appropriately stored in a RAM, an external
storage device, and the like. Hereinafter, a storage device that stores data, an address of the
storage area, and the like will be simply referred to as a storage unit .
[0119]
A program for obtaining a filter for each frequency using a spatial correlation matrix expressed
by Equations (5a) to (5b) in a direction to be an object of audio reproduction in the storage unit
of the narrow-oriented audio reproduction processing device A program for performing AD
conversion on an analog signal, a program for performing frame generation processing, a
program for converting a digital signal for each frame into a frequency domain signal in the
frequency domain, and a direction for audio reproduction A program for applying the
corresponding filter to the frequency domain signal for each frequency to obtain a reproduction
signal and a program for converting the reproduction signal into a time domain signal are stored.
[0120]
In the narrow-oriented voice reproduction processing device, each program stored in the storage
unit and data necessary for processing the each program are read into the RAM as needed, and
interpreted and processed by the CPU.
As a result, narrow directional audio reproduction is realized by the CPU realizing predetermined
functions (filter design unit, AD conversion unit, frame generation unit, frequency domain
conversion unit, filter application unit, time domain conversion unit).
[0121]
<Supplement> The present invention is not limited to the above-described embodiment, and
various modifications can be made without departing from the spirit of the present invention. For
example, in the above-described embodiment, it is assumed that the sound wave travels as a
plane wave, but the sound wave may travel as a spherical wave. In this case, the steering vector is
changed to a representation according to the spherical wave. Further, the processing described in
the above embodiment may be performed not only in chronological order according to the order
09-05-2019
39
of description but also may be performed in parallel or individually depending on the processing
capability of the device that executes the processing or the necessity. .
[0122]
Further, when the processing function in the hardware entity (narrow directional audio
reproduction processing device) described in the above embodiment is realized by a computer,
the processing content of the function that the hardware entity should have is described by a
program. Then, by executing this program on a computer, the processing function of the
hardware entity is realized on the computer.
[0123]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto-Optical
disc) etc., as semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only
Read Memory) etc. Can be used.
[0124]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, this program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
computer via a network.
[0125]
09-05-2019
40
For example, a computer that executes such a program first temporarily stores a program
recorded on a portable recording medium or a program transferred from a server computer in its
own storage device. Then, at the time of execution of the process, the computer reads the
program stored in its own recording medium and executes the process according to the read
program. Further, as another execution form of this program, the computer may read the
program directly from the portable recording medium and execute processing according to the
program, and further, the program is transferred from the server computer to this computer
Each time, processing according to the received program may be executed sequentially. In
addition, a configuration in which the above-described processing is executed by a so-called ASP
(Application Service Provider) type service that realizes processing functions only by executing
instructions and acquiring results from the server computer without transferring the program to
the computer It may be Note that the program in the present embodiment includes information
provided for processing by a computer that conforms to the program (such as data that is not a
direct command to the computer but has a property that defines the processing of the computer).
[0126]
Further, in this embodiment, the hardware entity is configured by executing a predetermined
program on a computer, but at least a part of the processing content may be realized as
hardware.
09-05-2019
41
1/--страниц
Пожаловаться на содержимое документа