JP2014187685

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014187685
Abstract: To provide a sound collection device whose device size for a given directivity
performance is smaller than that of the prior art. That is, if the device scale is the same as that of
the prior art sound collection device, a sound collection device having higher directivity
performance is provided, and if the directivity performance is the same as the sound collection
device of the prior art, a smaller sound collection device is provided. Do. A sound collection
device includes a plurality of microphones. The sound collection device is configured such that
the transfer characteristic between the plurality of microphones and the sound source is changed
according to the correlation between the plurality of microphones and the reflection unit made of
a material capable of reflecting sound. And a movable control unit that changes the orientation or
arrangement. [Selected figure] Figure 1
Sound pickup device
[0001]
The present invention relates to a beamforming technique using an array device configured of a
plurality of microphones and speakers. In particular, the present invention relates to a
beamforming technique based on diffusion sensing that reveals the optimum nature of the
transfer characteristic between the microphone and the sound source.
[0002]
03-05-2019
1
Non-patent documents 1 and 2 are known as prior art of speech enhancement technology based
on spread sensing using a microphone array. In Non-Patent Documents 1 and 2, a pseudo diffuse
sound field is generated by a reflective structure, and a microphone array is installed therein to
realize diffuse sensing.
[0003]
K. Niwa, S. Sakauchi, K. Furuya, M. Okamoto, and Y. Haneda, "Differted sensing for sharp
directivity microphone array", ICASSP 2012, 2012, pp. 225 -228 K. Niwa, Y. Hioka, K. Furuya,
and Y. Haneda, "Telescopic microphone array using reflector for segmentation target from noises
in the same direction", ICASSP 2012, 2012, pp. 5457-5460
[0004]
However, in the prior art, in order to generate a pseudo diffusive sound field by placing the
reflecting structure near the microphone array, the scale of the apparatus tends to be large. This
is because the correlation between the volume of the reflection structure and the reverberation
time is high, and the longer the reverberation time, the closer to the diffuse sound field. For
example, Non-Patent Document 2 constructs a reflective structure of such a size as to fit in a
sphere with a diameter of 1 meter. However, the device scale is often limited in advance by the
application to be applied. If the volume is limited, the correlation of the transfer characteristics
becomes high, and the directivity performance is degraded.
[0005]
A first aspect of the present invention aims to provide a sound collection device whose device
size for a given directivity performance is smaller than that of the prior art. That is, if the device
scale is the same as that of the prior art sound collection device, a sound collection device having
higher directivity performance is provided, and if the directivity performance is the same as the
sound collection device of the prior art, a smaller sound collection device is provided. The
purpose is to In this specification, the term "sound" is not limited to human voices, but refers to
general sounds such as musical tones and environmental noise as well as human and animal
voices.
03-05-2019
2
[0006]
Furthermore, in order to reduce the correlation of transfer characteristics within a limited
volume, it is necessary to devise the device configuration. For example, devising the shape of the
reflecting structure, attaching a mechanism that induces diffusion in the reflecting structure,
attaching the movable part to the microphone or the reflecting structure and moving it according
to the situation of the sound field We think that it can be realized by means such as using an
array combining different microphones. However, since the devices considered in the above
implementation are considered to have a large number of patterns, it is difficult to determine
which device pattern is appropriate.
[0007]
Another aspect of the present invention aims to introduce a quantity for evaluating the
correlation of transfer characteristics in each device pattern, and to provide a sound collection
device that determines the device configuration based thereon.
[0008]
In order to solve the above problems, according to a first aspect of the present invention, the
sound collection device includes a plurality of microphones.
The sound collection device is configured such that the transfer characteristic between the
plurality of microphones and the sound source is changed according to the correlation between
the plurality of microphones and the reflection unit made of a material capable of reflecting
sound. And a movable control unit that changes the orientation or arrangement.
[0009]
In order to solve the above problems, according to another aspect of the present invention, the
sound collection device includes a plurality of microphones. The sound collection device changes
the direction or the arrangement of at least one of the plurality of microphones so as to change
the transfer characteristic between the plurality of microphones and the sound source according
to the correlativity between the plurality of microphones Further includes
03-05-2019
3
[0010]
In order to solve the above problems, according to another aspect of the present invention, a
sound collection device is a reflector made of N microphones and a material capable of reflecting
sound, where N is an integer of 3 or more. And. The sound collection device includes an intersensor correlation calculation unit that calculates a control amount Z that minimizes the
correlation between microphones, M is an integer greater than or equal to 2 and less than or
equal to N, and N microphones And a selection unit for selecting the microphones.
[0011]
In order to solve the above-mentioned problems, according to another aspect of the present
invention, the sound collection device has Q as an integer of 2 or more, and Q pieces of Q made
of a plurality of microphones and a material capable of reflecting sound. And the reflective
portion of The sound collection device includes an inter-sensor correlation calculation unit that
calculates a control amount Z that minimizes the correlation between microphones, P is an
integer greater than or equal to 1 and less than or equal to Q, and Q reflectors are calculated
based on the control amount Z It further includes a selection unit that selects P reflection units.
[0012]
In order to solve the above-mentioned problems, according to another aspect of the present
invention, the sound collection device can reflect N microphones, where N is an integer of 3 or
more and Q is an integer of 2 or more. And Q reflectors made of a material. The sound collection
device is an inter-sensor correlation calculation unit that calculates a control amount Z that
minimizes the correlation between microphones, M is an integer greater than or equal to 2 and
less than or equal to N, P is an integer greater than or equal to 1 and less than or equal to Q, ,
And further includes a selection unit that selects M microphones from the N microphones and
selects P reflectors from the Q reflectors.
[0013]
In order to solve the above problems, according to another aspect of the present invention, a
03-05-2019
4
sound collection device has a reflector S made of a plurality of microphones and a material
capable of reflecting sound, with S being an integer of 2 or more. And S sound pickup units
including The sound collection device includes an inter-sensor correlation calculation unit that
calculates a control amount Z that minimizes the correlation between microphones, R is an
integer greater than or equal to 1 and less than or equal to S, and S sound collection units based
on the control amount Z And R further includes a selection unit for selecting R pickup units.
[0014]
According to the first aspect of the present invention, there is an effect that the device scale for
predetermined directivity performance can be made smaller than the prior art.
[0015]
Moreover, according to the other aspect of this invention, the apparatus structure which reduces
the correlation of a transfer characteristic can be discerned.
[0016]
The figure for demonstrating the conditions of the sound collection apparatus of this invention.
The figure for demonstrating the conditions of the sound collection apparatus of this invention.
The figure for demonstrating the case where the reflective structure was combined with the
sound collection apparatus of this invention. The figure for demonstrating the case where a
diffusion structure was combined with the sound collection apparatus of this invention. The
figure which shows the example of installation of the spreading ¦ diffusion structure which
increases the number of reflected sound included in the transfer characteristic between a control
point and a microphone. The figure which shows the example of installation of the spreading ¦
diffusion structure which blocks the opening part of a reflective structure. The figure which
shows the example in case a spreading ¦ diffusion structure is a three-dimensional structure
which has a convex surface. FIG. 1 is a perspective view of a sound collection device according to
a first embodiment. The front view of the sound collection apparatus which concerns on 1st
embodiment. FIG. 1 is a side view of a sound collection device according to a first embodiment.
FIG. 10 is a conceptual view showing a cross section taken along line XI-XI of FIG. 9; FIG. 10 is a
conceptual view showing the XII-XII cross section of FIG. 9; The figure which shows the function
structure of the sound collection apparatus which concerns on 1st embodiment. The figure which
03-05-2019
5
shows the processing flow of the sound collection apparatus which concerns on 1st embodiment.
The figure which shows the function structure of the sound collection apparatus which concerns
on 2nd embodiment. The figure which shows the processing flow of the sound collection
apparatus which concerns on 2nd embodiment. The figure which shows the example of the
shape of a reflection part. The figure which shows the function structure of the sound collection
apparatus which concerns on 3rd embodiment. The figure which shows the processing flow of
the sound collection apparatus which concerns on 3rd embodiment. The figure which shows the
function structure of the sound collection apparatus which concerns on 4th embodiment. The
figure which shows the processing flow of the sound collection apparatus which concerns on 4th
embodiment.
[0017]
Hereinafter, embodiments of the present invention will be described. In the drawings used in the
following description, the same reference numerals are given to constituent parts having the
same functions and steps for performing the same processing, and redundant description will be
omitted. In the following description, the symbols <→> , ^ , etc. used in the text should
originally be written directly above the previous character, but due to the limitations of the text
notation Described in. In the formula, these symbols are described at their original positions.
Moreover, the processing performed in each element unit of a vector or a matrix is applied to all
elements of the vector or the matrix unless otherwise noted.
[0018]
First Embodiment The present embodiment relates to a sound collection device that physically
modulates transfer characteristics based on diffusion sensing.
[0019]
First, the sound collecting process based on the diffusion sensing described so far in Non-Patent
Document 1 will be described.
[0020]
[Modeling of Observation Signal] Consider a situation in which one target sound and K (≧ 1)
pieces of noise are received using M (≧ 2) microphones.
03-05-2019
6
The purpose is directed control that emphasizes the target sound at any position in the presence
of a lot of noise.
The goal is achieved by suppressing the K noise sources and emphasizing the target sound. The
impulse response between the m (m = 1, 2,..., M) microphone and the target sound, and the k (k =
1, 2,..., K) noise are respectively am (i), bk, m (i) However, let L be the impulse response length,
and i = 0, 1,. The impulse response length L may be determined experimentally according to the
reverberation time determined by the size and structure of the device and the condition of the
installed room. Assuming that the target sound and the source signal of the k-th noise are s (t)
and nk (t), respectively, the observed signal xm (t) observed by the m-th microphone is modeled
by the following equation.
[0021]
[0022]
Here, t represents an index of time.
[0023]
By short-time Fourier transforming xm (t), the convolutional mixture of equation (1) is
approximated as an instantaneous mixture in the frequency domain as in the following equation.
[0024]
[0025]
Here, ω and τ respectively indicate the frequency and the index of the frame.
For example, sampling is performed at 48 kHz and the number of taps is 2048.
Also, Xm (ω, τ), S (ω, τ), Nk (ω, τ) are the observation signal xm (t), the sound source signal s
(t) of the target sound, and the sound source signal nk of the k th noise, respectively. Represents
03-05-2019
7
the time-frequency representation of (t).
am (ω) and bk, m (ω) represent frequency characteristics between the target sound and the k-th
noise and the m-th microphone, respectively, and these are hereinafter referred to as transfer
characteristics.
Expressing equation (2) in matrix form, it becomes as follows.
[0026]
[0027]
And <T> represents transposition.
[0028]
[Beamforming] The output signal y (t) after beamforming is obtained by convolving the
observation signal xm (t) with the filter wm (t) designed to emphasize the target sound as shown
in the following equation. Be
[0029]
[0030]
Here, J represents the filter length, and may be approximately the same as the impulse response
length L.
Y (ω, τ) which is a time-frequency expression of y (t) can be approximately obtained by the
following equation.
[0031]
03-05-2019
8
[0032]
Here, <H> represents a conjugate transpose, and the complex conjugate of W <→> m (ω)
corresponds to the frequency response of wm (j).
[0033]
[0034]
When the noise component contained in the output signal Y (ω, τ) is written as YN (ω, τ), the
power pN (ω) of the following equation is defined as the power of the noise component.
[0035]
[0036]
Here, ET represents temporal expectation value calculation.
Assuming that the source signals are uncorrelated with each other, the power pN (ω) can be
calculated only by the transfer characteristic b <→> k (ω) and the filter W <→ >> (ω).
[0037]
[0038]
In the field of array signal processing, various filter design methods have been described to
minimize pN (ω).
The delay-sum method and the maximum likelihood method will be described as a representative
example (see reference 1).
[Reference 1] Asano, Ta, "Array signal processing of sound-low order, tracking and separation of
03-05-2019
9
sound sources", Corona, 2011
[0039]
In the delay-sum method, the filter W <→> DS is designed to emphasize the direct sound of the
target sound by the following equation.
[0040]
[0041]
Represents the array manifold vector of the direct sound of the target sound.
The element hm (ω) represents the transfer coefficient of the direct sound path from the target
sound to the mth microphone, and the distance between the target sound and the mth
microphone is dm, the speed of sound is c, and the imaginary unit is j For example, it can be
calculated by the following equation.
[0042]
[0043]
Further, in the maximum likelihood method, the filter W <→> ML is designed to emphasize the
direct sound of the target sound and minimize the power pN (ω) according to the following
equation.
[0044]
[0045]
Here, R (ω) represents a spatial correlation matrix of noise.
03-05-2019
10
For example, assuming that there is no correlation between sound source signals, the spatial
correlation matrix R (ω) of noise is calculated using only the transfer characteristic b <→> k (ω)
as in the following equation.
[0046]
[0047]
In classical array signal processing as described in reference 1, it has been considered how to
arrange the spacing between microphones.
However, the correlation between microphones is often high except for specific frequencies.
The following two are known as representative problems.
The first problem is that it is difficult to perform narrow pointing control because the correlation
between transfer characteristics tends to be high in a low frequency band with a long
wavelength.
The second problem is that in the high frequency band where the wavelength is short, spatial
aliasing occurs that emphasizes the sound other than the specific target sound unless the
microphones are arranged at an interval equal to or less than a half wavelength of the
wavelength.
From the above two points, it has been considered difficult to reduce the power pN (ω) over a
wide band.
[0048]
[Diffusion sensing] In Non-Patent Document 1, in order to reduce the power pN (ω) over a wide
band, the nature of the transfer characteristic should be examined, and the basic theory of
diffusion sensing is summarized .
03-05-2019
11
[0049]
The concept of diffusion sensing is to "uncorrelate transfer characteristics across a wide band" by
"physical modulation of transfer characteristics" as in the following equation.
[0050]
[0051]
Here, physical modulation of the transfer characteristic refers to any physical means for
changing the nature of the transfer characteristic itself, such as a reflective structure placed in
the vicinity of the microphone.
The method proposed in Non-Patent Document 1 is a method in which reflection is repeated
many times, a sound field (diffuse sound field) in which reflected sound arrives isotropically, and
a microphone array is installed therein.
For example, if a reflecting structure shaped to surround the microphone array is made and
opened only on one side, the sound coming into the reflecting structure naturally repeats
reflection to generate a pseudo diffuse sound field.
[0052]
It will be briefly explained why the transfer characteristics are uncorrelated if the microphone
array is installed in the diffuse sound field.
Assuming that the correlation between transfer characteristics is γ (ω), it is known that the
correlation γ (ω) in the diffuse sound field is calculated by the following equation.
[0053]
03-05-2019
12
[0054]
Here, ES, p <→> respectively represent spatial expectation value calculation and position vectors
between microphones.
Assuming that the distance between the microphones ¦¦ p <→> ¦¦ is sufficiently wide, the
expected value of the correlation γ (ω) between the transfer characteristics in the diffuse sound
field is zero.
[0055]
[0056]
Therefore, in the prior art, a pseudo diffuse sound field is physically generated by a reflecting
structure, and a microphone array is installed therein (see Non-Patent Documents 1 and 2).
[0057]
Also, in order to reduce the power pN (ω), a filter design method using a transfer characteristic
prepared by prior simulation and measurement has been studied.
Simply put, it has been designed to emphasize only the target sound, but in diffuse sensing based
control it is designed to emphasize the transfer characteristic itself.
[0058]
When based on the delay-and-sum method, the filter W <→> is realized by replacing the array
manifold vector h <→> (ω) with the transfer characteristic a <→> (ω) of the target sound as in
the following equation. DS1 (ω) can be designed.
[0059]
[0060]
03-05-2019
13
In this case, it is necessary to prepare a <→> (ω) in advance by simulation or measurement.
[0061]
Moreover, when based on the maximum likelihood method, filter W <->> DS2 ((omega)) can be
designed by following Formula.
[0062]
[0063]
Also in this case, it is necessary to prepare a <→> (ω) and R (ω) in advance by simulation or
measurement.
In the case where a pseudo diffuse sound field is generated and sound is picked up using the
means mentioned above, it is expected that the transfer characteristic is naturally decorrelated,
so that the power p N (ω) Could be reduced over a wide band.
[0064]
<Point of First Embodiment> However, in the prior art, as described above, the scale of the
apparatus tends to be large.
[0065]
Therefore, in the present embodiment, in order to decorrelate the transfer characteristic over a
wide band, as physical modulation of the transfer characteristic , depending on the property
of the observed signal (correlation between microphones), or The orientation or position of the
microphone can be changed to reduce the correlation of the transfer characteristics under
conditions where the volume of the reflective structure is limited.
In other words, the reflector or the microphone is moved to reduce the correlation of the transfer
characteristics.
03-05-2019
14
[0066]
Hereinafter, conditions of the sound collection device defined in the present embodiment will be
described using FIGS. 1 and 2.
[0067]
[Required Conditions] (1) Including a plurality of microphones and a filtering unit Including a
filtering unit 160 that includes two or more microphones 112 and can perform independent
filtering.
[0068]
(2)Including Inter-Sensor Correlation Calculation Unit The inter-sensor correlation calculation
unit 210 calculates the correlation between microphones (for example, the correlation between
observed signals) and determines the movement of the reflecting unit 180 and the microphone
112 described later.
[0069]
(3
-1) One or more reflectors 180 are installed near the microphone 112 including the movable
control unit for changing the orientation or arrangement of the reflectors, and the orientation or
arrangement of the reflectors 180 is changed according to the correlation between the
microphones Including one or more movable control units 200 (see FIG. 1).
The reflecting unit 180 is made of a material capable of reflecting sound.
The shape should just be a shape which produces one or more reflected sounds.
For example, it may be plate-like as shown in FIG.
[0070]
(3-2) Includes a movable control unit that changes the orientation or arrangement of the
microphones One or more movable control units 200 that change the orientation or arrangement
03-05-2019
15
of the microphones 112 according to the correlation between the microphones (see FIG. 2) ).
[0071]
(3
The conditions of -1) and (3-2) are conditions which should exist if either one exists.
The structures (3-1) and (3-2) may be combined.
In other words, the movable control unit 200 may be configured to change the orientation or
arrangement of the microphone 112 and the reflection unit 180 simultaneously or separately.
[0072]
For example, the movable control unit 200 is composed of a motor or the like, rotates according
to the control amount Z determined by the inter-sensor correlation calculation unit 210, and
rotates a disk installed perpendicular to the rotation axis to rotate the disk. Change the
arrangement of the microphones 112 installed in (see FIG. 2).
Also, the reflection unit 180 installed on the rotation shaft is rotated to change the direction (see
FIG. 1).
Prior to use, in each control amount ε, the transfer characteristic A <→> (ω, ε) = [a <→> 1 (ω,
ε) between the K ′ point obtained by finely dividing the control target area in advance and
each microphone , a <→> 2 (ω, ε),..., a <→> K ′ (ω, ε)] are measured and stored in the
transfer characteristic storage unit 140 described later.
Furthermore, in the inter-sensor correlation calculation unit 210 described later, the transfer
characteristic A <→> (ω, ε) and the observation signal X <→> (ω, τ) = [X1 (ω, τ),. Based on
τ), the correlation between the microphones is calculated, and the control amount Z that
minimizes the correlation between the transfer characteristics is calculated by the following
equation, and this is output to the movable control unit 200.
03-05-2019
16
[0073]
[0074]
Since the transfer characteristics change depending on the orientation and arrangement of the
microphone 112 and the reflection unit 180, the orientation and arrangement of the microphone
112 and the reflection unit 180 are changed so that the correlation between the transfer
characteristics is reduced.
In addition, when using several microphone 112 and several reflection part 180, it is good also
as a structure which fixes one part and changes the remainder by the movable control part 200
(refer FIG. 2).
[0075]
[Conditions that are not essential but are good] Further, in order to decorrelate the transfer
characteristics, a method may be considered in which the following conditions are combined.
[0076]
(4)Including a Reflecting Structure A reflecting structure 190 formed of a material that reflects
and diffracts sound and having a shape surrounding the microphone 112 having an opening (in
other words, a shape that forms a three-dimensional space) (See Figure 3).
[0077]
(5)
Installation of Diffusion Structure One or more diffusion structures 181 are installed such that
the number of reflection paths between the control point A and the microphone 112 is large.
For example, in combination with the condition (4), one or more diffusion structures 181 are
provided on the inner wall surface or the inside of the reflection structure 190 (see FIG. 4).
03-05-2019
17
[0078]
The reflection path between the control point A and the microphone 112 is shown in FIG.
Although a reflection path (broken line) determined only by the reflection structure 190 is also
present, the installation of the diffusion structure 181 increases the reflection path (dasheddotted line).
Therefore, the diffusion structure 181 modulates the transfer characteristic in the case where the
diffusion structure 181 is not provided.
By increasing the number of reflection paths, the diffusivity of the sound field is increased even
in a situation where the volume of the sound collection device is limited, and therefore, it can be
expected that the correlation between the transfer characteristics is reduced.
The shape and the arrangement position of the diffusion structure 181 are not limited, and may
have a curved surface with unevenness.
However, as shown in FIG. 6, when a plate that blocks the opening of the reflective structure 190
is used as the diffusion structure 181, the reflection path between the control point A and the
microphone 112 is reduced, so that the diffusion structure is eliminated. It is not suitable as the
shape or arrangement of the 181.
Therefore, the diffusion structure 181 is arranged such that the number of reflections of the
sound incident on the sound collection device is larger than that in the case where the diffusion
structure 181 is not provided.
[0079]
FIG.4 and FIG.7 is sectional drawing which shows the example of a shape in case the spreading ¦
diffusion structure 181 is a three-dimensional structure which has a curved surface.
03-05-2019
18
In this example, a diffusion structure 181 projecting in the direction of the opening is provided
on the inner wall opposite to the surface having the opening of the reflection structure 190, and
has a concaved cross section in FIG. It has a face of It is desirable that the diffusion structure 181
have a structure for guiding the sound incident from the opening of the reflection structure 190
to a microphone inside the sound collection device. For example, in the case of FIG. 7, since the
sound is reflected to the outside of the sound collection device at the end of the diffusion
structure 181, the shape shown in FIG. 4 is considered to be more desirable.
[0080]
(6)
Use of microphones with different directivity By mixing and using microphones with various
directivity, correlation between transfer characteristics is reduced and decorrelation is achieved.
For example, microphone directivity is not limited, but microphones having various directivity
such as omnidirectional, unidirectional, bidirectional, and hypercardioid are used in combination.
If microphones with different directivity are placed at the same position, the transfer
characteristics with the same control point will be different. For example, when an
omnidirectional microphone or an omnidirectional microphone is disposed at the same position,
the transfer characteristic between the control point A and the omnidirectional microphone, the
control point A, and the unidirectional microphone The transfer characteristic between the two is
different. Therefore, under these conditions, the change in the transfer characteristic due to the
difference in directivity is used to further reduce the correlation between the transfer
characteristics, thereby achieving decorrelation.
[0081]
<Sound Collection Device 10 According to First Embodiment> FIG. 8 is a perspective view of the
sound collection device 10, FIG. 9 is a front view thereof, and FIG. 10 is a side view thereof. 11 is
a conceptual view showing a cross section taken along the line XI-XI of FIG. 9, and FIG. 12 is a
conceptual view showing the cross section taken along the line XII-XII shown in FIG.
[0082]
03-05-2019
19
As shown in FIG. 12, eleven disks 201 are linearly arranged in the three-dimensional space
formed by the reflection structure 190, and further, eleven microphones 212 are arranged on the
disk 201. . Further, although not shown, eleven microphones 211 are linearly arranged outside
the three-dimensional space formed by the reflection structure 190 (on the outer wall surface of
the upper wall) (see FIG. 11). The shape of the reflective structure 190 is not limited as long as
one or more openings are formed, but in this embodiment, a horizontally long rectangular
parallelepiped is the base, and the front surface is an open surface. In addition, the reflecting
structure 190 is a flat reflecting plate whose reflecting surface is a flat surface and has an
appropriate thickness and rigidity (for example, the reflectance α is 0.8). Reflective Structure
190 The reflective surface is not necessarily flat, and may be a flat plate having irregularities.
Furthermore, in the present embodiment, the horn 191 is provided on the opening surface in
order to facilitate the introduction of sound into the reflection structure. The horn 191 has a
shape such that the opening area as viewed from the outside of the reflective structure 190 is
large and the opening area as viewed from the inside is small. The structure is such that sound
can not easily enter the reflecting structure 190. The shape and the number of the opening faces
are not limited as long as there is one or more opening faces. ホーンをつけてもよいし、つけなく
てもよい。 In the present embodiment, the horn 191 is provided on the opening surface for each
of the diffusion structures 181.
[0083]
The reflective structure 190 forms a three-dimensional space, and the diffusion structure 181 is
disposed in the three-dimensional space. The diffusion structure 181 has a concave curved
surface. The sound coming from the aperture surface is reflected by the diffusion structure 181
and aimed to be multiply reflected in the reflection structure 190 so as to have this shape. The
number of the diffusion structures 181 may be set to Q (Q ≧ 1), and in the present embodiment,
ten diffusion structures 181 are provided (see FIG. 12).
[0084]
The microphone 212 can be installed inside the three-dimensional space formed by the reflecting
structure 190. In addition, the microphone 211 can be installed on the outer wall surface of the
upper wall of the reflective structure 190.
[0085]
03-05-2019
20
The microphone 211 is covered with an acoustically transparent acoustic transmission cover
192. "Acoustically transparent" means that reflection / diffraction does not occur (or hardly
occurs), for example, the sound transmission cover 192 is made of punching metal. The sound
transmission cover 192 is a cover for protecting the microphone 211 from impact and the like,
and may not be necessarily provided.
[0086]
The microphone 211 installed on the outside is less susceptible to the influence of reflection and
diffraction by the reflective structure 190, and has a feature of being able to observe a direct
sound with a strong amplitude. In addition, the microphone 212 was installed inside the
reflective structure 190. The microphone 212 is largely affected by reflection and diffraction by
the reflective structure 190, so that it is possible to obtain transfer characteristics that are clearly
different from those of the microphone 211 installed outside. Therefore, it is expected that the
correlation between the transfer characteristic of the microphone 212 installed inside the
reflective structure 190 and the transfer characteristic of the microphone 211 installed outside is
reduced. Note that the transfer characteristics from the control point to the microphone 212 can
be easily modulated in response to changes in the position of the control point and the sound
collection environment (for example, a reflector present outside the sound collection device) due
to the influence of the reflected sound. The transfer characteristic from the control point to the
microphone 212 is difficult to modulate.
[0087]
A movable control unit (motor) 200 was installed on the inner bottom surface of the reflective
structure. The movable reflection unit 180 is attached to the movable control unit 200 or the
microphone 212 is attached. In the present embodiment, the microphone 212 is attached. The
movable control unit 200 moves the movable reflector and the microphone so as to reduce the
correlation of the transfer characteristic according to the observation signal.
[0088]
[Signal Processing of Sound Collection Device 10] The functional configuration and processing
flow of the sound collection device 10 according to the first embodiment are shown in FIG. 13
03-05-2019
21
and FIG. The sound collection device 10 according to the first embodiment includes M1
microphones 211-m1, M2 microphones 212-m2, an AD conversion unit 120, a frequency domain
conversion unit 130, a filtering unit 160, a time domain conversion unit 170, and a filter. The
calculation unit 150, the transfer characteristic storage unit 140, the movable control unit 200,
and the inter-sensor correlation calculation unit 210 are included. m1=1,2,…,M1であ
り、m2=1,2,…,M2であり、M1≧1、M2≧1であり、M1+M2=Mである。
[0089]
<Microphone 211-m1, Microphone 212-m2> Sound is collected using M1 microphones 211-m1
and M2 microphones 212-m2 (s1), and an analog signal (sound collection signal) is output to the
AD conversion unit 120 Do. M1 microphones 211-m1 are installed outside the reflective
structure 190, and M2 microphones 212-m2 are installed inside the reflective structure 190.
[0090]
<AD Converting Unit 120> The AD converting unit 120 converts a total of M analog signals
collected by the M1 microphones 211-m1 and the M2 microphones 212-m2 into digital signals x
<→> (t) = Convert to [x1 (t),..., xM (t)] <T>, and output to the frequency domain transform unit
130 (s2). t represents an index of discrete time.
[0091]
<Frequency domain converter 130> First, the frequency domain converter 130 converts the
digital signal x <→> (t) = [x1 (t), ..., xM (t)] <T> output from the AD converter 120. As an input, N
samples are stored in a buffer for each channel and a digital signal in frame units x <→> (τ) = [x
<→> 1 (τ),..., X <→> M (τ)] <T> Generate τ is the index of the frame number. x <→> m (τ) = [x
m ((τ−1) N + 1),..., x m (τ N)] (1 ≦ m ≦ M). Although N depends on the sampling frequency,
2048 points are appropriate for 48 kHz sampling. Next, the frequency domain conversion unit
130 converts the digital signal x <→> (τ) of each frame into a signal X <→> (ω, τ) = [X 1 (ω,
τ),. , τ)] <T> (s3) and output. ω is the index of the discrete frequency. Although there is a fast
discrete Fourier transform as one of the methods for converting time domain signals into
frequency domain signals, the present invention is not limited to this, and other methods for
converting into frequency domain signals may be used. The frequency domain signal X <→> (ω,
τ) is output for each frequency ω and frame τ.
03-05-2019
22
[0092]
<Transmission Characteristic Storage Unit 140> The transmission characteristic storage unit 140
is a transmission characteristic A <→> (ω, ε) = [a <→> 1 (ω, ε), which is measured in advance
using the sound collection device 10,. , a <→> K ′ (ω, ε)] is stored. ε represents the control
amount of the movable control unit 200, and a <→> k (ω, ε) = [a1 (ω, ε), a2 (ω, ε),..., aM (ω,
ε)] <T> (Where k = 1, 2,..., K ′) and the movable control unit 200 is controlled by ε, the k
points and M microphones included in the densely divided K ′ point of the control target area
Transfer characteristic at the frequency ω during the period, in other words, a <→> k (ω, ε) =
[a1 (ω, ε),..., AM (ω, ε)] <T>, the movable control unit The transfer characteristic at frequency
ω at point k to each of the microphones included in the microphone array when 200 is
controlled by ε. Note that the transfer characteristics A <→> (ω, ε) may be prepared in
advance by theoretical equations or simulations, not by prior measurement.
[0093]
<Inter-Sensor Correlation Calculation Unit 210> The inter-sensor correlation calculation unit 210
takes out the transfer characteristic A <→> (ω, ε) from the transfer characteristic storage unit
140 and makes a predetermined interval (it may be every frame but Considering the operation of
the movable control unit 200, for example, it may be every few minutes (s20) to receive the
frequency domain signal X <→> (ω, τ) and calculate the inter-sensor correlation for each
frequency ω∈ Ω (S21) The control amount Z of the movable control unit 200 is obtained and
output.
[0094]
For example, by predicting the direction or position of the target sound and K ^ noises from the
frequency domain signal X <→> (ω, τ), the predicted target sound and K ^ noises can be
calculated by Inter-sensor correlation in the direction or position of noise is calculated to
determine the control amount Z.
[0095]
[0096]
03-05-2019
23
The input transfer characteristics A <→> (ω, ε) may be normalized because the power may not
be normalized for each of the sound collection devices.
Two examples are given below as implementation examples of the normalization method.
[0097]
(i) In the case of normalizing the power of the transfer characteristic for each sound collection
device, normalization is performed using the following equation.
[0098]
[0099]
(ii) In the case of normalizing the power of the transfer characteristic for each direction,
normalization is performed using the following equation.
[0100]
[0101]
There are various calculation methods of inter-sensor correlation, but (i) method using power
average C1 (ω, ε) of transfer characteristic correlation, (ii) method using channel capacity C2
(ω, ε), (iii) Four methods of using the condition number C3 (ω, ε) and (iv) using the
determinant C4 (ω, ε) are shown.
[0102]
(i) First, the method of calculating the power average C1 (ω, ε) of the correlation of the transfer
characteristic will be shown below.
The power of the correlation between the transfer characteristics is calculated according to the
following equation and averaged for all combinations of control points.
03-05-2019
24
[0103]
[0104]
The higher the orthogonality of the transfer characteristics, the smaller the value of C1 (ω, ε),
and C1 (ω, ε) = 0 when the transfer characteristics are completely uncorrelated.
[0105]
(ii) Next, a method of using channel capacity will be shown.
It is a commonly used measure in the MIMO system for wireless, and the maximum amount of
information that can be sent through the transmission path when the source and microphone are
regarded as the transmission path is called channel capacity (see reference 2).
[Reference 2] G. J. Foschini et al., On limits of wireless communications in a fading
environment when using multi-element antennas , Wireless Personal Communications, 1998,
vol. 6, no. 3, pp. 311-335
[0106]
The channel capacity C2 (ω, ε) can be calculated by the following equation.
[0107]
[0108]
Here, PSNR (ω, ε) is the average SN ratio of the sound source signal and the sensor noise in the
control amount ε, and Λ m (ω, ε) is the m-th spatial correlation matrix R (ω, ε) in the control
amount ε The eigenvalues are aligned as Λ 1 (ω, ε) ...... Λ, M (ω, ε) の 0.
Assuming that the source signals are uncorrelated with each other, the spatial correlation matrix,
R (ω, ε) can be approximated by the following equation using transfer characteristics.
03-05-2019
25
[0109]
[0110]
The higher the orthogonality of the transfer characteristics, the larger the value of C2 (ω, ε).
If there is no correlation between the transfer characteristics completely, the eigenvalues are
smoothed as Λ 1 (ω, ε) ≒... Λ M (ω, ε), so the trace of the spatial correlation matrix R n (ω,
ε) is constant. Under the conditions, the channel capacity C2 (ω, ε) becomes maximum.
[0111]
(iii) Next, a method of using the condition number C3 (ω, ε) will be shown.
The condition number at the n-th microphone is calculated by the ratio of the maximum
eigenvalue to the minimum eigenvalue of the spatial correlation matrix Rn (ω, ε) as in the
following equation.
[0112]
[0113]
The higher the orthogonality of the transfer characteristics, the smaller the value of C3 (ω, ε).
If there is no correlation between the transfer characteristics completely, C 3 (ω, ε) = 1.
[0114]
03-05-2019
26
(iv) Finally, the method of using the determinant C 4 (ω, ε) is shown.
The determinant is one evaluation function used to evaluate the degree of smoothness of the
eigenvalue distribution.
[0115]
[0116]
The higher the orthogonality of the transfer characteristics, the larger the value of C 4 (ω, ε).
When there is no correlation between the transfer characteristics completely, C4 (ω, ε) = 1.
[0117]
The inter-sensor correlation calculation unit 210 calculates the correlation of the transfer
characteristic on any scale.
Furthermore, the costs Ci (ω, ε) (where i = 1, 2, 3, 4) calculated for each frequency are
averaged.
[0118]
[0119]
Here, Ω is a set of frequency indexes to be averaged, and ¦ Ω ¦ represents the total number
thereof.
03-05-2019
27
Also, g (ω) represents the weight for each frequency.
Assuming that the speech is white, there is no problem with g (ω) = 1.
Finally, the control amount Z is determined based on the frequency averaged cost C ^ (ε).
The control amount Z is a control amount ε that minimizes the correlation between the transfer
characteristics.
For example, when using the power average C1 (ω, ε) and the condition number C3 (ω, ε), the
control amount ε corresponding to the minimum cost C ^ 1 or C ^ 3 is the control amount Z,
and the communication path When the capacity C2 (ω, ε) or the determinant C4 (ω, ε) is used,
the control amount ε corresponding to the maximum cost C ^ 2 or C ^ 4 is taken as the control
amount Z.
[0120]
When predicting the direction or position of the target sound and K ^ noises from the frequency
domain signal X <→> (ω, τ), an existing sound source position estimation technique may be
used. For example, a) GCC-PHAT method, b) MUSIC method, c) Beam former method etc. are
known as sound source position estimation techniques.
[0121]
a) GCC-PHAT method (for details, refer to the reference 2) [Reference 2] C. H. Knapp et al., "The
generalized correlation method for estimation of time delay", IEEE Trans. ASSP, 1976, vol. 24, no.
4, pp. 320-327
[0122]
The GCC-PHAT method is a method of obtaining a sound source arrival direction by using a time
03-05-2019
28
difference generated between two microphones (microphone pairs) when observing speech. In
this case, the inter-sensor correlation calculation unit 210 uses the frequency domain signal X
<→> (ω, τ) to obtain generalized cross correlation Q (ω, τ, r r → ̲j) = [Q 1 (ω, τ, r r → ̲j), ...,
QU (ω, τ,, r → ̲ j)] is calculated. Here, the subscript r → ̲j represents r <→> j. Also, U is the
total number of microphone pairs, and can take a value of up to MC2. The u (u = 1, 2,..., U) -th
microphone pair is composed of the mu̲1-th microphone and the mu̲2-th microphone
(however, the subscripts u̲1 and u̲2 represent u1 and u2, respectively) The phase of the
frequency domain signal Xm̲u̲2 (ω, τ) collected by the mu̲2th microphone is delayed by the
time rr → ̲j with respect to the phase of the frequency domain signal Xm̲u̲1 (ω, τ) collected
by the mu̲1th microphone Let the correlation value of the case be Qu (ω, τ, r r → ̲ j). Here,
subscripts m̲u̲1 and m̲u̲2 respectively represent mu̲1 and mu̲2, and rr → ̲j represents a
delay that occurs when sound propagates from the position r <→> j. The generalized cross
correlation Qu (ω, τ, rr → ̲j) is calculated by the following equation.
[0123]
[0124]
* Represents a complex conjugate.
[0125]
Furthermore, in the inter-sensor correlation calculation unit 210, using the generalized cross
correlation Qu (ω, τ, [r → ̲ j), the sound source position r <→> (τ) = [r <→> S (τ), r < →> 1
(τ),..., R <→> K ^ (τ)] is calculated.
As the position r <→> j where the value of the generalized cross correlation Qu (ω, τ, r r → ̲ j)
is larger, the possibility that a sound source exists is higher.
Therefore, it is sufficient to extract K ^ + 1 positions where the value of the generalized cross
correlation Qu (ω, τ, r r → ̲ j) is large. For example, it is sufficient to extract K ^ + 1 positions r
<→> j where the cost CGCC below is high.
[0126]
03-05-2019
29
[0127]
b) MUSIC method (for details, refer to reference 3) [Reference 3] RO Schmidt, "Multiple emitter
location and signal parameter estimation", IEEE Transactions on Antennas and Propagation,
1986, vol. 34, no. 3, pp. 276 -280
[0128]
In the MUSIC method, using microphones equal to or more than the number of sound sources (K
^ + 1) present in the sound field, sound source positions r <→> = [r <→> S, r <→> 1, 1 included in
the observation signal ..., r <→> K ^] is estimated.
Thus, M ≧ K ^ + 1.
The total number of noises K ^ is given in advance or estimated from observed signals.
[0129]
The inter-sensor correlation calculation unit 210 calculates a spatial correlation matrix R <→> N
(ω, τ) of the target sound and noise using the observation signal X <→> (ω, τ). First, the
spatial correlation matrix R <→> (ω, τ) is calculated using the observation signal X <→> (ω, τ).
[0130]
[0131]
Here, E [·] represents an expectation value operator, and there is no problem if it is replaced with,
for example, temporal averaging processing.
Next, to generate a spatial correlation matrix in the noise space, eigen-decomposition R <→> (ω,
03-05-2019
30
τ).
[0132]
[0133]
Here, V <→> (ω, τ) = [v <→> 1 (ω, τ),..., V <→> M (ω, τ)] is an eigenvector matrix, and v <→>
m (ω) , τ) are the m-th eigenvectors of V <→> (ω, τ).
Further, Λ <→> (ω, τ) = diag ([Λ1 (ω, τ),..., ΛM (ω, τ)]) is an eigenvalue matrix composed of
M eigenvalues. 1Since the eigenvectors from the th to the K ^ + 1 th include components due to
the sound source, the ^ k ^ + 2 to the M th eigenvectors v <→> K ^ + 2 (ω, τ), ..., v Only
stationary noise is present in the space constituted by <→> M (ω, τ). The property is used to
generate a spatial correlation matrix of target sound and (non-stationary) noise.
[0134]
[0135]
Furthermore, in the inter-sensor correlation calculation unit 210, the music spectrum PMUSIC
(ω, τ, r <→> j) is generated using the spatial correlation matrix R <→> N (ω, τ) of the target
sound and (non-stationary) noise. Calculate).
[0136]
[0137]
Here, h <→> (ω, r <→> j) is a transfer characteristic between M microphones from position r
<→> j, and is usually calculated by modeling only direct sound.
[0138]
Finally, in the inter-sensor correlation calculation unit 210, using PMUSIC (ω, τ, r <→> j), the
03-05-2019
31
sound source position r <→> = [r <→> S, r <→> 1,. Calculate <→> K ^].
As for the position r <→> j where the value of PMUSIC (ω, τ, r <→> j) is large, the possibility
that a sound source exists is high.
Therefore, it is sufficient to extract K ^ + 1 positions where the value of PMUSIC (ω, τ, r <→> j)
is large.
For example, it is sufficient to extract K ^ + 1 positions r <→> j whose cost CMUSIC is high.
[0139]
[0140]
c) Beamforming method (for details, refer to reference 4) [Reference 4] D. H. Johnson et al., Array
Signal Processing, Prentice-Hall, Englewodd Cliffs, NJ, USA, 1993
[0141]
The beamformer method is a method of estimating a sound source position by preparing a large
number of beamformers and scanning a space.
[0142]
In the inter-sensor correlation calculation unit 210, a filter w <→> (ω, r <→> j) = [W1 (ω, r <→>
j), ..., WM (ω, r <→] for scanning the space > j) Prepare <T> for each scanning position.
There are various filter design methods, but here, the delay-sum method and the minimum
variance method will be described.
[0143]
The delay-and-sum method is designed as follows with the cost of emphasizing the target sound
at position r <→> j.
03-05-2019
32
[0144]
[0145]
The minimum variance method is designed at the cost of minimizing the energy of noise while
emphasizing the target sound, so that it can be calculated as follows.
[0146]
[0147]
Although there are various other filter design methods, filters may be designed using any
method.
[0148]
The inter-sensor correlation calculation unit 210 further convolutes the filter w <→> (ω, r <→> j)
with the frequency domain signal X <→> (ω, τ) as in the following equation to obtain the spatial
spectrum PBF (ω, τ, r <→> j) is calculated.
[0149]
[0150]
Finally, the inter-sensor correlation calculation unit 210 calculates the sound source position r
<→> (τ) using the space spectrum PBF (ω, τ, r <→> j).
As the position r <→> j where the value of the space spectrum PBF (ω, τ, r <→> j) is large, the
possibility that a sound source exists is high.
Therefore, it is sufficient to extract K ^ + 1 positions where the value of the space spectrum PBF
(ω, τ, r <→> j) is large.
03-05-2019
33
For example, it is sufficient to extract K ^ + 1 positions r <→> j where the cost CBF is high.
[0151]
[0152]
The inter-sensor correlation calculation unit 210 predicts the direction or position of the target
sound and K ^ noises from the frequency domain signal X <→> (ω, τ), for example, by the
method described above.
When the movable control unit 200 is controlled by ε at the predicted position, the control
characteristic area is divided finely and the transfer characteristic A <→> (ω, ε) = [a <→> 1 (K,)
between each microphone Since ω, ε), a <→> 2 (ω, ε),..., a <→> K ′ (ω, ε)] are stored in
advance in the transfer characteristic storage unit 140, these values are extracted, The control
amount Z that minimizes the correlation between the transfer characteristics is determined by
the following equation (more specifically, see equations (20) to (24)), and is output to the
movable control unit 200.
[0153]
[0154]
<Moveable Control Unit 200> The moveable control unit 200 receives the control amount Z, and
moves the moveable reflection unit 180 or the microphone 212-m2 (M2 microphones 212-m2 in
this embodiment) (s22).
[0155]
When the difference between the received control amount Z and the previous time Z exceeds a
predetermined threshold, it is considered that the transfer characteristic to the microphone has
changed, and the movable type is detected only when a change in the transfer characteristic to
the microphone is detected. The reflecting unit 180 or the microphones 212-m2 (in this
embodiment, M2 microphones 212-m2) may be moved.
03-05-2019
34
[0156]
<Filter Calculation Unit 150> The filter calculation unit 150 extracts the transfer characteristic A
<→> (ω, ε) from the transfer characteristic storage unit 140, and calculates the filter W <→>
(ω, ε).
Then, control amount Z is received, and filter W <→> (ω, Z) corresponding to control amount Z
is output to filtering unit 160 each time control amount Z is changed.
For example, a filter W <→> (ω, ε) used for signal processing for suppressing an acoustic signal
from a specific position or direction is calculated.
[0157]
The gist of the beamforming technique of the present invention is that the transfer
characteristics are decorrelated over a wide band by changing the direction or position of the
diffusion structure or the microphone according to the nature of the observed signal (correlation
between microphones) It is to
Therefore, since the filter design concept itself is not affected, the filter W <→> (ω, ε) can be
designed by the same method as the prior art.
For example, the filter design method according to <1> SN ratio maximizing criterion described in
reference 5; <2> filter design method based on Power Inversion; <3> one or more blind spots (of
noise Filter design method by the minimum variance non-distortion response method with the
constraint condition of gain suppression), <4> filter design method by delay-and-sum beam
forming method, <5> maximum likelihood method The filter W <→> (ω, ε) can be designed by a
filter design method, <6> AMNOR (Adaptive Microphone-array for noise reduction) method or
the like.
[Reference 5] International Publication No. WO 2012/086834 Pamphlet
03-05-2019
35
[0158]
For example, when based on the delay-and-sum method, the filter W <→> DS1 (ω, ε) is
calculated by Expression (16).
[0159]
[0160]
Further, for example, when the maximum likelihood method is used as a base, the filter W <→>
DS 2 (ω, ε) is calculated by Expression (17).
[0161]
[0162]
Further, for example, in the case of a filter design method by the minimum variance no-distortion
response method having one or more dead angles as a constraint condition, the filter W <→> DS3
(ω, ε) is calculated by the following equation.
[0163]
[0164]
Here, fS (ω, ε) and fk (ω, ε) respectively indicate the pass characteristics at the frequency ω
with respect to the target sound and noise k (k = 1, 2,..., K).
For example, when the transfer characteristic a <→> (ω, ε) can be prepared in advance as the
transfer characteristic a <→> (ω, ε, θ) depending on the direction θ in the equation (26), the
transfer characteristic a The filter W <→> (ω, ε, θ) is calculated using <→> (ω, ε, θ), and the
filtering unit 160 can perform signal processing in a specific direction θs.
If the transfer characteristic a <→> (ω, ε) can be prepared in advance as the transfer
characteristic a <→> (ω, ε, θ, D) depending on the direction θ and the distance D, the transfer
characteristic a < →> (ω, ε, θ, D) is used to calculate the filter W <→> (ω, ε, θ, D), and the
03-05-2019
36
filtering unit 160 calculates a specific position (specific direction θs and distance DH) Signal
processing of the identified position can be performed.
[0165]
<Filtering Unit 160> The filtering unit 160 receives the filter W <→> (ω, Z) from the filter
calculation unit 150 each time the control amount Z is changed, and the frequency domain signal
X <→> (ω) for each frame. , τ), and for each frame τ, for each frequency ω ∈ Ω, frequency
domain signal X <→> (ω, τ) = [X 1 (ω, τ),..., X M (ω, τ)] <T The filter W <→> (ω, Z) is applied
to (see equation (5), s4) to output an output signal Y (ω, τ).
[0166]
[0167]
For example, the filtering unit may be configured to collect sound signals emitted from at least a
plurality of positions or directions in space based on a sound collection signal by M1
microphones 211-m1 and a sound collection signal by M2 microphones 212-m2. It is sufficient if
the characteristics are different.
Making the sound collection characteristics different means, for example, locally collecting
the sound signal emitted at a specific position and suppressing sound collection of sound signals
emitted at other positions as much as possible. It means that the sound signal emitted at a
position is suppressed (silenced) to pick up only the sound signal emitted at other positions.
[0168]
<Time Domain Transforming Unit 170> The time domain transforming unit 170 transforms the
output signal Y (ω, τ) of each frequency ω∈Ω of the τ frame into the time domain (s5), and
the frame unit time of the τ frame A domain signal y (τ) is obtained, and further, the obtained
frame unit time domain signal y (τ) is connected in the order of the index of the frame number
to output a time domain signal y (t).
The method of converting the frequency domain signal into the time domain signal is an inverse
03-05-2019
37
conversion corresponding to the conversion method used in the process of s3, and is, for
example, high-speed discrete inverse Fourier transform.
[0169]
<Effects> With such a configuration, the device scale for predetermined directivity performance
can be smaller than in the prior art.
At that time, a clue to distinguish between the target sound and the noise will be included in the
observation signal. For example, appropriate signal processing using a filter using a transfer
characteristic prepared in advance will cover a wide band. Arbitrary directional control is
possible.
In the present embodiment, the filter W <→> (ω, ε) is calculated in advance, but the filter
calculation unit 150 is performed after a predetermined directivity performance is determined
according to the calculation processing capability of the sound collection device 10 or the like.
May be configured to calculate the filter W <→> (ω, ε) for each frequency.
[0170]
Second Embodiment A description will be made focusing on parts different from the first
embodiment.
[0171]
<Point of Second Embodiment> In the present embodiment, the microphones are selected to
reduce the correlation of the transfer characteristics.
[Required conditions] (1) Having a part to evaluate the correlation of transfer characteristics.
(2)Based on the evaluation value, an effective microphone is selected to reduce the correlation of
the transfer characteristics.
03-05-2019
38
Here, the evaluation value corresponds to the control amount Z obtained in the first embodiment.
[0172]
Sound Collection Device 20 According to Second Embodiment (1) The sound collection device 20
has N microphones.
However, N is an integer of 3 or more.
(2)Select M microphones from N microphones.
However, it is assumed that N ≧ M> 1.
(Pattern 1) The N microphones are installed at a plurality of different predetermined positions,
and the microphones selected at the position where the correlation of the transfer characteristic
is reduced are selected based on the control amount Z.
(Pattern 2) It is assumed that N microphones have different directivity and are installed at the
same position, and based on the control amount Z, select a directional microphone whose
correlation of transfer characteristics is small. (Pattern 3) A combination of patterns 1 and 2.
That is, some of the N microphones are installed at a plurality of different predetermined
positions, and some have different directivity and are installed at the same position. Based on the
control amount Z, a microphone with which the correlation of the transfer characteristics
decreases (as long as the correlation of the transfer characteristics decreases with any
combination) is selected.
[0173]
[Signal Processing of Sound Collection Device 20] The functional configuration and processing
flow of the sound collection device 20 according to the second embodiment are shown in FIG. 15
03-05-2019
39
and FIG. The sound collection device 20 according to the second embodiment includes N
microphones 211-n, an AD conversion unit 120, a frequency domain conversion unit 130, a
filtering unit 160, a time domain conversion unit 170, a filter calculation unit 150, and a transfer
characteristic storage unit. 140 includes an inter-sensor correlation calculation unit 210 and a
selection unit 220. n = 1, 2,..., N, and N ≧ 3.
[0174]
<Transfer Characteristic Storage Unit 140> The transfer characteristic storage unit 140 is a
transfer characteristic A <→> n ′ (ω) = [a <→> n ′, 1 (ω), which is measured in advance using
the sound collection device 20. ..., a <→> n ', K' (ω)] are stored. a <→> n ', k (ω) = [an', 1 (ω), an ',
2 (ω), ..., an', M (ω)] <T>, where n '= 1 , 2,..., NCM, k = 1, 2,..., K ′), and when M microphones are
selected from N microphones 211-n, the control target area is included in the closely divided K
′ points Transfer characteristics at frequency ω between the selected k points and the M
selected microphones, in other words, a <→> n ′, k (ω) = [an ′, 1 (ω), an ′, 2 (ω),..., An ′, M
(ω)] <T> are each included in the selected M microphone arrays when M microphones are
selected from the N microphones 211-n. The transfer characteristic at frequency k at point k to
the microphone is used. However, M is an integer of 2 or more and N or less. Note that the
transfer characteristic A <→> n ′ (ω) may be prepared in advance by a theoretical formula or
simulation, not by prior measurement. As described above, n ′ may be an index (n ′ = 1, 2,...,
NCM) corresponding to all combinations in the case of selecting M microphones from N
microphones 211-n. Indexes corresponding only to combinations in which the correlation of
transfer characteristics is likely to be small (where n ′ = 1, 2,..., N ′, N ′ is the total number of
combinations in which the correlation of transfer characteristics that are appropriately set is
likely to decrease) It may be
[0175]
<Inter-Sensor Correlation Calculation Unit 210> The inter-sensor correlation calculation unit 210
uses the transfer characteristic A <→> n ′ (ω) instead of the transfer characteristic A <→> (ω,
ε).
[0176]
Therefore, the control amount Z is obtained as follows.
[0177]
03-05-2019
40
[0178]
Power average Cn ', 1 (ω) of correlation of transfer characteristics, channel capacity Cn', 2 (ω),
condition number Cn ', 3 (ω), determinant Cn', 4 (ω) It can be determined by (20 '), (21'), (23 ')
and (24').
[0179]
[0180]
[0181]
However, Λ m (ω) is the m-th eigenvalue of the spatial correlation matrix R (ω) in the control
amount ε, and the spatial correlation matrix R (ω) can be approximated by the following
equation.
[0182]
[0183]
[0184]
[0185]
The inter-sensor correlation calculation unit 210 calculates the correlation of the transfer
characteristic on any scale.
Furthermore, the costs Cn ′, i (ω) (where i = 1, 2, 3, 4) calculated for each frequency are
averaged.
[0186]
03-05-2019
41
[0187]
Finally, the control amount Z is obtained based on the frequency averaged cost C ^ n ', i.
[0188]
<Selection Unit 220> The selection unit 220 receives the control amount Z, and selects M
microphones from N microphones based on the control amount Z (s23).
That is, M microphones corresponding to n ′ (index corresponding to a combination of M
microphones selected from N microphones 211-n) giving the control amount Z are selected.
[0189]
The selection unit 220 outputs a control signal so as to output a collected signal to the AD
conversion unit 120 with respect to M microphones corresponding to n ′ giving the control
amount Z.
A control signal is output to the other microphones so as not to output the collected signal to the
AD conversion unit 120.
Note that the control signal may be output so as to process only the collected sound signals from
the M microphones corresponding to n ′ giving the control amount Z to the AD conversion unit
120.
[0190]
<Effects> With such a configuration, it is possible to identify a device configuration that reduces
the correlation of the transfer characteristics.
The structures of the first embodiment and the second embodiment may be combined.
03-05-2019
42
That is, the configuration may include the selection unit 220 for selecting a microphone, and the
movable control unit 200 for moving the microphone or the reflection unit.
M does not necessarily have to be a constant, and may be a variable that takes an integer of 2 or
more and N or less.
[0191]
Third Embodiment The differences from the second embodiment will be mainly described.
[0192]
<Point of Third Embodiment> In the present embodiment, the reflective portion is selected to
reduce the correlation of the transfer characteristics.
[Required conditions] (1) Having a part to evaluate the correlation of transfer characteristics.
(2)Based on the evaluation value, the effective reflector is selected to reduce the correlation of
the transfer characteristics.
[0193]
<Sound Collection Device 30 According to Third Embodiment> (1) The sound collection device
30 has Q reflection parts.
However, Q is an integer of 2 or more.
(2)P reflectors are selected from the Q reflectors.
03-05-2019
43
However, Q ≧ P ≧ 1.
(Pattern 1) The Q reflectors are installed at a plurality of different predetermined positions, and
the reflectors selected at the position where the correlation of the transfer characteristic is
reduced are selected based on the control amount Z. (Pattern 2) The Q reflectors are provided at
the same position, have different shapes and materials, and select a reflector having a shape and
a material with which the correlation of the transfer characteristic is reduced based on the
control amount Z. The reflector is made of a material capable of reflecting sound. The shape
should just be a shape which produces one or more reflected sounds. For example, it may be
plate-like as shown in FIG. 1 or may have another shape. For example, it may be shaped like the
diffusion structure 181 of FIG. An example of the shape of the reflective portion is shown in FIG.
When viewed from the front, it can be formed into a shape such as a rectangle, an ellipse, a
rounded rectangle, a rhombus, a regular octagon, a triangle and the like. In addition, when
viewed from the side, it can be formed into a concave surface, a convex surface, a second shape,
a pentagon, a hexagon, a vertical triangle, or an isosceles triangle. (Pattern 3) A combination of
patterns 1 and 2. That is, some of the Q reflectors are installed at a plurality of different
predetermined positions, and some are selected from those installed at the same position having
different shapes and materials. Based on the control amount Z, a reflector (which should have a
low correlation of the transfer characteristics regardless of the combination) is selected.
[0194]
In addition, the selected reflection part may be installed by the movable part which consists of a
motor etc., and may be installed by hand.
[0195]
[Signal Processing of Sound Collection Device 30] The functional configuration and processing
flow of the sound collection device 30 according to the third embodiment are shown in FIG. 18
and FIG.
The sound collection device 30 according to the third embodiment includes Q reflectors 180-q,
M microphones 211-m, an AD converter 120, a frequency domain converter 130, a filtering unit
160, a time domain converter 170, A filter calculation unit 150, a transfer characteristic storage
unit 140, an inter-sensor correlation calculation unit 210, a selection unit 220, and a display unit
230 are included. .., Q, Q ≧ 2, and m = 1, 2,..., M, M, 2.
03-05-2019
44
[0196]
<Transmission Characteristic Storage Unit 140> The transmission characteristic storage unit 140
is a transmission characteristic A <→> q ′ (ω) = [a <→> q ′, 1 (ω), which is measured in
advance using the sound collection device 30. ..., a <→> q ', K' (ω)] are stored. a <→> q ′, k (ω)
= [aq ′, 1 (ω), aq ′, 2 (ω),..., aq ′, M (ω)] <T> (where q ′ = 1 , 2,..., QCP, k = 1, 2,..., K ′) and
K reflective points in the case where P reflective portions are selected from the Q reflective
portions 180-q Transfer characteristic at frequency ω between the k points included in and the
M microphones, in other words, a <→> q ′, k (ω) = [aq ′, 1 (ω), aq ′, 2 (ω),..., aq ′, M (ω)]
<T> is for each microphone included in the M microphone arrays when P reflectors are selected
from the Q reflectors 180-q Transfer characteristics at frequency ω at point k. However, P is an
integer greater than or equal to 1 and less than or equal to Q. Note that the transfer
characteristic A <→> q ′ (ω) may be prepared in advance by a theoretical formula or a
simulation instead of the prior measurement. As described above, q ′ may be an index (q ′ = 1,
2,..., QCP) corresponding to all combinations in the case of selecting P reflectors from Q reflectors
180-q. Index corresponding to only combinations where the correlation of transfer
characteristics is likely to decrease (q ′ = 1, 2,..., Q ′, Q ′ are combinations of combinations of
transfer characteristics that are likely to be set appropriately. It is good also as total number).
[0197]
<Inter-Sensor Correlation Calculation Unit 210> The inter-sensor correlation calculation unit 210
uses the transfer characteristic A <→> q ′ (ω) in place of the transfer characteristic A <→> n ′
(ω) to calculate the control amount Z Ask.
[0198]
<Selection Unit 220> The selection unit 220 receives the control amount Z, and selects P
reflection units from the Q reflection units 180-q based on the control amount Z (s33).
That is, P reflective parts corresponding to q '(index corresponding to a combination of Q
reflective parts 180-q to select P reflective parts) to be controlled are selected. In the present
embodiment, it is assumed that the selected reflection unit is displayed on the display unit 230,
and P reflection units are manually installed. However, you may install by the movable part
which consists of motors etc.
03-05-2019
45
[0199]
<Effects> With such a configuration, it is possible to identify a device configuration that reduces
the correlation of the transfer characteristics. The structures of the first and second embodiments
and the third embodiment may be combined. That is, even if (1) at least one of the selection unit
220 for selecting the microphone and (2) the movable control unit 200 for moving the
microphone or the reflection unit is included, the selection unit 220 selects the reflection unit.
Good. P does not necessarily have to be a constant, and may be a variable taking an integer of 1
or more and Q or less.
[0200]
Fourth Embodiment A description will be made focusing on parts different from the third
embodiment.
[0201]
<Point of Fourth Embodiment> A sound collection unit having a low transfer characteristic
correlation is selected from S sound collection units including a plurality of microphones and a
reflection unit made of a material capable of reflecting sound.
However, S is an integer of 2 or more. [Required conditions] (1) Having a part to evaluate the
correlation of transfer characteristics. (2)Based on the evaluation value, a sound collection unit
that is effective to reduce the correlation of the transfer characteristics is selected from the
plurality of sound collection units. Here, the evaluation value corresponds to the control amount
Z obtained in the first embodiment.
[0202]
Sound Collection Device 40 According to Fourth Embodiment (1) The sound collection device 40
according to the fourth embodiment has S sound collection units. However, S is an integer of 2 or
more. (2)R pick-up units are selected from S pick-up units. However, S ≧ RS1.
03-05-2019
46
[0203]
[Signal Processing of Sound Collection Device 40] The functional configuration and processing
flow of the sound collection device 40 according to the fourth embodiment are shown in FIG. 20
and FIG. The sound collection device 20 according to the fourth embodiment includes S sound
collection units 410-s, AD conversion units 120, frequency domain conversion units 130,
filtering units 160, time domain conversion units 170, filter calculation units 150, and transfer
characteristics. A storage unit 140, an inter-sensor correlation calculation unit 210 and a
selection unit 220 are included. s = 1, 2,..., S, S ≧ 2. The sound collection unit 410-s includes Ms
microphones 211-s-ms and a reflection unit 490-s made of a material capable of reflecting
sound. ms = 1, 2,..., Ms. In the present embodiment, the reflection part is shaped like the
reflection structure 190 of FIG. 3 (shape that surrounds the microphone 112 having the
opening), but the diffusion structure 181 and the reflection part 180 of FIG. It may be shaped
like the above, or may be configured to include a plurality of reflecting portions for one sound
collecting portion. The reflecting portion is made of a material capable of reflecting sound, and
its shape may be a shape that causes one or more reflected sounds.
[0204]
<Transmission Characteristic Storage Unit 140> The transmission characteristic storage unit 140
is a transmission characteristic A <→> s (ω) = [a <→> s, 1 (ω),. a <→> s, K ′ (ω)] is stored. a
<→> s, k (ω) = [as, 1 (ω), as, 2 (ω), ..., as, Ms (ω)] <T> (where k = 1, 2, ..., K 'and subscript Ms
represent Ms), and when the sound collection unit 410-s is selected, the k target point and the
Ms microphones included in the K' point obtained by finely dividing the control target area
Transfer characteristic at frequency ω, in other words, a <→> s, k (ω) = [as, 1 (ω), as, 2 (ω), ...,
as, Ms (ω)] <T Is a transfer characteristic at frequency ω at point k to each microphone included
in the Ms microphone arrays when the sound collection unit 410-s is selected. Note that the
transfer characteristic A <→> s (ω) may be prepared in advance by a theoretical formula or
simulation, not by prior measurement.
[0205]
<Inter-Sensor Correlation Calculation Unit 210> The inter-sensor correlation calculation unit 210
obtains the control amount Z using the transfer characteristic A <→> s (ω) instead of the transfer
characteristic A <→> n ′ (ω). .
[0206]
03-05-2019
47
<Selection Unit 220> The selection unit 220 receives the control amount Z, and selects R
reflection units from the S sound collection units 410-s based on the control amount Z (s43).
That is, the sound collection unit 410-s corresponding to s giving the control amount Z is
selected.
[0207]
The selection unit 420 outputs a control signal so as to output a sound collection signal to the AD
conversion unit 120 with respect to the sound collection unit 410-s corresponding to s giving the
control amount Z. A control signal is output so as not to output the sound collection signal to the
AD conversion unit 120 with respect to the other sound collection unit 410 -s "(s ≠ s"). The
control signal may be output so as to process only the sound collection signal from the sound
collection unit 410-s corresponding to s for giving the control amount Z to the AD conversion
unit 120.
[0208]
<Effects> With such a configuration, it is possible to identify a configuration for reducing the
correlation of the transfer characteristic. In addition, you may combine the structure of 1st
embodiment, 2nd embodiment, 3rd embodiment, and 4th embodiment.
[0209]
<Other Modifications> The present invention is not limited to the above embodiment and
modifications. For example, the various processes described above may be performed not only in
chronological order according to the description, but also in parallel or individually depending on
the processing capability of the apparatus that executes the process or the necessity. In addition,
changes can be made as appropriate without departing from the spirit of the present invention.
For example, in the first embodiment, the inter-sensor correlation calculation unit 210 calculates
inter-sensor correlation (s21) and obtains the control amount Z of the movable control unit 200.
However, for a specific position or direction, The correlation between sensors is calculated, and
03-05-2019
48
further, the control amount Z of the movable control unit 200 is obtained, and when the user
inputs a specific position or direction, the corresponding control amount Z is also output. Good.
[0210]
<Program and Recording Medium> The above-described sound collection device can also be
functioned by a computer. In this case, a program for causing the computer to function as a
device (a device having the functional configuration illustrated in various embodiments) or each
process of the processing procedure (shown in each embodiment) in the computer A program for
causing the computer to execute may be executed by the computer. The program can be
recorded on a computer-readable recording medium such as a magnetic recording device, an
optical disc, a magneto-optical recording medium, or a semiconductor memory. When making a
computer execute a program, the program may be read from a recording medium, or may be
downloaded from a server or the like recording the program via a communication line.
[0211]
The present invention can be used for narrow pointing speech enhancement technology and
speech spot enhancement technology. In addition, it can be used for AGC (Auto Gain Control)
technology and area sound collection and reproduction technology.
03-05-2019
49