JP2018170718

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018170718
PROBLEM TO BE SOLVED: To provide a sound pickup device, program and method for
emphasizing only a target area sound with less distortion. SOLUTION: The present invention
relates to a sound collecting device. The sound collection device of the present invention changes
in accordance with the direction of arrival of the sound for each of a plurality of microphone
arrays consisting of two microphones, and takes a large value with respect to the sound coming
from the direction of the target area. An area feature obtained by integrating an arrival direction
feature of each microphone array is calculated for each frequency component by calculating an
arrival direction feature including a feature that takes a small value with respect to sound coming
from a direction other than the target area direction. The target area sound is extracted from the
signal based on the acquired signal output from the microphone array by using the area feature
amount. [Selected figure] Figure 1
Sound collecting device, program and method
[0001]
The present invention relates to a sound collection device, a program and a method, and can be
applied, for example, to emphasize only the sound of a specific area and suppress the sound of
other areas.
[0002]
Beamformer using a microphone array as a technology to emphasize sounds existing in a specific
direction (voice and sound; hereinafter, voice and sound may be collectively referred to as
03-05-2019
1
sound
) and to suppress other sounds There is.
A beam former is a technology which forms directivity and a dead angle using the time difference
of the signal which reaches each microphone (refer to nonpatent literature 1, nonpatent
literature 2).
[0003]
However, simply by pointing the directivity of the beamformer to the area for sound collection
(hereinafter referred to as "target area"), if there is a noise source around the target area, it will
be within the target area. There is a problem that not only the sound source (hereinafter referred
to as "target area sound") but also a noise source (hereinafter referred to as "non-target area
sound") present outside the target area simultaneously picks up sound.
[0004]
In order to solve this problem, conventionally, a method is proposed in which directivity is
crossed toward the target area from different directions using a plurality of microphone arrays,
and the target area sound is picked up (Patent Document 1).
In the method described in Patent Document 1, the target area is extracted by simultaneously
processing the beamformer output of each microphone array.
[0005]
FIG. 6 is an explanatory view showing an example of sound collection processing using a
plurality of conventional microphone arrays.
[0006]
FIG. 6 shows an example in which the directivity of the two microphone arrays MA (MA1, MA2)
is directed to the target area.
[0007]
03-05-2019
2
FIG. 6A shows the positional relationship between each microphone array MA and the sound
source of the target area sound when the directivity of the two microphone arrays MA1 and MA2
is directed to the target area.
6A also illustrates directivity (directivity of beam former) Z1 and Z2 corresponding to the
microphone arrays MA1 and MA2.
Furthermore, in the example of FIG. 6A, the sound source of the non-target area sound exists
around the sound source of the target area. Therefore, in the state of FIG. 6A, not only the target
area sound by the sound source in the target area but also the sound source in the non-target
area of the same directivity direction in the beamformer outputs of the microphone arrays MA1
and MA2. Non-purpose area sounds will be included.
[0008]
FIGS. 6 (b) and 6 (c) show the frequency components of the beamformer outputs of the two
microphone arrays MA1 and MA2, respectively. Assuming the speech sparsity, as shown in FIGS.
6B and 6C, one frequency component contains only one sound source (target area sound or nontarget area sound). Since the target area is included in the directivity of all the microphone
arrays, the frequency components of the target area sound are included in all beamformer
outputs in the same proportion and in the same distribution. In contrast, the frequency
components of non-target area sounds are different for each beamformer output. From such
features, the frequency component commonly included in each beamformer output can be
estimated as the component possessed by the target area sound, and based on this, the
conventional target area sound described in Patent Document 1 etc. The method of collecting
sound is realized.
[0009]
FIG. 7 is a block diagram showing a functional configuration of the sound collection device 10 to
which the conventional sound collection method is applied.
[0010]
A conventional sound collection device 10 shown in FIG. 7 includes a data input unit 2, a
frequency domain conversion unit 3, a directivity formation unit 4, a propagation delay
difference correction unit 5, a power correction unit 6, a first subtraction unit 7, and a first The
03-05-2019
3
second subtraction unit 8 is provided.
[0011]
Captured signals from the microphone arrays MA1 and MA2 are converted from analog signals
to digital signals (data) in the data input unit 2, and from the time domain to the frequency
domain in the frequency domain conversion unit 3 to obtain captured signal groups X1. And X2
are obtained.
Then, a beamformer having directivity such as directivity Z1 and directivity Z2 in FIG. 6A is
applied in directivity forming unit 4 to obtain beamformer output signals Xma1 (f) and Xma2 (f).
.
Then, the propagation delay difference correction unit 5 delays one of the beamformer output
signals Xma1 (f) and Xma2 (f) based on the distance (known information) between each
microphone array and the target area to match the timing. The delay correction signals X'ma1 (f)
and X'ma2 (f) are obtained.
[0012]
The power correction unit 6 calculates the amplitude correction coefficient αma 1 (alpha)
according to equation (1) in order to adapt to the direction of the speaker in the target area in
addition to the difference in amplitude due to the distance between each microphone array and
the target area. . The operator modef (A (f)) in the equation (1) is an operator that obtains the
value (the most frequent value) that appears most frequently among the function values A (f)
whose value changes depending on the variable f. Also, instead of the mode value, a median value
may be used as in equation (2). The operator medianf (A (f)) in the equation (2) is an operator
that obtains the median of the function value A (f) whose value changes depending on the
variable f.
[0013]
Then, the spectrum of the delay correction signal X'ma2 (f) related to the microphone array MA2
03-05-2019
4
whose amplitude is corrected by the amplitude correction coefficient αma1 from the delay
correction signal X'ma1 (f) related to the microphone array MA1 in the first subtraction unit 7 By
subtraction, the target area sound component overlapping at both beamformer outputs is erased,
and the non-target area sound component Nma1 (f) included in the delay correction signal X'ma1
(f) related to the microphone array MA1 is It is extracted. The equation (3) is a calculation
equation generally following such a concept. Nma1 = X'ma1-.alpha.ma1.X'ma2 (3)
[0014]
Then, the target area sound Yma1 (f) is extracted by performing spectrum subtraction on the
non-target area sound component Nma1 (f) from the delay correction signal X′ma1 (f) related
to the microphone array MA1 in the second subtraction unit 8 Be done. The equation (4) is a
calculation equation which generally follows this concept. Here, βma1 (beta) in the equation (4)
is a coefficient taking a constant value defining the removal intensity of the non-target area
sound. Yma1 = X'ma1-βma1 · Nma1 (4)
[0015]
As described above, if the conventional sound collection method is used, only the target area
sound can be collected even if the non-target area sound source exists around the target area.
[0016]
JP, 2014-72708, A
[0017]
Tadashi Asano, "Array signal processing of sound-localization, tracking and separation of sound
sources", The Acoustical Society of Japan, Corona, Inc., published on February 25, 2011 Takashi
Yato, Makoto Morito, Kei Yamada, Tetsuji Ogawa, " Sound source separation technology by
square microphone array (<Special feature> Efforts to commercialize speech recognition
technology), Information Processing Society of Japan, Information Processing 51 (11), pp. 14101416. 2010
[0018]
However, in the conventional sound collection method, since the spectrum subtraction is
performed twice to collect only the target area sound, there may be a problem with the sound
quality of the extracted target area sound.
03-05-2019
5
[0019]
Spectral subtraction is performed based on the amplitude or power of the observation signal or
the amplitude or power of the observation signal for each frequency component when there is an
observation signal in which the target sound component and the noise component are mixed and
the noise component estimated by an appropriate method. Is a method of estimating the
amplitude or power of the target sound.
The estimated noise component always contains an estimation error in the real environment.
Therefore, since spectral subtraction attenuates even the component of the target sound in the
frequency component in which the noise component is over-estimated, the problem in which the
target sound is distorted and the noise component in the frequency component in which the
noise component is underestimated are attenuated. Since it can not be completed, it has the
problem that a noise component remains.
Furthermore, since the sum of the amplitude or power of the true target sound and the amplitude
or power of the true noise does not necessarily coincide with the amplitude or power of the
observed signal for each frequency component, the estimated noise component is temporarily
estimated error Even if it does not include the spectrum subtraction, there is a problem that the
target sound is distorted and a noise component remains.
[0020]
The residual noise component is generally known as the greatest problem of spectral subtraction
because it is perceived as extremely unpleasant noise called musical noise.
Musical noise is noise in which the noise component is strongly distorted.
[0021]
In the conventional sound collection method, there is a problem that the emphasized target area
03-05-2019
6
sound may be distorted because the spectrum subtraction having the above problems is applied
twice.
[0022]
Therefore, there is a need for a sound collection device, program and method that emphasizes
only the target area sound with less distortion.
[0023]
The sound collection device according to the first aspect of the present invention (1) changes
according to the direction of arrival of sound for each of a plurality of microphone arrays
consisting of two microphones, and for sound coming from the direction of the target area
Feature quantity calculating means for calculating an arrival direction feature quantity having
features that take large values and take small values for sounds coming from directions other
than the direction of the target area; and (2) each of the microphones for each frequency
component Feature amount integrating means for obtaining area feature amounts obtained by
integrating the arrival direction feature amounts of an array, and (3) extracting a target area
sound from a signal based on a capture signal output from the microphone array using the area
feature amounts And a target area sound extraction unit.
[0024]
The sound collection program according to the second aspect of the present invention changes
the computer according to the direction of arrival of sound for each of a plurality of microphone
arrays consisting of (1) two microphones, and the sound coming from the direction of the target
area Feature amount calculating means for calculating an arrival direction feature amount having
a feature that takes a large value with respect to the sound coming from a direction other than
the direction of the target area, and (2) each frequency component Feature amount integration
means for acquiring an area feature amount obtained by integrating the arrival direction feature
amount of the microphone array, and (3) using the area feature amount, a signal based on a
capture signal output from the microphone array to a target area It is characterized in that it
functions as target area sound extraction means for extracting sound.
[0025]
The sound pickup method according to the third aspect of the present invention comprises (1)
feature quantity calculation means, feature quantity integration means, and target area sound
extraction means, and (2) the feature quantity calculation means comprises a plurality of two
microphones. Each microphone array changes according to the direction of arrival of the sound,
and takes a large value for the sound coming from the direction of the target area and a small
03-05-2019
7
value for the sound coming from the direction other than the direction of the target area (3) The
feature amount integration means acquires, for each frequency component, an area feature
amount in which the arrival direction feature amount of each of the microphone arrays is
integrated, (5) The target area sound extraction unit extracts the target area sound from the
signal based on the capture signal output from the microphone array, using the area feature
amount.
[0026]
According to the present invention, it is possible to provide a sound collection device, program
and method that emphasizes only the target area sound with less distortion.
[0027]
It is the block diagram shown about the functional composition of the sound collection device
concerning an embodiment.
It is an explanatory view shown about the example of the 1st arrival direction feature quantity
concerning an embodiment.
It is an explanatory view shown about the example of the 2nd arrival direction feature quantity
concerning an embodiment.
It is explanatory drawing shown about the example of the area feature-value which concerns on
embodiment.
It is explanatory drawing shown about the example of the determination result of the target area
calculated ¦ required by the sound collection apparatus to embodiment.
It is explanatory drawing shown about the example of the conventional sound collection method.
It is the block diagram shown about the functional composition of the conventional sound
collection device.
03-05-2019
8
[0028]
(A) Main Embodiment Hereinafter, one embodiment of a sound collection device, program and
method according to the present invention will be described in detail with reference to the
drawings.
[0029]
(A-1) Configuration of Embodiment FIG. 1 is a block diagram showing a functional configuration
of the sound collection device 100 of this embodiment.
[0030]
The sound collection device 100 performs target area sound collection processing for collecting
a target area sound from a sound source of a target area using acoustic signals supplied from the
M microphone arrays MA (MA1 to MAM).
[0031]
Each microphone array MA is disposed in a space where the target area is present, at a position
where it can be directed to the target area.
Each microphone array MA is configured by two microphones 1 (11, 12).
In each microphone array MA, an acoustic signal based on the sound captured by the two
microphones 11 and 12 is supplied to the data input unit 102.
[0032]
Next, the internal configuration of the sound collection device 100 will be described with
reference to FIG.
[0033]
03-05-2019
9
As shown in FIG. 1, the sound collection device 100 according to this embodiment has a data
input unit 102, a frequency domain conversion unit 103, a feature quantity calculation unit 104,
a feature quantity integration unit 105, and a target area sound extraction unit 106. doing.
Details of each component in the sound collection device 100 will be described later.
[0034]
In the sound collection device 100, the processing configuration after being converted into the
digital signal may be made to execute a program (including the sound collection program
according to the embodiment) on a computer provided with a processor, a memory, etc. Even
functionally, it can be represented in FIG.
[0035]
(A-2) Operation of Embodiment Next, the operation of the sound collection device 100 of this
embodiment having the configuration as described above (the sound collection method of this
embodiment) will be described.
[0036]
The data input unit 102 converts an acoustic signal captured by the microphone arrays MA1 to
MAM from an analog signal to a digital signal (data) for each microphone 1.
The data input unit 102 supplies the obtained captured signal to the frequency domain
conversion unit 103.
[0037]
In the following, capture signals captured by the microphones 11 of the microphone arrays MA1
to MAM are represented as x1, 1 (t) to xM, 1 (t), respectively, and capture signals captured by the
microphones 12 of the microphone arrays MA1 to MAM It represents as x1, 2 (t)-xM, 2 (t),
respectively.
[0038]
The frequency domain conversion unit 103 converts the captured signals x1, 1 (t) to xM, 1 (t)
03-05-2019
10
and x1, 2 (t) to xM, 2 (t) from the time domain to the frequency domain.
[0039]
In the following, the signals obtained by converting the captured signals x1, 1 (t) to xM, 1 (t) and
x 1, 2 (t) to xM, 2 (t) into the frequency domain are X 1, 1 (t) to XM, It is expressed as 1 (t), X1, 2
(t) to XM, 2 (t).
[0040]
The frequency domain conversion unit 103 acquires the acquired captured signals X1, 1 (t) to
XM, 1 (t) and X 1, 2 (t) to XM, 2 (t) of the frequency domain obtained by the feature amount
calculation unit 104 and The target area sound extraction unit 106 is supplied.
[0041]
For the transformation performed by the frequency domain transformation unit 103, fast Fourier
transformation (FFT), wavelet transformation, filter bank or the like can be used, but FFT is most
preferable.
Here, when performing the FFT, various window functions such as a Hamming window may be
used.
[0042]
The feature quantity calculation unit 104 determines the arrival direction feature quantity D1
(for each microphone array MA) from the captured signals X1, 1 (t) to XM, 1 (t) and X1, 2 (t) to
XM, 2 (t). f) Calculate ˜ DM (f).
The feature quantity calculation unit 104 supplies the obtained arrival direction feature
quantities D1 (f) to DM (f) to the feature quantity integration unit 105.
[0043]
03-05-2019
11
In the feature amount calculation unit 104, the arrival direction feature amounts D1 (f) to DM (f)
are the captured signals X1, 1 (t) to XM, 1 (t), X1, 2 (t) to XM, 2 (t) Are calculated by the same
calculation method for each microphone array MA.
The captured signals Xi, 1 (f) and Xi, 2 (f) and the arrival direction feature value Di (f) in the i-th (i
is any of 1 to M) microphone arrays MAi will be described below.
[0044]
It is preferable that the arrival direction feature quantity Di (f) has a feature that takes a large
value in the direction of the target area and takes a small value in directions other than the
direction of the target area.
Any calculation method may be used as long as the arrival direction feature amount Di (f) has
such a feature.
When the target area is located in the front direction of all the microphone arrays MA, it is
preferable to use, for example, the equation (5).
[0045]
Captured signals Xi, 1 (f) and Xi, 2 (f) are signals in which the target area sound and the nontarget area sound are mixed, but assuming the speech sparsity, the target area sound is
generated for each frequency component. And only one of the non-purpose area sound will be
included.
Therefore, if an angle at which a certain sound source arrives at a microphone array MA is
defined as θ (theta), equation (5) can be expanded as equation (6). In equation (6), c is the speed
of sound, and d is the distance between the two microphones 11 and 12 constituting the
microphone array. Similarly, assuming the speech sparsity, as a method of calculating the arrival
direction feature quantity Di (f), it is also possible to use a calculation method for explicitly
determining the direction of arrival as shown in equation (7). The inside of the absolute value of
03-05-2019
12
the equation (7) is the value (sin θ) of the sine function of the arrival direction θ.
[0046]
Next, a specific example of the arrival direction feature quantity Di (f) will be described using
FIGS. 2 and 3.
[0047]
FIGS. 2A and 3A show an example in which the arrival direction feature quantities D1 (f) and D2
(f) respectively corresponding to the microphone arrays MA1 and MA2 are obtained using the
equation (5). It is shown by a graph of dimensions (vertical, horizontal, height).
[0048]
In the graphs of FIGS. 2A and 3A, the distance from the microphone array MA1 is the vertical
position (vertical axis of the graph), and the distance from the microphone array MA2 is the
horizontal position (horizontal axis of the graph And the values of the arrival direction feature
amounts D1 (f) and D2 (f) are heights (axes in the height direction (vertical direction) of the
graph).
The graphs of FIGS. 2A and 3A show that the arrival direction feature value D1 when the target
area sound and the non-target area sound arrive from various vertical and horizontal positions
when f = 3 kHz. (F) shows the values of D2 (f).
[0049]
FIG. 2 (b) shows the values of the arrival direction feature value D1 (f) at each position of P411
to P416 shown in FIG. 2 (a).
As shown in FIG. 2 (b), the values of the arrival direction feature value D1 (f) at each position of
P411 to P416 are -0.13, 1, -0.13, 0.72, 1, 0.72, respectively. It becomes.
[0050]
03-05-2019
13
FIG. 2A is a graph of the arrival direction feature amount D1 (f) of the sound at f = 3 kHz for the
microphone array MA1 when the microphone array MA1 is installed at 1.5 m in the horizontal
position and 0 m in the vertical position. ing. As shown in FIGS. 2A and 2B, in the front direction
of the microphone array MA1 (when the horizontal position is 1.5 m), the arrival direction
feature value D1 (f) has a peak value I understand.
[0051]
FIG.3 (b) has shown the value of arrival direction feature-value D2 (f) in each position of P421P426 shown in figure to Fig.3 (a). As shown in FIG. 3B, the values of the arrival direction feature
value D2 (f) at each position of P421 to P426 are 0.72, -0.13, 1, -0.13, 0.72, 1 It becomes.
[0052]
FIG. 3A is a graph of the arrival direction feature D2 (f) at f = 3 kHz for the microphone array
MA2 when the microphone array MA2 is installed at 0 m in the horizontal position and 1.5 m in
the vertical position. As shown in FIGS. 3A and 3B, in the front direction of the microphone array
MA2 (when the vertical position is 1.5 m), the arrival direction feature value D2 (f) has a peak
value I understand.
[0053]
The feature amount integration unit 105 integrates the arrival direction feature amounts D1 (f)
to DM (f) for each frequency component to calculate an area feature amount E (f). The obtained
area feature amount E (f) is given to the target area sound extraction unit 106.
[0054]
The calculation method (integration method) of the area feature amount E (f) is such a calculation
method that the area feature amount E (f) also increases when all the arrival direction feature
amounts D1 (f) to DM (f) are large As long as it is an integration method, any calculation method
03-05-2019
14
may be used. For example, as shown in equation (8), the arrival direction feature value D1 (f)
which is the smallest for all the microphone arrays for each frequency component .., DM (f) may
be selected as the area feature amount E (f). E(f)=min[D1(f),…,DM(f)]
…(8)
[0055]
Next, a specific example of the area feature amount E (f) will be described with reference to FIG.
[0056]
FIG. 4A shows an example in the case where the area feature amount E (f) is obtained using the
equation (8) in a three-dimensional (longitudinal, horizontal, and height) graph.
[0057]
FIG. 4 (a) shows the arrival direction feature quantities D1 (f) and D2 (f) as equation (5) when
microphone arrays MA1 and MA2 (two microphones 1) are arranged as shown in FIG. An
example is shown in which the area feature amount E (f) is calculated by applying the calculated
arrival direction feature amounts D1 (f) and D2 (f) to the equation (8).
That is, FIG. 4 (a) is an area feature E (f) obtained by integrating the arrival direction feature D1
(f) and D2 (f) shown in FIG. 2 (a) and FIG. 3 (a) by equation (8). Is shown.
[0058]
In the graph of FIG. 4A, the distance from the microphone array MA1 is the vertical position
(vertical axis of the graph), and the distance from the microphone array MA2 is the horizontal
position (horizontal axis of the graph). The value of E (f) is taken as the height (the axis in the
height direction (vertical direction) of the graph).
FIG. 4A shows values of the area feature amount E (f) at various vertical and horizontal positions
when f = 3 kHz.
03-05-2019
15
[0059]
FIG. 4B shows the value of the area feature amount E (f) at each position of P51 to P59 shown in
FIG. 4A. As shown in FIG. 4B, the values of the area feature value E (f) at the respective positions
of P51 to P59 are -0.13, 0.36, -0.13, 0.36, -0.13. , 0.36, 0.72, 0.36, 1.
[0060]
As shown in FIGS. 4A and 4B, the area feature amount E (in the vicinity of the point where the
horizontal position and the vertical position are both 1.5 m) in the front direction of the
microphone array MA1 and the microphone array MA2 It can be seen that f) has a large value.
[0061]
The target area sound extraction unit 106 generates a target based on the captured signals X1, 1
(t) to XM, 1 (t), X1, 2 (t) to XM, 2 (t) and the area feature E (f). The area emphasis sound Y (f) is
calculated.
Then, the target area sound extraction unit 106 supplies (outputs) the obtained target area
emphasis sound Y (f) to the next stage.
[0062]
In the target area sound extraction unit 106, the selection (X1, 1 (t) to XM, 1 (t), X1, 2 (t) to XM,
2 (t) of the capture signal to be the target of extraction (emphasis) of the target area sound. ) Is
optional, and may be, for example, the first X1, 1 (f), or the capture signal of a microphone
closest to the target area, or the microphone array MA closest to the target area A delayed sum
beamformer may be applied to the captured signal group to make the target area sound slightly
emphasized (referred to as an integrated captured signal). Hereinafter, the selected acquisition
signal or integrated acquisition signal will be referred to as an extraction target signal X '(f).
[0063]
03-05-2019
16
In the target area sound extraction unit 106, extraction (emphasis) of the target area sound is
achieved by attenuating frequency components other than the target area sound among the
frequency components of the extraction target signal X '(f). Since the area feature amount E (f)
has a larger value as it is closer to the target area, the target area sound extraction unit 106
extracts the extraction target signal X '(according to the size of the area feature amount E (f). The
target area sound can be extracted (emphasized) by attenuating f). In the target area sound
extraction unit 106, for example, as in equation (9), a predetermined threshold F (f) is previously
determined for each frequency component, and the area feature amount E (f) is the threshold F
(f). If smaller, the target frequency emphasizing sound Y (f) in which only the frequency
component of the target area sound remains can be obtained by attenuating (for example, setting
as zero) the frequency component of the extraction target signal X ′ (f). it can.
[0064]
In the target area sound extraction unit 106, the threshold F (f) may be a constant value
regardless of the frequency component, but in this case, the range of the area extracted
(emphasized) by the frequency component changes. This is because the target area is wider as
the frequency is lower and narrower as the frequency is higher. Therefore, the target area sound
extraction unit 106 determines the target area (the range in which the frequency component is
not attenuated) regardless of the level of the frequency by setting the threshold F (f) for each
frequency component as shown in equation (10), for example. It can be set to a certain range. In
equation (10), φ (phi) is the size (angle) of an apparent area from each microphone array.
[0065]
Similar to FIGS. 2 to 4, FIG. 5 shows the range determined as the target area when φ = π / 10
when the microphone arrays MA1 and MA2 are arranged as shown in FIG. 6. ing.
[0066]
In FIG. 5, the blackened area indicates the range determined not to be the target area based on
the threshold, and the other area (the area not filled in black) is determined to be the target area
based on the threshold. Is shown.
[0067]
As shown in FIG. 5, it can be seen that the range of approximately 1 to 2 m in both the vertical
and horizontal directions is determined to be the target area based on the threshold value.
03-05-2019
17
[0068]
(A-3) Effects of the Embodiment According to this embodiment, the following effects can be
achieved.
[0069]
In the sound collection device 100 of this embodiment, since no spectral subtraction is
performed, even in a situation where the target area is surrounded by the non-target area sound
source, it is possible to emphasize only the target area sound with little distortion.
[0070]
(B) Other Embodiments The present invention is not limited to the above embodiments, and may
include modified embodiments as exemplified below.
[0071]
(B-1) The feature quantity calculation unit 104 can also apply the formula (11) or the formula
(12) to the method of calculating the arrival direction feature quantity Di (f).
[0072]
Further, in the feature amount integration unit 105, the calculation method of the area feature
amount E (f) (the integration method of the arrival direction feature amounts D1 (f) to DM (f))
also includes the equations (13) and (14). Applicable
[0073]
(B-2) In the target area sound extraction unit 106, a frequency component smaller than a certain
frequency (for example, 250 Hz) of the threshold F (f) may be a constant value.
For example, if F (f) at 250 Hz is used as the value of F (f) at less than 250 Hz, the range
determined to be the target area becomes wider at frequency components less than 250 Hz, and
low frequency components of frequency become less likely to distort A target area emphasis
sound Y (f) with less distortion can be obtained.
03-05-2019
18
[0074]
Further, the target area sound extraction unit 106 prepares two threshold values F1 (f) and F2
(f), calculates the area emphasis gain G (f), and obtains the obtained area emphasis gain G (f). The
target area emphasis sound Y (f) may be calculated by multiplying the extraction target signal X
′ (f).
For example, assuming that two threshold values F1 (f) and F2 (f) are calculated according to
equation (15), the area emphasis gain is calculated by equation (16) as φ1 = π / 9 and φ2-π /
11. It is good.
As a result, among the frequency components of the extraction target signal X ′ (f), the degree
of attenuation of the component derived from the sound source existing near the boundary
between the target area and the non-target area becomes gentle, so that the target area with less
distortion An emphasis sound Y (f) is obtained.
[0075]
100 ... sound collecting device 102 ... data input unit, 103 ... frequency domain conversion unit,
104 ... feature amount calculation unit, 105 ... feature amount integration unit, 106 ... target area
sound extraction unit.
03-05-2019
19