JP2011182292

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011182292
An object of the present invention is to monitor the sound of a calling party (received signal) sent
from a network and to control the operation of a microphone array, so that acoustic echo can be
suppressed, and the positions of a speaker and a speaker can be determined arbitrarily. To
provide a sound pickup technology that can reduce the amount of calculation. A reception signal
is received from a reception end, a reception section is determined from the reception signal, and
when it is determined not to be a reception section, a sound source position is detected, and a
sound collection level from the sound source is estimated. The directivity is formed, and when it
is determined that it is a reception section, the calculation of the filter coefficient is stopped in
the directivity formation filter calculation unit, and the transmission signal to be output to the
transmission end is set to 0. [Selected figure] Figure 5
Sound collecting device, sound collecting method and sound collecting program
[0001]
The present invention relates to sound collecting techniques such as video conferencing, audio
conferences, telephones, and remote lectures, and more particularly to sound collecting
techniques when a plurality of sound collecting and reproducing apparatuses are connected in
cascade.
[0002]
When a hands-free call is made, the distance between the microphone and the speaker is
generally large, so it is necessary to increase the sensitivity of the microphone in order to make
04-05-2019
1
the volume sufficiently audible.
In particular, when there are a plurality of speakers, it is necessary to change the sensitivity for
each speaker because the distance is different for each speaker. As a conventional technique for
achieving such an object, the technique shown in Example 1 of Patent Document 1 is known. In
the prior art, a microphone array consisting of a plurality of microphones is used to form a
directivity pattern having different sensitivities for each direction of the speaker. That is, as
shown in FIG. 1, for the speaker A located in the vicinity of the microphone array 11 consisting
of M microphones 11 m, the sensitivity is lowered and for the speaker B located away from the
microphone array 11 In order to enhance the sensitivity, form directivity. However, M is a
natural number of 2 or more, and m = 1, 2,.
[0003]
In the prior art, the direction of the speaker is automatically detected to adjust the sensitivity.
However, when using the sound pickup device according to the prior art in hands-free
communication, the reproduction sound of the speaker (sound sent from the other party) is
adjusted. Since the sensitivity is also increased, acoustic echo is generated, and in the worst case,
howling may occur. To address this problem, according to the method of the third embodiment of
Patent Document 1, it is possible to direct a directional dead angle to a speaker that emits
acoustic echo and to increase the sensitivity to the speaker while suppressing the acoustic echo.
That is, as shown in FIG. 2, when the directions of the speaker A and the speaker 2 are
sufficiently different as viewed from the microphone array 11, directivity can be formed such
that the sensitivity of the speaker can be increased while suppressing the acoustic echo. Further,
Non-Patent Document 1 is known as a directivity forming method in which the suppression of
the acoustic echo and the volume adjustment are simultaneously performed.
[0004]
Patent 4104626
[0005]
Kobayashi Kazunori, Furuya Kenichi, Haneda Yoichi, Kataoka Akitoshi, "Directive Automatic
Volume Control Microphone Array", Transactions of the Institute of Electronics, Information and
Communication Engineers, The Institute of Electronics, Information and Communication
04-05-2019
2
Engineers, 2004, Vol.
J87-A, No. 12, pp. 1491-1501
[0006]
However, the prior art places restrictions on the positions of the speaker and the speaker. That is,
as shown in FIG. 3, when the speaker A and the speaker 2 are located in the same direction or in
a close direction, appropriate directivity can not be formed because they can not be
distinguished. Therefore, in the case of a device in which the position of the speaker can be
arbitrarily determined, the prior art can not be used.
[0007]
In particular, as shown in FIG. 4, a plurality of sound pickup / reproduction devices 101, 102 and
103 are connected in cascade, and the sound transmitted from the ground speaker Z (callee) is
included in all sound pickup / reproduction devices In the second embodiment, the directions of
the speaker and the speaker are close to each other, and the position of the sound collecting and
reproducing apparatus is arbitrarily determined by the user. This problem becomes apparent.
[0008]
In order to solve the above problems, the sound collection technology according to the present
invention generates a transmission signal using sound reception signals emitted from each sound
source and collected by microphones of a plurality of channels arranged in an acoustic space.
Output to the transmitter end.
The receiver signal is received from the receiver end, the receiver section is determined from this
receiver signal, the filter coefficients of a plurality of channels are calculated so that the
transmission signal level for each sound source becomes a desired level, and a plurality of
microphone reception signals The filter coefficients of the channels respectively filter, add the
output signals of the filters of a plurality of channels, and output as a transmission signal. If it is
determined not to be the reception section, the sound source position is detected, the sound
collection level from the sound source is estimated to form directivity, and if it is determined to
be the reception section, the directivity formation filter is calculated. In the unit, the calculation
04-05-2019
3
of the filter coefficient is stopped, and the transmission signal output to the transmission end is
set to 0.
[0009]
The present invention observes the sound of the other party (received signal) sent from the
network and controls the operation of the microphone array, so that acoustic echo can be
suppressed, and the positions of the speaker and the speaker can be set arbitrarily. There is an
effect that it can be decided and the amount of calculation can be reduced.
[0010]
The figure which shows the example of the positional relationship of a speaker, and the
directivity pattern of the microphone array 11. FIG.
The figure which shows the example of the positional relationship of a speaker, and the
directivity pattern of a speaker and the microphone array 11. FIG. The figure which shows the
example of the directivity pattern in case a speaker and a speaker are located in the same
direction seeing from the microphone array 11. FIG. In (A), a plurality of sound pickup and
reproduction apparatuses 101, 102 and 103 are connected in cascade, and the sound sent from
the speaker Z on the ground side (calling party) is reproduced from the speakers 2 provided in
all the sound pickup and reproduction apparatuses FIG. 6B shows an example of the
configuration of the sound collection and reproduction apparatus 10. FIG. 2 shows an exemplary
configuration of the sound collection devices 100 and 200. The figure which shows the
processing flow of the sound collection apparatus 100 The figure which shows the example of a
structure of the receiving determination part 110. FIG. The figure which shows the change of the
process by the speech of the sound-collection apparatus of a prior art and Example 1, and a
receiving state. FIG. 7 is a diagram showing an example of the configuration of a directivity
formation filter calculation unit 120. The figure which shows the processing flow of the
directivity formation filter calculation part 120. FIG. FIG. 7 is a diagram showing an example of
the configuration of a reception determination unit 210.
[0011]
Hereinafter, embodiments of the present invention will be described in detail.
[0012]
04-05-2019
4
<Sound Collection Device 100> The sound collection device 100 according to the first
embodiment will be described with reference to FIGS. 5 and 6.
The sound collection device 100 includes two or more filters 121, 122,..., 12M, an incoming call
determination unit 110, a directivity formation filter calculation unit 120, and an addition unit
13.
[0013]
Further, for example, the microphone array 11 including M microphones 111, 112,..., 11M, or an
input unit (not shown) that receives an output of the microphone array 11 is provided. In the
present embodiment, the sound collection device 100 includes an input unit (not shown).
Moreover, you may provide any one of SW131-133. In the present embodiment, the switch 133
is provided.
[0014]
The sound collection device 100 transmits signals using sound reception signals emitted from
the respective sound sources 91, 92, ..., 9K and collected by the multiple channel microphones
111, 112, ..., 11M arranged in the acoustic space. To the transmit end 4. Although not shown in
the drawings, it is assumed that the output reception signals of the microphones 111 to 11M are
digital reception signals obtained by converting into digital values at a sampling frequency
determined in advance by a digital-to-analog converter.
[0015]
<Receive Reception Determination Unit 110> The reception determination unit 110 receives a
reception signal from a reception end (not shown) connected to the network 1, and determines a
reception section from the reception signal (s110). When it is determined that the reception
section is the reception section, the reception determination section 110 outputs a control signal
so as to stop the calculation of the filter coefficient in the directivity formation filter calculation
section 120 described later. Also, in order to set the transmission signal to be output to the
04-05-2019
5
transmission terminal 4 to 0, the control signal is output to the SW 133, the switch is turned off,
and the transmission signal is not output.
[0016]
A method of determining a reception interval will be illustrated using FIG. The reception judging
unit 110 includes, for example, a short time average power calculating unit 110B, a long time
average power calculating unit 110C, a dividing unit 110D, and a judging unit 110G.
[0017]
The short time average power calculation unit 110B calculates and outputs a short time average
power (for example, an average power of about 0.1 to 1 s) of the received reception signal. The
long time average power calculation unit 110C calculates and outputs a long time average power
(for example, an average power of about 1 to 100 s) of the received reception signal. Division
unit 110D receives short-time average power and long-time average power as input, obtains a
ratio RpR = PavSR / PavLR, and outputs it. The determination unit 110G inputs the ratio RpR
obtained by the division unit 110D and a predetermined threshold RthUR, compares them, and
determines the reception period when the ratio RpR exceeds the threshold RthUR, and the abovedescribed filter coefficient And a control signal to turn off the switch 133. In other cases, a
control signal for instructing calculation of the filter coefficient or a control signal for turning on
the SW 133 may be output. The threshold value RthUR is determined empirically and uniquely,
and is at least one value, for example, about 5 to 100.
[0018]
FIG. 8 shows the updated state of the filter and the time change of the output of the sound
collection device. In the figure, "on" indicates that the filter update and transmission signal is
output, and "off" indicates that the update and transmission signal is not output. In the case of
the third embodiment of Patent Document 1, in the reception state (section A), learning is
continued so as to direct a directional null toward the speaker. However, when the speech section
and the reception section change frequently (section B), learning of the filter is not effectively
performed. On the other hand, in the method according to the present invention, in the reception
section, the filter is not learned and processing is performed so that the output signal becomes 0,
so that the effect of sound volume adjustment is not impaired even in the case of section B. It can
04-05-2019
6
do repression.
[0019]
<Directivity formation filter calculation unit 120> When the reception determination unit 110
determines that the reception period is not set, the directivity formation filter calculation unit
120 receives the sound reception signal of the microphone array 11 as each sound source 91,
Filter coefficients of a plurality of channels are calculated so that transmission signal levels for
92,..., 9K respectively become desired levels (s 120), and the filter coefficients are output to the
filters 121, 122,. For example, the sound source position is detected by the method described in
Patent Document 1, the sound collection level from the sound source is estimated to form
directivity, and the filter coefficient is calculated. Details will be described later. When the
reception judging unit 110 judges that the section is a reception section, the directivity formation
filter calculating unit 120 stops the calculation of the filter coefficient in accordance with the
control signal received from the reception judging unit 110.
[0020]
<Filters 121, 122, ..., 12M> The filters 121, 122, ..., 12M receive the filter coefficients from the
directivity formation filter calculation unit 120 and set them. Furthermore, the sound reception
signals of the microphones 111, 112,..., 11M are respectively filtered with filter coefficients of a
plurality of channels (s125).
[0021]
<Adding unit 13> The adding unit 13 receives output signals of filters of a plurality of channels,
adds them all, and outputs the obtained value as a transmission signal to the transmission end 4
(s130).
[0022]
<Outline of Processing of Directionality Forming Filter Calculation Unit 120> The directivity
formation filter calculation unit 120 detects the sound source position by, for example, the same
method as in Patent Document 1, estimates the sound collection level from the sound source, and
detects directivity. Form
04-05-2019
7
The outline of the directivity formation filter calculation unit 120 will be described with
reference to FIGS. 9 and 10. The directivity formation filter calculation unit 120 includes a state
determination unit 14, a sound source position detection unit 15, a frequency domain conversion
unit 16, a covariance matrix calculation unit 17, a covariance matrix storage unit 18, and a sound
collection level estimation unit 19 and a filter coefficient calculation unit 21.
[0023]
As shown in FIG. 10, first, at step S1, the number of sound sources K is initialized to K = 0. Next,
speech detection is periodically performed by the state determination unit 14 in step S2, and
when speech is detected, the sound source position detection is performed by the sound source
position detection unit 15 in step S3. In step S4, it is determined whether the detected sound
source position matches any of the previously detected sound source positions, and if there is a
match, the covariance matrix RXX (ω) corresponding to the sound source position is co-located
in step S5. The variance matrix calculating unit 17 newly calculates the covariance matrix of the
corresponding region of the covariance matrix storage unit 18 in step S6.
[0024]
If it does not coincide with the sound source position previously detected in step S4, K is
increased by 1 in step S7, and the covariance matrix RXX (ω) corresponding to the sound source
position is calculated in step S8 as a covariance matrix calculation unit In step S9, the covariance
matrix is stored in a new area of the covariance matrix storage unit 18 in step S17.
[0025]
Next, the sound collection level is estimated by the sound collection level estimation unit 19 from
the covariance matrix stored in step S10, and the filter coefficient calculation unit 21 performs a
filter using the estimated sound collection level and the covariance matrix in step S11. The
coefficients are calculated, and in step S12, the set filter coefficients of the filters 121 to 12M are
updated.
Note that other conventional techniques may be used as a method of forming directivity.
04-05-2019
8
[0026]
Here, the details of step S8 will be described. In addition, about the process of another step, it
describes in patent document 1 in detail. The covariance matrix calculation unit 17 obtains
covariance of the sound reception signal of the microphone to generate a covariance matrix. For
each sound source 9 k (where k = 1, 2,..., K, and K is the number of sound sources), the frequency
domain conversion signal of the sound reception signal of the microphone obtained by the
frequency domain conversion unit 16 is X 1 (ω Assuming that) to XM (ω), the M × M
covariance matrix RXX (ω) of these signals is generally expressed by the following equation.
[0027]
[0028]
However, <*> represents a complex conjugate.
In the method according to the third embodiment of Patent Document 1, the covariance matrix
RXX (ω) is obtained in order to form directivity directing a dead angle to the speaker while
raising the sensitivity to the speaker. At this time, in order to perform time averaging, signal data
of a speech section and a reception section for a predetermined time or more are required.
However, as shown in section B of FIG. 8, since the speaker is generally replaced frequently when
talking, the speaker position may not be obtained before sufficient signal data can be obtained to
form the desired directivity. It often changes, and as a result, it may not be possible to obtain a
sufficient acoustic echo suppression effect although it requires a large amount of calculation to
obtain the covariance matrix.
[0029]
On the other hand, in the present embodiment, when it is determined that it is a reception
section, the directivity formation filter calculation unit 120 performs the processing of each unit
when the control signal is received so as to stop the calculation of the filter coefficient. Stop.
Therefore, the calculation of the covariance matrix RXX (ω) is also stopped. Therefore, when it is
determined not to be the receiving section, the filter coefficients are calculated based on the
04-05-2019
9
information (covariance matrix etc.) used for the calculation of the filter coefficients at the time
of stop. calculate. Therefore, it is possible to not only reduce the amount of calculation but also to
quickly follow the utterance after the reception period. <Effect> With this configuration, the
present invention observes the sound of the other party (reception signal) sent from the network
and controls the operation of the microphone array, so that acoustic echo can be suppressed. . In
other words, the directivity of the microphone array is learned only when the observed reception
signal is sufficiently small, and the sound collection process is performed using the obtained data.
On the other hand, when the reception signal is large, learning of the directivity of the
microphone array is stopped, and a process of decreasing the output of the microphone array is
performed. As a result, even when the speaker and the speaker position are close to each other as
described in the related art, it is possible to adjust the volume by forming directivity without
losing the acoustic echo suppression performance. Furthermore, the positions of the speaker and
the speaker can be arbitrarily determined, and the amount of calculation can be reduced. In
addition, even when the speakers change frequently, the acoustic echo suppression effect can be
obtained without requiring any calculation amount.
[0030]
The sound collection device 100 described above can also be functioned by a computer. In this
case, a program for causing the computer to function as a target device or a program for causing
the computer to execute each process of the processing procedure from a recording medium
such as a CD-ROM, a magnetic disk, or a semiconductor storage device Alternatively, the program
may be downloaded into the computer via a communication line and the program may be
executed.
[0031]
[Modification] The microphone array 11 may be (included) included in the sound collection
device 100. Note that, instead of the SW 133, the SW 131 or the SW 132 may be provided. Each
SW can be controlled to be on or off by the control signal of the reception determination unit
110. When the SW 132 is used, the addition process performed by the adding unit can be
omitted, and when the SW 131 is used, the filtering process performed by the filter can be
further omitted.
[0032]
04-05-2019
10
The reception judging unit 110 may judge the reception section by another method. For example,
the reception determination unit 110 includes a short time average power calculation unit 110B
and a determination unit 110G, and the short time average power calculation unit 110B
calculates the short time average power of the received signal (for example, for 0.1 to 1s).
Average power of degree) PavSR is calculated. The determination unit 110G compares the short
time average power PavSR calculated in the short time average power calculation unit 110B with
a predetermined threshold value RthUR, and determines that it is a reception section when
exceeding it and stops updating of the filter coefficient. The control signal and the control signal
for stopping the output of the transmission signal may be output. In the case where the reception
section can be sufficiently estimated by the short-time average power, with such a configuration,
it is possible to obtain the same effect as that of the first embodiment and obtain a sound
collection device with a small amount of calculation.
[0033]
Moreover, it does not have any of SW131-133, and it is good also as a structure which sets 0 as
a filter factor to filter 121, 122, ..., 12M. In this case, filter coefficients may be transmitted
instead of the control signal. For example, when the reception judging unit 110 judges that it is a
reception section, the control signal is output to the directivity forming filter calculating unit
120, and the directivity forming filter calculating unit 120 stops calculating the filter coefficient.
Then, information (covariance matrix RXX (ω) and the like) used to calculate the filter coefficient
at the time of stop is stored in a storage unit (not shown). After that, the reception judging unit
110 outputs a control signal so that the filter coefficients of the filters 121, 122, ..., 12M are all
0, or filter coefficients of all 0's into the filters 121, 122, ..., 12M. Send. The filters 121, 122,...,
12M perform filtering using the filter coefficients and the sound reception signals of the
microphones 111, 112,. After that, when it is determined that the reception section is not the
reception section in the reception determination section 110, the directivity formation filter
calculation section 110 extracts the information used for the calculation of the filter coefficient at
the time of stop from the storage section. Update the coefficients. With such a configuration, the
speaker position changes before sufficient signal data can be obtained, and as a result, a large
amount of calculation is required to obtain a covariance matrix, and a sufficient acoustic echo
suppression effect is obtained. Can solve the problem of
[0034]
<Sound Collection Device 200> The sound collection device 200 according to the second
04-05-2019
11
embodiment will be described with reference to FIGS. Only differences from the first embodiment
will be described. The sound collection device 200 is different from the sound collection device
100 according to the first embodiment in the configuration of the reception determination unit
210.
[0035]
<Receive Reception Determination Unit 210> The reception determination unit 210 receives a
reception signal from at least one of the microphones 111, 112,..., 11M (for example, a reception
signal of the microphone 111) (indicated by a broken line in FIG. 5) A receiver signal is received
from a receiver (not shown) connected to the network 1.
[0036]
In the case where it is determined that the reception section is the reception section, and the
reception signal and the reception signal are similar to each other, the directivity formation filter
calculation unit 120 stops the calculation of the filter coefficient. Output a control signal so that
Also, in order to set the transmission signal to be output to the transmission terminal 4 to 0, the
control signal is output to the SW 133, the switch is turned off, and the transmission signal is not
output.
[0037]
A method of determining a reception interval is illustrated using FIG. The reception judging unit
210 includes, for example, a short time average power calculating unit 110B, a long time average
power calculating unit 110C, a dividing unit 110D, a similarity degree judging unit 210H, and a
judging unit 210G.
[0038]
The processes in the short time average power calculation unit 110B, the long time average
power calculation unit 110C, and the division unit 110D are the same as in the first embodiment.
04-05-2019
12
[0039]
The similarity determination unit 210H receives the received signal collected by the microphone
111 and the received signal received from the receiving end as an input, and determines whether
the received signal and the received signal are similar using these as input. .
For example, the degree of similarity is determined using the cross correlation of the reception
signal and the reception signal, the correlation of the amplitude spectrum, or the like. At this
time, the similarity is normalized so as to take values between 0 and 1, where 0 indicates no
similarity and 1 indicates perfect match. Furthermore, the similarity is compared with a
predetermined threshold RthUS, and when the similarity exceeds the threshold RthUS, a
similarity determination result indicating similarity is output. The threshold value RthUS is a
value between 0 and 1 and is empirically determined uniquely, and takes, for example, a value of
about 0.5.
[0040]
Determination unit 110G receives ratio RpR determined by division unit 110D as a
predetermined threshold RthUR, compares ratio RpR with threshold RthUR, and indicates that
ratio RpR exceeds threshold RthUR and is similar. When the result of determination of the degree
of similarity is received, the control signal for stopping the calculation of the filter coefficient and
the control signal for turning off the SW 133 are output. In other cases, a control signal for
instructing calculation of the filter coefficient or a control signal for turning on the SW 133 may
be output. <Effects> With such a configuration, even when the speaker from which the received
signal is reproduced is in a direction close to the speaker, for example, the speaker is positioned
far from the speaker, or the speaker When there is little influence of the acoustic echo, such as
when there is a large shield between the microphones, it is possible to perform the volume
adjustment even in the presence of the reception signal. Such a configuration is effective because
it is not necessary to stop the volume adjustment by directivity formation when there is almost
no influence of the acoustic echo.
[0041]
The present invention can be used for sound collection technology and the like when using a
plurality of sound collection and reproduction devices connected in cascade.
04-05-2019
13
[0042]
100, 200 sound collection device 12 filter 13 addition unit 110 reception judgment unit 120
directivity formation filter calculation unit
04-05-2019
14