close

Вход

Забыли?

вход по аккаунту

JP2008177802

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008177802
An audio conference system capable of realizing a realistic conference according to the position
of a conferee who is present in each other's audio conference apparatus. A voice conference
device 1A in a conference room 100A generates voice communication data composed of an
output sound collection beam signal MBS and speaker direction information Pm according to a
voice when a conference person 201A speaks, and a conference It transmits to the audio
conference device 1B of the room 100B. The voice conference device 1B acquires a sound
emission voice signal and speaker orientation information Py from the received voice
communication data, and sets a virtual sound source 900 from the speaker orientation
information Py. The audio conference device 1B sets the delay adjustment amount D and the gain
adjustment amount G for each of the SP audio signals SPD1 to SPD4 to be given to the speakers
SP1 to SP4 based on the virtual sound source 900. The audio conference device 1 B performs
digital-analog conversion on the SP audio signals SPD 1 to SPD 4 adjusted by these adjustment
amounts, amplifies them, and emits the sound from the speaker SP. [Selected figure] Figure 7
Audio conference system and audio conference apparatus
[0001]
The present invention relates to an audio conference system for connecting two audio conference
devices arranged at mutually separated positions to perform an audio conference, and an audio
conference device used for the audio conference system.
[0002]
Conventionally, when performing an audio conference between two points separated from each
other, an audio conference apparatus such as Patent Document 1 or 2 is disposed at each point,
04-05-2019
1
and a conference person is present so as to surround the audio conference apparatus. Hold a
meeting.
[0003]
In the audio conference apparatuses of Patent Document 1 and Patent Document 2, one speaker
is disposed at the center of the housing so that sound is emitted from the top surface to the
outside, and different directions in the respective corner portions of the side surfaces are
collected. There are several microphones in place.
[0004]
In such a conventional audio conference apparatus, each microphone picks up the generated
sound from different azimuths and transmits an audio signal to the audio conference apparatus
on the other party side.
On the other hand, when the audio conference apparatus receives the audio signal collected by
the audio conference apparatus on the other party side, it emits the sound from the speaker as it
is.
JP-A-8-298696 JP-A-8-204803
[0005]
In the above-described conventional audio conference system, the audio conference device on the
sound collection side (transmission side) picks up the omnidirectional audio around the casing
with all the microphones and transmits it as one audio signal for sound emission.
Then, the sound conference apparatus on the sound emission side (reception side) gives the
received sound signal for sound emission to the speaker and emits the sound uniformly in all
directions.
[0006]
04-05-2019
2
In such a configuration, even if there are a plurality of conferees in the voice conference device
of the other party, the sound emission audio signal received from the other party is uniformly
emitted in all directions, so the other party is The position relationship of the conference person
who is present in the audio conference device of For this reason, it is not possible to give the
meeting a sense of realism.
[0007]
Therefore, an object of the present invention is to provide an audio conference system capable of
realizing a conference full of sense of realism and an audio conference apparatus used for the
audio system according to the positions of the conferees present in the audio conference
apparatuses of each other. It is in.
[0008]
According to the audio conference system of the present invention, there are provided a diskshaped housing, a plurality of unidirectional microphones circumferentially arranged in the
housing, and a plurality of speakers circumferentially arranged in the housing. , And a connection
means for connecting at least two of the plurality of audio conference devices.
Then, the voice conference device on the transmission side forms a collected sound beam signal
of a different sound collection direction from the collected sound signals of the plurality of
unidirectional microphones, and the collected sound beam signal based on the generated sound
of the conferee is signaled The sound collection direction is detected in all directions in the outer
direction of the disk-like casing corresponding to the selected sound collection beam signal, and
speaker orientation information is generated in response to the selected sound collection beam
signal, and the selected sound collection It is characterized in that the sound emission sound
signal based on the beam signal and the corresponding speaker direction information are
associated and transmitted. The voice conference device on the receiving side receives the voice
signal for sound emission from the voice conference device on the transmitting side and the
corresponding speaker direction information, and sets the virtual sound source for the same
direction as the direction of the speaker direction information. And the voices emitted from the
plurality of speakers are controlled as emitted from the virtual sound source.
[0009]
04-05-2019
3
In this configuration, the voice conference device on the transmission side picks up the speech of
the conferee and acquires the azimuth (sound pickup azimuth) of the conferee with respect to the
housing. Then, the sound emission sound signal based on the collected sound and the speaker
direction information are associated and transmitted. The audio conference device on the
receiving side sets the direction of the virtual sound source with respect to the housing from the
received speaker direction information, and the plurality of speakers are emitted so that the
sound based on the sound emission sound signal is emitted from the direction. Control the sound
to be heard. As a result, the conferee who is present at the receiving-side audio conference
apparatus can hear as if the voice is emitted from the same direction as the speaker direction at
the transmitting-side audio conference apparatus.
[0010]
Further, the voice conference apparatus on the receiving side of the present invention is
characterized in that amplitude and delay control of voices emitted from the speakers are
performed based on the set virtual sound source position and the positional relationship between
the speakers.
[0011]
In this configuration, the emitted sound from each speaker is subjected to amplitude control /
delay control according to the positional relationship between each speaker and the virtual sound
source position.
More specifically, the amplitude of the emitted sound is attenuated according to the distance
from the virtual sound source position, and the discharge timing is delayed according to the
distance from the virtual sound source position. As a result, a conference person who is present
at the sound conference apparatus on the sound emission side (reception side) can hear as if
sound is emitted from the virtual sound source position, regardless of the orientation of the audio
conference apparatus.
[0012]
In addition, the voice conference device on the transmission side of the present invention
04-05-2019
4
calculates the distance between the utterance position of the utterance and the side closest to the
utterance position of the housing together with the speaker orientation, and the speaker
orientation information from the speaker orientation and distance. Generate The voice
conference device on the receiving side is characterized by setting a virtual sound source based
on the direction and the distance obtained from the received speaker direction information.
[0013]
In this configuration, not only the speaker direction but also the distance from the housing to the
speaker is calculated to set the virtual sound source position. As a result, the speech position of
the conferee of the audio conference device on the sound collection side can be more accurately
reproduced by the audio conference device on the sound emission side.
[0014]
An audio conference apparatus according to the present invention is an audio conference
apparatus used for the above-described audio conference system. A plurality of speakers of this
audio conference apparatus are installed on the lower surface side of the housing with the sound
emission direction outward from the housing, and the plurality of unidirectional microphones
have the housing flat on the upper surface side of the housing It is characterized in that it is
installed with the viewed center direction as the sound collecting direction.
[0015]
In this configuration, the sound collection directivity direction of the microphones closest to each
other and the sound emission directivity direction of the speakers are in the opposite direction,
and it is difficult for the microphones to collect the wraparound sound from the speakers. It
becomes possible to detect with high accuracy the target speaker orientation in all directions of.
[0016]
According to the present invention, since the sound is emitted from the position of the conferee
of the partner apparatus, the position of each conferee in the conference room of the partner not
visible to the eye is known, and the sound emitted from the position By listening to the, you can
realize a voice conference system full of sense of reality.
04-05-2019
5
[0017]
A voice conference system according to an embodiment of the present invention will be
described with reference to the drawings.
FIG. 1 is a block diagram of the audio conference system according to the present embodiment.
FIG. 2 is an outline view of the audio conference apparatus used for the audio conference system
according to the present embodiment, in which (A) is a plan view and (B) is a side view. In FIG. 2,
θ indicates an angle that increases in the counterclockwise direction, with the direction of the
microphone MC1 and the speaker SP1 being 0 °, with the center of the audio conference device
1 viewed in plan as the rotation center. FIG. 3 is a functional block diagram of the audio
conference apparatus shown in FIG. As shown in FIG. 1, the audio conference system includes
audio conference devices 1A and 1B respectively disposed in the separated conference rooms
100A and 100B, and a network 500 connecting these audio conference devices 1A and 1B.
Conference tables 101A and 101B are respectively installed substantially at the centers of the
conference rooms 100A and 100B, and audio conference devices 1A and 1B are disposed on the
respective conference tables 101A and 101B. These audio conference apparatuses 1A and 1B are
provided with input / output I / F, and are connected to the network through these input / output
I / F. For example, in such a conference room 100A, the conferees 201A and 203A are seated
facing each other with the audio conference device 1A interposed therebetween, and the
conference person 201A is the speaker SP1 side of the audio conference device 1A and the
conference person 203A is an audio The user is seated at the speaker SP3 side of the conference
apparatus 1A. Also, in the conference room 100B, the conferees 202B and 204B are seated
facing each other with the audio conference apparatus 1B interposed, the conferencer 202B is on
the side of the speaker SP2 of the audio conference apparatus 1B, and the conferencer 204B is
an audio conference apparatus It is seated on the speaker SP4 side of 1B.
[0018]
Each of the voice conference devices 1A and 1B has the same specifications, and includes a diskshaped case 11. Specifically, the shape of the housing 11 in a plan view is circular, and the area
of the top surface and the bottom surface is smaller than the area of the midway portion in the
vertical direction, and the shape in a side view is a point from the height direction It has a shape
that narrows toward the surface and narrows from the one point to the bottom. That is, it has a
shape which has an inclined surface in the upper side and lower side respectively from the said
04-05-2019
6
one point. A recess 12 having a predetermined depth smaller than the area of the top surface is
formed on the top surface of the housing 11 so that the center of the recess 12 in plan view
coincides with the center of the top surface. It is set.
[0019]
Sixteen microphones MC1 to MC16 are installed inside the top surface side of the housing 11
along the side surface of the recess 12, and each of the microphones MC1 to MC16 takes the
center of the audio conference device 1 in plan view as a rotation center, etc. They are arranged
at an angular pitch (in this case approximately 22.5 ° apart). At this time, the microphones MC1
to MC16 are arranged along the direction in which the microphone MC1 is in the θ = 0 °
direction and θ is sequentially increased by 22.5 °. For example, the microphone MC5 is
disposed in the θ = 90 ° direction, the microphone MC9 is disposed in the θ = 180 °
direction, and the microphone MC13 is disposed in the θ = 270 ° direction. Each of the
microphones MC1 to MC16 has single directivity, and each microphone MC1 to MC16 is
arranged so as to have strong directivity in the central direction in plan view. For example, the
microphone MC1 sets the direction of θ = 180 ° as the center of directivity, the microphone
MC5 sets the direction of θ = 270 ° as the center of directivity, and the microphone MC9 sets
the direction of θ = 0 (360) ° as the center of directivity, The microphone MC13 has the
direction of θ = 90 ° as the center of directivity. The number of microphones is not limited to
this, and may be appropriately set according to the specification.
[0020]
The four speakers SP1 to SP4 are respectively installed such that the inclined surface on the
lower side of the housing 11 and the sound emitting surface coincide with each other, and the
speakers SP1 to SP4 rotate the center of the audio conference device 1 in plan view They are
arranged at equal angular pitches (in this case, approximately 90 ° apart) as centers. At this
time, the speaker SP1 is disposed in the same θ = 0 ° direction as the microphone MC1, the
speaker SP2 is disposed in the same θ = 90 ° direction as the microphone MC5, and the
speaker SP3 is disposed in the same θ = 180 ° direction as the microphone MC9. The speaker
SP4 is disposed in the same θ = 270 ° direction as the microphone MC13. Each of the speakers
SP1 to SP4 has strong directivity in the front direction of the sound emission surface, the speaker
SP1 emits sound around the θ = 0 ° direction, and the speaker SP2 centers on the θ = 90 °
direction The speaker SP3 emits the sound around the θ = 180 ° direction, and the speaker
SP4 emits the sound around the θ = 270 ° direction.
04-05-2019
7
[0021]
Thus, the speakers SP1 to SP4 are disposed on the lower side of the housing 11, the microphones
MC1 to MC16 are disposed on the upper side of the housing 11, and the sound collection
direction of the microphones MC1 to MC16 is the center direction of the housing 11 By doing
this, it becomes difficult for each of the microphones MC1 to MC16 to pick up the wraparound
sound from the speakers SP1 to SP4. For this reason, in the speaker position detection to be
described later, it becomes difficult to be influenced by the wraparound speech, and the speaker
position detection can be performed with higher accuracy.
[0022]
The operation unit 29 is installed on an inclined surface on the upper side of the housing 11 and
includes various operation buttons and a liquid crystal display panel (not shown). The input /
output I / F is an inclined surface on the lower side of the housing 11, and is installed at a
position where the speakers SP1 to SP4 are not installed. Although not shown, a network
connection terminal, a digital audio terminal, an analog audio terminal, etc. Equipped with Then,
a network cable is connected to this network connection terminal to connect to the
aforementioned network 500.
[0023]
The audio conference device 1 has a functional configuration as shown in FIG. 3 together with
such a structural configuration. The control unit 20 performs general control of setting of the
voice conference device 1, sound collection, sound emission and the like, and gives control of
each content of the operation instruction input by the operation unit 29 to each portion of the
voice conference device 1.
[0024]
(1) Sound Release The communication control unit 21 acquires voice data from voice
communication data from the other party's voice conference device received through the input /
output I / F, and outputs the channel CH1 as the voice emission voice signals S1 to S3. Output to
04-05-2019
8
CH3. At this time, the communication control unit 21 acquires the partner apparatus ID from the
voice communication data, and assigns a channel CH to each partner apparatus ID. For example,
when there is one connected partner device, the audio data from the partner device is assigned to
the channel CH1 as the sound emission sound signal S1. Also, when there are two connected
partner devices, the audio data from the two partner devices are individually assigned to the
channels CH1 and CH2 as the sound emission sound signals S1 and S2, respectively. Similarly,
when there are three connected partner devices, audio data from the three partner devices are
individually assigned to channels CH1, CH2 and CH3 as sound signals S1, S2 and S3,
respectively. . The channels CH1 to CH3 are connected to the sound emission control unit 22 via
the echo cancellation unit 28. The communication control unit 21 extracts speaker orientation
data Py (Pm) in the other party's voice conference apparatus associated with each voice data of
voice communication data, and supplies the voice direction control unit 22 with the channel
information.
[0025]
The sound emission control unit 22 generates speaker output signals SPD1 to SPD4 to be
provided to the speakers SP1 to SP4 based on the sound emission audio signals S1 to S3 and the
speaker direction information Py.
[0026]
FIG. 4 is a block diagram showing the main configuration of the sound emission control unit 22
of the present embodiment.
FIG. 5A is a view showing the distribution of virtual sound sources set by the sound emission
specification control unit 220. FIG. 5B is a diagram showing the contents of the noise output
specification table 2281.
[0027]
As shown in FIG. 4, the sound emission control unit 22 includes individual sound emission signal
generation units 221 to 223 corresponding to the sound emission sound signals S1 to S3, a
sound emission specification control unit 220, and speaker output signals SPD1 to SPD4. And a
memory 228 for storing a sound emission specification table 2281.
[0028]
04-05-2019
9
The sound emission specification control unit 220 sets a virtual sound source based on the
speaker direction information Py from the communication control unit 21.
As shown in FIG. 5A, the virtual sound source is set at an interval of 45 ° at a predetermined
distance with respect to the horizontal direction outward from the housing 11 and with the
center of the housing 11 as the rotation center. ing. More specifically, the virtual sound source
901 is set in the θ = 0 direction, which is the direction in which the speaker SP1 and the
microphone MC1 are arranged from the center of the housing 11, and 45 from the virtual sound
source 901 in the counterclockwise direction. The virtual sound sources 902 to 908 are set in
order at intervals of °. The number of virtual sound sources is not limited to this, and may be set
appropriately according to the device specification.
[0029]
The sound emission specification control unit 220 reads the delay adjustment amount D and the
gain adjustment amount G for setting the corresponding virtual sound source from the sound
emission specification table 2281 stored in the memory 228 based on the set virtual sound
source. For example, if the speaker direction information Py indicates the θ = 0 direction, the
sound emission specification control unit 220 sets the virtual sound source 901, and the delay
adjustment amount D11, the gain adjustment amount G11, and SP2 for SP1 are set. The delay
adjustment amount D21, the gain adjustment amount G21, the delay adjustment amount D31 for
SP3, the gain adjustment amount G31, the delay adjustment amount D41 for SP4, and the gain
adjustment amount G41 are read out. Here, the delay adjustment amount D and the gain
adjustment amount G are preset by the distance between the virtual sound source to be set and
each of the speakers SP1 to SP4. In addition, these adjustment amounts may be set by
performing an experiment of sound emission and collection environment measurement on the
spot after installing the audio conference device.
[0030]
When the sound emission specification control unit 220 reads out the delay adjustment amount
D and the gain adjustment amount G for each of the speakers SP1 to SP4, the individual sound
emission for the corresponding channel is performed based on the channel information given
together with the speaker direction information Py. The delay adjustment amount D and the gain
04-05-2019
10
adjustment amount G are output to any of the signal generation units 221 to 223. For example, if
the virtual sound source 901 is set and the channel information indicates CH1, the sound
emission specification control unit 220 sets the delay adjustment amount D11 for SP1, the gain
adjustment amount G11, and the delay adjustment amount D21 for SP2. The gain adjustment
amount G21, the delay adjustment amount D31 for SP3, the gain adjustment amount G31, the
delay adjustment amount D41 for SP4, and the gain adjustment amount G41 are given to the
individual sound emission signal generation unit 221.
[0031]
The individual sound emission signal generation units 221 to 223 are inputted from the
corresponding channels CH1 to CH3 based on the delay adjustment amount D and the gain
adjustment amount G for each of the speakers SP1 to SP4 given from the sound emission
specification control unit 220. The delay processing and gain control of the sound emission
sound signals S1 to S3 are output to the signal synthesis units 224 to 227. More specifically, the
individual sound emission signal generation unit 221 sets the delay adjustment amount D1 * (*: 1
to 8 corresponding to the virtual sound sources 901 to 908) and the gain adjustment amount G1
* (*: virtual sound sources 901 to 908). At corresponding 1 to 8), delay processing and gain
control of the sound emission sound signal S1 are performed, and are output to the signal
synthesis unit 224. At the same time, the individual sound emission signal generation unit 221
sets the delay adjustment amount D2 * (*: 1 to 8 corresponding to the virtual sound sources 901
to 908) and the gain adjustment amount G2 * (*: 1 to 6 corresponding to the virtual sound
sources 901 to 908). In 8), delay processing and gain control of the sound emission sound signal
S1 are performed, and the result is output to the signal synthesis unit 225. At the same time, the
individual sound emission signal generation unit 221 sets the delay adjustment amount D3 * (*: 1
to 8 corresponding to virtual sound sources 901 to 908) and the gain adjustment amount G3 * (*:
1 corresponding to virtual sound sources 901 to 908). Through 8), performs delay processing
and gain control of the sound emission sound signal S1 and outputs the result to the signal
synthesis unit 226. Furthermore, at the same time, the individual sound emission signal
generation unit 221 sets the delay adjustment amount D4 * (*: 1 to 8 corresponding to virtual
sound sources 901 to 908) and the gain adjustment amount G4 * (*: 1 corresponding to virtual
sound sources 901 to 908). Through 8), performs delay processing and gain control of the sound
emission sound signal S1 and outputs the result to the signal synthesis unit 227. The individual
sound emission signal generation units 222 to 223 also perform delay processing and gain
control of the sound emission sound signals S2 and S3 in the same processing as the individual
sound emission signal generation unit 221, and output the result to the signal synthesis units
224 to 227.
04-05-2019
11
[0032]
The signal combining unit 224 combines (adds) the signals for SP1 output from the individual
sound emission signal generating units 221 to 223, and outputs the signals as an SP1 audio
signal SPD1. Similarly, the signal combining unit 225 combines (adds) the signals for SP2 output
from the individual sound emission signal generating units 221 to 223, and outputs the result as
an SP2 audio signal SPD2. Similarly, the signal combining unit 226 combines (adds) the signals
for SP3 output from the individual sound emission signal generating units 221 to 223, and
outputs the result as an SP3 audio signal SPD3. Furthermore, in the same manner, the signal
combining unit 227 combines (adds) the signals for SP 4 output from the individual sound
emission signal generating units 221 to 223, and outputs the result as the SP 4 audio signal SPD
4.
[0033]
The D / A converter 23 performs digital-analog conversion on the SP audio signals SPD1 to
SPD4, and the sound emission AMP (amplifier) 24 amplifies the SP audio signals SPD1 to SPD4 at
a constant amplification factor, and produces speakers respectively. Give to SP1 to SP4.
[0034]
The speakers SP1 to SP4 perform voice conversion on the received SP audio signals SPD1 to
SPD4 and emit the sound.
[0035]
By performing such sound release processing, the voices emitted from the respective speakers
SP1 to SP4 have a predetermined delay relationship and amplitude relationship, so that the
feeling as if the voice was emitted from the set virtual sound source can be a conferee Can be
given to
[0036]
(2) Sound Collection The above-mentioned microphones MC1 to MC16 pick up external sounds
such as the generated sound of the conference person to generate sound collection signals MS1
to MS16.
04-05-2019
12
Each sound pickup AMP (amplifier) 25 amplifies the corresponding sound pickup signals MS1 to
MS16 at a predetermined amplification factor, and the A / D converter 26 converts the amplified
sound pickup signals MS1 to MS16 from analog to digital and collects them. It is output to the
sound control unit 27.
[0037]
FIG. 6 is a block diagram showing the main configuration of the sound collection control unit 27.
As shown in FIG.
As shown in FIG. 6, the sound collection control unit 27 includes an azimuthal sound collection
beam generation unit 271 and an output data determination unit 272.
The azimuthal sound collection beam generation unit 271 sets an appropriate combination for
the sound collection signals MS1 to MS16 (digital data), and performs delay / addition
processing of the collected sound collection signals, etc. Sound collection beam signals MB1 to
MB8 are generated with sound collection directions corresponding to eight different azimuths
corresponding to the virtual sound sources 901 to 908, respectively.
[0038]
The azimuthal sound collection beam generation unit 271 includes adders 2711 to 2718. The
adder 2711 adds the collected signals MS1, MS2 and MS16 to generate a collected beam signal
MB1 having strong directivity in the θ = 180 ° direction (corresponding to the virtual sound
source 905). The adder 2712 adds the collected signals MS2, MS3 and MS4 to generate a
collected beam signal MB2 having strong directivity in the θ = 225 ° direction (corresponding
to the virtual sound source 906). The adder 2713 adds the collected sound signals MS4, MS5,
MS6 to generate a collected sound beam signal MB3 having strong directivity in the θ = 270 °
direction (corresponding to the virtual sound source 907). The adder 2714 adds the collected
signals MS6, MS7, MS8 to generate a collected beam signal MB4 having strong directivity in the
θ = 315 ° direction (corresponding to the virtual sound source 908). The adder 2715 adds the
collected signals MS8, MS9, MS10 to generate a collected beam signal MB5 having strong
directivity in the θ = 0 direction (corresponding to the virtual sound source 901). The adder
2716 adds the collected signals MS10, MS11 and MS12 to generate a collected beam signal MB6
having strong directivity in the θ = 45 ° direction (corresponding to the virtual sound source
04-05-2019
13
902). The adder 2717 adds the collected sound signals MS12, MS13 and MS14 to generate a
collected sound beam signal MB7 having strong directivity in the θ = 90 ° direction
(corresponding to the virtual sound source 903). The adder 2718 adds the collected sound
signals MS14, MS15, MS16 to generate a collected sound beam signal MB8 having strong
directivity in the θ = 135 ° direction (corresponding to the virtual sound source 904).
[0039]
As described above, in the example of the embodiment, the collected beam signal MB1
corresponds to the virtual sound source 905, the collected beam signal MB2 corresponds to the
virtual sound source 906, and the collected beam signal MB3 corresponds to the virtual sound
source 907. The sound beam signal MB4 is made to correspond to the virtual sound source 908.
Further, the collected beam signal MB5 corresponds to the virtual sound source 901, the
collected beam signal MB6 corresponds to the virtual sound source 902, the collected beam
signal MB7 corresponds to the virtual sound source 903, and the collected beam signal MB8
corresponds to the virtual sound source 904. Correspond to The number of sound collection
beam signals to be generated is not limited to this, and can be appropriately set according to the
specification.
[0040]
The azimuthal sound collection beam generation unit 271 outputs the generated sound collection
beam signals MB1 to MB8 to the output data determination unit 272.
[0041]
The output data determination unit 272 includes a maximum signal detection unit 2721 and a
Select / Mix circuit 2722.
[0042]
The maximum signal detection unit 2721 compares the signal levels of the collected sound beam
signals MB1 to MB8 and selects the collected sound beam signal having the maximum signal
level.
The maximum signal detection unit 2721 outputs selected beam information MBM indicating the
04-05-2019
14
selected collected beam signal to the Select / Mix circuit 2722.
Further, the maximum signal detection unit 2721 outputs the direction information
corresponding to the selected sound collection beam signal to the communication control unit 21
as the speaker direction information Pm.
[0043]
The Select / Mix circuit 2722 selects the sound collection beam signal MB designated by the
information based on the selected beam information MBM from the maximum signal detection
unit 2721 and outputs it as the sound collection beam signal MBS for output. The Select / Mix
circuit 2722 does not select and output only the sound collection beam signal MB specified by
the selection beam information MBM, but collects the sound collection beam signal MB specified
by the selection beam information MBM. The sound beam signal MB may be mixed and output as
the output sound collection beam signal MBS.
[0044]
The echo cancellation unit 28 is an adaptive filter that generates a pseudo-regression sound
signal based on the sound emission sound signals S1 to S3 for the input sound collection sound
beam signal MBS, and an output sound collection beam signal MBS. And a post processor that
subtracts the pseudo-regression sound signal from. The echo cancellation circuit subtracts the
pseudo-regression sound signal from the output sound collection beam signal MBS while
sequentially optimizing the filter coefficient of the adaptive filter, whereby microphones are
output from the speakers SP1 to SP4 included in the output sound collection beam signal MBS.
The wraparound component to MC1 to MC16 is removed. The output sound collection beam
signal MBS ′ from which the wraparound component has been removed is output to the
communication control unit 21.
[0045]
The communication control unit 21 generates voice communication data by associating the
output sound collection beam signal MBS ′ from which the return sound is removed by the echo
cancellation unit 28 with the speaker direction information Pm from the sound collection control
04-05-2019
15
unit 27. , Output to input / output I / F. The voice communication data generated in this manner
is transmitted to the destination voice conference apparatus via the input / output I / F and the
network 500.
[0046]
With such a configuration, sound emission is performed at the position of the sound emitting
side audio conference device corresponding to the position of the speaker with respect to the
sound collecting side audio conference device. It is possible to give a feeling as if the speaker
who is seated is present at the sound conference device on the sound emission side and speaking
to each of the conferees present on the sound conference device on the sound emission side. This
makes it possible to have a remote conference full of sense of realism. At this time, regardless of
the position of the virtual sound source, all the speakers SP1 to SP4 emit the sounds whose
delay-amplitude relationships are controlled, respectively, rather than simply sounding from the
speakers near the virtual sound source position. More realistic sound source localization can be
realized. For example, when sound is emitted only with a speaker close to the virtual sound
source position, the sound emission direction is the virtual sound source direction, so a conferee
at a position symmetrical to the virtual sound source with respect to the audio conference device
1 I can not hear it. However, by emitting sound from all the speakers as in the configuration of
the present embodiment, since at least one speaker that emits sound toward the conferee is
present even if it is not the front, clear sound can be heard.
[0047]
Next, specific usage examples will be described with reference to the drawings. FIG. 7 is a
diagram for explaining a sound emitting and receiving state when the conferee 201A speaks in
the situation shown in FIG. In the case of FIGS. 1 and 7, in the conference room 100A, the
conferee 201A is present in the θ = 0 ° direction of the audio conference device 1A, and the
conferee 203A is present in the θ = 180 ° direction of the audio conference device 1A. doing.
In the conference room 100B, the conferee 202B is present in the θ = 90 ° direction of the
audio conference device 1B, and the conferencer 204B is present in the θ = 270 ° direction of
the audio conference device 1B.
[0048]
04-05-2019
16
When the conferee 201A of the conference room 100A speaks, the voice 301A is collected by
the voice conference apparatus 1A. At this time, since the sound 301A is mainly collected by the
microphones MC8, MC9 and MC10, the collected sound beam signal MB5 constituted by the
collected sound signals of these microphones MC8, MC9 and MC10 becomes equal to or more
than the predetermined threshold . The output sound collection beam signal MBS consisting of
the sound collection beam signal MB5 is echo-cancelled and transmitted to the voice conference
apparatus 1B as voice communication data together with the speaker orientation information Pm
of θ = 0 °.
[0049]
When receiving the voice communication data from the voice conference device 1A, the voice
conference device 1B of the conference room 100B extracts the voice data, assigns it to the
channel CH1, for example, and converts it to the voice emission voice signal S1. Further, the
voice conference device 1B extracts the speaker orientation information Pm (= Py) from the voice
communication data. The audio conference apparatus 1 B sets the virtual sound source 901 in
the θ = 0 direction since the speaker orientation information Py is θ = 0 °, and the delay
adjustment amount D and the gain adjustment amount G for realizing the virtual sound source
901. Read out. At this time, the SP audio signals SPD1 to SPD4 corresponding to the respective
speakers SP1 to SP4 have an amplitude strength such that SP1 audio signal SP1> SP2 audio
signal SP2 = SP4 audio signal SP4> SP3 audio signal SP3 The gain adjustment amount D is set,
and the delay time is set such that the SP1 voice signal SP1 <SP2 voice signal SP2 = SP4 voice
signal SP4 <SP3 voice signal SP3. The audio conference device 1B emits the SP audio signals
SPD1 to SPD4 adjusted in this way from the corresponding speakers SP1 to SP4. By emitting the
sound in this manner, the sound emission sound 401A corresponding to the SP1 sound signal
SPD1 corresponds to the sound emission sound 402A corresponding to the SP2 sound signal
SPD2 and the sound emission sound 403A corresponding to the SP3 sound signal SPD3. The
sound becomes louder than the emitted sound 404A corresponding to the audio signal SPD4.
Also, the sound emission sound 401A, the sound emission sound 402A, the sound emission
sound 403A, and the sound emission sound 404A are emitted in this order. As a result, the
conferees 202 B and 204 B can hear as if they were emitted from the virtual sound source 901.
As a result, in the conference persons 202B and 204B present on the audio conference device
101B side, there is a conference person 201A present on the audio conference device 101A in
the direction of θ = 0 °, and this conference person 201A is present You can feel like you are
speaking.
[0050]
04-05-2019
17
In the above description, although one person is speaking as an example, the present invention
can be applied even when two or more people are speaking simultaneously. In this case, the
voice conference device on the sound collection side picks up the voice signal of each speaker
and gives the speaker direction information individually, and the voice conference device on the
sound emission side is based on the acquired speaker direction information. Multiple virtual
sound sources may be set.
[0051]
In the above description, the case where communication is performed by two audio conference
devices has been described, but the setting of the above-mentioned virtual sound source is also
performed when communication is performed by a plurality of audio conference devices such as
three or four. It can apply. In this case, a channel may be individually assigned to each audio
conference apparatus to be a communication partner, virtual sound sources may be set for sound
emission audio signals of the respective channels, and delay processing and gain control may be
performed. More specifically, the sound emission voice signal S1 allocated to the channel CH1 is
subjected to delay processing and gain control based on the speaker direction information given
by the first audio conference device corresponding to the channel SP1, and each speaker SP1 is
controlled. An audio signal to ˜ SP4 is generated. Similarly, the sound emission sound signal S2
allocated to the channel CH2 is subjected to delay processing and gain control based on the
speaker direction information given by the second audio conference device corresponding to the
channel speech signal S2 to the speakers SP1 to SP4. Audio signal is generated. Further, the
sound emission sound signal S3 allocated to the channel CH3 is subjected to delay processing
and gain control based on the speaker direction information given by the third audio conference
device corresponding to the channel CH3, and is outputted to each of the speakers SP1 to SP4.
An audio signal is generated. As described above, by combining the audio signals generated for
the speakers SP1 to SP4, the SP audio signals SPD1 to SPD4 are generated and emitted from the
speakers SP1 to SP4. As a result, a conferee who is present at the audio conference device that
emits sound can hear the speech as if the conferee who is present at the other first to third voice
conference devices is present.
[0052]
Further, in the above description, although the case where the virtual sound source is set from
only the azimuth while the distance from the housing is the same, the virtual sound source may
be set using the azimuth and the distance. In this case, the voice conference device on the sound
04-05-2019
18
collection side calculates the distance between each microphone and the speech position from
the delay relationship of the sound signals collected by the microphones, and specifies the
speech position by using at least three distances. be able to. Then, the voice conference device on
the sound collection side generates speaker direction information including distance information
together with the direction information, and transmits it to the voice conference device on the
sound emission side. The voice conference device on the sound emission side sets a virtual sound
source obtained from the direction and the distance based on the received speaker direction
information, and performs delay processing and gain control to realize the virtual sound source.
This makes it possible to more realistically reproduce the speaking position (the position of the
speaking person who made the statement).
[0053]
It is a block diagram of the audio conference system of embodiment of this invention. BRIEF
DESCRIPTION OF THE DRAWINGS It is an outline view of the audio conference apparatus used
for the audio conference system of embodiment of this invention. It is a functional block diagram
of the audio conference device shown in FIG. It is a block diagram showing the main composition
of sound emission control part 22 of an embodiment of the present invention. They are a figure
which shows distribution of the virtual sound source set by the sound emission specification
control part 220, and a figure which shows the content of the sound emission specification table
2281. FIG. FIG. 6 is a block diagram showing the main configuration of a sound collection control
unit 27. In the situation shown in FIG. 1, it is a figure explaining the sound emission state when
each meeting person 201A speaks.
Explanation of sign
[0054]
1, 1A, 1B-voice conference device, 11-case, 12-recess, 21-communication control unit, 22-sound
emission control unit, 221-223-individual sound emission signal generating unit, 224-227-signal
combining unit , 23-D / A converter, 24-sound emission amplifier, 25-sound collection amplifier,
26-A / D converter, 27-sound collection control unit, 271- azimuthal sound collection beam
generation unit, 272-output data determination unit , 28-echo cancellation unit, 29-operation
unit, 100A, 100B-conference room, 101A, 101B-conference table, 201A, 203A, 202B, 204Bconferencer, 301A-voice (voice collection voice), 401A, 402A, 403A, 404A-voice (sound emission
voice), 500-network, 901-908-virtual sound source, SP1-SP4-speaker, MC1-MC16-microphone
04-05-2019
19
1/--страниц
Пожаловаться на содержимое документа