Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2008177802 An audio conference system capable of realizing a realistic conference according to the position of a conferee who is present in each other's audio conference apparatus. A voice conference device 1A in a conference room 100A generates voice communication data composed of an output sound collection beam signal MBS and speaker direction information Pm according to a voice when a conference person 201A speaks, and a conference It transmits to the audio conference device 1B of the room 100B. The voice conference device 1B acquires a sound emission voice signal and speaker orientation information Py from the received voice communication data, and sets a virtual sound source 900 from the speaker orientation information Py. The audio conference device 1B sets the delay adjustment amount D and the gain adjustment amount G for each of the SP audio signals SPD1 to SPD4 to be given to the speakers SP1 to SP4 based on the virtual sound source 900. The audio conference device 1 B performs digital-analog conversion on the SP audio signals SPD 1 to SPD 4 adjusted by these adjustment amounts, amplifies them, and emits the sound from the speaker SP. [Selected figure] Figure 7 Audio conference system and audio conference apparatus [0001] The present invention relates to an audio conference system for connecting two audio conference devices arranged at mutually separated positions to perform an audio conference, and an audio conference device used for the audio conference system. [0002] Conventionally, when performing an audio conference between two points separated from each other, an audio conference apparatus such as Patent Document 1 or 2 is disposed at each point, 04-05-2019 1 and a conference person is present so as to surround the audio conference apparatus. Hold a meeting. [0003] In the audio conference apparatuses of Patent Document 1 and Patent Document 2, one speaker is disposed at the center of the housing so that sound is emitted from the top surface to the outside, and different directions in the respective corner portions of the side surfaces are collected. There are several microphones in place. [0004] In such a conventional audio conference apparatus, each microphone picks up the generated sound from different azimuths and transmits an audio signal to the audio conference apparatus on the other party side. On the other hand, when the audio conference apparatus receives the audio signal collected by the audio conference apparatus on the other party side, it emits the sound from the speaker as it is. JP-A-8-298696 JP-A-8-204803 [0005] In the above-described conventional audio conference system, the audio conference device on the sound collection side (transmission side) picks up the omnidirectional audio around the casing with all the microphones and transmits it as one audio signal for sound emission. Then, the sound conference apparatus on the sound emission side (reception side) gives the received sound signal for sound emission to the speaker and emits the sound uniformly in all directions. [0006] 04-05-2019 2 In such a configuration, even if there are a plurality of conferees in the voice conference device of the other party, the sound emission audio signal received from the other party is uniformly emitted in all directions, so the other party is The position relationship of the conference person who is present in the audio conference device of For this reason, it is not possible to give the meeting a sense of realism. [0007] Therefore, an object of the present invention is to provide an audio conference system capable of realizing a conference full of sense of realism and an audio conference apparatus used for the audio system according to the positions of the conferees present in the audio conference apparatuses of each other. It is in. [0008] According to the audio conference system of the present invention, there are provided a diskshaped housing, a plurality of unidirectional microphones circumferentially arranged in the housing, and a plurality of speakers circumferentially arranged in the housing. , And a connection means for connecting at least two of the plurality of audio conference devices. Then, the voice conference device on the transmission side forms a collected sound beam signal of a different sound collection direction from the collected sound signals of the plurality of unidirectional microphones, and the collected sound beam signal based on the generated sound of the conferee is signaled The sound collection direction is detected in all directions in the outer direction of the disk-like casing corresponding to the selected sound collection beam signal, and speaker orientation information is generated in response to the selected sound collection beam signal, and the selected sound collection It is characterized in that the sound emission sound signal based on the beam signal and the corresponding speaker direction information are associated and transmitted. The voice conference device on the receiving side receives the voice signal for sound emission from the voice conference device on the transmitting side and the corresponding speaker direction information, and sets the virtual sound source for the same direction as the direction of the speaker direction information. And the voices emitted from the plurality of speakers are controlled as emitted from the virtual sound source. [0009] 04-05-2019 3 In this configuration, the voice conference device on the transmission side picks up the speech of the conferee and acquires the azimuth (sound pickup azimuth) of the conferee with respect to the housing. Then, the sound emission sound signal based on the collected sound and the speaker direction information are associated and transmitted. The audio conference device on the receiving side sets the direction of the virtual sound source with respect to the housing from the received speaker direction information, and the plurality of speakers are emitted so that the sound based on the sound emission sound signal is emitted from the direction. Control the sound to be heard. As a result, the conferee who is present at the receiving-side audio conference apparatus can hear as if the voice is emitted from the same direction as the speaker direction at the transmitting-side audio conference apparatus. [0010] Further, the voice conference apparatus on the receiving side of the present invention is characterized in that amplitude and delay control of voices emitted from the speakers are performed based on the set virtual sound source position and the positional relationship between the speakers. [0011] In this configuration, the emitted sound from each speaker is subjected to amplitude control / delay control according to the positional relationship between each speaker and the virtual sound source position. More specifically, the amplitude of the emitted sound is attenuated according to the distance from the virtual sound source position, and the discharge timing is delayed according to the distance from the virtual sound source position. As a result, a conference person who is present at the sound conference apparatus on the sound emission side (reception side) can hear as if sound is emitted from the virtual sound source position, regardless of the orientation of the audio conference apparatus. [0012] In addition, the voice conference device on the transmission side of the present invention 04-05-2019 4 calculates the distance between the utterance position of the utterance and the side closest to the utterance position of the housing together with the speaker orientation, and the speaker orientation information from the speaker orientation and distance. Generate The voice conference device on the receiving side is characterized by setting a virtual sound source based on the direction and the distance obtained from the received speaker direction information. [0013] In this configuration, not only the speaker direction but also the distance from the housing to the speaker is calculated to set the virtual sound source position. As a result, the speech position of the conferee of the audio conference device on the sound collection side can be more accurately reproduced by the audio conference device on the sound emission side. [0014] An audio conference apparatus according to the present invention is an audio conference apparatus used for the above-described audio conference system. A plurality of speakers of this audio conference apparatus are installed on the lower surface side of the housing with the sound emission direction outward from the housing, and the plurality of unidirectional microphones have the housing flat on the upper surface side of the housing It is characterized in that it is installed with the viewed center direction as the sound collecting direction. [0015] In this configuration, the sound collection directivity direction of the microphones closest to each other and the sound emission directivity direction of the speakers are in the opposite direction, and it is difficult for the microphones to collect the wraparound sound from the speakers. It becomes possible to detect with high accuracy the target speaker orientation in all directions of. [0016] According to the present invention, since the sound is emitted from the position of the conferee of the partner apparatus, the position of each conferee in the conference room of the partner not visible to the eye is known, and the sound emitted from the position By listening to the, you can realize a voice conference system full of sense of reality. 04-05-2019 5 [0017] A voice conference system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of the audio conference system according to the present embodiment. FIG. 2 is an outline view of the audio conference apparatus used for the audio conference system according to the present embodiment, in which (A) is a plan view and (B) is a side view. In FIG. 2, θ indicates an angle that increases in the counterclockwise direction, with the direction of the microphone MC1 and the speaker SP1 being 0 °, with the center of the audio conference device 1 viewed in plan as the rotation center. FIG. 3 is a functional block diagram of the audio conference apparatus shown in FIG. As shown in FIG. 1, the audio conference system includes audio conference devices 1A and 1B respectively disposed in the separated conference rooms 100A and 100B, and a network 500 connecting these audio conference devices 1A and 1B. Conference tables 101A and 101B are respectively installed substantially at the centers of the conference rooms 100A and 100B, and audio conference devices 1A and 1B are disposed on the respective conference tables 101A and 101B. These audio conference apparatuses 1A and 1B are provided with input / output I / F, and are connected to the network through these input / output I / F. For example, in such a conference room 100A, the conferees 201A and 203A are seated facing each other with the audio conference device 1A interposed therebetween, and the conference person 201A is the speaker SP1 side of the audio conference device 1A and the conference person 203A is an audio The user is seated at the speaker SP3 side of the conference apparatus 1A. Also, in the conference room 100B, the conferees 202B and 204B are seated facing each other with the audio conference apparatus 1B interposed, the conferencer 202B is on the side of the speaker SP2 of the audio conference apparatus 1B, and the conferencer 204B is an audio conference apparatus It is seated on the speaker SP4 side of 1B. [0018] Each of the voice conference devices 1A and 1B has the same specifications, and includes a diskshaped case 11. Specifically, the shape of the housing 11 in a plan view is circular, and the area of the top surface and the bottom surface is smaller than the area of the midway portion in the vertical direction, and the shape in a side view is a point from the height direction It has a shape that narrows toward the surface and narrows from the one point to the bottom. That is, it has a shape which has an inclined surface in the upper side and lower side respectively from the said 04-05-2019 6 one point. A recess 12 having a predetermined depth smaller than the area of the top surface is formed on the top surface of the housing 11 so that the center of the recess 12 in plan view coincides with the center of the top surface. It is set. [0019] Sixteen microphones MC1 to MC16 are installed inside the top surface side of the housing 11 along the side surface of the recess 12, and each of the microphones MC1 to MC16 takes the center of the audio conference device 1 in plan view as a rotation center, etc. They are arranged at an angular pitch (in this case approximately 22.5 ° apart). At this time, the microphones MC1 to MC16 are arranged along the direction in which the microphone MC1 is in the θ = 0 ° direction and θ is sequentially increased by 22.5 °. For example, the microphone MC5 is disposed in the θ = 90 ° direction, the microphone MC9 is disposed in the θ = 180 ° direction, and the microphone MC13 is disposed in the θ = 270 ° direction. Each of the microphones MC1 to MC16 has single directivity, and each microphone MC1 to MC16 is arranged so as to have strong directivity in the central direction in plan view. For example, the microphone MC1 sets the direction of θ = 180 ° as the center of directivity, the microphone MC5 sets the direction of θ = 270 ° as the center of directivity, and the microphone MC9 sets the direction of θ = 0 (360) ° as the center of directivity, The microphone MC13 has the direction of θ = 90 ° as the center of directivity. The number of microphones is not limited to this, and may be appropriately set according to the specification. [0020] The four speakers SP1 to SP4 are respectively installed such that the inclined surface on the lower side of the housing 11 and the sound emitting surface coincide with each other, and the speakers SP1 to SP4 rotate the center of the audio conference device 1 in plan view They are arranged at equal angular pitches (in this case, approximately 90 ° apart) as centers. At this time, the speaker SP1 is disposed in the same θ = 0 ° direction as the microphone MC1, the speaker SP2 is disposed in the same θ = 90 ° direction as the microphone MC5, and the speaker SP3 is disposed in the same θ = 180 ° direction as the microphone MC9. The speaker SP4 is disposed in the same θ = 270 ° direction as the microphone MC13. Each of the speakers SP1 to SP4 has strong directivity in the front direction of the sound emission surface, the speaker SP1 emits sound around the θ = 0 ° direction, and the speaker SP2 centers on the θ = 90 ° direction The speaker SP3 emits the sound around the θ = 180 ° direction, and the speaker SP4 emits the sound around the θ = 270 ° direction. 04-05-2019 7 [0021] Thus, the speakers SP1 to SP4 are disposed on the lower side of the housing 11, the microphones MC1 to MC16 are disposed on the upper side of the housing 11, and the sound collection direction of the microphones MC1 to MC16 is the center direction of the housing 11 By doing this, it becomes difficult for each of the microphones MC1 to MC16 to pick up the wraparound sound from the speakers SP1 to SP4. For this reason, in the speaker position detection to be described later, it becomes difficult to be influenced by the wraparound speech, and the speaker position detection can be performed with higher accuracy. [0022] The operation unit 29 is installed on an inclined surface on the upper side of the housing 11 and includes various operation buttons and a liquid crystal display panel (not shown). The input / output I / F is an inclined surface on the lower side of the housing 11, and is installed at a position where the speakers SP1 to SP4 are not installed. Although not shown, a network connection terminal, a digital audio terminal, an analog audio terminal, etc. Equipped with Then, a network cable is connected to this network connection terminal to connect to the aforementioned network 500. [0023] The audio conference device 1 has a functional configuration as shown in FIG. 3 together with such a structural configuration. The control unit 20 performs general control of setting of the voice conference device 1, sound collection, sound emission and the like, and gives control of each content of the operation instruction input by the operation unit 29 to each portion of the voice conference device 1. [0024] (1) Sound Release The communication control unit 21 acquires voice data from voice communication data from the other party's voice conference device received through the input / output I / F, and outputs the channel CH1 as the voice emission voice signals S1 to S3. Output to 04-05-2019 8 CH3. At this time, the communication control unit 21 acquires the partner apparatus ID from the voice communication data, and assigns a channel CH to each partner apparatus ID. For example, when there is one connected partner device, the audio data from the partner device is assigned to the channel CH1 as the sound emission sound signal S1. Also, when there are two connected partner devices, the audio data from the two partner devices are individually assigned to the channels CH1 and CH2 as the sound emission sound signals S1 and S2, respectively. Similarly, when there are three connected partner devices, audio data from the three partner devices are individually assigned to channels CH1, CH2 and CH3 as sound signals S1, S2 and S3, respectively. . The channels CH1 to CH3 are connected to the sound emission control unit 22 via the echo cancellation unit 28. The communication control unit 21 extracts speaker orientation data Py (Pm) in the other party's voice conference apparatus associated with each voice data of voice communication data, and supplies the voice direction control unit 22 with the channel information. [0025] The sound emission control unit 22 generates speaker output signals SPD1 to SPD4 to be provided to the speakers SP1 to SP4 based on the sound emission audio signals S1 to S3 and the speaker direction information Py. [0026] FIG. 4 is a block diagram showing the main configuration of the sound emission control unit 22 of the present embodiment. FIG. 5A is a view showing the distribution of virtual sound sources set by the sound emission specification control unit 220. FIG. 5B is a diagram showing the contents of the noise output specification table 2281. [0027] As shown in FIG. 4, the sound emission control unit 22 includes individual sound emission signal generation units 221 to 223 corresponding to the sound emission sound signals S1 to S3, a sound emission specification control unit 220, and speaker output signals SPD1 to SPD4. And a memory 228 for storing a sound emission specification table 2281. [0028] 04-05-2019 9 The sound emission specification control unit 220 sets a virtual sound source based on the speaker direction information Py from the communication control unit 21. As shown in FIG. 5A, the virtual sound source is set at an interval of 45 ° at a predetermined distance with respect to the horizontal direction outward from the housing 11 and with the center of the housing 11 as the rotation center. ing. More specifically, the virtual sound source 901 is set in the θ = 0 direction, which is the direction in which the speaker SP1 and the microphone MC1 are arranged from the center of the housing 11, and 45 from the virtual sound source 901 in the counterclockwise direction. The virtual sound sources 902 to 908 are set in order at intervals of °. The number of virtual sound sources is not limited to this, and may be set appropriately according to the device specification. [0029] The sound emission specification control unit 220 reads the delay adjustment amount D and the gain adjustment amount G for setting the corresponding virtual sound source from the sound emission specification table 2281 stored in the memory 228 based on the set virtual sound source. For example, if the speaker direction information Py indicates the θ = 0 direction, the sound emission specification control unit 220 sets the virtual sound source 901, and the delay adjustment amount D11, the gain adjustment amount G11, and SP2 for SP1 are set. The delay adjustment amount D21, the gain adjustment amount G21, the delay adjustment amount D31 for SP3, the gain adjustment amount G31, the delay adjustment amount D41 for SP4, and the gain adjustment amount G41 are read out. Here, the delay adjustment amount D and the gain adjustment amount G are preset by the distance between the virtual sound source to be set and each of the speakers SP1 to SP4. In addition, these adjustment amounts may be set by performing an experiment of sound emission and collection environment measurement on the spot after installing the audio conference device. [0030] When the sound emission specification control unit 220 reads out the delay adjustment amount D and the gain adjustment amount G for each of the speakers SP1 to SP4, the individual sound emission for the corresponding channel is performed based on the channel information given together with the speaker direction information Py. The delay adjustment amount D and the gain 04-05-2019 10 adjustment amount G are output to any of the signal generation units 221 to 223. For example, if the virtual sound source 901 is set and the channel information indicates CH1, the sound emission specification control unit 220 sets the delay adjustment amount D11 for SP1, the gain adjustment amount G11, and the delay adjustment amount D21 for SP2. The gain adjustment amount G21, the delay adjustment amount D31 for SP3, the gain adjustment amount G31, the delay adjustment amount D41 for SP4, and the gain adjustment amount G41 are given to the individual sound emission signal generation unit 221. [0031] The individual sound emission signal generation units 221 to 223 are inputted from the corresponding channels CH1 to CH3 based on the delay adjustment amount D and the gain adjustment amount G for each of the speakers SP1 to SP4 given from the sound emission specification control unit 220. The delay processing and gain control of the sound emission sound signals S1 to S3 are output to the signal synthesis units 224 to 227. More specifically, the individual sound emission signal generation unit 221 sets the delay adjustment amount D1 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and the gain adjustment amount G1 * (*: virtual sound sources 901 to 908). At corresponding 1 to 8), delay processing and gain control of the sound emission sound signal S1 are performed, and are output to the signal synthesis unit 224. At the same time, the individual sound emission signal generation unit 221 sets the delay adjustment amount D2 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and the gain adjustment amount G2 * (*: 1 to 6 corresponding to the virtual sound sources 901 to 908). In 8), delay processing and gain control of the sound emission sound signal S1 are performed, and the result is output to the signal synthesis unit 225. At the same time, the individual sound emission signal generation unit 221 sets the delay adjustment amount D3 * (*: 1 to 8 corresponding to virtual sound sources 901 to 908) and the gain adjustment amount G3 * (*: 1 corresponding to virtual sound sources 901 to 908). Through 8), performs delay processing and gain control of the sound emission sound signal S1 and outputs the result to the signal synthesis unit 226. Furthermore, at the same time, the individual sound emission signal generation unit 221 sets the delay adjustment amount D4 * (*: 1 to 8 corresponding to virtual sound sources 901 to 908) and the gain adjustment amount G4 * (*: 1 corresponding to virtual sound sources 901 to 908). Through 8), performs delay processing and gain control of the sound emission sound signal S1 and outputs the result to the signal synthesis unit 227. The individual sound emission signal generation units 222 to 223 also perform delay processing and gain control of the sound emission sound signals S2 and S3 in the same processing as the individual sound emission signal generation unit 221, and output the result to the signal synthesis units 224 to 227. 04-05-2019 11 [0032] The signal combining unit 224 combines (adds) the signals for SP1 output from the individual sound emission signal generating units 221 to 223, and outputs the signals as an SP1 audio signal SPD1. Similarly, the signal combining unit 225 combines (adds) the signals for SP2 output from the individual sound emission signal generating units 221 to 223, and outputs the result as an SP2 audio signal SPD2. Similarly, the signal combining unit 226 combines (adds) the signals for SP3 output from the individual sound emission signal generating units 221 to 223, and outputs the result as an SP3 audio signal SPD3. Furthermore, in the same manner, the signal combining unit 227 combines (adds) the signals for SP 4 output from the individual sound emission signal generating units 221 to 223, and outputs the result as the SP 4 audio signal SPD 4. [0033] The D / A converter 23 performs digital-analog conversion on the SP audio signals SPD1 to SPD4, and the sound emission AMP (amplifier) 24 amplifies the SP audio signals SPD1 to SPD4 at a constant amplification factor, and produces speakers respectively. Give to SP1 to SP4. [0034] The speakers SP1 to SP4 perform voice conversion on the received SP audio signals SPD1 to SPD4 and emit the sound. [0035] By performing such sound release processing, the voices emitted from the respective speakers SP1 to SP4 have a predetermined delay relationship and amplitude relationship, so that the feeling as if the voice was emitted from the set virtual sound source can be a conferee Can be given to [0036] (2) Sound Collection The above-mentioned microphones MC1 to MC16 pick up external sounds such as the generated sound of the conference person to generate sound collection signals MS1 to MS16. 04-05-2019 12 Each sound pickup AMP (amplifier) 25 amplifies the corresponding sound pickup signals MS1 to MS16 at a predetermined amplification factor, and the A / D converter 26 converts the amplified sound pickup signals MS1 to MS16 from analog to digital and collects them. It is output to the sound control unit 27. [0037] FIG. 6 is a block diagram showing the main configuration of the sound collection control unit 27. As shown in FIG. As shown in FIG. 6, the sound collection control unit 27 includes an azimuthal sound collection beam generation unit 271 and an output data determination unit 272. The azimuthal sound collection beam generation unit 271 sets an appropriate combination for the sound collection signals MS1 to MS16 (digital data), and performs delay / addition processing of the collected sound collection signals, etc. Sound collection beam signals MB1 to MB8 are generated with sound collection directions corresponding to eight different azimuths corresponding to the virtual sound sources 901 to 908, respectively. [0038] The azimuthal sound collection beam generation unit 271 includes adders 2711 to 2718. The adder 2711 adds the collected signals MS1, MS2 and MS16 to generate a collected beam signal MB1 having strong directivity in the θ = 180 ° direction (corresponding to the virtual sound source 905). The adder 2712 adds the collected signals MS2, MS3 and MS4 to generate a collected beam signal MB2 having strong directivity in the θ = 225 ° direction (corresponding to the virtual sound source 906). The adder 2713 adds the collected sound signals MS4, MS5, MS6 to generate a collected sound beam signal MB3 having strong directivity in the θ = 270 ° direction (corresponding to the virtual sound source 907). The adder 2714 adds the collected signals MS6, MS7, MS8 to generate a collected beam signal MB4 having strong directivity in the θ = 315 ° direction (corresponding to the virtual sound source 908). The adder 2715 adds the collected signals MS8, MS9, MS10 to generate a collected beam signal MB5 having strong directivity in the θ = 0 direction (corresponding to the virtual sound source 901). The adder 2716 adds the collected signals MS10, MS11 and MS12 to generate a collected beam signal MB6 having strong directivity in the θ = 45 ° direction (corresponding to the virtual sound source 04-05-2019 13 902). The adder 2717 adds the collected sound signals MS12, MS13 and MS14 to generate a collected sound beam signal MB7 having strong directivity in the θ = 90 ° direction (corresponding to the virtual sound source 903). The adder 2718 adds the collected sound signals MS14, MS15, MS16 to generate a collected sound beam signal MB8 having strong directivity in the θ = 135 ° direction (corresponding to the virtual sound source 904). [0039] As described above, in the example of the embodiment, the collected beam signal MB1 corresponds to the virtual sound source 905, the collected beam signal MB2 corresponds to the virtual sound source 906, and the collected beam signal MB3 corresponds to the virtual sound source 907. The sound beam signal MB4 is made to correspond to the virtual sound source 908. Further, the collected beam signal MB5 corresponds to the virtual sound source 901, the collected beam signal MB6 corresponds to the virtual sound source 902, the collected beam signal MB7 corresponds to the virtual sound source 903, and the collected beam signal MB8 corresponds to the virtual sound source 904. Correspond to The number of sound collection beam signals to be generated is not limited to this, and can be appropriately set according to the specification. [0040] The azimuthal sound collection beam generation unit 271 outputs the generated sound collection beam signals MB1 to MB8 to the output data determination unit 272. [0041] The output data determination unit 272 includes a maximum signal detection unit 2721 and a Select / Mix circuit 2722. [0042] The maximum signal detection unit 2721 compares the signal levels of the collected sound beam signals MB1 to MB8 and selects the collected sound beam signal having the maximum signal level. The maximum signal detection unit 2721 outputs selected beam information MBM indicating the 04-05-2019 14 selected collected beam signal to the Select / Mix circuit 2722. Further, the maximum signal detection unit 2721 outputs the direction information corresponding to the selected sound collection beam signal to the communication control unit 21 as the speaker direction information Pm. [0043] The Select / Mix circuit 2722 selects the sound collection beam signal MB designated by the information based on the selected beam information MBM from the maximum signal detection unit 2721 and outputs it as the sound collection beam signal MBS for output. The Select / Mix circuit 2722 does not select and output only the sound collection beam signal MB specified by the selection beam information MBM, but collects the sound collection beam signal MB specified by the selection beam information MBM. The sound beam signal MB may be mixed and output as the output sound collection beam signal MBS. [0044] The echo cancellation unit 28 is an adaptive filter that generates a pseudo-regression sound signal based on the sound emission sound signals S1 to S3 for the input sound collection sound beam signal MBS, and an output sound collection beam signal MBS. And a post processor that subtracts the pseudo-regression sound signal from. The echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound collection beam signal MBS while sequentially optimizing the filter coefficient of the adaptive filter, whereby microphones are output from the speakers SP1 to SP4 included in the output sound collection beam signal MBS. The wraparound component to MC1 to MC16 is removed. The output sound collection beam signal MBS ′ from which the wraparound component has been removed is output to the communication control unit 21. [0045] The communication control unit 21 generates voice communication data by associating the output sound collection beam signal MBS ′ from which the return sound is removed by the echo cancellation unit 28 with the speaker direction information Pm from the sound collection control 04-05-2019 15 unit 27. , Output to input / output I / F. The voice communication data generated in this manner is transmitted to the destination voice conference apparatus via the input / output I / F and the network 500. [0046] With such a configuration, sound emission is performed at the position of the sound emitting side audio conference device corresponding to the position of the speaker with respect to the sound collecting side audio conference device. It is possible to give a feeling as if the speaker who is seated is present at the sound conference device on the sound emission side and speaking to each of the conferees present on the sound conference device on the sound emission side. This makes it possible to have a remote conference full of sense of realism. At this time, regardless of the position of the virtual sound source, all the speakers SP1 to SP4 emit the sounds whose delay-amplitude relationships are controlled, respectively, rather than simply sounding from the speakers near the virtual sound source position. More realistic sound source localization can be realized. For example, when sound is emitted only with a speaker close to the virtual sound source position, the sound emission direction is the virtual sound source direction, so a conferee at a position symmetrical to the virtual sound source with respect to the audio conference device 1 I can not hear it. However, by emitting sound from all the speakers as in the configuration of the present embodiment, since at least one speaker that emits sound toward the conferee is present even if it is not the front, clear sound can be heard. [0047] Next, specific usage examples will be described with reference to the drawings. FIG. 7 is a diagram for explaining a sound emitting and receiving state when the conferee 201A speaks in the situation shown in FIG. In the case of FIGS. 1 and 7, in the conference room 100A, the conferee 201A is present in the θ = 0 ° direction of the audio conference device 1A, and the conferee 203A is present in the θ = 180 ° direction of the audio conference device 1A. doing. In the conference room 100B, the conferee 202B is present in the θ = 90 ° direction of the audio conference device 1B, and the conferencer 204B is present in the θ = 270 ° direction of the audio conference device 1B. [0048] 04-05-2019 16 When the conferee 201A of the conference room 100A speaks, the voice 301A is collected by the voice conference apparatus 1A. At this time, since the sound 301A is mainly collected by the microphones MC8, MC9 and MC10, the collected sound beam signal MB5 constituted by the collected sound signals of these microphones MC8, MC9 and MC10 becomes equal to or more than the predetermined threshold . The output sound collection beam signal MBS consisting of the sound collection beam signal MB5 is echo-cancelled and transmitted to the voice conference apparatus 1B as voice communication data together with the speaker orientation information Pm of θ = 0 °. [0049] When receiving the voice communication data from the voice conference device 1A, the voice conference device 1B of the conference room 100B extracts the voice data, assigns it to the channel CH1, for example, and converts it to the voice emission voice signal S1. Further, the voice conference device 1B extracts the speaker orientation information Pm (= Py) from the voice communication data. The audio conference apparatus 1 B sets the virtual sound source 901 in the θ = 0 direction since the speaker orientation information Py is θ = 0 °, and the delay adjustment amount D and the gain adjustment amount G for realizing the virtual sound source 901. Read out. At this time, the SP audio signals SPD1 to SPD4 corresponding to the respective speakers SP1 to SP4 have an amplitude strength such that SP1 audio signal SP1> SP2 audio signal SP2 = SP4 audio signal SP4> SP3 audio signal SP3 The gain adjustment amount D is set, and the delay time is set such that the SP1 voice signal SP1 <SP2 voice signal SP2 = SP4 voice signal SP4 <SP3 voice signal SP3. The audio conference device 1B emits the SP audio signals SPD1 to SPD4 adjusted in this way from the corresponding speakers SP1 to SP4. By emitting the sound in this manner, the sound emission sound 401A corresponding to the SP1 sound signal SPD1 corresponds to the sound emission sound 402A corresponding to the SP2 sound signal SPD2 and the sound emission sound 403A corresponding to the SP3 sound signal SPD3. The sound becomes louder than the emitted sound 404A corresponding to the audio signal SPD4. Also, the sound emission sound 401A, the sound emission sound 402A, the sound emission sound 403A, and the sound emission sound 404A are emitted in this order. As a result, the conferees 202 B and 204 B can hear as if they were emitted from the virtual sound source 901. As a result, in the conference persons 202B and 204B present on the audio conference device 101B side, there is a conference person 201A present on the audio conference device 101A in the direction of θ = 0 °, and this conference person 201A is present You can feel like you are speaking. [0050] 04-05-2019 17 In the above description, although one person is speaking as an example, the present invention can be applied even when two or more people are speaking simultaneously. In this case, the voice conference device on the sound collection side picks up the voice signal of each speaker and gives the speaker direction information individually, and the voice conference device on the sound emission side is based on the acquired speaker direction information. Multiple virtual sound sources may be set. [0051] In the above description, the case where communication is performed by two audio conference devices has been described, but the setting of the above-mentioned virtual sound source is also performed when communication is performed by a plurality of audio conference devices such as three or four. It can apply. In this case, a channel may be individually assigned to each audio conference apparatus to be a communication partner, virtual sound sources may be set for sound emission audio signals of the respective channels, and delay processing and gain control may be performed. More specifically, the sound emission voice signal S1 allocated to the channel CH1 is subjected to delay processing and gain control based on the speaker direction information given by the first audio conference device corresponding to the channel SP1, and each speaker SP1 is controlled. An audio signal to ˜ SP4 is generated. Similarly, the sound emission sound signal S2 allocated to the channel CH2 is subjected to delay processing and gain control based on the speaker direction information given by the second audio conference device corresponding to the channel speech signal S2 to the speakers SP1 to SP4. Audio signal is generated. Further, the sound emission sound signal S3 allocated to the channel CH3 is subjected to delay processing and gain control based on the speaker direction information given by the third audio conference device corresponding to the channel CH3, and is outputted to each of the speakers SP1 to SP4. An audio signal is generated. As described above, by combining the audio signals generated for the speakers SP1 to SP4, the SP audio signals SPD1 to SPD4 are generated and emitted from the speakers SP1 to SP4. As a result, a conferee who is present at the audio conference device that emits sound can hear the speech as if the conferee who is present at the other first to third voice conference devices is present. [0052] Further, in the above description, although the case where the virtual sound source is set from only the azimuth while the distance from the housing is the same, the virtual sound source may be set using the azimuth and the distance. In this case, the voice conference device on the sound 04-05-2019 18 collection side calculates the distance between each microphone and the speech position from the delay relationship of the sound signals collected by the microphones, and specifies the speech position by using at least three distances. be able to. Then, the voice conference device on the sound collection side generates speaker direction information including distance information together with the direction information, and transmits it to the voice conference device on the sound emission side. The voice conference device on the sound emission side sets a virtual sound source obtained from the direction and the distance based on the received speaker direction information, and performs delay processing and gain control to realize the virtual sound source. This makes it possible to more realistically reproduce the speaking position (the position of the speaking person who made the statement). [0053] It is a block diagram of the audio conference system of embodiment of this invention. BRIEF DESCRIPTION OF THE DRAWINGS It is an outline view of the audio conference apparatus used for the audio conference system of embodiment of this invention. It is a functional block diagram of the audio conference device shown in FIG. It is a block diagram showing the main composition of sound emission control part 22 of an embodiment of the present invention. They are a figure which shows distribution of the virtual sound source set by the sound emission specification control part 220, and a figure which shows the content of the sound emission specification table 2281. FIG. FIG. 6 is a block diagram showing the main configuration of a sound collection control unit 27. In the situation shown in FIG. 1, it is a figure explaining the sound emission state when each meeting person 201A speaks. Explanation of sign [0054] 1, 1A, 1B-voice conference device, 11-case, 12-recess, 21-communication control unit, 22-sound emission control unit, 221-223-individual sound emission signal generating unit, 224-227-signal combining unit , 23-D / A converter, 24-sound emission amplifier, 25-sound collection amplifier, 26-A / D converter, 27-sound collection control unit, 271- azimuthal sound collection beam generation unit, 272-output data determination unit , 28-echo cancellation unit, 29-operation unit, 100A, 100B-conference room, 101A, 101B-conference table, 201A, 203A, 202B, 204Bconferencer, 301A-voice (voice collection voice), 401A, 402A, 403A, 404A-voice (sound emission voice), 500-network, 901-908-virtual sound source, SP1-SP4-speaker, MC1-MC16-microphone 04-05-2019 19
© Copyright 2021 DropDoc