close

Вход

Забыли?

вход по аккаунту

JPWO2014017134

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPWO2014017134
Abstract: To provide an information processing system and a storage medium capable of
providing an immersive feeling to a third space when the space around the user interacts with
another space. A recognition unit for recognizing a first object and a second object based on
signals detected by a plurality of sensors arranged around a specific user, and the first
recognition performed by the recognition unit. And an identification unit for identifying a second
object, an estimation unit for estimating the position of the specific user according to a signal
detected by any of the plurality of sensors, and a plurality of devices arranged around the
specific user Each of the first and second objects identified by the identification unit so as to be
localized near the position of the specific user estimated by the estimation unit when being
output from the actuator of An information processing system comprising: a signal processing
unit configured to process a signal. [Selected figure] Figure 1
Information processing system and storage medium
[0001]
The present disclosure relates to an information processing system and a storage medium.
[0002]
In recent years, various techniques have been proposed in the field of data communication.
For example, Patent Document 1 below proposes a technique related to a machine-to-machine
09-05-2019
1
(M2M) solution. Specifically, the remote management system described in Patent Document 1
uses the Internet Protocol (IP) Multimedia Subsystem (IMS) platform (IS) to disclose the presence
information by the device, and between the user and the device. Through instant messaging, the
interaction between an authorized user client (UC) and a machine client (DC) is realized.
[0003]
Meanwhile, various array speakers capable of forming an acoustic beam have been developed in
the field of acoustic technology. For example, Patent Document 2 below describes an array
speaker in which a plurality of speakers are attached to one cabinet with their wavefronts in
common, and the delay amount and level of the sound emitted from each speaker are controlled.
Moreover, it is described in the following patent document 2 that the array microphone by the
same principle is also developed, and the said array microphone collects the sound-collection by
adjusting the level and delay amount of the output signal of each microphone. Points can be set
arbitrarily, which enables efficient sound collection.
[0004]
JP, 2006-279565, A JP, 2008-543137, A
[0005]
However, Patent Documents 1 and 2 described above do not refer at all to techniques and
communication methods that are regarded as a means for widely arranging a large number of
image sensors, microphones, speakers and the like and realizing user's body expansion.
[0006]
Therefore, the present disclosure proposes a new and improved information processing system
and storage medium capable of providing a sense of immersion in the third space when the space
around the user interacts with another space. Do.
[0007]
According to the present disclosure, the recognition unit that recognizes the first object and the
second object based on the signals detected by the plurality of sensors disposed around the
specific user, and the recognition performed by the recognition unit An identification unit that
identifies a first object and a second object, an estimation unit that estimates a position of the
specific user according to a signal detected by any of the plurality of sensors, and a periphery of
09-05-2019
2
the specific user Acquired from sensors around the first and second objects identified by the
identification unit so as to be localized near the position of the specific user estimated by the
estimation unit when being output from a plurality of actuators And a signal processing unit that
processes each signal.
[0008]
According to the present disclosure, a recognition unit that recognizes a first object and a second
object based on signals detected by sensors in the vicinity of a specific user, and the first and
second devices that are recognized by the recognition unit Output from actuators around the
specific user based on an identification unit identifying the target of the object and signals
acquired from a plurality of sensors disposed around the first and second objects identified by
the identification unit And an information processing system including a signal processing unit
that generates a signal.
[0009]
According to the present disclosure, a computer is recognized by a recognition unit that
recognizes a first object and a second object based on signals detected by a plurality of sensors
disposed around a specific user, and the recognition unit. An identifying unit for identifying the
first and second objects, an estimating unit for estimating the position of the specific user
according to a signal detected by any of the plurality of sensors, and a periphery of the specific
user Of the first and second objects identified by the identification unit so as to be localized near
the position of the specific user estimated by the estimation unit when being output from the
plurality of actuators arranged in A storage medium is proposed in which a program for
functioning as a signal processing unit that processes each signal acquired from a sensor is
stored.
[0010]
According to the present disclosure, a computer is provided with a recognition unit that
recognizes a first object and a second object based on signals detected by sensors in the vicinity
of a specific user, and the first information recognized by the recognition unit. And an
identification unit for identifying a second object, and signals obtained from a plurality of sensors
disposed around the first and second objects identified by the identification unit, A storage
medium storing a program to function as a signal processing unit that generates a signal output
from an actuator is proposed.
[0011]
As described above, according to the present disclosure, it is possible to provide an immersive
09-05-2019
3
feeling in the third space when the space around the user interacts with another space.
[0012]
It is a figure for explaining an outline of an acoustic system by one embodiment of this
indication.
FIG. 1 shows a system configuration of an acoustic system according to an embodiment of the
present disclosure.
It is a block diagram showing composition of a signal processing device by this embodiment.
It is a figure for demonstrating the shape of the acoustic closed surface by this embodiment.
It is a block diagram showing composition of a management server by this embodiment.
It is a flow chart which shows basic processing of a sound system by this embodiment.
It is a flowchart which shows the command recognition process by this embodiment.
It is a flowchart which shows the sound collection process by this embodiment.
It is a figure for demonstrating the sound field construction of the 3rd space by this embodiment.
It is a figure for demonstrating the sound field construction method of site C. FIG. It is a block
diagram which shows the other structure of the management server by this embodiment. It is a
figure for demonstrating measurement of an acoustic parameter. It is a figure which compares
and shows arrangement of a plurality of microphones in a measurement environment, and
arrangement of a plurality of speakers in a listening environment. It is a figure for demonstrating
the shape of the closed curve in the measurement environment by this embodiment. FIG. 7 is a
block diagram showing a configuration of a sound field reproduction signal processing unit that
performs sound field construction so as to provide an immersive feeling to the site C. It is a figure
09-05-2019
4
for demonstrating the measurement of the impulse response in Site C. FIG. It is a figure for
demonstrating the calculation using the impulse response group by the Matrix Convolution part
by this embodiment. It is a flow chart which shows sound field reproduction processing by this
embodiment. It is a figure for demonstrating the case where the sound field constructed ¦
assembled by the site B is being fixed. It is a figure for demonstrating the case where the sound
field constructed ¦ assembled by the site B flows. It is a figure for demonstrating the
measurement in measurement object space. It is a figure for demonstrating the measurement in
an anechoic room. It is a figure for demonstrating reproduction in reproduction object space. It is
a figure which shows the other system configuration ¦ structure of the acoustic system by this
embodiment. It is a figure which shows an example of a system configuration ¦ structure of the
autonomous acoustic system by this embodiment. It is a block diagram showing composition of a
device of an autonomous sound system by this embodiment. It is a flowchart which shows the
operation processing of the autonomous acoustic system by this embodiment. FIG. 7 is a diagram
for describing change of the operation device according to the movement of the user in the
autonomous acoustic system according to the present embodiment. FIG. 7 is a diagram for
describing a case where services are provided to a plurality of users in the autonomous acoustic
system according to the present embodiment.
[0013]
Hereinafter, preferred embodiments of the present disclosure will be described in detail with
reference to the accompanying drawings. In the present specification and the drawings,
components having substantially the same functional configuration will be assigned the same
reference numerals and redundant description will be omitted.
[0014]
The description will be made in the following order. 1. Overview of sound system according to
an embodiment of the present disclosure2. Basic configuration 2-1. System configuration 2-2.
Signal processing device 2-3. Management server 3. Operation processing 3-1. Basic processing
3-2. Command recognition processing 3-3. Sound pickup processing 4. Sound field construction
of third space 4-1. Configuration of Management Server 4-2. Configuration of sound field
reproduction signal processing unit 4-3. Sound field reproduction processing 5. Supplement 6. ま
とめ
[0015]
09-05-2019
5
<1. Overview of Sound System According to One Embodiment of the Present Disclosure>
First, a summary of a sound system (information processing system) according to one
embodiment of the present disclosure will be described with reference to FIG. FIG. 1 is a diagram
for describing an overview of an acoustic system according to an embodiment of the present
disclosure. As shown in FIG. 1, in the acoustic system according to the present embodiment, a
large number of microphones 10, image sensors (not shown), speakers 20, etc. are distributed
throughout the world such as a room, a house, a building, the outdoors, a region, a country.
Assume a situation where various sensors and actuators are arranged.
[0016]
In the example shown in FIG. 1, a plurality of microphones (hereinafter referred to as
microphones) 10A and a plurality of actuators are provided as an example of a plurality of
sensors on a road or the like in one outdoor area "site A" where user A is currently present. As an
example, a plurality of speakers 20A are arranged. Further, in one indoor area "site B" where the
user B is currently present, a plurality of microphones 10B and a plurality of speakers 20B are
arranged on a wall, a floor, a ceiling, and the like. In the sites A and B, a human sensor and an
image sensor (not shown) may be further disposed as an example of a sensor.
[0017]
Here, site A and site B can be connected via a network, and the signals input and output by each
microphone and speaker at site A and the signals input and output by each microphone and
speaker at site B are mutually different It is sent and received.
[0018]
Thereby, the sound system according to the present embodiment reproduces, in real time, voices
and images corresponding to a predetermined target (person, place, building, etc.) with a
plurality of speakers and displays arranged around the user.
Further, the sound system according to the present embodiment can collect the voice of the user
with a plurality of microphones disposed around the user and reproduce it in real time around
the predetermined target. Thus, in the acoustic system according to the present embodiment, it is
09-05-2019
6
possible to make the space around the user interact with other spaces.
[0019]
In addition, it is possible to substantially expand the body of the user's mouth, eyes, ears, etc.
widely by using the microphone 10, the speaker 20, the image sensor, etc., which are disposed
indoors and outdoors everywhere, a new Communication method can be realized.
[0020]
Furthermore, in the sound system according to the present embodiment, since the microphones,
image sensors, and the like are disposed everywhere, the user does not need to possess a
smartphone or a mobile phone terminal, and a predetermined target is indicated by voice or
gesture. It can be connected to the space around the subject.
Hereinafter, application of the sound system according to the present embodiment in a case
where the user A who is at the site A wants to talk with the user B who is at the site B will be
briefly described.
[0021]
(Data Collection Processing) At site A, data collection processing is continuously performed by a
plurality of microphones 10A, an image sensor (not shown), a human sensor (not shown), and the
like. Specifically, the sound system according to the present embodiment collects voices collected
by a plurality of microphones 10A, a captured image captured by an image sensor, or a detection
result of a human sensor, thereby estimating the position of the user. .
[0022]
In the sound system according to the present embodiment, a microphone group in which the
voice of the user can be collected sufficiently based on the position information of the plurality of
microphones 10A registered in advance and the estimated position of the user May be elected.
Then, the audio system according to the present embodiment performs microphone array
processing on a stream group of audio signals collected by the selected microphones. In
09-05-2019
7
particular, the sound system according to the present embodiment may perform a delay-and-sum
array in which the sound collection points are aligned with the mouth of the user A, whereby the
superdirectivity of the array microphone can be formed. Therefore, a small voice of user A's
tweets can also be collected.
[0023]
Further, the sound system according to the present embodiment recognizes a command based on
the collected voice of the user A, and executes operation processing according to the command.
For example, when the user A who is at the site A tweets "I want to talk to Mr. B", the "request for
calling the user B" is recognized as a command. In this case, the sound system according to the
present embodiment identifies the current position of the user B, and connects the site B where
the user B is currently and the site A where the user A is currently. Thereby, the user A can talk
with the user B.
[0024]
(Object Decomposition Processing) During a call, audio signals (stream data) collected by a
plurality of microphones at site A are separated from sound sources (noise components around
user A, people around user A's Object separation processing such as speech separation,
reverberation suppression, noise / echo processing, etc. is performed. As a result, stream data
with a good S / N ratio and with a reduced sense of reverberation is sent to site B.
[0025]
Although it is assumed that the user A is talking while moving, the acoustic system according to
the present embodiment can cope with the above by continuously performing the data collection.
Specifically, the sound system according to the present embodiment continuously collects data
based on a plurality of microphones, image sensors, a human sensor, etc., and grasps the moving
path and the direction in which the user A is moving. Then, the sound system according to the
present embodiment continuously updates the selection of the appropriate microphone group
arranged around the moving user A, and the sound collection point is always collected at the
moving user A's mouth Perform array microphone processing continuously so that Thereby, the
sound system according to the present embodiment can cope with the case where the user A
speaks while moving.
09-05-2019
8
[0026]
In addition to the audio stream data, the moving direction, the direction, and the like of the user
A are converted to metadata and sent to the site B together with the stream data.
[0027]
Then, the stream data sent to the site B is reproduced from the speakers disposed around the
user B at the site B.
At this time, the sound system according to the present embodiment collects data at a site B by a
plurality of microphones, image sensors, and a human sensor, estimates the position of the user
B based on the collected data, and Select appropriate speakers that surround the area with a
closed acoustic surface. The stream data sent to the site B is reproduced from the speaker group
thus selected, and the area inside the acoustic closed surface is controlled as an appropriate
sound field. In the present specification, a surface formed when the positions of a plurality of
speakers or microphones in proximity are connected in a manner to surround a certain target
(for example, a user) is conceptually referred to as acoustic closed surface We call it ". Also, the
acoustic closed surface does not necessarily constitute a complete closed surface, and may
have a shape that approximately surrounds an object (for example, a user).
[0028]
Also, the sound field here may be arbitrarily selectable by the user B. For example, in the sound
system according to the present embodiment, when the user B designates the site A as a sound
field, the environment of the site A is reproduced at the site B. Specifically, the environment of
the site A is reproduced on the site B based on, for example, sound information as an ambient
sound collected in real time, and the meta information on the site A acquired in advance.
[0029]
The sound system according to the present embodiment can also control the sound image of the
user A by using a plurality of speakers 20B disposed around the user B at the site B. That is, the
09-05-2019
9
acoustic system according to the present embodiment can also reproduce the voice (sound
image) of the user A outside the ear of the user B or the closed acoustic surface by forming an
array speaker (beam forming). Further, the sound system according to the present embodiment
moves the sound image of the user A around the user B in accordance with the actual movement
of the user A at the site B by using the metadata of the movement path and the direction of the
user A. May be
[0030]
The outline of the voice communication from site A to site B has been described divided into the
steps of data collection process, object decomposition process, and object synthesis process, but
the same applies naturally to the voice communication from site B to site A Processing is
performed. This enables two-way voice communication at site A and site B.
[0031]
The outline of the sound system (information processing system) according to an embodiment of
the present disclosure has been described above. Subsequently, the configuration of the acoustic
system according to the present embodiment will be described in detail with reference to FIGS.
[0032]
<2. Basic configuration> [2-1. System Configuration] FIG. 2 is a view showing the overall
configuration of an acoustic system according to the present embodiment. As shown in FIG. 2,
the audio system includes a signal processing device 1A, a signal processing device 1B, and a
management server 3.
[0033]
The signal processing device 1A and the signal processing device 1B are connected to the
network 5 by wired / wireless communication, and can transmit and receive data to each other
via the network 5. Further, the management server 3 is connected to the network 5, and the
signal processing device 1A and the signal processing device 1B can also transmit and receive
09-05-2019
10
data with the management server 3.
[0034]
The signal processing device 1A processes signals input and output by the plurality of
microphones 10A and the plurality of speakers 20A disposed at the site A. Further, the signal
processing device 1B processes signals input and output by the plurality of microphones 10B
and the plurality of speakers 20B disposed at the site B. When it is not necessary to distinguish
and explain the signal processing devices 1A and 1B, the signal processing device 1 is referred
to.
[0035]
The management server 3 has functions of authenticating the user and managing the absolute
position (current position) of the user. Furthermore, the management server 3 may manage
information (IP address etc.) indicating the location or the position of the building.
[0036]
Thereby, the signal processing apparatus 1 can inquire the management server 3 and acquire
connection destination information (IP address etc.) of a predetermined target (person, place,
building etc.) specified by the user.
[0037]
[2−2.
Signal Processing Device] Next, the configuration of the signal processing device 1 according to
the present embodiment will be described in detail. FIG. 3 is a block diagram showing the
configuration of the signal processing device 1 according to the present embodiment. As shown
in FIG. 3, the signal processing apparatus 1 according to the present embodiment includes a
plurality of microphones 10 (array microphones), an amplifier / ADC (analog digital converter)
unit 11, a signal processing unit 13, and microphone position information DB (database) 15. User
position estimation unit 16, recognition unit 17, identification unit 18, communication I / F
09-05-2019
11
(interface) 19, speaker position information DB 21, amplifier / DAC (digital analog converter)
unit 23, and a plurality of speakers 20 (array speakers) Have. Each component will be described
below.
[0038]
(Array Microphone) The plurality of microphones 10 are arranged throughout an area (site) as
described above. For example, if it is outdoor, it will be arrange ¦ positioned on a floor, a wall, a
ceiling etc. if it is indoors, such as a road, a telephone pole, a streetlight, the outer wall of a house
or a building. In addition, the plurality of microphones 10 pick up surrounding sounds and
output them to the amplifier / ADC unit 11 respectively.
[0039]
(Amplifier / ADC Unit) The amplifier / ADC unit 11 has an amplification function (amplifier) of
the sound wave output from each of the plurality of microphones 10 and a function (Analog ·
Analog ··· to Digital Converter). The amplifier / ADC unit 11 outputs each converted audio signal
to the signal processing unit 13.
[0040]
(Signal processing unit) The signal processing unit 13 is each audio signal collected by the
microphone 10 and sent through the amplifier / ADC unit 11 or each audio signal reproduced
from the speaker 20 through the DAC / amplifier unit 23. Have the ability to process Further, the
signal processing unit 13 according to the present embodiment functions as a microphone array
processing unit 131, a high S / N processing unit 133, and a sound field reproduction signal
processing unit 135.
[0041]
Microphone array processing unit The microphone array processing unit 131 is configured to
focus on the user's voice as microphone array processing for a plurality of audio signals output
from the amplifier / ADC unit 11 (so that the sound collection position is at the user's mouth)
Perform directivity control.
09-05-2019
12
[0042]
At this time, the microphone array processing unit 131 is optimal for the user's voice collection
based on the position of the user estimated by the user position estimation unit 16 and the
position of each microphone 10 registered in the microphone position information DB 15. The
microphone group forming the acoustic closed surface including the user may be selected.
Then, the microphone array processing unit 131 performs directivity control on the audio signal
acquired by the selected microphone group. Further, the microphone array processing unit 131
may form superdirectivity of the array microphone by the delay-and-sum array processing and
the null generation processing.
[0043]
The high S / N processing unit The high S / N processing unit 133 is configured to be a monaural
signal with high clarity and a good S / N ratio with respect to a plurality of audio signals output
from the amplifier / ADC unit 11 It has a function to process. Specifically, the high S / N
processing unit 133 separates the sound source and performs reverberation and noise
suppression.
[0044]
The S / N conversion processing unit 133 may be provided downstream of the microphone array
processing unit 131. Also, the audio signal (stream data) processed by the high S / N processing
unit 133 is used for speech recognition by the recognition unit 17 or transmitted to the outside
via the communication unit I / F 19.
[0045]
Sound Field Reproduction Signal Processing Unit The sound field reproduction signal processing
unit 135 performs signal processing on audio signals reproduced from the plurality of speakers
20, and controls the sound field to be localized near the position of the user. Specifically, for
09-05-2019
13
example, the sound field reproduction signal processing unit 135 includes the user based on the
position of the user estimated by the user position estimation unit 16 and the position of each
speaker 20 registered in the speaker position information DB 21. Select the optimal speaker
group that forms an acoustic closed surface. Then, the sound field reproduction signal processing
unit 135 writes the audio signal subjected to the signal processing to the output buffers of the
plurality of channels corresponding to the selected speaker group.
[0046]
Further, the sound field reproduction signal processing unit 135 controls the area inside the
acoustic closed surface as an appropriate sound field. The control method of the sound field is,
for example, one known as Kirchhoff-Helmholtz integration rule or Rayleigh integration rule, and
a wave field synthesis (WFS) method to which this is applied is generally known. Also, the sound
field reproduction signal processing unit 135 may apply the signal processing technology
described in Japanese Patent No. 4674505 and Japanese Patent No. 4735108.
[0047]
The shape of the acoustic closed surface formed by the above-described microphone or speaker
is not particularly limited as long as it is a three-dimensional shape surrounding the user. For
example, an elliptical acoustic closed surface 40-1 as shown in FIG. It may be a cylindrical
acoustic closed surface 40-2 or a polygonal acoustic closed surface 40-3. In the example
illustrated in FIG. 4, the shape of the acoustic closed surface by the plurality of speakers 20B-1 to
20B-12 disposed around the user B at the site B is illustrated as an example, but the shape of the
acoustic closed surface by the plurality of microphones 10 The same is true for
[0048]
(Microphone Position Information DB) The microphone position information DB 15 is a storage
unit that stores position information of a plurality of microphones 10 disposed at a site. Position
information of the plurality of microphones 10 may be registered in advance.
[0049]
09-05-2019
14
(User Position Estimation Unit) The user position estimation unit 16 has a function of estimating
the position of the user. Specifically, the user position estimation unit 16 determines the plurality
of microphones based on the analysis result of the voice collected from the plurality of
microphones 10, the analysis result of the captured image captured by the image sensor, or the
detection result by the human sensor. The relative position of the user to the ten or more
speakers 20 is estimated. In addition, the user position estimation unit 16 may acquire GPS
(Global Positioning System) information and estimate the absolute position (current position
information) of the user.
[0050]
(Recognition Unit) The recognition unit 17 analyzes the voice of the user based on the audio
signal collected by the plurality of microphones 10 and processed by the signal processing unit
13 to recognize a command. For example, the recognition unit 17 morphologically analyzes the
voice of the user I want to talk to Mr. B , and recognizes the call request command based on
the predetermined target B designated by the user and the request speak .
[0051]
(Identification Unit) The identification unit 18 has a function of identifying a predetermined
object recognized by the recognition unit 17. Specifically, for example, the identification unit 18
may determine connection destination information for acquiring a voice or an image
corresponding to a predetermined target. The identification unit 18 may transmit, for example,
information indicating a predetermined target from the communication unit I / F 19 to the
management server 3, and may obtain connection destination information (IP address or the like)
corresponding to the predetermined target from the management server 3 .
[0052]
(Communication I / F) The communication I / F 19 is a communication module for transmitting
and receiving data to and from another signal processing apparatus or the management server 3
through the network 5. For example, the communication I / F 19 according to the present
embodiment inquires of the management server 3 about connection destination information
corresponding to a predetermined target, or the microphone 10 picks up a sound from another
09-05-2019
15
signal processing apparatus as a connection destination. Then, the audio signal processed by the
signal processing unit 13 is transmitted.
[0053]
(Speaker Position Information DB) The speaker position information DB 21 is a storage unit that
stores position information of a plurality of speakers 20 disposed at a site. Position information
of the plurality of speakers 20 may be registered in advance.
[0054]
(DAC / Amplifier Unit) The DAC / amplifier unit 23 has a function (Digital) for converting an
audio signal (digital data) written in an output buffer of each channel for reproduction from the
plurality of speakers 20 into a sound wave (analog data)・ Has to Analog Converter).
[0055]
Further, the DAC / amplifier unit 23 amplifies the converted sound wave, and reproduces
(outputs) each of the plurality of speakers 20.
[0056]
(Array Speakers) As described above, the plurality of speakers 20 are arranged throughout a
certain area (site).
For example, if it is outdoor, it will be arrange ¦ positioned on a floor, a wall, a ceiling etc. if it is
indoors, such as a road, a telephone pole, a streetlight, the outer wall of a house or a building.
Also, the plurality of speakers 20 reproduces the sound wave (sound) output from the DAC /
amplifier unit 23.
[0057]
09-05-2019
16
The configuration of the signal processing device 1 according to the present embodiment has
been described above in detail. Subsequently, the configuration of the management server 3
according to the present embodiment will be described with reference to FIG.
[0058]
[2−3. Management Server] FIG. 5 is a block diagram showing the configuration of the
management server 3 according to the present embodiment. As illustrated in FIG. 5, the
management server 3 includes a management unit 32, a search unit 33, a user position
information DB 35, and a communication I / F 39. Each component will be described below.
[0059]
(Management Unit) The management unit 32 manages information on a place (site) where the
user is currently, based on the user ID (IDentification) or the like transmitted from the signal
processing device 1. For example, the management unit 32 identifies the user based on the user
ID, and associates the IP address or the like of the signal processing apparatus 1 of the
transmission source with the identified user name or the like as connection destination
information and stores it in the user position information DB 35 . The user ID may include a
name, a personal identification number, or biometric information. Also, the management unit 32
may perform user authentication processing based on the transmitted user ID.
[0060]
(User Position Information DB) The user position information DB 35 is a storage unit that stores
information related to the place where the user is currently, according to the management by the
management unit 32. Specifically, the user position information DB 35 stores the ID of the user
and the connection destination information (such as the IP address of the signal processing
apparatus corresponding to the site where the user is present) in association with each other.
Also, the current position information of each user may be updated every moment.
[0061]
09-05-2019
17
(Search Unit) The search unit 33 searches the connection destination information by referring to
the user position information DB 35 in response to the connection destination (calling
destination) inquiry from the signal processing device 1. Specifically, the search unit 33 searches
the user position information DB 35 for the associated connection destination information based
on the name or the like of the target user included in the connection destination inquiry.
[0062]
(Communication I / F) The communication I / F 39 is a communication module for transmitting
and receiving data to and from the signal processing device 1 through the network 5. For
example, the communication I / F 39 according to the present embodiment receives the ID of the
user from the signal processing device 1 and receives the connection destination inquiry. Also,
the communication I / F 39 transmits connection destination information of the target user in
response to the connection destination inquiry.
[0063]
Hereinabove, each configuration of the acoustic system according to the embodiment of the
present disclosure has been described in detail. Next, the operation processing of the sound
system according to the present embodiment will be described in detail with reference to FIGS.
[0064]
<3. Operation processing> [3-1. Basic Process] FIG. 6 is a flowchart showing the basic process
of the sound system according to the present embodiment. As shown in FIG. 6, first, in step S103,
the signal processing apparatus 1A transmits the ID of the user A who is at the site A to the
management server 3. The signal processing apparatus 1A may acquire the ID of the user A from
a tag such as RFID (Radio Frequency IDentification) owned by the user A, or may recognize it
from the voice of the user A. In addition, the signal processing device 1A may read biological
information from the body (face, eyes, hands, etc.) of the user A, and acquire it as an ID.
[0065]
On the other hand, in step S106, the signal processing apparatus 1B similarly transmits the ID of
the user B who is at the site B to the management server 3.
09-05-2019
18
[0066]
Next, in step S109, the management server 3 identifies the user based on the user ID transmitted
from each signal processing device 1, and identifies the IP address of the signal processing
device 1 of the transmission source, etc. Are registered as connection destination information in
association with each other.
[0067]
Next, in step S112, the signal processing device 1B estimates the position of the user B who is at
the site B.
Specifically, the signal processing device 1B estimates the relative position of the user B with
respect to the plurality of microphones disposed at the site B.
[0068]
Next, in step S115, the signal processing apparatus 1B collects the audio signals collected by the
plurality of microphones disposed at the site B into the mouth of the user B based on the
estimated relative position of the user B. Perform microphone array processing so that the
position is in focus.
Thus, the signal processing device 1B is prepared when the user B makes a speech.
[0069]
On the other hand, in step S118, the signal processing apparatus 1A similarly performs
microphone array processing on the audio signals collected by the plurality of microphones
arranged at the site A such that the sound collection position is focused on the mouth of the user
A. , Prepare for the case where the user A makes some remarks. Then, the signal processing
device 1A recognizes the command based on the voice (speech) of the user A. Here, as an
example, the case where the user A utters "I want to talk to Mr. B" and the signal processing
09-05-2019
19
apparatus 1A recognizes it as a "call request for user B" command will be continued. The
command recognition process according to the present embodiment is described in [3-2. The
command recognition process] will be described in detail.
[0070]
Next, in step S121, the signal processing apparatus 1A sends a connection destination inquiry to
the management server 3. As described above, when the command is a call request for the user
B , the signal processing device 1A inquires about the connection destination information of the
user B.
[0071]
Next, in step S125, the management server 3 searches for connection destination information of
the user B in response to the connection destination inquiry from the signal processing device
1A, and transmits the search result to the signal processing device 1A in subsequent step S126.
[0072]
Next, in step S127, the signal processing apparatus 1A identifies (determines) the connection
destination based on the connection destination information of the user B received from the
management server 3.
[0073]
Next, in step S128, the signal processing device 1A sends a signal processing device 1B to the
signal processing device 1B based on the connection destination information of the identified
user B, for example, the IP address of the signal processing device 1B corresponding to the site B
where the user B is currently present. Perform call processing.
[0074]
Next, in step S131, the signal processing apparatus 1B outputs a message asking the user B
whether to respond to the call from the user A (call notification).
Specifically, for example, the signal processing device 1B may reproduce the message from a
09-05-2019
20
speaker disposed around the user B.
Also, the signal processing device 1B recognizes the user B's answer to the call notification based
on the voice of the user B collected from a plurality of microphones disposed around the user B.
[0075]
Next, in step S134, the signal processing device 1B transmits the reply of the user B to the signal
processing device 1A.
Here, the user B makes an OK response, and bidirectional communication between the user A
(the signal processing device 1A side) and the user B (the signal processing device 1B side) is
started.
[0076]
Specifically, in step S137, the signal processing apparatus 1A picks up the voice of the user A at
the site A and starts the audio stream (audio signal) at the site B (signal) in order to start
communication with the signal processing apparatus 1B. Sound collection processing to be
transmitted to the processing device 1B side is performed. In addition, about the sound collection
process by this embodiment, the below-mentioned [3-3. Sound Collection Processing] will be
described in detail.
[0077]
Then, in step S140, the signal processing device 1B forms an acoustic closed surface including
the user B by a plurality of speakers disposed around the user B, and based on the audio stream
transmitted from the signal processing device 1A Perform site playback processing. In the sound
field reproduction processing according to the present embodiment, the sound field of the third
space (site C) is further constructed to provide the user who is in another space and who is in
communication with another user with a sense of immersion in the third space. It is also possible.
About such sound field reproduction processing, it mentions later "4. This will be described in
detail in 3.
09-05-2019
21
[0078]
In the above-described steps S137 to S140, one-way communication is shown as an example, but
in the present embodiment, bidirectional communication is possible. Processing and sound
processing may be performed by the signal processing apparatus 1A.
[0079]
The basic processing of the sound system according to the present embodiment has been
described above.
As a result, the user A does not need to possess a mobile phone terminal, a smartphone, etc., and
simply utters "I want to talk to Mr. B", using other microphones and speakers located in the
vicinity to other places. Can talk with user B who is in Subsequently, the command recognition
processing shown in step S118 will be described in detail with reference to FIG.
[0080]
[3−2. Command Recognition Processing] FIG. 7 is a flowchart showing command
recognition processing according to the present embodiment. As shown in FIG. 7, first, in step
S203, the user position estimation unit 16 of the signal processing device 1 estimates the
position of the user. For example, based on the sound collected by the plurality of microphones
10, the captured image captured by the image sensor, and the arrangement of the microphones
stored in the microphone position information DB 15, the user position estimation unit 16
compares the user with each microphone Position, orientation, and mouth position may be
estimated.
[0081]
Next, in step S206, the signal processing unit 13 selects a microphone group forming an acoustic
closed surface including the user according to the estimated relative position, orientation, and
position of the user of the user.
[0082]
09-05-2019
22
Next, in step S209, the microphone array processing unit 131 of the signal processing unit 13
performs microphone array processing on the audio signal collected from the selected
microphone group, and focuses the microphone directivity so as to focus on the user's mouth.
Control.
Thereby, the signal processing device 1 can be prepared when the user makes a speech.
[0083]
Next, in step S212, the high S / N processing unit 133 performs processing such as reverberation
and noise suppression on the audio signal processed by the microphone array processing unit
131 to improve the S / N ratio.
[0084]
Next, in step S215, the recognition unit 17 performs speech recognition (voice analysis) based on
the audio signal output from the high S / N processing unit 133.
[0085]
Then, in step S218, the recognition unit 17 performs command recognition processing based on
the recognized voice (audio signal).
The specific content of the command recognition process is not particularly limited. For example,
the recognition unit 17 may recognize a command by comparing a previously registered
(learned) request pattern with a recognized voice.
[0086]
When the command can not be recognized in step S218 (S218 / No), the signal processing
device 1 repeats the processing shown in steps S203 to S215.
09-05-2019
23
At this time, since S203 and S206 are also repeated, the signal processing unit 13 can update the
microphone group forming the acoustic closed surface including the user according to the
movement of the user.
[0087]
[3−3. Sound Collection Process] Next, the sound collection process shown in step S137 of
FIG. 6 will be described in detail with reference to FIG. FIG. 8 is a flowchart showing the sound
collection process according to the present embodiment. As shown in FIG. 8, first, in step S308,
the microphone array processing unit 131 of the signal processing unit 13 performs microphone
array processing on the audio signals collected from the selected / updated microphones, and the
user's mouth Control the directivity of the microphone to focus.
[0088]
Next, in step S312, the high S / N processing unit 133 performs processing such as reverberation
and noise suppression on the audio signal processed by the microphone array processing unit
131 to improve the S / N ratio.
[0089]
Then, in step S315, the communication I / F 19 indicates the connection destination indicated by
the connection destination information of the target user identified in the above step S126 (see
FIG. 6) for the audio signal output from the high S / N processing unit 133. (For example, it
transmits to signal processing apparatus 1B).
As a result, the voice uttered by the user A at the site A is collected by the plurality of
microphones disposed around the user A and transmitted to the site B side.
[0090]
The command recognition process and the sound collection process according to the present
embodiment have been described above. Subsequently, the sound field reproduction processing
according to the present embodiment will be described in detail.
09-05-2019
24
[0091]
<4. Sound Field Construction in Third Space> As described above, in the sound field
reproduction process (step S140 in FIG. 6) according to the present embodiment, the sound field
in the third space (site C) is built and is in another space. It is also possible to provide the user
who is in a call with another user with an immersive feeling in the third space. Hereinafter, an
outline of sound field construction for providing such a sense of immersion in the third space will
be described with reference to FIG.
[0092]
FIG. 9 is a diagram for explaining the sound field construction of the third space according to the
present embodiment. As shown in FIG. 9, when the user A at the site A and the user B at the site
B talk, the sound system according to the present embodiment includes the sound field 42 of the
site C which is the third space at each site A, Build in B. Here, as an example, site A, site B and site
C are respectively remote places (remote places). Thus, for example, user B in Tokyo (site B) talks
with user A in the United States (site A) while immersed in the space of Italy (site C) scheduled to
travel with user A. Can.
[0093]
Specifically, the sound system according to the present embodiment uses the sound information
parameters (characteristics of parameters such as impulse response) measured in advance at site
C and the sound contents (environmental sound) collected at site C. The C sound field 42 may be
constructed. Note that such acoustic information parameters and acoustic content of the third
space may be acquired in advance in the third space and stored in the management server.
[0094]
(Method of sound field construction of site C) Here, when user A at site A and user B at site B are
talking, each method in case of constructing sound field of site C at each site This will be
described with reference to FIG. FIG. 10 is a diagram for explaining the sound field construction
09-05-2019
25
method of site C. In the example shown in FIG. 10, as an example, the case where the sound field
of the site C is constructed (providing an immersive feeling to the site C) in the site B where the
user B who talks with the user A is present will be described.
[0095]
As shown in FIG. 10, as method 1, the sound image is localized so that the voice of the user A is
present outside the acoustic closed surface 40B formed by the plurality of speakers 20B so as to
include the user B. Are processed using the acoustic information parameters so that the voice of
C. can be felt as echoing at the site C.
[0096]
Here, when the sound image of the user A is localized outside the acoustic closed surface 40, at
the site B, as shown in FIG. 10, the voice emitted by the user A outside the acoustic closed
surface 40B intersects the acoustic closed surface 40B. Assume a wave front at the time of
Then, the sound image is localized by reproducing from the plurality of speakers 20 so as to
create the assumed wavefront inside the acoustic closed surface 40B.
[0097]
Also, assuming that the user A emits a voice at the site C, the voice of the user A is acoustically
closed with a reflected sound (a reflected sound that differs depending on each material and
structure) due to the structure or obstacle at the site C. In some cases, the curved surface 40B is
reached. Therefore, the audio system according to the present embodiment feels that the voice of
the user A is echoed at the site C by processing the voice of the user A using the acoustic
information parameter (impulse response) measured at the site C in advance. Site 42 is
constructed on site B. Thereby, the user B can further gain a sense of immersion into the site C.
[0098]
As the second method, the voice of the user B who is present inside the acoustic closed surface
40 is picked up, the voice is processed using the acoustic information parameters of the site C,
09-05-2019
26
and the sound is reproduced from the plurality of speakers 20B forming the acoustic closed
surface 40 Do. That is, in order for the user B who is inside the acoustic closed surface 40 to
experience the sound field of the site C with a sense of reality, to be further immersed in the
immersion feeling to the site C, and to feel the size of the space of the site C In addition to the
processing of the other party's voice (Method 1), the change of the voice emitted by oneself is
also important (echo location). Therefore, in the method 2, the sound field 42 in which the sound
emitted by the user B is felt to be reflected at the site C is constructed in the site B. Thereby, the
user B can further obtain the sense of reality of the site C and the sense of immersion into the
site C. A specific realization method of method 2 will be described later with reference to FIGS.
16A and 16B.
[0099]
Method 3 includes the presence of the site C, the site C by reproducing the acoustic content such
as the noise of the site C and the environmental sound from the plurality of speakers 20B
forming the acoustic closed surface 40 including the user B. Increase your sense of immersion.
The acoustic content of site C may be pre-recorded or may be collected in real time.
[0100]
In the above, three techniques in sound field construction for providing an immersive feeling to
the site C have been described with reference to FIG. In the sound system according to the
present embodiment, the sound field may be constructed by one of the above three methods, or
the sound field may be constructed by combining two or more methods.
[0101]
(Specification of Site C) Further, in the present embodiment, the third space (site C) may be
arbitrarily designated by the user or may be a preset location. For example, when the user A who
is present at the site A speaks "I want to talk with the user B (the first target) at the site C (the
second target)", a plurality of microphones 10A arranged in the periphery (FIG. 1 The speech is
collected by reference) and recognized as a command by the signal processing device 1A.
[0102]
09-05-2019
27
Next, the signal processing device 1A requests the management server for connection destination
information for calling user B and sound field construction data of a designated place. Then,
the management server sends connection destination information (here, the IP address of the
signal processing apparatus 1B of the site B where the user B is present, etc.) and data for sound
field construction (here, the site C to the signal processing apparatus 1A). Transmit acoustic
information parameters and acoustic content).
[0103]
Further, when communication is started between the signal processing device 1A and the signal
processing device 1B (when the user B responds to the calling from the user A with an OK
response), the sound field construction data is a signal processing device It is also sent to 1B.
Thereby, the sound field of the site C is constructed at the site A and the site B, and the user A
and the user B who are at different sites can share an immersive feeling in the same place.
[0104]
The outline of the sound field construction for providing the sense of immersion into the third
space has been described above. Subsequently, a configuration of a management server that
accumulates acoustic information parameters and acoustic content in the third space will be
described with reference to FIG.
[0105]
[4−1. Configuration of Management Server] FIG. 11 is a block diagram showing another
configuration of the management server according to the present embodiment. As shown in FIG.
11, the management server 3 ′ has a management unit 32, a search unit 34, a user position
information DB 35, a communication I / F 39, an acoustic information parameter DB 36, and an
acoustic content DB 37. The management unit 32, the user position information DB 35, and the
communication I / F 39 are as described above with reference to FIG. 5, and thus the description
thereof is omitted here.
09-05-2019
28
[0106]
(Searching Unit) First, the searching unit 34 refers to the user position information DB 35 in
response to the connection destination (calling destination) inquiry from the signal processing
device 1 in the same manner as the search unit 33 described above. Search for. Specifically, the
search unit 34 searches the user position information DB 35 for the associated connection
destination information based on the target user's name and the like included in the connection
destination inquiry.
[0107]
Further, in response to the request for the sound field construction data from the signal
processing device 1, the search unit 34 searches and extracts the acoustic information parameter
of the designated site from the acoustic information parameter DB 36. Further, in response to the
request for the sound field construction data from the signal processing device 1, the search unit
34 searches and extracts the acoustic content of the designated site from the acoustic content DB
37.
[0108]
(Acoustic Information Parameter) The acoustic information parameter DB 36 is a storage unit
that stores acoustic information parameters measured in advance at each site. The acoustic
parameter may be one obtained by measuring an impulse response from any one or more points
(positions where the sound image is to be localized) at each site. Moreover, when measuring an
impulse response, an S / N ratio improves by using a TSP (Time Streched Pulse) response, a
Swept-Sine method, M series response, etc.
[0109]
Here, measurement of acoustic information parameters will be described with reference to FIG. In
the measurement 1 shown in FIG. 12, an acoustic information parameter (the first one used in
processing the voice of the other user of the call) localized at an arbitrary position outside the
acoustic closed surface 40 in the method 1 described with reference to FIG. The measurement of
the acoustic information parameter 1) will be described. As shown in FIG. 12, a sound source
09-05-2019
29
(speaker 20 C) is installed at an arbitrary position outside the closed curved surface 43 formed
by the plurality of microphones 10 C by the plurality of outward directional microphones 10 C
disposed at the site C. And how each of the microphones 10C is transmitted (impulse response).
[0110]
In the example shown in FIG. 12, one speaker 20C is arranged in the measurement 1. However,
the present invention is not limited to this. A plurality of speakers 20C are arranged outside the
closed surface 43 and each microphone 10C from each speaker 20 The transmission to each may
be measured. As a result, in the above method 1, it is possible to increase the locations where the
sound source of the user A can be localized.
[0111]
Further, in the measurement 2 shown in FIG. 12, an acoustic information parameter (second
acoustic information parameter) used when processing the user's own voice located inside the
acoustic closed surface 40 in the method 2 described with reference to FIG. Explain the
measurement of. As shown in FIG. 12, a plurality of outward directed microphones 10C arranged
at the site C are outputted from a sound source (speaker 20C) installed inside a closed curved
surface 43 formed by the plurality of microphones 10C. It measures how impulse noise (signal
for measurement) is transmitted to each microphone 10C under the influence of reflection and
echo at the site C (impulse response). In the example shown in FIG. 12, in the measurement 2,
one speaker 20C is arranged as an example, but the present embodiment is not limited to this,
and a plurality of speakers 20C are arranged inside the closed surface 43. The transmission from
each of the T.20 to each of the microphones 10C is measured.
[0112]
(Acoustic content) The acoustic content DB 37 is a storage unit that stores the acoustic content
collected at each site. The acoustic content is, for example, surrounding sound (environmental
sound, noisy voice, etc.) recorded (measured) at each site.
[0113]
09-05-2019
30
In the measurement of acoustic content, for example, as shown in the measurement 3 of FIG. 12,
the surrounding sound is measured (recorded) by a plurality of outward directional microphones
10C arranged at the site C. In addition, the measurement of the ambient sound may be
performed separately by time or weekday / holiday. As a result, the sound system according to
the present embodiment can construct sound fields by time of the site C and by weekdays /
holidays. In addition, in the site B which is the reproduction environment, it is also possible to
reproduce the acoustic content close to the current time.
[0114]
The closed surface 43 formed by the plurality of microphones 10C shown in FIG. 12 may be
formed larger than the acoustic closed surface of the listening environment (reproduction
environment). This will be described below with reference to FIG. FIG. 13 is a diagram comparing
the arrangement of the plurality of microphones 10C in the measurement environment (here, site
C) with the arrangement of the plurality of speakers 20B in the listening environment (here, site
B).
[0115]
As shown in FIG. 13, with respect to the acoustic closed surface 40 formed by the plurality of
speakers 20B so as to include the user B, the plurality of microphones 10C used at the time of
measurement at the site C is a closed surface larger than the acoustic closed surface 40 It is
arranged to form 43.
[0116]
Further, as described above with reference to FIG. 4, at the site B in the listening environment
(reproduction environment), the plurality of speakers 20B-1 to 20B-12 three-dimensional
acoustic closed surface 40-1, 40-2, 40 -3 is formed.
Therefore, as shown in FIG. 14 as well, at the site C in the measurement environment, threedimensional closed surfaces 43-1, 43-2, and 40 by directional microphones 10C-1 to 10C-12
directed outward. -3 may be formed.
09-05-2019
31
[0117]
The respective configurations of the management server 3 'according to the present embodiment
have been described in detail above. Subsequently, the control on the site B side of the listening
environment (reproduction environment) for constructing the sound field of the site C by using
the method 1 to the method 3 (see FIG. 12) will be described. On the site B side, an optimal
sound field is formed by the sound field reproduction signal processing unit 135 (see FIG. 3) of
the signal processing device 1B. Hereinafter, with reference to FIG. 15, the configuration of the
sound field reproduction signal processing unit 135 for realizing the above methods 1 to 3 and
constructing a sound field will be specifically described.
[0118]
[4−2. Configuration of Sound Field Reproduction Signal Processing Unit] FIG. 15 is a block
diagram for describing a configuration of sound field reproduction signal processing unit 135 for
constructing a sound field so as to provide an immersive feeling to site C. Further, in FIG. 15,
among the signal processing devices 1B, main configurations related to the description here are
shown, and other configurations are omitted.
[0119]
As shown in FIG. 15, the sound field reproduction signal processing unit 135 functions as a
convolution (convolution integration) unit 136, howling suppression units 137 and 139, and a
matrix convolution (convergence matrix) unit 138.
[0120]
(Convolution Unit) The Convolution unit 136 has a function to realize the above-described
method 1 (sound image localization of the user A, echo sound processing of the user A).
Specifically, the Convolution unit 136 sets the audio signal b (voice of the user A) acquired
(received) from the signal processing device 1A of the site A via the communication I / F 19 to
the acoustic information parameter c of the site C (No. 1. Render separately for each output
speaker using (1) acoustic information parameter). At this time, the Convolution unit 136 may
use the acoustic information parameter c (impulse response) of the site C according to the
09-05-2019
32
localization position in consideration of the parameter a of the position at which the sound image
of the user A is localized. The parameter a of the position for localizing the sound image of the
user A may be transmitted from the signal processing device 1A or the management server 3 'via
the communication I / F 19, or based on the instruction of the user B. It may be calculated by the
signal processing device 1B. Also, the Convolution unit 136 may acquire the acoustic information
parameter c (impulse response) of the site C from the management server 3 ′ via the
communication I / F 19.
[0121]
Then, as shown in FIG. 15, the Convolution unit 136 is an audio signal obtained by performing
the above signal processing on the output buffer of each output speaker (a plurality of speakers
20B forming the acoustic closed surface 40B including the user B). Write
[0122]
(Howling Suppression Unit) The howling suppression units 137 and 139 are, as necessary,
shown in FIG. 15, the rear stage of the amplifier / ADC unit 11 of the microphone and the front
stage of the DAC / amplifier unit 23 of the speaker as shown in FIG. In order to perform
cooperative processing, both are provided.
As described above, in the method 2, the sounds collected from the plurality of microphones 10B
arranged around the user B are rendered using the acoustic information parameter (impulse
response), and the plurality of sounds arranged around the user B are rendered. Play from the
speaker 20B. At this time, since the positions of the microphone and the speaker are close to
each other, there is a possibility that excessive oscillation may occur due to the action of both.
For this reason, in the example shown in FIG. 15, the howling suppression parts 137 and 139 are
provided, and the howling suppression processing is performed. The sound field reproduction
signal processing unit 135 may have an echo canceler in addition to the howling suppression
units 137 and 139 in order to prevent the excessive oscillation operation.
[0123]
(Matrix Convolution Unit) The Matrix Convolution unit 138 has a function to realize the abovementioned Method 2 (echo sound processing of the user B). Specifically, the Matrix Convolution
unit 138 generates the audio information parameter c of the site C (the sound generated in the
09-05-2019
33
closed acoustic surface 40B) as the audio signal collected by the plurality of microphones 10B
disposed at the site B. Render separately for each output speaker using the acoustic information
parameter of 2: impulse response group). Thereby, an audio signal for constructing a sound field
in the site B in which the sound generated in the closed acoustic surface 40B of the site B, for
example, the sound of the user B's own voice is felt to be reflected at the site C, is output.
[0124]
Here, a method of realizing the method 2 according to the present embodiment will be
specifically described with reference to FIGS. 16A and 16B. FIG. 16A is a diagram for explaining
measurement of an impulse response at site C. As shown in FIG. 16A, first, impulse response
from each speaker 20C arranged at the site C and facing the outside of the closed surface 43 to
each microphone 10C arranged at the site C and facing the outside of the closed surface 43 Is
measured.
[0125]
Specifically, impulse responses from a single speaker on the closed surface 43 to a plurality of
microphone groups on the closed surface 43 are also measured. This impulse response can also
be considered as a transfer function influenced by space acoustics such as a structure / obstacle
of site C, in the frequency axis.
[0126]
Here, in the example shown in FIG. 16A, the positions of the microphones and the speakers on
the closed surface 43 are expressed as R1, R2,. Then, as shown in FIG. 16A, each transfer
function is measured from the speaker (SP) disposed at R1 to the microphone disposed at the
microphone disposed at R1, the microphone disposed at R2,. Ru. Next, each transfer function is
measured from the speaker arranged in R2, to the microphone arranged in R1, the microphone
arranged in R2, ..., and the microphone arranged in RN.
[0127]
Next, the transfer function from the speaker located in R1 to the microphone located in R1 is
R11, and the transfer function from the speaker located in R1 to the microphone located in R2 is
09-05-2019
34
R12, the formula shown in FIG. As shown in), matrix representation using a transfer function R is
possible.
[0128]
The matrix data is stored as an acoustic information parameter in the management server 3 'or
the like, and is used when constructing the sound field of the site C at the site B.
Subsequently, a case of constructing a sound field of site C using matrix data at site B will be
described with reference to FIG. 16B.
[0129]
FIG. 16B is a diagram for describing an operation using the impulse response group by the
matrix convolution unit 138. In the example shown in FIG. 16B, on the site B (reproduction
environment) side, a closed surface having the same size and shape as the measurement at the
site C is assumed. Further, the numbers of the plurality of microphones 10B and the plurality of
speakers 20B arranged at the site B are also the same as at the time of measurement at the site C,
and the arrangement positions are also the same as those at the time of measurement at the site
C R1, R2,. Suppose that However, as shown in FIG. 16B, the plurality of microphones 10B and the
plurality of speakers 20B face the inside of the acoustic closed surface 40B.
[0130]
Also, as shown in FIG. 16B, let V1, V2,... VN be frequency axis representations collected by the
microphones at the positions of R1, R2,. Further, output signals (audio signals) output
(reproduced) from the speakers at positions R1, R2,... RN at the site B are W1, W2,.
[0131]
In this case, the wave front of the sound (voice or object sound of the user A) generated inside
the acoustic closed surface 40B of the site B reaches the acoustic closed surface 40B and is
directed inward at the position of R1, R2,. The sound is collected by the microphones 10B, and
the collected sound signals of V1, V2,... VN are acquired by the respective microphones 10B.
09-05-2019
35
[0132]
The Matrix Convolution unit 138 then uses the signal group (microphone input) of V 1, V 2,... V
N and the matrix (Equation 1) of the transfer function group described with reference to FIG.
Equation 2 is executed to calculate the outputs W1, W2,... WN from the respective speakers 20B.
[0133]
As described above, the Matrix Convolution unit 138 uses the acoustic information parameters
(transfer function group) of the site C for the audio signals (V1, V2,... VN) collected by the
plurality of microphones 10B. Perform signal processing.
Also, as shown in FIG. 15, the Matrix Convolution unit 138 adds the audio signals (W1, W2,...
WN) subjected to the above signal processing to the output buffer of each output speaker.
[0134]
(Addition of Sound Content) As shown in FIG. 15, the sound field reproduction signal processing
unit 135 outputs the sound content d of the site C received from the management server 3 'via
the communication I / F 19 to the output buffer of each output speaker The above method 3 is
realized by performing processing to add to.
[0135]
Heretofore, the configuration of the sound field reproduction signal processing unit 135 of the
signal processing device 1B according to the present embodiment has been described in detail.
Next, sound field reproduction processing at the time of constructing the sound field of site C at
site B will be specifically described with reference to FIG.
[0136]
09-05-2019
36
[4−3.
Sound Field Reproduction Processing] FIG. 17 is a flowchart showing sound field reproduction
processing according to the present embodiment. As shown in FIG. 17, first, in step S403, the
user position estimation unit 16 (see FIG. 3) of the signal processing device 1B estimates the
position of the user B. For example, based on the sound collected by the plurality of microphones
10B, the captured image captured by the image sensor, the arrangement of each speaker stored
in the speaker position information DB 21, and the like, the user position estimation unit 16
determines the user B for each speaker 20B. The relative position, orientation, mouth position
and ear position may be estimated.
[0137]
Next, in step S406, the signal processing unit 13 forms a closed acoustic wave group including
the user according to the estimated relative position and orientation of the user B, the position of
the mouth, and the position of the ear. To elect.
[0138]
Next, in step S407, the sound field reproduction signal processing unit 135 of the signal
processing unit 13 causes the Convolution unit 136 to respond to the received audio signal b
(voice of the user A collected at the site A) in FIG. The process of method 1 shown is performed.
Specifically, as shown in FIG. 15, the Convolution unit 136 uses the audio information parameter
c (first acoustic information parameter) of the site C for the audio signal b received from the
signal processing device 1A of the site A. Render separately for each output speaker chosen.
Then, the convolution unit 136 writes the audio signal subjected to the process of method 1 into
the output buffer of each of the selected output speakers.
[0139]
Next, in step S409, the sound field reproduction signal processing unit 135 causes the Matrix
Convolution unit 138 to perform the processing of method 2 shown in FIG. 10 on the voice of the
user B collected at the site B by the selected microphone group. Specifically, the Matrix
Convolution unit 138 generates an audio signal picked up by a microphone group (a plurality of
09-05-2019
37
microphones 10B) forming an acoustic closed surface including the user B as a sound
information parameter c (second sound of the site C). Render separately for each output speaker
using information parameters). Then, the Matrix Convolution unit 138 adds the audio signal
subjected to the process of Method 2 to the output buffer of each of the selected output
speakers.
[0140]
Next, in step S411, the sound field reproduction signal processing unit 135 adds the acoustic
content d of the site C to the output buffer of each of the selected output speakers as processing
of method 3 shown in FIG.
[0141]
Then, in step S415, the signal processing device 1B outputs the contents of each output buffer
from the speaker group selected in step S406 through the DAC / amplifier unit 23.
[0142]
As described above, in the sound system according to the present embodiment, the voice of the
user A collected at the site A is rendered using the first sound information parameter measured
at the site C, and the sound at the site C is echoed. , And is reproduced from the plurality of
speakers 20B of site B.
Also, the voice of the user B himself picked up at the site B is rendered using the second acoustic
information parameter measured at the site C, and with the echo at the site C, the plurality of
speakers 20B of the site B Played from.
Further, the acoustic content collected at the site C is reproduced from the plurality of speakers
20B of the site B.
[0143]
Thereby, the sound system according to the present embodiment is the third space (here, the
site) when interlinking (eg, calling) with another site (here, the site A) at one site (here, the site B).
09-05-2019
38
C) can provide a sense of immersion. The user B can obtain a feeling of sound like being at the
site C with the user A, and can be immersed in a richer sense of reality.
[0144]
The sound field reproduction signal processing unit 135 can also control the sound image of the
received audio signal (voice of the user A) by using a speaker group disposed around the user B.
For example, by forming an array speaker (beam forming) with a plurality of speakers, the sound
field reproduction signal processing unit 135 reproduces the voice of the user A at the ear of the
user B, or the acoustic closed surface including the user B It is possible to reproduce the sound
image of the user A outside.
[0145]
By continuously performing the above-described S403 and S406, the signal processing unit 13
can update the speaker group forming the acoustic closed surface including the user B according
to the movement of the user B. Hereinafter, this will be specifically described with reference to
FIGS. 18A and 18B.
[0146]
FIG. 18A is a diagram for describing the case where the sound field 42 constructed at the site B is
fixed. As shown in FIG. 18A, first, a plurality of speakers 20B are selected to form an acoustic
closed surface 40 including the user B (steps S403 and S406), and a sound field 42 providing a
sense of immersion into the site C is selected. Assume that it is built. In this case, when the user B
moves out of the room, moves out of the room, or the like, it gets out of the sound field 42 and
the user does not get a sense of immersion into the site C. .
[0147]
Therefore, as described above, the above-described S403 and S406 are continuously performed,
and the speaker group forming the acoustic closed surface including the user B is updated
according to the movement of the user B. FIG. 18B is a diagram for describing the case where the
09-05-2019
39
sound field 42 constructed at the site B flows.
[0148]
As shown in FIG. 18B, according to the movement of the user B, a plurality of speakers are newly
selected (updated), and a speaker group (speaker 20B ′) forming an acoustic closed surface 40
′ including the user B is updated. A sound field 42 'is newly constructed by 20B'.
[0149]
In the above, each operation processing of the acoustic system by this embodiment was
explained in detail.
Subsequently, a supplement of the present embodiment will be described.
[0150]
<5. Supplement> [5-1. Modification of Command Input] Although the command is input by
voice in the above embodiment, the command input method of the sound system according to
the present disclosure is not limited to voice input, and may be another input method. For
example, the signal processing device 1 according to the present embodiment may detect a user
operation on each switch (an example of the operation input unit) disposed around the user, and
may recognize a command such as a call request. Further, in this case, the signal processing
apparatus 1 also specifies a call destination (the name of the target user, etc.) and a specification
of the immersion target place (the place name, etc.) as well as a touch panel etc. arranged around
the user (operation input unit Can be accepted by
[0151]
Further, the recognition unit 17 of the signal processing device 1 analyzes the gesture of the user
based on the image captured by the imaging unit disposed around the user, and the detection
result acquired by the infrared / thermal sensor, as a command. It may be recognized. For
example, when the user makes a call to make a call, the recognition unit 17 recognizes it as a call
request command. In this case, the signal processing apparatus 1 may also receive designation of
09-05-2019
40
a call destination (the name of the target user, etc.) and designation of a immersive target (the
place name, etc.) from the touch panel or the like arranged around the user. It may be
determined based on voice analysis.
[0152]
If the user feels that it is difficult to hear the sound while talking with another user who is at
another site (when the voices of the other user are being reproduced from a plurality of
microphones 10 around the user), the user can use gestures. Control of the playback sound may
be required. Specifically, for example, the recognition unit 17 may recognize, as a volume up
command, a gesture for bringing the extended hand close to the ear or a gesture for bringing the
both hands close to the head to imitate the rabbit's ear.
[0153]
As described above, the command input method of the sound system according to the present
disclosure is not limited to voice input, and may be switch operation or gesture input.
[0154]
[5−2.
Another Example of Command] In the above embodiment, the case where a person is specified as
a predetermined target and a call request (call request) is recognized as a command has been
described, but the sound system command according to the present disclosure is a call request
(call It is not limited to the request, but may be another command. For example, the recognition
unit 17 of the signal processing device 1 may recognize a command for reproducing a place, a
building, a program, a song, etc. designated as a predetermined target in a space where the user
is present.
[0155]
In addition, the sound system according to the present embodiment may reproduce another
space in real time in the space where the user is present, a designated place, a past space such as
a building (for example, in the past in a famous opera theater You may reproduce the performed
name concert etc.).
09-05-2019
41
[0156]
[5−3.
Conversion from Large Space to Small Space] Here, in the embodiment described above, the
closed surface on the site B (reproduction environment) side and the closed surface on the site C
(measurement environment) side have approximately the same size and shape. Although it is
assumed, the present embodiment is not limited to this. For example, in the present embodiment,
even when the closed surface on the reproduction environment side is smaller than the closed
surface on the measurement environment side, it is possible to reproduce the sound field (the
spread of space) of the measurement environment in the reproduction environment. .
[0157]
In such a large space to small space conversion process, the audio signal received by the signal
processing apparatus 1 (the audio signal of the user A or the sound of the user A before the
implementation of the methods 1 and 3 described above with reference to FIG. Content etc.). In
addition, the acoustic system according to the present embodiment can solve the mismatch
problem of the correspondence between the positions of the speakers and the microphones on
the measurement environment side and the reproduction environment side by performing such
conversion processing in real time.
[0158]
Specifically, for example, the sound field reproduction signal processing unit 135 may use signal
processing using a transfer function disclosed in Japanese Patent No. 4775487. Japanese Patent
No. 4775487 finds a transfer function (measurement data of impulse response) in the sound
field of the measurement environment, and further reproduces an audio signal subjected to
arithmetic processing based on the transfer function in the reproduction environment, and
measures the measurement environment in the reproduction environment. It reproduces the
sound field (reverberation, sound image localization, etc.). Hereinafter, signal processing using a
transfer function (measurement data of impulse response) will be described with reference to
FIGS. 19A to 19C.
09-05-2019
42
[0159]
FIG. 19A is a diagram for describing measurement in the measurement target space. First, as
shown in FIG. 19A, in the measurement target space (large space), M microphones forming a
large closed surface P are disposed, and measurement for Mch (for M speaker output channels)
is performed. Also, let the positions of each of the M microphones be P1, P2,. Then, a
measurement signal is output from the speaker (SP) disposed outside the closed surface P, and
impulse responses from the speakers to the microphones disposed in P1, P2,... PM are measured.
The impulse response (transfer function) thus measured is shown in equation (3) of FIG. 19A.
[0160]
Next, measurement in the anechoic chamber will be described with reference to FIG. 19B. As
shown in FIG. 19B, in the anechoic chamber, M speakers forming a large closed surface P are
disposed, and inside the closed surface P, N microphones forming a small closed surface Q are
disposed. Measurements (for N speaker output channels) are performed. Here, the positions of
the M speakers are assumed to be positions P1, P2,... PM similar to FIG. 19A. Also, let the
positions of each of the N microphones be Q1, Q2, ... QN.
[0161]
And the sound (measurement signal) collected by the microphone arranged in P1 of FIG. 19A is
outputted from the speaker arranged in P1, and impulse to each microphone arranged in Q1,
Q2,. The response is measured. Next, sounds (measurement signals) collected by the
microphones arranged in P2 of FIG. 19A are output from the speakers arranged in P2, and the
respective microphones arranged in Q1, Q2,. The impulse response is measured. In this way, all
impulse responses from each of the M speakers to each of the microphones arranged in Q1, Q2,
... QN are measured.
[0162]
The M vertical vectors thus measured can be converted into N outputs by performing an M × N
09-05-2019
43
matrix operation. That is, the impulse response (transfer function) measured in this way is
matrixed (matrix generation of transfer function group) as shown in equation (4) in FIG. 19B,
from a large space (coefficient for Mch) to a small space Implement conversion to (Nch
coefficient).
[0163]
Next, reproduction in the reproduction target space (small space) will be described with
reference to FIG. 19C. As shown in FIG. 19C, in the reproduction target space, N speakers
forming a small closed surface Q including the user B are disposed. Here, the positions of the N
speakers are assumed to be the positions Q1, Q2,... QN similar to FIG. 19B.
[0164]
In this case, when the received audio signal (for example, the voice of the user A; the voice signal
S) is output from each of the speakers arranged in Q1, Q2, ... QN, the output of each speaker is
the expression shown in FIG. 19C. It is obtained by (5). The equation (5) is an operation using the
impulse response (transfer function) shown in the equations (3) and (4).
[0165]
Thus, for example, when the sound image of the user A is localized outside the closed surface Q,
as shown in FIG. 19C, the wave front when the voice emitted by the user A outside the closed
surface Q intersects the closed surface Q is Assume that the assumed wavefront is created inside
the closed surface Q. At this time, by converting the mismatch between the number of
microphones in the measurement target space and the number of speakers in the reproduction
target space according to the above equation (5), the acoustic system according to the present
embodiment can The sound field of a large closed surface P can be reproduced.
[0166]
[5−4. Image Construction] Furthermore, in the above embodiment, the provision of a sense
of immersion into the third space is realized by sound field construction (sound field
09-05-2019
44
reproduction processing), but the acoustic system according to the present disclosure is not
limited to this, Video construction may be used.
[0167]
For example, the signal processing apparatus 1 receives, from a predetermined server, videos
captured by a plurality of image sensors disposed in the third space (site C), and sounds collected
by a plurality of microphones disposed at the site A When reproducing at site B, the video of site
C may be reproduced to reproduce the space of site C.
[0168]
The reproduction of the image may be, for example, spatial projection by hologram reproduction,
or may be reproduction on a television in a room, a display, or a head mounted display worn by a
user.
As described above, by performing the video construction in addition to the sound field
construction, the user can obtain an immersive feeling in the third space, and can feel more
realistic.
[0169]
[5−5. Other System Configuration Example] The system configuration of the audio system
according to the above-described embodiment described with reference to FIGS. 1 and 2 is
similar to the system configuration of the calling side (site A) and the called side (site B). A
plurality of microphones and speakers are arranged in the signal processing unit 1A and 1B to
process the signal. However, the system configuration of the sound system according to the
present embodiment is not limited to the configuration shown in FIGS. 1 and 2, and may be, for
example, a configuration as shown in FIG.
[0170]
FIG. 20 is a diagram showing another system configuration of the sound system according to the
present embodiment. As shown in FIG. 13, in the audio system according to the present
09-05-2019
45
embodiment, the signal processing device 1, the communication terminal 7, and the management
server 3 are connected via the network 5.
[0171]
The communication terminal 7 has an ordinary single microphone and a single speaker such as a
mobile phone terminal and a smart phone, and is a legacy to a high-performance interface space
in which a plurality of microphones and a plurality of speakers are arranged according to the
present embodiment. It is an interface.
[0172]
The signal processing apparatus 1 according to the present embodiment can be connected to a
normal communication terminal 7 and can reproduce voices received from the communication
terminal 7 from a plurality of speakers disposed around the user.
In addition, the signal processing device 1 according to the present embodiment can transmit the
voice of the user collected from the plurality of microphones disposed around the user to the
communication terminal 7.
[0173]
As described above, according to the sound system according to the present embodiment, the
first user who is in the space where the plurality of microphones and the plurality of speakers
are arranged in the periphery and the second user who possesses the ordinary communication
terminal 7 It is possible to realize a call with That is, in the configuration of the sound system
according to the present embodiment, one of the calling side and the called side may be a highly
functional interface space in which a plurality of microphones and a plurality of speakers
according to the present embodiment are disposed.
[0174]
[5−6. Autonomous Acoustic System] In the above embodiment, as described with reference
to FIGS. 1 to 3, the signal processing apparatus 1 inputs / outputs the plurality of microphones
09-05-2019
46
10 and the plurality of speakers 20 disposed around the user. Although controlled, the
configuration of the acoustic system according to the present disclosure is not limited thereto.
For example, a plurality of autonomous microphones and speakers are arranged around the user,
and the devices communicate with each other to form an acoustic closed surface including the
user according to each determination, thus realizing the above-described sound field construction
You may do it. Hereinafter, such an autonomous sound system will be specifically described with
reference to FIGS. 21 to 24. In the autonomous acoustic system described here, as an example,
the case where a plurality of devices 100 each having one microphone 10 and one speaker 20
are disposed around the user will be described.
[0175]
(System Configuration) FIG. 21 is a diagram showing an example of a system configuration of an
autonomous acoustic system according to the present disclosure. As shown in FIG. 21, the
autonomous sound system according to the present disclosure includes a plurality of devices 100
(100-1 to 100-4), a management server 3, a user ID DB 6, a service log DB 8, and a user personal
DB 9. Further, as shown in FIG. 21, the management server 3, the user ID DB 6, the service log
DB 8, and the user personal DB 9 are connected via the network 5.
[0176]
Device A plurality of devices 100 (100-1 to 100-4) are distributed throughout the world such as
a room, a house, a building, an outdoor area, a region, and a country. The example shown in FIG.
21 shows the case where a plurality of devices 100 are disposed on the wall and floor of a public
place such as a department store or a station. Also, the plurality of devices 100 (100-1 to 100-4)
can communicate with each other wirelessly or by wire, and notify each other of the capabilities
of the own device. Further, among the plurality of devices 100 (100-1 to 100-4), at least one
device 100 (for example, the device 100-1) is accessible to the network 5. The device 100 also
has a microphone 10 and a speaker 20, respectively. The configuration of the device according
to the present embodiment will be described later with reference to FIG.
[0177]
Information Notification As described above, the plurality of devices 100 (100-1 to 100-4) notify
each other of the capability (characteristic information) of the own device. The characteristic
09-05-2019
47
information to be notified includes a device ID, a service that can be provided by the own device,
a device owner ID, and a loan attribute of the device. Here, the owner ID is the owner (installer)
ID of the device 100, and each device 100 (100-1 to 100-4) shown in FIG. 21 is assumed to be
installed by an individual or a corporation. Ru. The device's lending attribute is information
indicating an attribute of a service for which lending (use) is permitted in advance by an
individual or a corporation who is a device installer.
[0178]
In addition, notification of information is performed autonomously and decentrally, periodically
or on demand. Further, the information notification method according to the present
embodiment may use a procedure generally known as a mesh network configuration method
(Beaconing of IEEE 802.11s, etc.).
[0179]
Also, the device 100 may be provided with a plurality of types of communication I / F (interface).
In this case, each device 100 periodically checks which device it can communicate with when
using which communication I / F, and can prioritize communication I / F that can directly
communicate with the largest number of devices. to start.
[0180]
Also, each device 100 may transfer broadcast information from devices in the vicinity of its own
device to a device several hops away using a wireless I / F, or may transmit it to another device
via the network 5 Good.
[0181]
The management server management server 3 manages the absolute position (current position)
of each user as described with reference to FIGS. 2 and 5.
Alternatively, the management server 3 may be a management server 3 ′ that stores acoustic
information parameters and the like of the third space described with reference to FIG. 11.
09-05-2019
48
[0182]
Service Log DB The service log DB 8 is a storage unit that stores the service content, the device
100 contributing to the provision of the service, and the user providing the service in association
with each other. As a result, the service log DB 8 can grasp which device is used for which service
provision, which user is provided with what service, and the like.
[0183]
Also, the service log stored in the service log DB 8 may be used as charging information for the
user (user) who has used the service later, or the person who installs the device 100 (individual /
corporate) who has contributed to the service provision. It may be used as kickback information
for Here, the kickback information is information used when providing a part of the usage fee to
the owner (installer) of the device 100 according to the contribution rate (frequency) of the
device 100 to service provision, etc. is there. In addition, the service log stored in the service log
DB 8 may be transmitted to the user personal DB 9 as metadata of the action of the user.
[0184]
The user personal DB user personal DB 9 stores, as user-owned data, the user's action metadata
transmitted from the service log DB 8. The data stored in the user personal DB 9 can be used in
various personalization services and the like.
[0185]
-User ID DB The user ID DB 6 is a storage unit that associates and stores the ID (name, password,
biometric information, etc.) of the registered user with the service that the user is permitted to
provide. The user ID DB 6 is used when the device 100 performs user authentication.
[0186]
09-05-2019
49
The system configuration of the autonomous acoustic system according to the present
embodiment has been described above with reference to FIG. Subsequently, the configuration of
the device 100 (signal processing apparatus) according to the present embodiment will be
described with reference to FIG.
[0187]
Device Configuration FIG. 22 is a block diagram showing the configuration of the device 100
according to the present embodiment. As shown in FIG. 22, the device 100 includes a
microphone 10, an amplifier / ADC unit 11, a signal processing unit 200, a recognition unit 17,
an identification unit 18, a communication I / F 19, a user authentication unit 25, a user position
estimation unit 16, a DAC The amplifier unit 23 and the speaker 20 are included. The
microphone 10, the amplifier / ADC unit 11, the recognition unit 17, the identification unit 18,
the communication I / F 19, the user position estimation unit 16, the DAC / amplifier unit 23,
and the speaker 20 have been described with reference to FIG. The explanation in is omitted.
[0188]
Signal Processing Unit The signal processing unit 200 includes a high S / N processing unit 210
and a sound field reproduction signal processing unit 220. Similar to the high S / N processing
unit 133 shown in FIG. 3, the high S / N processing unit 210 has high clarity and an S / N ratio
with respect to the audio signal output from the amplifier / ADC unit 11. It has a function to
process it to be a good monaural signal. Specifically, the high S / N processing unit 210 separates
the sound source and performs reverberation and noise suppression. The audio signal processed
by the high S / N processing unit 210 is output to the recognition unit 17 and subjected to voice
analysis for command recognition, or transmitted to an external device via the communication I /
F 19.
[0189]
The sound field reproduction signal processing unit 220 performs signal processing on the audio
signal reproduced from the speaker 20 and performs control so that the sound field is localized
near the position of the user. In addition, the sound field reproduction signal processing unit 220
cooperates with another device 100 adjacent thereto to control the output content (audio signal)
09-05-2019
50
from the speaker 20 so as to form an acoustic closed surface including the user.
[0190]
-User Authentication Unit The user authentication unit 25 inquires of the user ID DB 6 on the
network 5 via the communication I / F 19 based on the user ID acquired from a tag such as an
RFID possessed by the user, and performs user authentication. For example, when the acquired
user ID matches the ID registered in advance in the user ID DB 6, the user authentication unit 25
authenticates as a user permitted to provide service.
[0191]
The configuration of the device 100 according to the present embodiment has been described
above in detail. Subsequently, operation processing of the autonomous acoustic system according
to the present embodiment will be described with reference to FIG.
[0192]
(Operation Process) FIG. 23 is a flow chart showing the operation process of the autonomous
acoustic system according to the present embodiment. As shown in FIG. 23, first, in step S503,
the device 100 performs a preparation process. Specifically, the device 100 broadcasts the
above-described characteristic information with the other device 100, and confirms a compatible
(reliable) device.
[0193]
For example, even if the device 100-1 checks whether or not the adjacent device 100-2 is
reliable, based on the owner ID, the lending attribute, and the like included in the characteristic
information received from the adjacent device 100-2. Good. Devices that have been confirmed to
be reliable operate the actuators of their own device, capture the output results with the sensors
of the adjacent devices, etc., and it is possible to work together by combining the characteristics
of the devices. You can figure out Such verification procedures may be performed periodically. In
addition, each device 100 should gently grasp what kind of service can be provided in the space
09-05-2019
51
in which a plurality of adjacent devices 100 (100-1 to 100-4) are arranged through such a
confirmation procedure. Is possible.
[0194]
Next, in step S506, when the user appears in the space in which the plurality of devices 100
(100-1 to 100-4) are disposed, the device 100 performs user authentication. For example, as
shown in FIG. 21, when the user carries a tag 60 such as an RFID, the devices 100-1 to 100-4
arranged in the periphery receive the characteristic information notified from the tag 60, and the
user's The appearance may be detected. Then, when the appearance of the user is detected, each
device 100 inquires of the user ID DB 6 on the network 5 based on the user ID included in the
characteristic information notified from the tag 60, and may or may not provide the service.
Certify.
[0195]
When the user does not possess the tag 60, the devices 100-1 to 100-4 may detect the
appearance of the user by various sensors (a microphone, a camera, a human sensor, a heat
sensor, and the like). The devices 100-1 to 100-4 may extract the ID (e.g., biological information)
of the user by analyzing detection results of various sensors.
[0196]
Further, in the example illustrated in FIG. 21, the device 100-1 among the devices 100-1 to 1004 has an access path to the user ID DB 6. In this case, the device 100-2, 100-3 or 100-4 which
has acquired the user ID transmits the user ID to the device 100-1, and the device 100-1 inquires
the user ID DB 6 for user authentication. You may go. As described above, it is not necessary that
all of the plurality of devices 100-1 to 100-4 hold the access to the user ID DB 6.
[0197]
Also, the result of user authentication performed by one of the devices 100-1 to 100-4 is shared
with other devices 100 in the vicinity, and the devices 100-1 to 100-4 serve the user Understand
that you can provide
09-05-2019
52
[0198]
Next, in step S509, the device 100 recognizes a command (service request) from the user.
Here, the device 100 may notify the tag 60 of information on services that can be provided to the
authenticated user. The tag 60 can notify the user what kind of service can be received at this
location by each output means (not shown) such as a speaker or a display unit. Further, the tag
60 specifies a service currently desired by the user by command input (microphone, gyro, key
touch, etc.) from the user, and notifies the peripheral devices 100-1 to 100-4 of the service.
[0199]
When the user does not possess the tag 60, the devices 100-1 to 100-4 analyze voices and
gestures of the user with various sensors (microphone, camera, human sensor, heat sensor, etc.),
and the user desires You may be aware of the services you
[0200]
Here, the command recognized by the device 100 may be a command or the like for requesting
reproduction of a place designated as a predetermined target, a building, a program, a song, etc.
in addition to the call request (call request) as described above. Good.
[0201]
Next, in step S512, when the requested service is a service that is permitted to the user, the
devices 100-1 to 100-4 start provision of the service.
Specifically, for example, the devices 100-1 to 100-4 start the operation of the sensor (for
example, the microphone 10) or the actuator (for example, the speaker 20), and further bring the
communication paths between the devices into an operating state.
The devices 100-1 to 100-4 may cooperate with each other to determine the operation of the
own device based on the type of service to be provided, the amount of available communication
resources, and the like.
09-05-2019
53
[0202]
In addition, when the device 100 has a plurality of types of communication I / Fs, the
communication I / F used to transmit traffic may be operated as needed based on the amount of
information to be provided. In addition, the device 100 may partially release the power saving
mode by increasing the operation duty cycle by the necessary amount. Furthermore, the device
100 may transition to a state in which stable supply of a band is possible by setting a
transmission / reception time zone used for communication among a plurality of devices, etc.
Raising etc.).
[0203]
Next, in step S515, the device 100 ends the provision of the service when the end of the service
is instructed by the user. Specifically, for example, the device 100 ends the operation of the
sensor (for example, the microphone 10) or the actuator (for example, the speaker 20), and stops
the communication path between the devices.
[0204]
Next, in step S518, the device 100 notifies the service log DB 8 of the content that the own
device has contributed to the service provision this time. The device 100 may also notify the
service log DB 8 of information of the user who provided the service (authenticated user).
[0205]
Heretofore, the operation processing of the autonomous acoustic system according to the
present embodiment has been specifically described with reference to FIG. Hereinafter, additional
description of the autonomous acoustic system according to the present embodiment will be
made.
[0206]
09-05-2019
54
(Continuing of Service) The autonomous acoustic system according to the present embodiment is
continued by changing the device 100 providing (operating) the service even when the
authenticated user walks or moves the place. It is possible to provide services to users. The
change processing is performed based on, for example, the radio wave intensity from the tag 60
owned by the user, an input signal from a sensor (a microphone, a camera, a human sensor or
the like) included in each device, or the like. Hereinafter, this will be specifically described with
reference to FIG.
[0207]
FIG. 24 is a diagram for describing the change of the operating device according to the
movement of the user in the autonomous acoustic system according to the present embodiment.
As shown in FIG. 24, in this case, the device 100-1 and the device 100-2 which the user operates
to perform service provision are separated from the device 100-5 and 100 not performing the
service provision operation. Let's assume the case of approaching -6.
[0208]
In this case, in the devices 100-5 and 100-6, the user has moved based on the radio wave
intensity from the tag 60 possessed by the user, the input signal from the sensor in the devices
100-5 and 100-6, and the like. Detect that. Then, the devices 100-5 and 100-6 receive, from the
adjacent device 100-2 and the like, information such as the ID of the user and the service that
may be provided.
[0209]
Then, the devices 100-5 and 100-6 start service provision to the user based on the received
information. On the other hand, when the devices 100-1 and 100-2 which have been providing
the service determine that the user has deviated from the service available range by the sensor
and actuator of the own device, the service providing is ended, and the device operation and
communication are performed. Bring down the road.
09-05-2019
55
[0210]
As described above, even if the user moves while receiving a service, the device 100 disposed
around the destination can take over the user ID and the content of the service and can
continuously provide the same.
[0211]
(Access Path to Network 5) In the autonomous acoustic system according to the present
embodiment described with reference to FIG. 21, at least one of the devices 100-1 to 100-4
(here, the device 100-1). ) Had an access path to the network 5.
However, the configuration of the autonomous acoustic system according to the present
disclosure is not limited to the example illustrated in FIG. 21, and is a network in which the
devices 100-1 to 100-4 are closed and does not have an access path to the outside (network 5).
The case is also conceivable.
[0212]
In such a case, the devices 100-1 to 100-4 may use, for example, the holding tag 60 as an access
gateway to the outside by the user. That is, the devices 100-1 to 100-4 inquire of the user ID DB
6 on the network 5 via the tag 60 when the tag 60 appears in a state where the specific
information is broadcasted to each other, and the user is authenticated. .
[0213]
(Provision of Service to Plural Users) Next, provision of services when plural users appear in the
space where the devices 100-1 to 100-4 are arranged will be described with reference to FIG.
[0214]
FIG. 25 is a diagram for describing the case where services are provided to a plurality of users in
the autonomous acoustic system according to the present embodiment.
09-05-2019
56
As shown in FIG. 25, when a plurality of users appear in the space where the devices 100-1 to
100-4 are arranged and each of them makes a service request, each of the devices 100-1 to 1004 Implement multiple service offerings.
[0215]
In this case, the operation of the devices 100-1 to 100-4 for each user is as described with
reference to FIGS. 21 to 24, but for the tag 60 possessed by the user 1, the tag possessed by the
user 2 65 is regarded as one of the devices arranged in the periphery. Also for the tag 65
possessed by the user 2, the tag 60 possessed by the user 1 is regarded as one of the devices
arranged in the vicinity.
[0216]
Therefore, the devices 100-1 to 100-4 notify the characteristic information of both the tag 60
and the tag 65, and confirm whether the device is a reliable device, thereby providing the
characteristic of the tag 60 or the tag 65 as a service. You may use it.
[0217]
For example, when the devices 100-1 to 100-4 are closed networks, the devices 100-1 to 100-4
notify the tag 65 of the access route to the external network 5 by notification of the
characteristic information with the tag 65. Understand what you are holding.
The devices 100-1 to 100-4 can use the tag 65 possessed by the user 2 as one device to connect
to the external network 5 in the service provision to the user 1 possessing the tag 60. .
[0218]
As described above, the tag 65 possessed by the user 2 located in the vicinity is not limited to the
devices 100-1 to 100-4 arranged around the user 1, to the external network for the user 1
possessing the tag 60 The case of providing the access of
[0219]
In such a case, the service content provided by the tag 65 is written to the service log DB 8 and a
09-05-2019
57
kickback is performed to the user 2 who holds the tag 65 based on the contribution to the
service provision to the user 1 later. It may be used at the same time.
[0220]
<6.
Summary> As described above, in the sound system according to the present embodiment, when
making the space around the user interact with another space, it is possible to provide an
immersive feeling to the third space.
Specifically, the sound system according to the present embodiment reproduces voices and
images corresponding to the first predetermined target (person, place, building, etc.) from a
plurality of speakers and displays arranged around the user. Can. Further, at this time, the sound
system according to the present embodiment can reproduce the space of the second
predetermined object (place or the like), and can provide an immersive feeling or presence to the
second predetermined object. In this manner, it is possible to substantially expand the body of
the user's mouth, eyes, ears, etc. widely by using the microphone 10, the speaker 20, the image
sensor, etc., which are disposed indoors and outdoors everywhere. A new way of communication
can be realized.
[0221]
Furthermore, in the sound system according to the present embodiment, since the microphones,
image sensors, and the like are disposed everywhere, the user does not need to possess a
smartphone or a mobile phone terminal, and a predetermined target is indicated by voice or
gesture. It can be connected to the space around the subject.
[0222]
Also, the configuration of the audio system that implements such a new communication method
may be implemented by a signal processing device that controls a plurality of microphones, a
plurality of speakers, and the like.
09-05-2019
58
In addition, the sound system according to the present embodiment may be realized by
associating the autonomous microphones and devices such as the respective speakers with other
adjacent devices.
[0223]
The preferred embodiments of the present disclosure have been described above in detail with
reference to the accompanying drawings, but the present technology is not limited to such
examples. It is obvious that those skilled in the art of the present disclosure can conceive of
various modifications or alterations within the scope of the technical idea described in the claims.
It is understood that also of course falls within the technical scope of the present disclosure.
[0224]
For example, the configuration of the signal processing device 1 is not limited to the
configuration shown in FIG. 3. For example, the recognition unit 17 and the identification unit 18
shown in FIG. 3 are provided on the server side connected via a network instead of the signal
processing device 1. It may be a configuration. In this case, the signal processing device 1
transmits the audio signal output from the signal processing unit 13 to the server via the
communication I / F 19. Also, the server performs processing of command recognition and
identification of a predetermined target (person, place, building, program, song, etc.) based on
the received audio signal, and corresponds to the recognition result and the predetermined target
identified. Connection destination information is transmitted to the signal processing device 1.
[0225]
Note that the present technology can also have the following configurations. (1) A recognition
unit for recognizing a first object and a second object based on signals detected by a plurality of
sensors disposed around a specific user, and the first and the recognition units recognized by the
recognition unit. An identification unit for identifying a second object; an estimation unit for
estimating the position of the specific user according to a signal detected by any of the plurality
of sensors; Each signal acquired from sensors around the first and second objects identified by
the identification unit to be localized near the position of the specific user estimated by the
estimation unit when being output from an actuator An information processing system
comprising: (2) The first object is a predetermined person, the second object is a predetermined
09-05-2019
59
place, and the signal processing unit is a signal acquired by a sensor in the vicinity of the
predetermined person, and the predetermined place The information processing system
according to (1), wherein a signal acquired by a peripheral sensor is processed. (3) The first
object is a predetermined person, the second object is a predetermined place, and the signal
processing unit is a signal acquired in real time by a sensor around the predetermined person,
and the predetermined object. The information processing system according to (1), wherein a
signal already acquired and accumulated by a sensor around the location of is processed. (4) The
sensor in the vicinity of the first object and the sensor in the vicinity of the second object are
respectively disposed in remote locations, as described in any one of (1) to (3). Information
processing system. (5) The plurality of sensors disposed around the specific user is a
microphone, and the recognition unit recognizes the first and second objects based on an audio
signal detected by the microphone. The information processing system according to any one of
(1) to (4). (6) The plurality of sensors disposed around the specific user is an image sensor, and
the recognition unit determines the first and second objects based on the captured image
acquired by the image sensor. The information processing system according to any one of (1) to
(4), which recognizes. (7) The information processing according to any one of (1) to (6), wherein
the sensor in the vicinity of the first object and the sensor in the vicinity of the second object are
different types of sensors. system. (8) The signal processing unit processes the signal acquired by
the sensor in the vicinity of the first target based on the characteristic of the parameter
corresponding to the second object, and the periphery of the second target is processed. The
information processing apparatus according to any one of (1) to (7), which performs processing
of adding to a signal acquired by a sensor.
(9) The signal processing unit processes each signal acquired from sensors in the vicinity of the
first and second objects so as to be localized near the sense organ of the specific user. The
information processing apparatus according to any one of the above. (10) The respective sensors
around the first and second objects are microphones, and the plurality of actuators arranged
around the particular user are a plurality of speakers, and the signal processing unit is The first
and second based on the positions of the plurality of speakers and the estimated position of the
user so as to form a sound field near the position of the specific user when output from the
plurality of speakers. The information processing system according to any one of (1) to (9), which
processes each audio signal collected by the microphone in the vicinity of a target. (11) The
estimation unit continuously estimates the position of the specific user, and the signal processing
unit forms a sound field near the position of the specific user according to a change in the
position of the specific user. The information processing system according to (10), which
processes each of the audio signals. (12) A recognition unit for recognizing a first object and a
second object based on signals detected by sensors in the vicinity of a specific user, and the first
and second objects recognized by the recognition unit Based on an identification unit to be
identified and signals acquired from a plurality of sensors disposed around the first and second
objects identified by the identification unit, signals output from actuators around the specific
09-05-2019
60
user are extracted. An information processing system including a signal processing unit to
generate. (13) The first object is a predetermined person, the second object is a predetermined
place, and the signal processing unit is a signal acquired by a plurality of sensors disposed
around the predetermined person. And the information processing system according to (12),
which processes signals acquired from a plurality of sensors disposed around the predetermined
place. (14) The first object is a predetermined person, the second object is a predetermined place,
and the signal processing unit is acquired in real time by a plurality of sensors disposed around
the predetermined person. The information processing system according to (12), wherein the
signal and the signal already acquired and accumulated by sensors in the vicinity of the
predetermined place are processed. (15) The computer recognizes the first object and the second
object based on the signals detected by the plurality of sensors disposed around the specific user,
and the computer recognizes the first object and the second object. An identification unit that
identifies a first object and a second object, an estimation unit that estimates a position of the
specific user according to a signal detected by any of the plurality of sensors, and a periphery of
the specific user Acquired from sensors around the first and second objects identified by the
identification unit so as to be localized near the position of the specific user estimated by the
estimation unit when being output from a plurality of actuators A storage medium storing a
program for causing it to function as a signal processing unit that processes each signal.
(16) A recognition unit for recognizing a first object and a second object based on signals
detected by sensors in the vicinity of a specific user, and the first and second devices recognized
by the recognition unit. Output from actuators around the specific user based on an identification
unit identifying the target of the object and signals acquired from a plurality of sensors disposed
around the first and second objects identified by the identification unit A storage medium storing
a program for causing it to function as a signal processing unit that generates a signal.
[0226]
1, 1A, 1B Signal Processing Device 3, 3 'Management Server 5 Network 6 User ID DB 7
Communication Terminal 8 Service Log DB 9 User Personal DB 10, 10A, 10B, 10C Microphone
(Microphone) 11 Amplifier / ADC (Analog Digital Converter) 13 200 signal processing unit 15
microphone position information DB (database) 16 user position estimation unit 17 recognition
unit 18 identification unit 19 communication I / F (interface) 20, 20A, 20B, 20C speaker 23 DAC
(digital analog converter) amplifier unit 25 User authentication unit 32 Management unit 33, 34
Search unit 35 User position information DB 36 Acoustic information parameter DB 37 Acoustic
content DB 40, 40-1, 40-2, 40-3 Acoustic closed surface 42 Acoustic field 43, 43-1, 43-2 , 43-3
closed surface 6 , 65 tag 100,100-1〜100-4 device 131 microphone array processing unit
133,210 high S / N processing unit 135,220 sound field reproduction signal processing section
136 Convolution unit 137, 139 howling suppressing section 138 Matrix Convolution unit
09-05-2019
61
09-05-2019
62
1/--страниц
Пожаловаться на содержимое документа