close

Вход

Забыли?

вход по аккаунту

JP2016092529

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016092529
Abstract: In a sound space in which a large number of speakers and environmental sounds
including noise are mixed, a specific sound source is extracted from the collected voice, and the
high-quality voice after extraction is used as a calling partner of the own terminal or the own
terminal. Systems and methods for communicating with A sound transmission system (100) is
provided with one or more sound collection units (M1 to M4) for collecting sound in a sound
space AR including a plurality of sounds, and sound collected by the one or more sound
collection units. Storage unit for storing voice data of the sound source, a sound source
separation unit PR2 for separating the sound data stored in the storage unit into sound data of
the same number as one or more sound collecting units, and a sound source separation unit And
a first transmission unit that transmits any one of the separated voice data to the communication
terminals T1 and T2. [Selected figure] Figure 1
Voice transmission system and voice transmission method
[0001]
The present invention relates to an audio transmission system and an audio transmission method
for performing predetermined signal processing on audio collected in a sound space in which a
plurality of audios are mixed and transmitting the audio.
[0002]
In a space where a plurality of voices are mixed (hereinafter referred to as "sound space"), voices
are picked up using a plurality of microphones, and a sound source (voice signal) in a specific
direction is extracted from the voice signal of the picked up voices. A sound source separation
technique has been studied (see, for example, Patent Document 1).
03-05-2019
1
[0003]
The sound source separation apparatus shown in Patent Document 1 is independent component
analysis that separates sound without using information on the arrangement of microphones in
consideration of the influence of sound diffraction, and specifies a target direction that is weak to
sound diffraction. Perform the filtering process that can extract the sound in parallel, use the
power spectrum of the sound of the filtering process as the reference signal, and consider the
sound with high time correlation with the reference signal as the sound with high noise
suppression performance, the reference A sound with high time correlation with the signal is
selected as the sound source separation signal.
As a result, the sound source separation device can separate the sound in a short time even when
the sound is diffracted on the case like a mobile phone, for example.
[0004]
JP, 2011-199474, A
[0005]
In Patent Document 1, a voice in any direction is generated from sound waveform information
(voice signal) collected by a microphone array having a plurality of microphones corresponding
to a housing (for example, a mobile phone) in which the size of a product is predefined. Can be
extracted.
[0006]
However, in the configuration of Patent Document 1, since the number and arrangement of
microphones assuming the size of the product are premised, in an actual sound space in which
many speakers and environmental sounds including noise are mixed, a plurality of microphones
are used. There is a problem that it is difficult to extract a specific sound source from the
collected sound.
[0007]
In order to solve the problems described above, the present invention extracts a specific sound
source from the collected voice in an actual sound space in which many speakers and
03-05-2019
2
environmental sounds including noise are mixed, and the extracted sound source It is an object
of the present invention to provide a voice transfer system and a voice transfer method for
transferring quality voice to a self terminal or a called party of the self terminal.
[0008]
The present invention is an audio transmission system including a communication terminal,
which is collected by one or more sound pickup units that collect sound in a sound space
including a plurality of sounds, and the one or more sound pickup units. A storage unit for
storing voice data of the voice, a separation unit for separating the voice data stored in the
storage unit into voice data of the same number as the one or more sound pickup units, and the
separator A first transmission unit for transmitting any one of the individual separated speech
data generated by the above to the communication terminal.
[0009]
Furthermore, the present invention is a voice transmission system including a first
communication terminal and a second communication terminal, wherein the one or more sound
collection units collect sound in a sound space including a plurality of sounds; A storage unit for
storing voice data of the voice collected by the one or more sound collection units; and the same
number of types of the voice data stored in the storage unit as the one or more sound collection
units The separation unit for separating into voice data, the voice data transmitted from the first
communication terminal, or any one of the separated voice data generated by the separation unit
to the second communication terminal It is an audio transmission system provided with two
transmitting parts.
[0010]
Further, the present invention is a voice transmission method in a voice transmission system
including a communication terminal, comprising the steps of: collecting voice in one or more
voice collection devices arranged in a sound space including a plurality of voices; Storing voice
data of the voice collected by one or more sound collecting devices; separating the voice data
into voice data of the same number as the one or more sound collecting devices; And
transmitting any one of the later individual separated voice data to the communication terminal.
[0011]
Furthermore, the present invention is a voice transmission method in a voice transmission
system including a first communication terminal and a second communication terminal, wherein
one or more sound pickup devices are disposed in a sound space including a plurality of sounds.
Collecting voice in the step of storing the voice data of the voice collected by the one or more
03-05-2019
3
voice collecting devices, and the same number of types of the voice data as the one or more voice
collecting devices And the step of transmitting to the second communication terminal any one of
the audio data transmitted from the first communication terminal or the separated audio data
after the separation. , Voice transmission method.
[0012]
According to the present invention, in an actual sound space in which many speakers and
environmental sounds including noise are mixed, a specific sound source can be extracted from
the collected speech, so high-quality speech after extraction can be obtained. Can be transmitted
to the own terminal or the called party of the own terminal.
[0013]
A block diagram showing an example of a system configuration of the voice transmission system
of the present embodiment. A block diagram showing an example of an internal configuration of
the communication terminal of the present embodiment. Explanatory drawing Explanatory
drawing which shows typically the 2nd use case example of the speech transmission system of
this embodiment The explanatory drawing which shows the 3rd use case example of the speech
transmission system of this embodiment typically The 3rd example shown in FIG. The flowchart
explaining an example of the operation outline along the time series of the voice transmission
system in the use case example of the explanatory diagram schematically showing the fourth use
case example of the voice transmission system of the present embodiment 9 is a flowchart for
explaining the operation procedure of the first operation example of the second embodiment. A
flowchart for explaining the operation procedure of the second operation example of the voice
transmission system of the present embodiment. Flowchart illustrating the operation procedure
[0014]
Hereinafter, an embodiment (hereinafter, referred to as the present embodiment ) which
specifically discloses a voice transmission system and a voice transmission method according to
the present invention will be described in detail with reference to the drawings.
[0015]
In the first operation example of the present embodiment, the voice transmission system picks up
voice at one or more microphones disposed in a sound space AR in which a plurality of voices
(for example, voices of a plurality of people) are mixed and included; The voice data of the voice
collected by the one or more microphones is stored in a place different from the sound space AR
03-05-2019
4
(for example, a server device on a cloud network).
The voice transmission system separates the stored voice data into voice data of the same
number as the number of one or more microphones (hereinafter referred to as separated voice
data ) by a predetermined input operation by the user of the communication terminal
(Separation), any one of the individual separated speech data after sound source separation is
transmitted to the communication terminal.
[0016]
Further, in the second operation example of the present embodiment, the audio transmission
system is one or more audio communication systems arranged in a sound space AR in which a
plurality of audios (for example, voices of a plurality of people) are mixed and contained. The
voices are picked up by the microphones, and the voice data of the voices picked up by the one
or more microphones are stored in a place different from the sound space AR (for example, a
server device on a cloud network).
The voice transmission system separates the stored voice data into voice data of the same
number as one or more microphones (hereinafter referred to as "separated voice data") by a
predetermined input operation by the user of the first communication terminal. (Sound source
separation), in accordance with a predetermined input operation by the user of the first
communication terminal, transmits any one of the separated voice data to the second
communication terminal which is the call partner of the first communication terminal.
[0017]
Hereinafter, the details of the voice transmission system according to the present embodiment
will be specifically described.
Hereinafter, a user who uses the communication terminal in the first operation example and the
first communication terminal (see the communication terminal T1 shown in FIG. 1) in the second
operation example is referred to as user A , and the second in the second operation example
The user who uses the communication terminal (refer to communication terminal T2 shown in
FIG. 1) is referred to as user B .
03-05-2019
5
[0018]
FIG. 1 is a block diagram showing an example of a system configuration of a voice transmission
system 100 of the present embodiment.
In the first operation example described above, the voice transfer system 100 shown in FIG. 1
includes the communication terminal T1, one or more microphones M1, M2, M3, and M4
disposed in the sound space AR, and the microphone signal multiplex transmission device MX. , A
server apparatus CDS provided on the cloud network NW1, and a first selection unit SL1
provided on the LTE network NW2.
In the second operation example described above, the voice transfer system 100 adds the
communication terminal T2 and the first selection unit SL1 provided on the LTE network NW2 in
addition to the configuration corresponding to the first operation example described above.
Further, the configuration includes.
[0019]
Microphones M1 to M4 as an example of a sound collection unit or a sound collection device are
disposed at predetermined positions in the sound space AR, respectively, and pick up voices in
which a plurality of voices are mixed in the sound space AR to perform microphone signal
multiplex transmission Output to device MX.
[0020]
The microphone signal multiplex transmission apparatus MX aggregates and multiplexes voice
data of voices collected by the microphones M1 to M4 and provides multiplexed voice data (voice
signal) on the cloud network NW1. Send to server device CDS.
The microphone signal multiplex transmission device MX may be disposed in the sound space AR
as long as it is electrically connected to the microphones M1 to M4, or may be disposed at a
place other than the sound space AR.
03-05-2019
6
[0021]
The server apparatus CDS is provided on a cloud network NW1 as an example of the Internet
network or the intranet network, and includes at least a sound signal processing transmission
unit PR.
In FIG. 1, only the sound signal processing transmission unit PR is illustrated as the minimum
configuration necessary to explain the operation of the server apparatus CDS according to the
present embodiment, and an audio signal transmitted from the microphone signal multiplex
transmission apparatus MX The illustration of the receiving unit for receiving is omitted.
[0022]
The sound signal processing transfer unit PR includes at least a control unit PR1, a sound source
separation unit PR2, and a memory (not shown) as an example of a storage unit.
The control unit PR1 and the sound source separation unit PR2 are configured using, for
example, a central processing unit (CPU), a micro processing unit (MPU), or a digital signal
processor (DSP).
The sound signal processing transfer unit PR demultiplexes the audio signal multiplexed in the
microphone signal multiplex transmission apparatus MX.
[0023]
The control unit PR1 has a control signal (see below) for sequentially switching the separated
voice data generated after the sound source separation by the sound source separation unit PR2
for each predetermined input operation of the users A and B to the communication terminals T1
and T2. It is input.
The control unit PR1 outputs an instruction for sequentially switching the separated audio data
03-05-2019
7
transmitted from the server apparatus CDS to the sound source separation unit PR2 in response
to the input of the control signal.
The sound signal processing transfer unit PR transmits the separated voice data output from the
sound source separation unit PR2 to the first selection unit SL1 or the second selection unit SL2
according to the switching instruction of the control unit PR1.
[0024]
The sound source separation unit PR2 performs sound source separation using the audio data
after demultiplexing stored in the memory, thereby separating voice data of the same number as
that of one or more microphones (for example, microphones M1 to M4) (ie, microphones When
there are the same number of sound sources as the arrangement number of in the sound space
AR, the voice data of the sound output from each sound source is generated.
The processing of sound source separation in the sound source separation unit PR2 is a known
technology, and may be, for example, the method disclosed in Patent Document 1 described
above, or the method disclosed by other known technologies. I will not explain the details.
[0025]
The memory functions as a work memory when the control unit PR1 and the sound source
separation unit PR2 operate, and further stores audio data of the audio signal after being
demultiplexed by the sound signal processing transfer unit PR.
[0026]
The first selection unit SL1 and the second selection unit SL2 are configured by, for example,
radio base stations provided on an LTE network NW2 as an example of a public telephone
network.
The network in which the first selection unit SL1 and the second selection unit SL2 are arranged
is not limited to LTE (Long Term Evolution), and may be a 3G (3rd Generation) network or an
HSPA (High Speed Packet Access) network.
03-05-2019
8
[0027]
The first selection unit SL1 receives any one of separated voice data c1 instructed by the control
unit PR1 among the separated voice data corresponding to the one or more sound sources
generated by the sound source separation unit PR2.
The first selection unit SL1 transmits any one of separated voice data c1 to the communication
terminal T1 via the LTE network NW2.
[0028]
In addition, when the users A and B are talking using the communication terminals T1 and T2,
respectively, the first selection unit SL1 transmits the call voice of the user B transmitted from
the communication terminal T2 (in other words, the server apparatus CDS Thus, the voice data b
of the raw call voice not separated by the sound source is received, and any split voice data c1 or
the voice data b of the call voice of the user B is transmitted to the communication terminal T1.
[0029]
The second selection unit SL2 receives any one of separated audio data c2 instructed by the
control unit PR1 among the separated audio data corresponding to the one or more sound
sources generated by the sound source separation unit PR2.
The second selection unit SL2 transmits any one of separated voice data c2 to the
communication terminal T2 via the LTE network NW2.
[0030]
In addition, when the users A and B are talking using the communication terminals T1 and T2,
respectively, the second selection unit SL2 transmits the call voice of the user A transmitted from
the communication terminal T1 (in other words, the server apparatus CDS Thus, the voice data a
of the raw call voice not separated by the sound source is received, and any separated voice data
c2 or the voice data a of the call voice of the user A is transmitted to the communication terminal
T2.
03-05-2019
9
[0031]
The communication terminals T1 and T2 are sound source separation services in the voice
transmission system 100 (that is, services for separating voice data of voice collected in the
sound space AR into voice data of the same number as one or more microphones) Are operated
by users A and B, respectively, and have at least a wireless communication function and a call
function.
The communication terminals T1 and T2 are, for example, a mobile phone, a smartphone or a
cordless handset.
[0032]
Further, in the communication terminals T1 and T2, applications AP1 and AP2 (in FIG. 1, simply
abbreviated as "application") for using the sound source separation service are installed in
advance. In other words, as described later with reference to FIG. 2, programs and data of the
applications AP1 and AP2 for using the sound source separation service are stored in advance in
the communication terminals T1 and T2, and the users A and B When using the sound source
separation service, the applications AP1 and AP2 are activated according to the input operation
of the users A and B.
[0033]
FIG. 2 is a block diagram showing an example of the internal configuration of the communication
terminals T1 and T2 of the present embodiment. The communication terminal T1 shown in FIG. 2
includes a sound collection unit 11, a CPU 12, a memory 13, an input unit 14, a display unit 15,
an audio output unit 16, and a wireless communication unit 17 to which an antenna 18 is
connected. Is included. Similarly, the communication terminal T2 illustrated in FIG. 2 is a wireless
communication unit to which the sound collection unit 21, the CPU 22, the memory 23, the input
unit 24, the display unit 25, the voice output unit 26, and the antenna 28 are connected. And 27.
Since the configurations of communication terminals T1 and T2 are the same in the following, in
the description of each part shown in FIG. 2, communication terminal T1 is exemplified and
described, and the operation of communication terminal T2 is different from the operation of
03-05-2019
10
communication terminal T1 Will be explained.
[0034]
The sound collection units 11 and 21 hold the communication terminals T1 and T2 when the
users A and B hold the communication terminals T1 and T2, respectively. The voice of the call at
the time of making a call (for example, a hands-free call) in the absence state is collected and
output to the CPU 12. The voice data of the call voice collected by the sound collection units 11
and 21 is an analog signal, but is converted to a digital signal by an ADC (Analog Digital
Converter, not shown) included in the sound collection units 11 and 21. The data is input to the
CPUs 12 and 22.
[0035]
The CPUs 12 and 22 are control processes for overall control of the operations of the respective
units of the communication terminals T1 and T2, data input / output processes with other
respective sections, data calculation (calculation) processes and data storage processes I do.
When the CPUs 12 and 22 receive, for example, a signal (notification signal) for notifying the
sound source separation service at the wireless communication units 17 and 27 via the antennas
18 and 28, an application for using the sound source separation service Activate AP1 and AP2.
Further, after activating the applications AP1 and AP2 for using the sound source separation
service, the CPUs 12 and 22, for example, when the map information of the sound space AR is
stored in advance in the memories 13 and 23, The map information is displayed on the display
units 15 and 25.
[0036]
The memories 13 and 23 are configured using, for example, a random access memory (RAM), a
read only memory (ROM), and a nonvolatile or volatile semiconductor memory, and function as a
work memory when the CPUs 12 and 22 operate. Predetermined programs (for example,
applications AP1 and AP2) and data for operating the CPUs 12 and 22 are stored.
[0037]
The input units 14 and 24 are user interfaces (UI: User Interface) for receiving input operations
of the users A and B and notifying the CPUs 12 and 22, for example, arranged in correspondence
with the screens of the display units 15 and 25. , A and B, using a touch panel or touch pad that
03-05-2019
11
can be operated by a finger or a stylus pen.
[0038]
The display units 15 and 25 are configured using, for example, an LCD (Liquid Crystal Display) or
an organic EL (Electroluminescence), and use, for example, a sound source separation service by
a predetermined input operation to the input units 14 and 24 of the users A and B. The screen of
the application AP1, AP2 is displayed.
[0039]
The voice output units (for example, speakers) 16 and 26 are, for example, any separated voice
data c1 transmitted from the first selection unit SL1 or the voice data b of the call voice of the
user B according to the input operation of the users A and B. Output the voice of
[0040]
The wireless communication unit 17 is connected to an antenna 18 for wireless communication
(for example, LTE, 3G, HSPA, Wi-fi (registered trademark), Bluetooth (registered trademark)), and
is transmitted from the first selection unit SL1. The separate voice data c1 or the voice data b of
the call voice of the user B is received.
The wireless communication unit 27 is connected to an antenna 28 for wireless communication
(for example, LTE, 3G, HSPA, Wi-fi (registered trademark), Bluetooth (registered trademark)), and
is transmitted from the second selection unit SL2 The separate voice data c2 or the voice data a
of the call voice of the user A is received.
[0041]
The wireless communication unit 17 also controls the control signal generated by the CPU 12 in
order to sequentially switch the separated voice data generated after the sound source separation
by the sound source separation unit PR2 according to the input operation to the input unit 14 of
the user A. It sends to the server apparatus CDS via 18.
The wireless communication unit 27 controls the CPU 22 to switch any of the separated voice
03-05-2019
12
data c2 transmitted from the second selection unit SL2 or the voice data a of the call voice of the
user A in response to the input operation to the input unit 24 of the user B. The control signal
generated by the is transmitted to the server apparatus CDS via the antenna 28.
[0042]
Next, various use cases of the sound source separation service in the voice transmission system
according to the present embodiment will be described with reference to FIGS.
In order to simplify the description of various use case examples, in FIGS. 3, 4, 5 and 7, the
microphone signal multiplex transmission apparatus MX shown in FIG. 1, the first selection unit
SL1, and the second selection are selected. The illustration of the LTE network NW2 including
the unit SL2 is omitted.
[0043]
(First Example of Use Case) FIG. 3 is an explanatory view schematically showing a first example
of use case of the voice transmission system 100 of the present embodiment.
In the first use case example, a plurality of microphones M1a, M2a, M3a, M4a and M5a are
arranged in a sound space AR (for example, an interview room), and an English interpreter Tr1
around an English speaking speaker Ps1a It is assumed that the conversation content of the
speaker Ps1a is simultaneously translated into multiple languages by the French interpreter Tr2
and the Chinese interpreter Tr3. In this case, the server apparatus CDS stores voice data obtained
by collecting at least Japanese voice of the speaker Ps1a, English voice of the interpreter Tr1,
French voice of the interpreter Tr2, and Chinese voice of the interpreter Tr3. Ru.
[0044]
In the first use case shown in FIG. 3, the user Ps2a of the communication terminal T1 in which
the application AP1 for using the sound source separation service is installed may or may not be
in the sound space AR.
[0045]
03-05-2019
13
The server apparatus CDS generates voice data of voices collected by the microphones M1a to
M5a in the sound space AR by the predetermined input operation of the communication terminal
T1 of the user Ps 2a by the same number of sound sources as the number of microphones (for
example, , Japanese voice, English voice, French voice, Chinese voice, and other voices), and one
of separated voice data (for example, English voice) is transmitted to the communication terminal
T1.
The communication terminal T1 sequentially switches and outputs separated voice data
transmitted from the server apparatus CDS at each predetermined input operation of the user Ps
2a. Thus, the user Ps2a listens to the content of the speaker Ps1a who is speaking in Japanese in
the interview room, which is the sound space AR, in high-quality speech in a language (for
example, English) that the user Ps2a can understand itself. Can.
[0046]
Second Example of Use Case FIG. 4 is an explanatory view schematically showing a second
example of use case of the voice transmission system 100 of the present embodiment. In the
second use case example, a plurality of microphones M1b, M2b, M3b, M4b, M5b, M6b, M7b,
M8b are arranged in a sound space AR (for example, a concert hall in Tokyo), and a singer, a
piano player, and a violin It is assumed that the orchestra is played by the performer, the clarinet
performer, and the flute performer. In this case, the server apparatus CDS includes the voice of
the singer's song, the sound of the piano by the piano player, the sound of the violin by the violin
player, the sound of the clarinet by the clarinet player, and the flute player The sound data of at
least the sound of the flute and the noise sound of the noise source are stored.
[0047]
In the second use case example shown in FIG. 4, the users Ps1b and Ps2b of the communication
terminal T1 in which the application AP1 for using the sound source separation service is
installed may be in, for example, Osaka far from the concert hall. May be near the concert hall.
Further, on the screen of the communication terminal T1 of the users Ps1b and Ps2b, a sound
space AR (for example, a sketch of a concert hall) is displayed on the display unit 15 as a screen
example of the application AP1 for using the sound source separation service. .
03-05-2019
14
[0048]
The server apparatus CDS is, for example, of the same number as the number of microphones of
voice data of voices collected by the microphones M1b to M8b in the sound space AR by a
predetermined input operation to the communication terminal T1 of the user Ps1b in Osaka.
Sound source (eg, the voice of the singer's song, the sound of the piano by the piano player, the
sound of the violin by the violin player, the sound of the clarinet by the clarinet player, and the
sound of the flute by the flute player The sound source is separated into noise sound by the noise
source and other voices, and one of separated sound data (for example, the sound of the piano) is
transmitted to the communication terminal T1. The communication terminal T1 sequentially
switches and outputs separated voice data transmitted from the server apparatus CDS for each
predetermined input operation of the user Ps1b. Thereby, the user Ps1b can listen to only the
high quality voice after the sound source separation of the musical instrument (for example, the
piano) designated by himself in the orchestra played in the concert hall in Tokyo which is the
sound space AR. it can.
[0049]
Similarly, the server apparatus CDS is a microphone that transmits voice data of voices picked up
by the microphones M1b to M8b in the sound space AR, for example, by a predetermined input
operation to the communication terminal T1 of the user Ps2b near the concert hall. The same
number of sound sources (for example, the voice of the singer's song, the piano's sound by the
piano performer, the violin's sound by the violin performer, the clarinet's sound by the clarinet
performer, the flute's performance The sound source is separated into the sound of the flute by
the person, the noise sound by the noise source, and the other voice, and any separated voice
data (for example, the sound of the flute) is transmitted to the communication terminal T1. The
communication terminal T1 sequentially switches and outputs separated voice data transmitted
from the server apparatus CDS at each predetermined input operation of the user Ps 2b. As a
result, the user Ps2b can only listen to high-quality speech after separation of sound sources of a
musical instrument (for example, a flute) specified by him / her in an orchestra played in a
concert hall in Tokyo, which is a sound space AR. it can.
[0050]
Third Example of Use Case FIG. 5 is an explanatory view schematically showing a third example
03-05-2019
15
of use case of the voice transmission system of the present embodiment. FIG. 6 is a flow chart for
explaining an example of an operation outline along a time series of the voice transfer system in
the third use case example shown in FIG. In the third use case example, a plurality of
microphones M1c, M2c, M3c, M4c, M5c, M6c, M7c, M8c, M9c are arranged in a sound space AR
(for example, a shopping mall SML in which a plurality of stores are arranged side by side) It is
assumed that the user Ps1c has come to the shopping mall for shopping. In this case, the server
apparatus CDS collects at least speech of PR of products or services of each store (specifically, a
fish store, a general store, a book store, a shoe store, a sushi store, a ramen restaurant, an
information store, a clothes store). The stored voice data is stored.
[0051]
In the third use case example shown in FIG. 5, the screen of the communication terminal T1 of
the user Ps1c is a screen example of the application AP1 for using the sound source separation
service in the sound space AR (for example, shopping mall SML). Map information of each store
is displayed on the display unit 15.
[0052]
The server device CDS is equal in number to the number of microphones of voice data of voices
collected by the microphones M1c to M9c in the sound space AR (for example, shopping mall
SML) by a predetermined input operation on the communication terminal T1 of the user Ps1c.
Types of sound sources (for example, voice from a fish store, voice from a general store, voice
from a book store, voice from a shoe store, voice from a sushi store, voice from a ramen store,
voice from information, voice from an information store, clothes from The sound source is
separated into voice and other voice, and any one of separated voice data (for example, voice
from a bookstore) is transmitted to the communication terminal T1.
The communication terminal T1 sequentially switches and outputs the separated voice data
transmitted from the server apparatus CDS at each predetermined input operation of the user
Ps1c. Thereby, the user Ps 1 c listens to the high quality voice (for example, the voice from the
bookstore) desired by the user Ps 1 c itself from the sound in which the sounds from a plurality
of stores are mixedly heard in the shopping mall SML which is the sound space AR. Can.
[0053]
03-05-2019
16
In FIG. 6, when the user Ps1c who is a shopper enters the shopping mall SML (ST1), the
communication terminal T1 possessed by the user Ps1c is in the target area of the sound source
separation service that can be realized by the voice transmission system 100 of this embodiment.
A report (notification signal) of the positioning is received by, for example, Bluetooth (registered
trademark) (ST2). The notification signal is transmitted from, for example, a predetermined
sound source separation service providing apparatus (not shown) installed by a shopping mall
operator who provides the sound source separation service.
[0054]
Upon receiving the notification signal in step ST2, the communication terminal T1 activates the
application AP1 for using the sound source separation service (ST3), and the sound space AR (for
example, the user Ps1c is currently located on the screen of the display unit 15) A sketch (see
FIG. 5) of the shopping mall SML) is displayed (ST4). The process of step ST4 may be omitted in
the communication terminal T1.
[0055]
As described above, the server apparatus CDS includes at least a speech of PR of a product or
service of each store (specifically, a fish store, a general store, a book store, a shoe store, a sushi
store, a noodle store, an information store, a clothes store). The collected voice data is stored. The
server device CDS is equal in number to the number of microphones of voice data of voices
collected by the microphones M1c to M9c in the sound space AR (for example, shopping mall
SML) by a predetermined input operation on the communication terminal T1 of the user Ps1c.
The sound source is separated into sound sources of the above-mentioned types, and any one of
separated sound data (for example, sound from a bookstore) is transmitted to the communication
terminal T1.
[0056]
The communication terminal T1 sequentially switches and outputs the separated voice data
transmitted from the server apparatus CDS at each predetermined input operation of the user
Ps1c (ST5). After the separated voice data desired by the user Ps1c is output from the
communication terminal T1, the application AP1 for using the sound source separation service is
ended by the input operation of the user Ps1c (ST6).
03-05-2019
17
[0057]
Fourth Example of Use Case FIG. 7 is an explanatory view schematically showing a fourth
example of use case of the voice transmission system 100 of the present embodiment. In the
fourth use case example, a plurality of microphones M1d, M2d, M3d, M4d, and M5d are
arranged in a sound space AR (for example, an office room), and a negotiation space Dk arranged
around the loud speaker Ps1. It is assumed that the speakers Ps2 and Ps3 having a business talk
with each other have a conversation. In this case, the server apparatus CDS stores voice data in
which at least the voice of the speaker Ps1, the voice of the speaker Ps2, and the voice of the
speaker Ps3 are collected.
[0058]
In the fourth use case example shown in FIG. 7, the speaker Ps2 who is the user of the
communication terminal T1 in which the application AP1 for using the sound source separation
service is installed is of the communication terminal T2 not in the sound space AR. An application
AP2 for using the sound source separation service is installed in the communication terminal T2
while talking with the other party Ps4. When the communication terminals T1 and T2 do not
activate the applications AP1 and AP2 for causing both the speaker Ps2 and the other party Ps4
to use the sound source separation service, the communication terminals T1 and T2 are: The
voice of the speaker Ps2 and the voice of the other party Ps4 (that is, the raw call voice picked up
by the communication terminals T1 and T2) is transmitted and received via, for example, the LTE
network NW2.
[0059]
The server device CDS generates voice data of the same number as the number of microphones
of the voice data of the voice collected by each of the microphones M1d to M5d in the sound
space AR by a predetermined input operation to the communication terminal T1 of the speaker
Ps2. For example, the sound source is separated into the voice of the speaker Ps1, the voice of
the speaker Ps2, the voice of the speaker Ps3, and other voices, and any separated voice data (for
example, the voice of the speaker Ps3) is transmitted to the communication terminal T1. The
communication terminal T1 sequentially switches and outputs separated voice data transmitted
from the server apparatus CDS at each predetermined input operation of the speaker Ps2. In the
03-05-2019
18
communication terminal T2, when the speaker Ps2 designates and operates one of the separated
voice data (for example, separated voice data of the speaker Ps3) to the communication terminal
T1, the designated separated voice data is transmitted via the server apparatus CDS. It receives
and switches the voice of the speaker Ps2 (received voice) to the separated voice data of the
speaker Ps3 and outputs it. As a result, the speaker Ps2 makes high-quality separated voice data
of the conversation contents of the speaker Ps3 who is in conversation with the speaker Ps2 in
the office room which is the sound space AR, to the distant party Ps4 who is far away from the
office room. You can listen to it.
[0060]
Next, an operation procedure of the first operation example of the voice transmission system of
the present embodiment will be described with reference to FIGS. 1 and 8. FIG. 8 is a flow chart
for explaining the operation procedure of the first operation example of the voice transmission
system 100 of the present embodiment. In the following description, the operations of the
communication terminals T1 and T2 and the server apparatus CDS are mainly described to
simplify the description, and the operations of the first selection unit SL1 and the second
selection unit SL2 as necessary. And description of the operation of the microphone signal
multiplex transmission apparatus MX is omitted.
[0061]
In FIG. 8, the user A performs an operation of activating the application AP1 for using the sound
source separation service on the communication terminal T1. Thus, the communication terminal
T1 activates the application AP1 for using the sound source separation service (ST11). The
communication terminal T1 is a server that transmits a start signal for requesting start of sound
source separation in the sound space AR, which is a target area of the sound source separation
service, and listening of the separated sound data separated by the sound source by a
predetermined input operation of the user A It transmits to apparatus CDS (ST12).
[0062]
The server apparatus CDS reads out voice data in which a plurality of types of voices collected in
the sound space AR are mixed from a memory (not shown) in the server apparatus CDS, and
separates sound sources using voice data after demultiplexing. Therefore, the separated sound
data of the same number as one or more microphones (for example, the microphones M1 to M4)
(ie, when the same number of sound sources as the number of arranged microphones are present
03-05-2019
19
in the sound space AR) To generate voice data (ST21).
[0063]
The server apparatus CDS initializes, in the control unit PR1, a parameter k indicating the ordinal
number of one or more of the separated audio data generated in step ST21 (ST22), the separated
audio data corresponding to k = 1 being the first selecting unit It transmits to communication
terminal T1 via SL1 (ST23).
[0064]
The communication terminal T1 receives the separated voice data transmitted in step ST23, and
outputs the separated voice data in the voice output unit 16 (ST13, temporary listening of the
user A).
At the time of step ST13, any one of separated voice data c1 generated by the server apparatus
CDS is output (reproduced) at the communication terminal.
[0065]
Here, when the user A does not like the separated voice data output from the voice output unit
16 (ST14, NO), an input operation to that effect (for example, another separation other than the
currently output separated voice data) The communication terminal T1 transmits, to the server
apparatus CDS, a next selection request signal for switching to another separated voice data
other than the currently output separated voice data.
[0066]
In response to the next selection request signal transmitted from the communication terminal T1,
the server apparatus CDS increments the parameter k indicating the ordinal number of one or
more separated voice data generated in step ST21 in the control unit PR1 (ST24) Separated
speech data corresponding to the parameter k (for example, k = 2) after the increment is
transmitted to the communication terminal T1 via the first selection unit SL1 (ST23).
As described above, in the first operation example of the voice transmission system 100
03-05-2019
20
according to the present embodiment, transmission and reception of separated voice data is
repeated between the communication terminal T1 and the server apparatus CDS until the
separated voice data that the user A wants to hear is found. The communication terminal T1 can
continue listening to the separated voice data after finding the separated voice data to be listened
to.
[0067]
On the other hand, when the user A likes the separated audio data output from the audio output
unit 16 (ST14, YES), an input operation to that effect (for example, output (reproduction) of
currently output separated audio data) The communication terminal T1 continues the output
(reproduction) of the separated audio data currently being output (ST15, main listening by user
A).
When the user A performs an input operation for instructing the end of the application AP1
(ST16), the communication terminal T1 ends the application AP1.
[0068]
Next, the operation procedure of the second operation example of the voice transmission system
of the present embodiment will be described with reference to FIGS. 1, 9 and 10. FIG. FIG. 9 is a
flow chart for explaining the operation procedure of the second operation example of the voice
transmission system 100 of the present embodiment. FIG. 10 is a flowchart for explaining the
operation procedure continued from FIG. 9 and 10, the communication terminal T1 used by the
user A is the transmitting side, and the communication terminal T2 used by the user B is the
receiving side. In the description of FIG. 9, the same step number is assigned to the operation
overlapping with that of FIG. 8, the description will be omitted, and different contents will be
described.
[0069]
In FIG. 9, the communication terminal T1 starts transmission and reception of a call voice with
the communication terminal T2 by a predetermined input operation of the user A for starting a
call between the user A and the user B, and similarly The communication terminal T2 starts
03-05-2019
21
transmission and reception of a call voice with the communication terminal T1 by a
predetermined input operation of the user B for starting a call between the user A and the user B
(ST51). Therefore, the voice data of the voice transmitted from the communication terminal T1 to
the communication terminal T2 is the voice data a of the call voice of the user A, and the voice
data of the voice transmitted from the communication terminal T2 to the communication
terminal T1 Is the voice data b of the call voice of the user B.
[0070]
The user A performs an operation of activating the application AP1 for using the sound source
separation service as the sender mode on the communication terminal T1. Thus, the
communication terminal T1 activates the application AP1 for using the sound source separation
service as the sender mode (ST32). Similarly, the user B performs an operation of activating the
application AP2 for using the sound source separation service as the receiver mode on the
communication terminal T2. Thus, the communication terminal T2 activates the application AP2
for using the sound source separation service as the receiver mode (ST52).
[0071]
The communication terminal T1 receives the separated voice data corresponding to the
parameter k = 1 transmitted from the server apparatus CDS in step ST23, and outputs the
separated voice data in the voice output unit 16 (ST13, temporary listening for the user A). At the
time of step ST13, any one of separated voice data c1 generated by the server apparatus CDS is
output (reproduced) at the communication terminal.
[0072]
When the user A likes the separated voice data output from the voice output unit 16 (ST14, YES),
an operation indicating that the voice data (separated voice data) to be transmitted to the user B
who is the call partner is determined I do. Thereby, the communication terminal T1 transmits a
determination signal to the effect that the separated voice data to be transmitted to the
communication terminal T2 has been determined to the server apparatus CDS and the
communication terminal T2 according to the determination operation of the user A (ST33). The
determination signal includes information indicating the type of any of the separated voice data
determined by the user A.
03-05-2019
22
[0073]
In response to the determination signal transmitted from the communication terminal T1 in step
ST33, the server apparatus CDS transmits any one of separated voice data corresponding to the
determination signal to the communication terminal T2 (ST41).
[0074]
On the other hand, when the communication terminal T2 receives the determination signal
transmitted from the communication terminal T1 in step ST33 (ST53, YES), the user who
transmitted from the communication terminal T1 the received voice output from the voice output
unit 26 The voice data a of the call voice of A is switched to any of the separated voice data c2
transmitted by the server apparatus CDS via the second selection unit SL2 in step ST41 (ST54).
[0075]
In addition, when the user B performs the switching operation of the listening destination (ST 55,
YES), the communication terminal T 2 receives the call voice of the user A transmitted from the
communication terminal T 1 as the received voice output from the voice output unit 26. The
voice data a and the separated voice data c2 transmitted by the server apparatus CDS via the
second selection unit SL2 in step ST41 are switched (ST56).
[0076]
After step ST54, when the user B does not perform the switching operation of the listening
destination (ST55, NO), the communication terminal T2 continues the transmission and reception
of the call voice between the users A and B with the communication terminal T1. To do (ST57).
Similarly, the communication terminal T1 continues transmission and reception of the call voice
between the users A and B with the communication terminal T2 (ST34).
The voice data of the voice transmitted from the communication terminal T2 to the
communication terminal T1 is the voice data b of the call voice of the user B, but the voice of the
voice transmitted from the communication terminal T1 to the communication terminal T2 The
data is voice data a of the call voice of the user A, or any separated voice data c2 transmitted by
the server apparatus CDS via the second selection unit SL2.
03-05-2019
23
[0077]
After step ST34, when the user A performs an input operation for instructing the end of the
application AP1 (ST35), the communication terminal T1 ends the application AP1, and an
application for notifying the end of the application AP1. An end notification signal is transmitted
to the server apparatus CDS and the communication terminal T2.
When the server apparatus CDS receives an application end notification signal from the
communication terminal T1 (ST42, YES), it ends the operation in the sound source separation
service.
[0078]
Further, when the communication terminal T2 receives an application end notification signal
from the communication terminal T1, the communication terminal T2 displays that effect on the
screen of the display unit 25 and performs an input operation for instructing the end of the
application AP2 of the user B. When it is performed (ST58), the received voice output from the
voice output unit 26 is the voice data a of the call voice of the user A transmitted from the
communication terminal T1, and the server apparatus CDS sets the second selection unit SL2 in
step ST41. It switches to the former out of any one of the separated voice data c2 transmitted
through (ST59).
[0079]
After step ST35, the communication terminal T1 continues the transmission and reception of the
call voice between the users A and B with the communication terminal T2 (ST36).
Similarly, the communication terminal T2 continues the transmission and reception of the call
voice between the users A and B with the communication terminal T1 (ST60). In addition, since
the applications AP1 and AP2 of the sound source separation service are ended at the
communication terminals T1 and T2, the voice data of the voice transmitted from the
communication terminal T1 to the communication terminal T2 is the voice data of the call voice
of the user A a, and the voice data of the voice transmitted from the communication terminal T2
to the communication terminal T1 becomes voice data b of the call voice of the user B.
03-05-2019
24
[0080]
When the communication between the users A and B ends with the communication terminals T2
and T1, the communication terminals T1 and T2 end transmission and reception of the call voice
(ST37 and ST61).
[0081]
As described above, in the sound transmission system 100 according to the present embodiment,
the plurality of microphones M1 to M4 arranged in the sound space AR including the plurality of
sounds collect the sound, and the microphones M1 to M4 collect the sound. Voice voice data is
stored in the server apparatus CDS on the cloud network NW1.
The server apparatus CDS separates the voice data stored in the sound source separation unit
PR2 into voice data of the same number as the microphones M1 to M4 (separated voice data),
and performs any separation by a predetermined input operation to the communication terminal
T1. The voice data is transmitted to the first selection unit SL1 on the LTE network NW2. The
first selection unit SL1 transmits, to the communication terminal T1, any one of the separated
voice data generated by the sound source separation unit PR2.
[0082]
Thus, the voice transfer system 100 can extract separated voice data as a specific sound source
from the collected voice in an actual sound space AR in which a large number of speakers and
environmental sounds including noise are mixed. Therefore, the high-quality separated voice data
desired by the user A of the communication terminal T1 can be transmitted to the
communication terminal T1 which is the own terminal by a predetermined input operation to the
communication terminal T1.
[0083]
In addition, the voice transmission system 100 causes the control unit PR1 of the server
apparatus CDS to sequentially switch the separated voice data and transmit the sound source
separation unit PR2 to the first selection unit SL1 for each predetermined input operation to the
communication terminal T1. .
03-05-2019
25
The communication terminal T1 receives and outputs any one of separated voice data
transmitted from the first selection unit SL1 in accordance with a predetermined input operation
of the user A.
[0084]
Thereby, the voice transfer system 100 performs high-quality separation desired by the user A
from among separated audio data separated as one or more sound sources in the server
apparatus CDS by the simple input operation of the user A to the communication terminal T1.
Audio data can be selectively switched and transmitted to the communication terminal T1.
[0085]
Further, the voice transmission system 100 separates the voice data stored in the sound source
separation unit PR2 into voice data (separated voice data) of the same number as the
microphones M1 to M4 in the server apparatus CDS, and transmits the predetermined data to the
communication terminal T2. By input operation, one of the separated voice data is transmitted to
the second selection unit SL2 on the LTE network NW2.
The second selection unit SL2 is voice data transmitted from the communication terminal T1 (for
example, raw call voice of the user A picked up by the communication terminal T1) or individual
separated voice data generated by the sound source separation unit PR2. One of them is
transmitted to the communication terminal T2.
[0086]
Thus, the voice transfer system 100 can extract separated voice data as a specific sound source
from the collected voice in an actual sound space AR in which a large number of speakers and
environmental sounds including noise are mixed. Because of this, the communication terminal T2
receives the voice data of the voice of the user A who is the call partner or the high-quality
separated voice data generated by the server apparatus CDS by a predetermined input operation
on the communication terminal T2 which is the call partner of the communication terminal T1.
Can be selectively transmitted to the communication terminal T2 which is the other party of the
communication terminal T1 according to the preference of the user B.
03-05-2019
26
[0087]
Further, the voice transfer system 100 transmits any one of the separated voice data generated
by the sound source separation unit PR2 to the communication terminal T1, and separates each
of the user A's predetermined input operations on the communication terminal T1. The audio
data is sequentially switched and transmitted from the sound source separation unit PR2 to the
first selection unit SL1.
The communication terminal T1 receives and outputs any one of separated voice data
transmitted from the first selection unit SL1 in accordance with a predetermined input operation
of the user A.
[0088]
Thereby, the voice transfer system 100 desires the user B who is the calling party from among
the separated voice data separated as one or more sound sources in the server apparatus CDS by
the simple input operation of the user A to the communication terminal T1. Since high-quality
separated voice data can be transmitted to the communication terminal T1 according to the
user's request, the high-quality separated voice data specified by the communication terminal T1
after being listened in advance in the communication terminal T1 It can be transmitted to the
communication terminal T2.
[0089]
Further, the voice transfer system 100 causes the control unit PR1 to output any one of
separated voice data from the sound source separation unit PR2 to the second selection unit SL2
in accordance with a predetermined input operation to the communication terminal T1.
The second selection unit SL2 transmits one of the separated voice data output from the sound
source separation unit PR2 to the second communication terminal.
[0090]
As a result, the voice transfer system 100 can transfer to the communication terminal T2 any of
the separated voice data selected after the user A listens to the separated voice data transferred
to the communication terminal T1.
03-05-2019
27
[0091]
Further, in the voice transfer system 100, the communication terminal T1 receives a notification
signal of the voice separation service for separating voice data of voice collected in the sound
space AR into voice data of the same number as the microphones M1 to M4. In response to the
service start operation for receiving the voice separation service, one of the separated voice data
transmitted from the first selection unit SL1 on the LTE network NW2 is received.
[0092]
As a result, even if the user A of the communication terminal T1 does not know the existence of
the voice separation service, the voice transfer system 100 receives the notification signal of the
voice separation service at the communication terminal T1, the simple service of the user A By
the start operation, the separated voice data generated by the sound source separation can be
received at communication terminal T1 and can be made to listen to user A, or the separated
voice data can be transmitted to communication terminal T2.
[0093]
Further, in the voice transfer system 100, the communication terminal T1 notifies the
information on the target area of the voice separation service that separates voice data of voice
collected in the sound space AR into voice data of the same number as the microphones M1 to
M4. Receive a signal.
[0094]
As a result, the voice transfer system 100 receives the information on the target area of the voice
separation service even if the user A of the communication terminal T1 does not know the
information on the target area of the voice separation service. By the input operation, separated
voice data generated by sound source separation corresponding to any target area designated by
user A can be received at communication terminal T1 and can be made to listen to user A, or the
separated voice data Can be transmitted to the communication terminal T2.
[0095]
Further, in the sound transmission system 100, the sound source separation unit PR2 is one
selected according to a selection signal from the communication terminal T1 for selecting one or
more microphones among the one or more microphones M1 to M4. The voice data of the voice
collected by the above microphones (for example, the microphones M1 and M2) is separated into
the same number of types of voice data as the selected one or more microphones (for example,
03-05-2019
28
the microphones M1 and M2).
[0096]
Thus, in the sound transmission system AR, the user A wants to listen to the communication
terminal T1 by selectively operating the microphone around the other party (for example, a
speaker) whom the user A wants to listen to in the sound space AR. It is possible to efficiently
collect the voice of the other party (for example, a speaker) by excluding the collection of the
noise source (for example, motor sound) at a position far away from the other party (for example,
the speaker).
[0097]
Although various embodiments have been described above with reference to the drawings, it
goes without saying that the present invention is not limited to such examples.
It will be apparent to those skilled in the art that various changes and modifications can be made
within the scope of the appended claims, and of course these also fall within the technical scope
of the present invention. It is understood.
[0098]
The present invention extracts a specific sound source from the collected voice in an actual
sound space in which many speakers and environmental sounds including noise are mixed, and
the high quality voice after extraction is used as the own terminal or the own terminal The
present invention is useful as a voice transmission system and a voice transmission method for
transmitting to the other party.
[0099]
11, 21 sound collecting unit 12, 22 CPU 13, 23 memory 14, 24 input unit 15, 25 display unit 16,
26 voice output unit 17, 27 radio communication unit 18, 28 antenna a, b voice data of call voice
c1, c2 One of the separated voice data AP1, AP2 application CDS server device M1, M2, M3, M4
microphone MX microphone signal multiplex transmission device NW1 cloud network NW2 LTE
network PR sound signal processing transfer unit PR1 control unit PR2 sound source separation
unit SL1 1 selection unit SL2 second selection unit T1, T2 communication terminal
03-05-2019
29
1/--страниц
Пожаловаться на содержимое документа