JP2013164468

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013164468
Abstract: A voice analysis apparatus and the like capable of grasping the synchrony of voices
between wearers while suppressing the influence of environmental sound from information on
sound pressure of voices acquired by voice acquisition means of a plurality of wearers. I will
provide a. A data receiving unit (21) for receiving information on audio signals of voices acquired
by a plurality of microphones (11, 12) arranged at different positions from the wearer's mouth
and acquiring voices; Of the voice acquired by the microphones 11 and 12 identified by the
comparison result of the voice signals of the voices acquired by the multiple microphones 11 and
12 for the disabled person A tuneability determination unit that determines the tuneability of
voice from information related to the voice signal of the voice, regarding the voice identified as
the voice of any of the wearers from the discrimination result of whether the voice is the voice of
another person other than the person , And a host device 20. [Selected figure] Figure 1
Speech analysis device, speech analysis system and program
[0001]
The present invention relates to a voice analysis device, a voice analysis system, and a program.
[0002]
According to Patent Document 1, when subjects A and B in a room interact, the speech of the
subject A and the speech of the subject B are captured by two microphones, respectively, and the
CPU calculates the speed of the captured speech. Is a tuning degree detection device that
compares the difference between the two with the threshold and turns on / off the lamp outside
the room according to the comparison result, and if the difference is greater than or equal to the
threshold, it is determined that the dialogue is not synchronized; Is turned off, while if the
03-05-2019
1
difference is less than the threshold, it is determined that the dialogue is in sync and the lamp is
turned on.
[0003]
JP, 2005-265982, A
[0004]
An object of the present invention is to grasp the synchrony of voices between wearers by
suppressing the influence of environmental sound from information of sound pressure of voices
acquired by voice acquiring means of a plurality of wearers.
[0005]
The invention according to claim 1 is a voice information receiving unit for receiving information
on voice signals of voices disposed at different positions from the wearer's mouth and obtained
by a plurality of voice acquiring means for acquiring voices. And the voice acquired by the voice
acquisition means is identified based on the comparison result of the voice signals of the voices
acquired by the plurality of voice acquisition means for each wearer; With regard to voices
identified as speech voices of any wearer from the result of discrimination between speech voices
and speech voices of others other than the wearer, the tuneability of the voice is determined from
the information on the voice signal of the voices A voice analysis apparatus comprising: a
tuneability determination unit.
[0006]
The invention according to claim 2 is characterized by further comprising a self / other
identification unit for obtaining a discrimination result as to whether the voice of the wearer or
the voice of another person other than the wearer comprises the voice acquisition means. It is a
voice analysis device according to item 1.
The voice analysis according to claim 1 or 2, further comprising: a grouping unit that performs
grouping of the wearer based on the tuneability of the voice determined by the tuneability
determination unit. It is an apparatus.
In the invention according to claim 4, in the case where it is determined that the sound acquired
03-05-2019
2
by the sound acquisition means is neither the uttered speech of any of the wearers, the grouping
unit divides the sounds into the groupings of the wearers. The speech analysis device according
to claim 3, characterized in that it is not used for operation.
[0007]
The invention according to claim 5 is a comparison result of voice signals of voices acquired by
the voice acquisition means with a plurality of voice acquisition means for acquiring voices
arranged at different positions from the wearer's mouth. The self / other identification unit for
identifying whether the voice acquired by the voice acquisition means is the speech voice of the
wearer provided with the voice acquisition means or the speech voice of another person other
than the wearer; And a tuneability determination unit that determines the tuneability of the voice
from the information related to the voice signal of the voice with respect to the voice identified as
the speech voice of any of the wearers from the identification result identified by Is a speech
analysis system characterized by
[0008]
The invention according to claim 6 receives an audio information transmission unit that
transmits information related to an audio signal of audio acquired by the audio acquisition unit,
and information related to an audio signal of audio transmitted by the audio information
transmitter. The voice analysis system according to claim 5, further comprising: a voice
information receiving unit.
[0009]
The invention according to claim 7 is a function of receiving information on audio signals of
voices acquired by a plurality of voice acquisition means arranged at different positions from the
wearer's mouth and acquiring voices. And the voice acquired by the voice acquisition means is
identified based on the comparison result of the voice signals of the voices acquired by the
plurality of voice acquisition means for each wearer; With regard to voices identified as speech
voices of any wearer from the result of discrimination between speech voices and speech voices
of others other than the wearer, the tuneability of the voice is determined from the information
on the voice signal of the voices It is a program that realizes functions.
[0010]
According to the invention of claim 1, compared with the case where the present configuration is
not adopted, the influence of the environmental sound is suppressed from the information of the
03-05-2019
3
sound pressure of the sound acquired by the sound acquiring means of the plurality of wearers.
It is possible to provide a voice analysis device capable of grasping the voice synchrony of
According to the second aspect of the present invention, the present invention can also be
applied to the case where the information received by the voice information receiving unit does
not include the information regarding the subject of the uttered voice.
According to the invention of claim 3, compared with the case where the present configuration is
not adopted, the influence of the environmental sound is suppressed from the information of the
sound pressure of the sound acquired by the sound acquiring means of the plurality of wearers.
Can be grouped.
According to the invention of claim 4, compared with the case where this configuration is not
adopted, the accuracy of grouping of the wearer is further improved.
According to the invention of claim 5, compared with the case where the present configuration is
not adopted, the influence of the environmental sound is suppressed from the information of the
sound pressure of the sound acquired by the sound acquiring means of the plurality of wearers.
It is possible to construct a speech analysis system that can grasp the synchrony of the speech.
According to the invention of claim 6, the voice acquiring means can be easily worn by the
wearer, and the process of analyzing the voice can be performed centrally.
According to the invention of claim 7, compared with the case where the present configuration is
not adopted, the influence of the environmental sound is suppressed from the information of the
sound pressure of the sound acquired by the sound acquiring means of the plurality of wearers.
The computer can realize a function that can grasp the audio synchronization.
[0011]
It is a figure showing the example of composition of the speech analysis system by this
embodiment. It is a figure which shows the structural example of the terminal device in this
embodiment. It is a figure which shows the relationship of a wearer's and the other person's
03-05-2019
4
mouth (speech part), and a position with a microphone. It is a figure which shows the
relationship between the distance between a microphone and a sound source, and sound
pressure (input sound volume). It is a flowchart which shows operation ¦ movement of the
terminal device in this embodiment. It is a block diagram of the data analysis part in this
embodiment. It is a flowchart which shows operation ¦ movement of the host apparatus in this
embodiment. It is a figure which shows the condition where the several wearer who each
mounted ¦ worn the terminal device of this embodiment is having a conversation. It is a figure
which shows the example of the speech information of each terminal device in the conversation
condition of FIG. It is a table ¦ surface explaining parameter A1, A2, B1, B2, the wearer A, and the
wearer B's distance relationship. (A)-(b) is the figure which showed the specific example of the
terminal device actually used.
[0012]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the accompanying drawings. <System Configuration Example> FIG. 1 is a view showing a
configuration example of a speech analysis system according to the present embodiment. As
shown in FIG. 1, the voice analysis system 1 of the present embodiment is configured to include a
terminal device 10 and a host device 20 which is an example of a voice analysis device. The
terminal device 10 and the host device 20 are connected via a wireless communication line. As a
type of wireless communication line, a line according to an existing system such as Wi-Fi
(Wireless Fidelity), Bluetooth (registered trademark), ZigBee, UWB (Ultra Wideband) may be
used. Further, in the illustrated example, only one terminal device 10 is described, but as will be
described in detail later, the terminal device 10 is worn and used by each user, and is actually
used The terminal devices 10 for the number of persons are prepared. Hereinafter, the user
wearing the terminal device 10 is referred to as a wearer.
[0013]
The terminal device 10 includes a plurality of microphones (first and second microphones 11
and 12) and amplifiers (first and second amplifiers 13 and 14) as sound acquisition means for
acquiring sound. The terminal device 10 further includes a voice analysis unit 15 that analyzes
the acquired voice, a data transmission unit 16 for transmitting an analysis result to the host
device 20, and further includes a power supply unit 17.
[0014]
03-05-2019
5
The first microphone 11 and the second microphone 12 are disposed at different positions from
the wearer's mouth (speaking part). Here, the first microphone 11 is disposed at a position (e.g.,
about 35 cm) far from the mouth (speaking part) of the wearer, and the second microphone 12 is
a position (e.g., about 10 cm) near the mouth (speaking part) Shall be placed in As types of
microphones used as the first microphone 11 and the second microphone 12 of the present
embodiment, various existing ones such as a dynamic type and a capacitor type may be used. In
particular, a nondirectional MEMS (Micro Electro Mechanical Systems) microphone is preferable.
[0015]
The first amplifier 13 and the second amplifier 14 amplify the electrical signal (audio signal)
output according to the voice acquired by the first microphone 11 and the second microphone
12, respectively. As the amplifiers used as the first amplifier 13 and the second amplifier 14 of
the present embodiment, an existing operational amplifier or the like may be used.
[0016]
The voice analysis unit 15 analyzes voice signals output from the first amplifier 13 and the
second amplifier 14. Although the details will be described later, the voice analysis unit 15
detects the voices acquired by the first microphone 11 and the second microphone 12 based on
the comparison result of the voice signals of the voices acquired by the first microphone 11 and
the second microphone 12. It functions as a self-and-others identification unit that discriminates
whether a voice of a wearer provided with the first microphone 11 and the second microphone
12 or a voice of another person other than the wearer.
[0017]
The data transmission unit 16 transmits the acquired data including the analysis result by the
voice analysis unit 15 and the ID of the terminal to the host device 20 via the above-described
wireless communication line. As information to be transmitted to the host device 20, according to
the contents of processing performed in the host device 20, in addition to the above analysis
result, for example, acquisition time of voice by the first microphone 11 and second microphone
12 and acquired voice Information such as sound pressure may be included. The terminal device
03-05-2019
6
10 may be provided with a data storage unit for storing the analysis result by the voice analysis
unit 15 and batch transmission of storage data for a fixed period may be performed. In addition,
you may transmit by a wire line. In the present embodiment, the data transmission unit 16
functions as an audio information transmission unit that transmits information related to an
audio signal of audio.
[0018]
The power supply unit 17 supplies power to the first microphone 11, the second microphone 12,
the first amplifier 13, the second amplifier 14, the voice analysis unit 15, and the data
transmission unit 16 described above. As a power supply, for example, an existing power supply
such as a dry battery or a rechargeable battery is used. Further, the power supply unit 17
includes known circuits such as a voltage conversion circuit and a charge control circuit, as
necessary.
[0019]
The host device 20 outputs a data receiving unit 21 that receives data transmitted from the
terminal device 10, a data storage unit 22 that stores the received data, a data analysis unit 23
that analyzes the stored data, and an analysis result. And an output unit 24. The host device 20 is
realized by, for example, an information processing device such as a personal computer. Further,
as described above, in the present embodiment, a plurality of terminal devices 10 are used, and
the host device 20 receives data from each of the plurality of terminal devices 10.
[0020]
The data reception unit 21 corresponds to the above-described wireless channel, receives data
from each terminal device 10, and sends the data to the data storage unit 22. In the present
embodiment, the data receiving unit 21 functions as an audio information receiving unit that
receives information on the audio signal of the audio transmitted by the data transmitting unit
16. The data storage unit 22 stores the reception data acquired from the data reception unit 21
for each speaker. Here, the identification of the speaker is performed by collating the terminal ID
transmitted from the terminal device 10 with the speaker name and the terminal ID registered in
advance in the host device 20. Also, the wearer state may be transmitted from the terminal
device 10 instead of the terminal ID.
03-05-2019
7
[0021]
The data analysis unit 23 is realized by, for example, a program-controlled CPU of a personal
computer, and analyzes data stored in the data storage unit 22. The specific analysis content and
analysis method can take various contents and methods according to the usage purpose and
usage mode of the system of the present embodiment. For example, analyzing the frequency of
interaction between the wearers of the terminal device 10 and the tendency of the other party of
the interaction with each wearer, or analogizing the relationship of the interlocutors from the
information of the length and sound pressure of each utterance in the dialogue To be done.
[0022]
The output unit 24 outputs an analysis result by the data analysis unit 23 or performs an output
based on the analysis result. The means for outputting this analysis result and the like can take
various means such as display display, print output by a printer, voice output, etc., depending on
the purpose of use and usage mode of the system, contents and format of the analysis result.
[0023]
<Example of Configuration of Terminal Device> FIG. 2 is a view showing an example of the
configuration of the terminal device 10. As shown in FIG. As described above, the terminal device
10 is worn and used by each user. In order to make the user attachable, as shown in FIG. 2, the
terminal device 10 of the present embodiment is configured to include an apparatus main body
30 and a strap 40 connected to the apparatus main body 30. In the illustrated configuration, the
user wears the strap 40 and wears the device body 30 from the neck.
[0024]
The device main body 30 is a circuit and a power source that realizes at least the first amplifier
13, the second amplifier 14, the voice analysis unit 15, the data transmission unit 16, and the
power supply unit 17 in a thin rectangular parallelepiped case 31 formed of metal or resin. The
power supply (battery) of the part 17 is accommodated and comprised. The case 31 may be
03-05-2019
8
provided with a pocket into which an ID card or the like displaying ID information such as the
name or affiliation of the wearer is inserted. In addition, such ID information or the like may be
printed on the surface of the case 31 itself, or a seal in which the ID information or the like is
described may be attached.
[0025]
The strap 40 is provided with a first microphone 11 and a second microphone 12 (hereinafter
referred to as the microphones 11 and 12 when the first microphone 11 and the second
microphone 12 are not distinguished from each other). The microphones 11 and 12 are
connected to the first amplifier 13 and the second amplifier 14 housed in the device body 30 by
a cable (electric wire or the like) passing through the inside of the strap 40. As materials of the
strap 40, various existing materials such as leather, synthetic leather, cotton and other natural
fibers, synthetic fibers such as resin, metals, etc. may be used. Moreover, the coating process
using a silicone resin, a fluorine resin, etc. may be given.
[0026]
The strap 40 has a tubular structure, and the microphones 11 and 12 are housed inside the strap
40. By providing the microphones 11 and 12 inside the strap 40, it is possible to prevent the
microphones 11 and 12 from being damaged or soiled, and to prevent the communicator from
being aware of the presence of the microphones 11 and 12. The first microphone 11 disposed at
a position far from the wearer's mouth (speaking part) may be provided in the device body 30. In
the present embodiment, a case where the first microphone 11 is provided to the strap 40 will be
described as an example.
[0027]
Referring to FIG. 2, the first microphone 11 is provided at an end (for example, a position within
10 cm from the connection site) of the strap 40 connected to the device body 30. As a result, in a
state where the wearer puts the strap 40 on the neck and lowers the device body 30, the first
microphone 11 is disposed at a position approximately 30 cm to 40 cm away from the mouth
(speaking part) of the wearer . Also when the first microphone 11 is provided in the device body
30, the distance from the wearer's mouth (speaking part) to the first microphone 11 is
approximately the same.
03-05-2019
9
[0028]
The second microphone 12 is provided at a position distant from the end of the strap 40
connected to the device body 30 (for example, a position of about 20 cm to 30 cm from the
connection site). Thus, with the wearer hanging the strap 40 around the neck and lowering the
device body 30, the second microphone 12 is located at the neck of the wearer (for example, at a
position that hits the clavicle), and the mouth of the wearer (speech The site is placed at a
distance of about 10 cm to 20 cm from the site.
[0029]
In addition, the terminal device 10 of this embodiment is not limited to the structure shown in
FIG. For example, in the microphones 11 and 12, the distance of the sound wave arrival path
from the first microphone 11 to the mouth (speaking part) of the wearer is the distance of the
sound wave arrival path from the second microphone 12 to the mouth (speaking part) The
positional relationship between the first microphone 11 and the second microphone 12 may be
specified so as to be several times as large. Therefore, the first microphone may be provided on
the strap 40 behind the neck. Further, the microphones 11 and 12 may be attached to the wearer
by various methods without being limited to the configuration provided on the strap 40 as
described above. For example, each of the first microphone 11 and the second microphone 12
may be individually fixed to clothes using a pin or the like. In addition, a dedicated attachment
designed so that the positional relationship between the first microphone 11 and the second
microphone 12 is fixed at a desired position may be prepared and attached.
[0030]
Further, as shown in FIG. 2, the device body 30 is not limited to a configuration that can be
connected to the strap 40 and carried from the neck of the wearer, as long as the device body 30
can be easily carried. For example, it may be configured to be attached to clothes or a body by a
clip or a belt instead of the strap as in the present embodiment, or may be configured to be
simply carried in a pocket or the like. In addition, a function of receiving, amplifying, and
analyzing audio signals from the microphones 11 and 12 may be realized by a mobile phone or
other existing portable electronic information terminals.
03-05-2019
10
[0031]
Furthermore, the microphones 11 and 12 and the apparatus main body 30 (or the voice analysis
unit 15) may be connected by wireless communication rather than by wired connection.
Although the first amplifier 13, the second amplifier 14, the voice analysis unit 15, the data
transmission unit 16, and the power supply unit 17 are housed in a single case 31 in the above
configuration example, they are configured as a plurality of individual It is good. For example, the
power supply unit 17 may not be housed in the case 31 and may be connected to an external
power supply and used.
[0032]
<Identification of Speaker (Self (Others)) Based on Non-Language Information of Acquired Voice>
Next, a method of identifying a speaker in the present embodiment will be described. The system
of this embodiment uses the information of the sound acquired by the two microphones 11 and
12 provided in the terminal device 10 to discriminate between the speech voice of the wearer of
the terminal device 10 and the speech voice of the other person. Do. In other words, the present
embodiment identifies oneself and the other (speaker and others) with respect to the speaker of
the acquired sound. Further, in the present embodiment, of the information of the acquired
sound, the speech is not based on the linguistic information obtained using morphological
analysis or dictionary information, but on the basis of non-verbal information such as sound
pressure (input volume to microphones 11 and 12). Identify the In other words, the speaker of
the sound is identified from the speech situation specified by the non-language information, not
the speech content specified by the language information.
[0033]
As described with reference to FIGS. 1 and 2, in the present embodiment, the first microphone 11
of the terminal device 10 is disposed at a position far from the mouth (speaking portion) of the
wearer, and the second microphone 12 is the wearer. Placed at a position close to the mouth
(speaking part) of That is, when the wearer's mouth (speaking part) is used as a sound source,
the distance between the first microphone 11 and the sound source and the distance between the
second microphone 12 and the sound source are largely different. Specifically, the distance
between the first microphone 11 and the sound source is about 1.5 to 4 times the distance
between the second microphone 12 and the sound source. Here, the sound pressure of the
03-05-2019
11
acquired sound in the microphones 11 and 12 attenuates (distance attenuation) as the distance
between the microphones 11 and 12 and the sound source increases. Therefore, regarding the
speech sound of the wearer, the sound pressure of the acquired sound in the first microphone 11
and the sound pressure of the acquired sound in the second microphone 12 are largely different.
[0034]
On the other hand, considering the case where the mouth (speaking part) of a person (other)
other than the wearer is the sound source, since the other person is apart from the wearer, the
distance between the first microphone 11 and the sound source The distance between the second
microphone 12 and the sound source does not change significantly. Depending on the position of
the other person with respect to the wearer, a difference between the two may occur, but the
distance between the first microphone 11 and the sound source is second as in the case where
the wearer's mouth (speaking part) is used as the sound source. It will not be several times the
distance between the microphone 12 and the sound source. Therefore, regarding the speech
voice of the other person, the sound pressure of the acquired speech at the first microphone 11
and the sound pressure of the acquired speech at the second microphone 12 do not differ greatly
as in the case of the speech of the wearer.
[0035]
FIG. 3 is a diagram showing the positional relationship between the mouths of the wearer and
others (speaking parts) and the microphones 11 and 12. In the relationship shown in FIG. 3, the
distance between the sound source a, which is the wearer's mouth (speaking part), and the first
microphone 11 is La1, and the distance between the sound source a and the second microphone
12 is La2. Further, the distance between the sound source b which is the other person's mouth
(speaking part) and the first microphone 11 is Lb1, and the distance between the sound source b
and the second microphone 12 is Lb2. In this case, the following relationship holds.
La1>La2(La1≒1.5×La2〜4×La2) Lb1≒Lb2
[0036]
FIG. 4 is a view showing the relationship between the distance between the microphones 11 and
12 and the sound source and the sound pressure (input volume). As described above, the sound
pressure attenuates in accordance with the distance between the microphones 11 and 12 and the
03-05-2019
12
sound source. In FIG. 4, when the sound pressure β in the case of the distance La1 and the
sound pressure α in the case of the distance La2 are compared, the sound pressure α is about
four times the sound pressure β. On the other hand, since the distance Lb1 and the distance Lb2
approximate each other, the sound pressure β in the case of the distance Lb1 and the sound
pressure α in the case of the distance Lb2 are substantially equal. Therefore, in the present
embodiment, the sound pressure difference is used to discriminate between the user's own
utterance voice and the other person's utterance voice in the acquired voice. Although the
distances Lb1 and Lb2 are 60 cm in the example shown in FIG. 4, it means that the sound
pressure α and the sound pressure β are almost equal, and the distances Lb1 and Lb2 are
limited to the values shown in the figure. I will not.
[0037]
As described with reference to FIG. 4, the sound pressure α of the second microphone 12 is
several times (for example, about 4 times) the sound pressure β of the first microphone 11 with
respect to the voice of the wearer. Further, regarding the speech voice of the other person, the
sound pressure α of the second microphone 12 is substantially equal to (about 1 times) the
sound pressure β of the first microphone 11. Therefore, in the present embodiment, a threshold
is set to the difference between the sound pressure α of the second microphone 12 and the
sound pressure β of the first microphone 11 (sound pressure difference α−β). Then, the
sound whose sound pressure difference is larger than the threshold is judged as the user's own
speech, and the sound whose sound pressure difference is smaller than the threshold is judged as
the other's speech.
[0038]
The voice acquired by the microphones 11 and 12 includes so-called noise (noise) such as
environmental sound in addition to the voiced speech. The relationship of the distance between
the noise source and the microphones 11 and 12 is similar to that of the other person's speech.
That is, according to the example shown in FIG. 4, the distance between the noise source c and
the first microphone 11 is Lc1, and the distance between the noise source c and the second
microphone 12 is Lc2. Lc1 and the distance Lc2 approximate each other. And sound pressure
difference alpha-beta in the acquisition sound of microphones 11 and 12 becomes smaller than a
threshold. However, such noise is separated and removed from the speech by performing
filtering processing by an existing technique using a band pass filter, a gain filter, and the like.
03-05-2019
13
[0039]
<Operation Example of Terminal Device> FIG. 5 is a flowchart showing the operation of the
terminal device 10 in the present embodiment. As shown in FIG. 5, when the microphones 11
and 12 of the terminal device 10 acquire voice, electric signals (audio signals) corresponding to
the acquired voice are transmitted from the microphones 11 and 12 to the first amplifier 13 and
the second amplifier 14. Step 101). When the first amplifier 13 and the second amplifier 14
acquire the audio signals from the microphones 11 and 12, they amplify the signals and send
them to the audio analysis unit 15 (step 102).
[0040]
The voice analysis unit 15 performs filtering processing on the signals amplified by the first
amplifier 13 and the second amplifier 14 to remove noise (noise) components such as
environmental sound from the signal (step 103). Next, the voice analysis unit 15 acquires each of
the microphones 11 and 12 in a fixed time unit (for example, several tenths of a second to
several hundredths of a second) for the signal from which the noise component is removed. The
average sound pressure in the voice is determined (step 104).
[0041]
Next, when the gain of the average sound pressure in each of the microphones 11 and 12
obtained in step 104 is present (Yes in step 105), the voice analysis unit 15 determines that
there is an utterance voice (utterance is performed). On the other hand, when there is no gain of
the average sound pressure in each of the microphones 11 and 12 obtained in step 104 (No in
step 105), the voice analysis unit 15 determines that there is no voice (voice is not performed) (
Step 110). Then, when it is determined that there is a speech, the voice analysis unit 15 obtains a
difference (sound pressure difference) α-β between the average sound pressure at the first
microphone 11 and the average sound pressure at the second microphone 12 (step 106). . Then,
if the sound pressure difference α-β obtained in step 106 is larger than the threshold (Yes in
step 107), it is determined that the speech sound is a speech by the wearer's own speech. Then,
the voice analysis unit 15 sets a parameter to that effect. Here, for convenience, this is referred to
as "state 1" (step 108). On the other hand, when the sound pressure difference obtained in step
106 is smaller than the threshold (No in step 107), the speech analysis unit 15 determines that
the speech sound is a speech according to another person's speech. Then, the voice analysis unit
15 sets a parameter to that effect. Here, for convenience, this is referred to as "state 2" (step
03-05-2019
14
109).
[0042]
Note that the determination in step 105 takes into account the case where noise that could not
be removed by the filtering process in step 103 remains in the signal, and there is a gain if the
value of the average sound pressure gain is a certain value or more. You may judge.
[0043]
Thereafter, the voice analysis unit 15 receives the information (the presence or absence of the
utterance, the information of the speaker, the identification result of the self / other identification
( state 1 ) obtained by the processing of step 104 to step 110 through the data transmission
unit 16 Or state 2 information) is transmitted to the data receiving unit 21 of the host device
20 as an analysis result (step 111).
At this time, the length of the speaking time of each speaker (the wearer or others), the value of
the gain of the average sound pressure, and other additional information may be transmitted to
the data receiving unit 21 of the host device 20 together with the analysis result. .
[0044]
Then, in the host device 20, the tuneability of the voice is determined based on the voice
information including the identification result of the self-other identification of the voice. In the
present embodiment, the data analysis unit 23 of the host device 20 performs the function of
deriving the distance relationship between a plurality of wearers.
[0045]
<Description of Data Analysis Unit 23> FIG. 6 is a block diagram of the data analysis unit 23 in
the present embodiment. As shown in FIG. 6, the data analysis unit 23 determines the tuneability
of voice from the information on voice signals of voices acquired from a plurality of wearers by
the data reception unit 21, and the tuneability determination unit 231. And a grouping unit 232
that performs grouping of the wearer based on the audio synchrony determined by the.
03-05-2019
15
[0046]
<Operation Example of Host Device> FIG. 7 is a flowchart showing the operation of the host
device 20 in the present embodiment. Hereinafter, the operation of the host device 20 according
to the present embodiment will be described with reference to FIGS. 1, 6, and 7. First, the data
receiving unit 21 receives various information including information on voice and information on
identification results of self-other identification from the plurality of terminal devices 10 (step
201). These pieces of information are temporarily stored in the data storage unit 22 (step 202).
[0047]
Next, this information is sent to the data analysis unit 23, and the data analysis unit 23
determines the synchrony of the voice sent from the plurality of terminal devices 10 (step 203).
[0048]
Hereinafter, a method of determining the synchrony of voice information will be described.
FIG. 8 is a view showing a state in which a plurality of wearers wearing the terminal device 10 of
the present embodiment are in conversation. FIG. 9 is a diagram showing an example of
utterance information of each of the terminal devices 10A and 10B in the conversation situation
of FIG. As shown in FIG. 8, a case is considered where two wearers A and B, who respectively
wear the terminal device 10, are in conversation. At this time, the voice recognized as the
utterance of the wearer in the terminal device 10A of the wearer A is recognized as the utterance
of the other person in the terminal device 10B of the wearer B. On the contrary, the voice
recognized as the speech of the wearer in the terminal device 10B is recognized as the speech of
the other person in the terminal device 10A.
[0049]
Speech information is sent to the host device 20 independently from the terminal device 10A and
the terminal device 10B. At this time, as shown in FIG. 9, the speech information acquired from
the terminal device 10A and the speech information acquired from the terminal device 10B are
03-05-2019
16
opposite to each other in the identification result of the speaker (the wearer and the other
person), but the speech is The information indicating the state of speech such as the length of
time or the timing when the speaker is switched approximates. Therefore, the host device 20
according to this application example compares the information acquired from the terminal
device 10A with the information acquired from the terminal device 10B to determine that these
pieces of information indicate the same utterance situation, and the wearer It recognizes that A
and the wearer B are in conversation. By thus determining the synchrony between the voice of
the wearer A and the voice of the wearer B, it can be determined that the wearer A and the
wearer B are in conversation. Here, as the information indicating the utterance status, at least the
length of the utterance time in each utterance of each speaker mentioned above, the start time
and end time of each utterance, the time (timing) when the speaker is switched, etc. As such, time
information on the utterance is used. Note that only part of the time information on these
utterances may be used to determine the utterance status of a particular conversation, or other
information may be used additionally.
[0050]
In this embodiment, when one of the terminal devices 10 is identified as the voice of the person
wearing the terminal device 10 (wearer), the voice of another person among the voices acquired
by the other terminal devices 10 And determine the synchrony with the voice identified as. This
is because the tuneability determination unit 231 identifies the voices acquired by the
microphones 11 and 12 that are identified based on the comparison result of the audio signals of
the voices acquired by the microphones 11 and 12 for each wearer. From the information on the
audio signal of the voice, the sound identified as the uttered voice of any of the wearers from the
identification result of whether the uttered voice of the wearer or the uttered voice of another
person other than the wearer has It may be reworded to determine the synchrony.
[0051]
This matter will be described in more detail below. Here, the state 1 (for one's own
utterance) and the state 2 (for another person's utterance) described above are used. In this
case, the following four parameters can be set for the wearer A and the wearer B.
[0052]
For the wearer A: When α-β> (predetermined threshold) ( state 1 ), the parameter A1 is set
When α-β <(predetermined threshold) ( state 2 ), the parameter A2 The set
03-05-2019
17
[0053]
For the wearer B: When α−β> (predetermined threshold) ( state 1 ), the parameter B1 is set
When α−β <(predetermined threshold) ( state 2 ), the parameter B2 The set
[0054]
FIG. 10 is a table for explaining the distance relationship between the parameters A1, A2, B1 and
B2 and the wearer A and the wearer B.
If applied to the example described in FIG. 8 and FIG.
[0055]
(1) When parameter A1 and parameter B2 are set for the wearer A and the wearer B in a certain
time zone, it is determined that the voice of the wearer A and the voice of the wearer B have
synchrony, and the wearing is performed It can be determined that the person A and the wearer
B are in a conversational relationship.
In this case, it can be reworded that the wearer B is at a distance at which the speaker A's speech
can be heard.
[0056]
(2) Also, when parameter A2 and parameter B1 are set in a certain time zone, it is also
determined that the voice of the wearer A and the voice of the wearer B have synchrony, and the
wearer A and the wearer B And can be judged to be in a conversational relationship. In this case,
it can be reworded that the wearer A is at a distance at which the speaker B can hear the speech.
[0057]
03-05-2019
18
(3) In a certain time zone, parameter A1 and parameter B1 may be set. In this case, the wearer A
and the wearer B are in close proximity to each other. Also in this case, it may be determined that
the voice of the wearer A and the voice of the wearer B have synchrony, and that the wearer A
and the wearer B have a conversational relationship.
[0058]
(4) The parameter A2 and the parameter B2 may be set in a certain time zone. In this case,
although the wearer A and the wearer B are in the space where the same sound can be heard, it
can be determined that the distance between the wearer A and the wearer B is relatively long.
Then, it can be determined that this sound is not the voice uttered by the wearer A and the
wearer B. More specifically, it can be judged that this sound is an environmental sound such as a
sound of an air conditioner or a sound of construction when the sound is a voice of a wearer
other than the wearer A and the wearer B.
[0059]
Summarizing the above-described matters, (3) is the closest as the distance between the wearer A
and the wearer B. Then, the cases of (1) and (2) are the closest, and the case of the farthest is the
case of (4) ((3) <((1) and (2)) <(4)). In this manner, proximity determination of a plurality of
wearers can be performed based on the identification result of the self-other identification.
[0060]
If it is determined that there is no synchrony with any of the plurality of wearers (No in step
203), the process returns to step 201. On the other hand, when it is determined that there is
synchrony for any of the plurality of wearers (Yes in step 203), the grouping unit 232 next
performs grouping of the wearers (step 204). This is done by selecting the wearer in
conversation from the audio synchrony between each of the plurality of wearers. For example,
when it is determined that only the two wearers A and B have the audio synchrony, these two
persons are selected to form one group (group). In addition to the wearer A and the wearer B,
when it is determined that the wearer C who is the other wearer has the audio synchrony with at
least one of the wearer A and the wearer B, these three persons are Make a selection (group). In
addition, even if there are four or more wearers who are compatible with voice, a group (group)
03-05-2019
19
can be configured in the same manner. Then, grouping of the wearers can be performed by
determining the audio synchrony for all the wearers.
[0061]
At this time, in the case of the above (4) in which the grouping unit 232 determines that the
sound acquired by the microphones 11 and 12 is neither the speech voice of any wearer
according to the identification result of the self / other identification, Is not used to group the
wearers.
[0062]
By grouping the wearer wearing the terminal device 10 as described above, it is possible to
analyze the communication tendency of each wearer.
Furthermore, it can be determined whether a specific wearer exists in a predetermined space. If
this space is, for example, a predetermined room, it can be determined whether or not this
particular wearer is in this room. That is, the voice analysis system 1 according to the present
embodiment can also be used as an entry / exit determination system for the wearer.
[0063]
Further, in the above-described example, the sound self-other identification is performed by the
sound pressure difference α-β, but the present invention is not limited to this. For example,
considering the sound pressure ratio α / β, the case where this value is larger than a
predetermined threshold may be set as state 1 , and the case smaller than the predetermined
threshold may be set as state 2 . Furthermore, in the above-described example, the terminal
device 10 performs the self-other identification of the voice, but the present invention is not
limited to this, and the host device 20 may perform the identification. As the speech analysis
system 1 in this embodiment, for example, the data analysis unit 23 of the host device 20
performs the self-other identification of the speech performed by the speech analysis unit 15
with respect to that of FIG. In this case, the data analysis unit 23 functions as the abovedescribed self-other identification unit.
[0064]
03-05-2019
20
<Specific Example of Terminal Device> FIGS. 11A and 11B are views showing a specific example
of the terminal device 10 that is actually used. Among these, FIG. 11A has substantially the same
configuration as the terminal device 10 shown in FIG. 2, and two microphones of a first
microphone 11 and a second microphone 12 are arranged. However, the first microphone 11 is
disposed in the device body 30. The distance between the first microphone 11 and the second
microphone 12 is 35 cm.
[0065]
Further, in FIG. 11B, three microphones of the third microphone 18 are arranged in addition to
the first microphone 11 and the second microphone 12. The distance between the first
microphone 11 and the second microphone 12 and the distance between the third microphone
18 and the first microphone 11 are both 35 cm. The distance between the second microphone
12 and the third microphone 18 is 10 cm.
[0066]
By using the terminal device 10 in which the microphones are arranged at three or more places
as shown in FIG. 11 (b), it is possible to determine the above-mentioned distance relationship and
interaction relationship using different microphone pairs. . In the terminal device 10 shown in
FIG. 11B, the set of the first microphone 11 and the second microphone 12 and the set of the
third microphone 18 and the first microphone 11 can be selected. By using a plurality of sets of
microphones in this manner, it is possible to more accurately determine the distance relationship
and interaction relationship of a plurality of wearers in the data analysis unit 23.
[0067]
<Description of Program> The processing performed by the host device 20 in the present
embodiment described with reference to FIG. 7 is realized by cooperation of software and
hardware resources. That is, a CPU (not shown) in the control computer provided in the host
device 20 executes a program for realizing each function of the host device 20 to realize each of
these functions.
03-05-2019
21
[0068]
Therefore, the process performed by the host device 20 described with reference to FIG. 7 is a
process in which the computer receives voices acquired by the plurality of microphones 11 and
12 disposed at different positions from the wearer's mouth and acquiring voices. The voices
acquired by the microphones 11 and 12 are identified based on the function of receiving
information and the comparison result of the voice signals of the voices acquired by the multiple
microphones 11 and 12 for each wearer. About the voice identified as the voice of any wearer
from the identification result of the voice of the wearer or the voice of another person other than
the wearer, the tuning of the voice from the information on the voice signal of the voice It can
also be understood as a program that realizes the function of determining the gender.
[0069]
DESCRIPTION OF SYMBOLS 1 ... Voice analysis system, 10 ... Terminal device, 11 ... 1st
microphone, 12 ... 2nd microphone, 15 ... Voice analysis part, 16 ... Data transmission part, 18 ...
3rd microphone, 20 ... Host device, 21 ... Data reception Unit 23 Data analysis unit 30 Device
body 40 Straps 231 Tuneability determination unit 232 Grouping unit
03-05-2019
22