close

Вход

Забыли?

вход по аккаунту

JP2009109868

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009109868
A sound source localization apparatus advantageous in hardware implementation is provided.
SOLUTION: Two microphones 2 and 3 and microphones 2 and 3 which are respectively directed
forward and are disposed at an interval to the left and right, one being directed forward and the
other being directed backwards. Time difference detection unit 8 for detecting time difference
information of the sound collected by 3 according to the pulse neuron model, and sound
pressure difference detection unit 9 for detecting sound pressure difference information of the
sound collected by the microphones 2 and 3 by the pulse neuron model The left and right
direction detection unit 10 detects the directional information of the sound source in the left and
right direction by the pulse neuron model based on the time difference information of the sound
and the directional information of the sound source in the front and back direction by the pulse
neuron model based on the sound pressure difference information of the sound A sound source
localization apparatus 1 comprising: a longitudinal direction detection unit 11; [Selected figure]
Figure 5
Sound source localization device
[0001]
The present invention relates to a sound source localization apparatus for identifying a sound
source direction, and in particular, to a pulse signal (also simply referred to as pulse . A
sound source localization apparatus using a pulse neuron model which is a neuron model that
inputs / outputs.
04-05-2019
1
[0002]
Also referred to as a pulse neuron model (hereinafter referred to as "PN model" or "neuron") for
sound source localization. The technique which uses these is disclosed by following patent
document 1 and non-patent documents 1-3 below. Patent Document 1 discloses a time difference
detector for sound source localization, and Non Patent Document 1 discloses a sound source
direction perception model for extracting a time difference between sounds coming into both
ears and a sound pressure difference using a PN model. Non-Patent Document 2 discloses a
competitive learning neural network using a PN model for auditory information processing
system, and Non-Patent Document 3 discloses a method of implementing the PN model in
hardware. ing.
[0003]
FIG. 1 shows a schematic view of a PN model. Although the same PN model is also shown in NonPatent Documents 1 to 3, it will not be described in detail, but in the PN model, time t (t is a
discrete value, and in this case dt = 1. When the pulse i n (t) = 1 is input from the n th input
channel to the n th synapse, the local membrane potential p n (t) of the n th synapse is the
connection weight (simply weight It is also called. )) It increases by w n and then decays to
the quiescent potential with a time constant τ. The internal potential I (t) of the PN model at
time t is represented as the sum of each local membrane potential p n (t) at that time. The PN
model fires (that is, generates an output pulse "1") when the internal potential I (t) becomes equal
to or higher than the threshold θ. However, since there is a refractory period RP related to firing
in a nerve cell, even in the PN model, it does not fire even when the internal potential exceeds a
threshold during a certain period from firing to RP.
[0004]
The PN model can be hardware-implemented by digital circuits. FIG. 2 shows a configuration
example of a PN model digital circuit. A similar configuration example is also described in NonPatent Document 3 and therefore will not be described in detail, but in this configuration
example, addition processing is normally performed, and the value obtained by bit-shifting the
register value is subtracted after several addition processings ( That is, since attenuation is
approximately realized by bit shift and complement representation), and a multiplier is not used
for the mechanism of attenuation processing, it is suitable for realization by digital circuits.
04-05-2019
2
[0005]
By the way, as shown in FIG. 3, two microphones 101 and 102 corresponding to both ears are
arranged in the same direction to the left and right, and the sound collected by the microphone
101 and the sound collected by the microphone 102 are used When trying to identify the
direction of the sound source, the horizontal direction can be identified by the time difference of
the sound generated from the difference in distance from the sound source to the microphones
101 and 102, but the sound source connects the sound collectors of the microphones 101 and
102. It was not possible to distinguish in the front-back direction sandwiching the line (one-dot
chain line in the figure). In FIG. 3, the same time difference occurs even if the sound source is at
either the position of P or the position of P 'symmetrical with respect to the one-dot chain line in
the drawing.
[0006]
For this reason, in the sound source localization system described in Non-Patent Document 4
below, as described in Section 3.4 of the document, the right microphone is disposed forward
and the left microphone is directed backward ( Since the sound from the rear is more than the
sound from the front, calculate the center of gravity of the frequency spectrum of the sound
coming from the left and right microphones, and calculate the center of gravity of the right than
the center of left. If it is larger, it is judged that the center of gravity is in the front, and if the
center of gravity on the left is larger than the center of gravity on the right, it is backward. JP JP
2007-164027 A Noboru Kuroyanagi, Akira Iwata, "Sound source direction perception by pulse
transfer type auditory neural network model-Extraction of time difference and sound pressure
difference-", IEICE Technical Report, The Institute of Electronics, Information and Communication
Engineers, March 1993, NC 92-149, p. 163-170 Susumu Kuroyanagi, Akira Iwata, "Competitive
Learning Neural Network Using Pulsed Neuron Model for Auditory Information Processing
System," Transactions of the Institute of Electronics, Information and Communication Engineers
(D-II), July 2004, J87-D Volume II, No. 7, p. 1496-1504 Nobutori Niijima, Sho Kuroyanagi, Akira
Iwata, "Implementation Method of Pulsed Neuron Model for FPGA", Technical Report of Research
Institute of the Institute of Electronics, Information and Communication Engineers, IEICE, 2002,
NC2001 211, p. 121-128 Schauer, Gross, "Model and Application of a Binaural 360 ° Sound
Localization System", Proceedings of the International Neural Network Joint Society (Proceedings
of The International Joint Conference on Neural Networks, IEEE Trie Computer Society, 2001,
Volume 2, p. 1132-1137
04-05-2019
3
[0007]
However, in the sound source localization system of Non-Patent Document 4, FFT (Fast Fourier
Transform) is performed to calculate the center of gravity of the frequency spectrum, and an FFT
computer is required, which is disadvantageous in hardware implementation.
[0008]
SUMMARY OF THE INVENTION The present invention solves the above-mentioned problems,
and an object thereof is to provide a sound source localization apparatus which is advantageous
in hardware implementation.
[0009]
The sound source localization apparatus according to the present invention has two forward
microphones and two microphones arranged at intervals in the left and right direction, one being
directed forward and the other being directed backward; Time difference detection means for
detecting time difference information of sounds collected by microphones by a pulse neuron
model, sound pressure difference detection means for detecting sound pressure difference
information of sounds collected by each of the microphones by pulse neuron models; Horizontal
direction detection means for detecting directional information of a sound source in the left and
right direction by a pulse neuron model based on time difference information of sound detected
by the time difference detection means, sound pressure difference information of sound detected
by the sound pressure difference detection means And forward and backward direction detecting
means for detecting the direction information of the sound source in the front and back direction
by the pulse neuron model. To.
[0010]
Further, based on the direction information of the sound source in the left and right direction
detected by the left and right direction detection means and the direction information of the
sound source in the front and back direction detected by the front and back direction detection
means It is preferable to provide a circumferential direction detecting means for detecting the
pulse by a pulse neuron model.
[0011]
In the sound source localization apparatus of the present invention, the time difference detection
means detects time difference information of sound by the pulse neuron model, the sound
pressure difference detection means detects sound pressure difference information by the pulse
neuron model, and the left and right direction detection means The direction information of the
sound source in the direction is detected, and the front-back direction detection means detects
04-05-2019
4
the direction information of the sound source in the front-back direction by the pulse neuron
model, and the pulse neuron model can be realized by a simple digital circuit. It is advantageous
to the top.
[0012]
Hereinafter, an embodiment of the present invention will be described based on the drawings.
[0013]
As shown in FIG. 4, the sound source localization device 1 includes left and right microphones 2
and 3, and a main body 4 to which the microphones 2 and 3 are connected.
The main unit 4 is connected to a display device 5 that displays the localization result.
[0014]
The microphones 2 and 3 have forward directivity (that is, unidirectionality with good sensitivity
on the forward side).
The microphones 2 and 3 are arranged such that the sound collecting units 2a and 3a of the
microphones 2 and 3 are aligned in the left and right direction with a space between left and
right, and the microphone 2 on the left is directed forward to the microphone 3 on the right. Is
placed towards the rear.
The microphones 2 and 3 convert the sound collected by the sound collecting units 2 a and 3 a
into an electric signal.
The higher the volume, that is, the higher the sound pressure, the higher the voltage value of the
converted electrical signal.
The sound collecting units 2a and 3a may not necessarily be arranged in a straight line in the
04-05-2019
5
left-right direction.
In such a case, appropriate correction may be made to the obtained data.
[0015]
As shown in FIG. 5, the main unit 4 includes a left input signal processing unit 6 connected to the
left microphone 2, a right input signal processing unit 7 connected to the right microphone 3,
and an input signal processing unit Time difference detection unit connected to both 6 and 7
(corresponds to time difference detection means). Sound pressure difference detection unit
(corresponding to sound pressure difference detection means) connected to both 8 and the input
signal processing units 6 and 7). And 9) a left-right direction detection unit connected to the time
difference detection unit 8 (corresponding to a left-right direction detection means). And 10) and
a longitudinal direction detection unit connected to the sound pressure difference detection unit
9 (corresponding to the longitudinal direction detection means). ), And an eight-direction
detection unit (corresponding to a surrounding direction detection unit) connected to both the
left-right direction detection unit 10 and the front-rear direction detection unit 11. ) And 12).
[0016]
The input signal processing units 6, 7 convert each of the left and right input signals into a pulse
train having a pulse frequency corresponding to the signal strength, that is, according to the
sound pressure, for each frequency band. The time difference detection unit 8 detects time
difference information of the left and right sounds (sounds input from the left and right
microphones 2 and 3) from the left and right pulse trains output from the input signal processing
units 6 and 7. The sound pressure difference detection unit 9 detects sound pressure difference
information of the left and right sounds from the left and right pulse trains. When the sound
source is on the left side, the left-right direction detection unit 10 makes sound from the left
microphone 2 faster than the right microphone 3 and from the right microphone 3 faster than
the left microphone 2 Direction information of the sound source in the left and right direction is
detected from the time difference information detected by the time difference detection unit 8
using the noise. The front-back direction detection unit 11 detects the sound pressure difference
by using the fact that the sound pressure from the microphone not directed to the sound source
is smaller than the sound incident from the microphone directed to the sound source. Direction
information of the sound source in the front-rear direction is detected from the sound pressure
difference information detected by the unit 9. Based on the information output from the left-right
04-05-2019
6
direction detection unit 10 and the front-back direction detection unit 11, the eight-direction
detection unit 12 detects directional information of the sound source in eight surrounding
directions (in which one of eight directions the sound source is Output information).
[0017]
The display unit 5 displays the direction of the sound source based on the direction information
output from the eight direction detection unit 12.
[0018]
As shown in FIG. 6, the input signal processing units 6, 7 include AD conversion units 14L, 14R,
frequency decomposition units 15L, 15R corresponding to cochleas of human auditory system,
and non-linear conversion units corresponding to hair cells. 16L and 16R, and pulse conversion
units 17L and 17R corresponding to the cochlear nerve.
The AD conversion units 14L and 14R perform AD conversion on the signals input from the
microphones 2 and 3. The frequency decomposition units 15L and 15R are configured by a band
pass filter (BPF) group, and decompose the AD converted signal into a plurality of (N) frequency
bands (frequency channels) in a logarithmic scale in a predetermined frequency range. . The nonlinear conversion units 16L and 16R perform non-linear conversion on the signals of the
respective frequency bands input from the frequency decomposition units 15L and 15R,
respectively, to take out only their positive components, and to use a low pass filter (LPF).
Perform envelope detection. The pulse converters 17L and 17R convert the signals of the
frequency bands input from the non-linear converters 16L and 16R into pulse trains having
pulse frequencies proportional to the signal strength. By these processes, the input signal
processing units 6 and 7 convert each of the left and right input signals into a pulse train having
a pulse frequency corresponding to the signal strength for each frequency band.
[0019]
When making the input signal units 6, 7 into hardware, the AD conversion units 14L and 14R are
AD conversion circuits, and the frequency resolution units 15L and 15R, the non-linear
conversion units 16L and 16R, and the pulse conversion units 17L and 17R are each digital It
can be configured by a circuit.
[0020]
04-05-2019
7
Detection of time difference information in time difference detection unit 8, detection of sound
pressure difference information in sound pressure difference detection unit 9, detection of
direction information in left and right direction detection unit 10, detection of direction
information in front and back direction detection unit 11, and 8 direction detection unit The
detection of the direction information at 12 is all performed by a pulse neural network composed
of a plurality of PN models.
The pulse neural network is realized by implementing a plurality of PN models that can operate
independently and asynchronously in parallel as electronic circuits, and can perform high-speed
processing.
[0021]
The time difference detection unit 8 is composed of a time difference detection model consisting
of a PN model as shown in FIG. 7 and a train of time delay elements 19 (see FIG. 8) for inputting
the time difference detection model while shifting a pulse train. There is. The time difference
detection model is the same as that described in Non-Patent Document 1 and the like, and
therefore will not be described in detail. However, as shown in FIG. 20. An MSO neuron row in
which a plurality (odd number) of the memory cells 20 are arranged is provided for each
frequency channel. Each MSO neuron 20 has a left input terminal 21 to which the left pulse
signal is input, a right input terminal 22 to which the right pulse signal is input, and an output
terminal 23, and all the MSO neurons 20 have left and right inputs. The output terminal when
the pulse signal is input substantially simultaneously from the left and right by setting the weight
for the signal to a common fixed value and setting the threshold value to a value that is twice the
weight or twice the weight plus the reference value of the internal potential 23 is configured to
output a pulse signal. Of course, "substantially simultaneously" includes the case of simultaneous.
[0022]
Then, the time difference detection unit 8 shifts the left pulse train to the right and shifts the
right pulse train to the left for each clock (unit time) by the time delay element 19, and the left
and right pulse trains correspond to corresponding frequency channels. Input to the MSO neuron
train. That is, the left pulse signal is sequentially input to each MSO neuron 20 while being
04-05-2019
8
shifted for each unit time from one end (left end in FIG. 8) to the other end (the right end in FIG.
8) of the MSO neuron row, and the right pulse signal is the MSO neuron row The MSO neurons
20 are sequentially input while being shifted every unit time from the other end (the right end) to
the one end (the left end).
[0023]
For example, if the number of MSO neurons 20 in each MSO neuron train is 2J + 1, and each
MSO neuron 20 is numbered -J to J, then at time t, each MSO neuron 20 has an internal potential
I < MSO> ji (t) is calculated, and when the internal potential exceeds a predetermined threshold, y
ji (t) = 1 is output, and when it does not exceed y ji (t) = 0. Here, j is the number of the MSO
neuron 20, and i is the number of the frequency channel (i = 1 to N). In the following [Equation
1], p <left> ji (t) is the local membrane potential for the left input signal, p <right> ji (t) is the local
membrane potential for the right input signal, and w is all neurons 20 Are common coupling
weights, and τ is a decay time constant.
[0024]
As a result, in the time difference detection model, the neuron 20 near the center of the MSO
neuron train fires when pulse signals come in from the left and right substantially
simultaneously, and MSO when the pulse signal comes in from the left earlier than the right.
Change due to the time difference between the left and right input signals, such as the right
neuron 20 in the neuron train fires, and the left neuron 20 in the MSO neuron train fires if the
pulse signal comes in earlier from the right than the left The firing pattern is output as time
difference information of the sound.
[0025]
As described above, when each MSO neuron 20 in each MSO neuron train is numbered -J to J, an
output pulse train (y -Ji (t) from the train of MSO neurons corresponding to frequency channel i
is given at time t. ,..., Y 0i (t),..., Y Ji (t)) are output, and the following vector y MSO (t) is output as
time difference information as a whole from the time difference detection model.
[0026]
y MSO (t) = (y-J1 (t), ..., y 01 (t), ..., y J1 (t), y-J2 (t), ..., y 02 (t), ..., y J2 ( t),..., y -JN (t),..., y 0N (t),...,
y JN (t)) The time difference detection unit 8 can be configured by a digital circuit as shown in
04-05-2019
9
FIG.
FIG. 9A is a diagram for explaining the operation of the first half of one clock, and FIG. 9B is a
diagram for explaining the operation of the second half.
This example is also described in Chapter 5 of Non-Patent Document 3 and therefore will not be
described in detail. However, each MSO neuron 20 of the time difference detection unit 8
includes AND circuits 24L and 24R, adders 25 and 26, and a register 27. , A comparator 28, and
an attenuation generation unit 29. As shown in FIG. 9A, the attenuation generation unit 29
generates the attenuation of the internal potential by performing bit shift and complement
representation on the internal potential (internal potential) and inputs it to the adder 26. The
adder 26 adds the internal potential and its attenuation to update the internal potential in the
register 27. Then, as shown in FIG. 9B, the AND circuits 24L and 24R input the connection
weights to the adder 25 only when the input signal is 1 , and the adder 25 receives the input
connection weights and the register 27. The addition with the internal potential held in is
performed to update the internal potential in the register 27. The comparator 28 compares the
threshold held by itself with the internal potential held by the register 27 and outputs a signal
1 if the internal potential is equal to or higher than the threshold, Signal "0" is output. Note
that the refractory period can be implemented by providing a counter that counts the refractory
period and not firing during the refractory period, and resetting the counter together with the
firing.
[0027]
The sound pressure difference detection unit 9 is composed of a sound pressure difference
detection model composed of a PN model as shown in FIG. The sound pressure difference
detection model is not described in detail because it is the same as that described in the above
non-patent document 1 etc., but it is also referred to as a sound pressure difference detection
neuron (hereinafter referred to as "LSO neuron") as shown in FIG. A row of LSO neurons in which
a plurality (odd number) of them 40 are arranged is provided for each frequency channel. Each
LSO neuron 40 has a left input terminal 41 to which a left pulse signal is input, a right input
terminal 42 to which a right pulse signal is input, and an output terminal 43.
[0028]
04-05-2019
10
As shown in FIG. 11, at time t, the corresponding frequency channel i output from the input
signal processing unit 6 is provided to the left input terminal 41 and the right input terminal 42
of each LSO neuron 40 of each LSO neuron row. The left pulse signal x <left> i (t) and the right
pulse signal x <right> i (t) are input. In addition, k is a number given to each neuron 40 in each
LSO neuron train, and is −K ≦ k ≦ K.
[0029]
Then, each LSO neuron 40 calculates an internal potential I <LSO> ki (t) according to the
following [Equation 2], and outputs y ki (t) = 1 if this internal potential exceeds a predetermined
threshold value If it is less than the threshold value, y ki (t) = 0 is output. The threshold value is a
value common to each LSO neuron 40. In the following [Equation 2], p <left> ki (t) is the local
membrane potential for the left input signal, p <right> ki (t) is the local membrane potential for
the right input signal, w <left> ki Is the coupling weight for the left input signal and w <right> ki
is the coupling weight for the right input signal. Also, τ is a decay time constant, b, α and β are
constants.
[0030]
As shown in the above [Equation 2], in the sound pressure difference detection model, the
coupling weights for the left and right input signals gradually change, and when the left and right
sound pressures are substantially equal, the central part (from the number b to b) Neurons.
However, the neuron of number 0 does not fire. From the center to the left neuron 40 if the
sound pressure on the left is greater than the sound pressure on the right, and the neurons from
the center to the right if the sound pressure on the right is greater than the sound pressure on
the left It is configured to fire up to 40, and fire up to a distant neuron from the central part as
the left and right sound pressure difference is larger. Note that, by appropriately setting the
connection weights, as described above, the central LSO neuron 40 in each frequency channel
may not fire at all, and some neurons 40 on both sides of the central LSO neuron 40 are not You
may make it always fire.
[0031]
As described above, when each LSO neuron 40 in each LSO neuron train is numbered from -K to
K, an output pulse train (y -Ki (t) from the LSO neuron train corresponding to frequency channel i
is given at time t. ,..., Y 0i (t),..., Y Ki (t)) are output, and the following vector y LSO (t) is output as
sound pressure difference information as a whole from the sound pressure difference detection
04-05-2019
11
model.
[0032]
y LSO (t) = (y-K1 (t), ..., y 01 (t), ..., y K1 (t), y-K2 (t), ..., y 02 (t), ..., y K2 ( t), ..., y-KN (t), ..., y 0 N (t),
..., y KN (t)) In the same configuration as the MSO neuron 20 of FIG. Can be realized by digital
circuits by appropriately changing.
[0033]
The left-right direction detection unit 10, the front-rear direction detection unit 11, and the eightdirection detection unit 12 are all in the competitive learning neural network (hereinafter
referred to as "CONP") described in the non-patent document 2 above.
It consists of).
The CONP is a pulse neural network configured such that only one competitive learning neuron
fires each time by adjusting the threshold value of each competitive learning neuron, and aims to
quantize the input vector. The configuration of CONP is shown in FIG. The CONP includes a
competitive learning neuron group 50 and a control neuron group 60. The competitive learning
neuron group 50 is also referred to as a plurality of competitive learning neurons (hereinafter
referred to as "CL neurons"). 51. The control neuron group is a non-firing detection neuron
(hereinafter also referred to as NFD neuron ) which fires when none of the CL neurons 51
fires. ) And a multiple firing detection neuron (hereinafter referred to as "MFD neuron") that fires
when multiple firings are performed by the multiple CL neurons 51. And 62).
[0034]
The NFD neuron 61 and the MFD neuron 62 uniformly change the threshold value of each CL
neuron 51 according to the firing condition thereof (in fact, the internal potential of each CL
neuron 51 is uniformly changed), thereby the CL neuron This is a PN model for holding a
situation in which only one CL neuron 51 fires in the group 50. The NFD neuron 61 and the MFD
neuron 62 have input terminals and output terminals according to the number of CL neurons 51
in the CL neuron group 50, and receive pulse signals output from the CL neurons 51 at each
input terminal. , NFD neurons 61 output 1 from the output terminal only when the signals
04-05-2019
12
from all CL neurons 51 are 0 , and the MFD neurons 62 receive signals 1 from a
plurality of CL neurons 51 Output "1" from the output terminal only in the case.
[0035]
Each CL neuron 51 has an input terminal 551 to which input pulses x 1 (t), x 2 (t), ..., x i (t), ..., x n
(t) are input as shown in FIG. , 552,..., 55i,..., 55n, and the pulse signals y nfd (t) and y mfd (t)
output from the NFD neuron 61 and the MFD neuron 62, respectively. It has 58 and. Each input
terminal 55i (i = 1 to n) is bifurcated into two, one connected to a synapse 53i having a variable
connection weight w hi and the other connected to a synapse 54i having a fixed connection
weight "1". It is done. Here, h is a number assigned to each CL neuron 51 in the CL neuron group
50, and h = 1 to M.
[0036]
The operation of CONP will be described based on FIGS. 14-1 and 14-2. Each CL neuron 51 in
the CL neuron group 50 has an input vector x (t) = (x 1 (t), x 2 (t),..., X i (n) consisting of n input
pulses per unit time. t),..., x n (t)) (t: time) is input (S101). Then, the NFD neuron 61 and the MFD
neuron 62 respectively output the output value y nfd at the time t based on the output y h (t−1)
from each CL neuron 51 at the stored time (t−1). (T) and y mfd (t) are calculated and output to
each CL neuron 51 (S102, S103). The output values y nfd (t) and y mfd (t) in the NFD neuron 61
and the MFD neuron 62 using the output y h (t-1) from each CL neuron 51 at time (t-1),
respectively. May be calculated and held, and y nfd (t) and y mfd (t) may be output to each CL
neuron 51 at time t.
[0037]
Next, each CL neuron 51 calculates an internal potential I h (t) (h = 1 to M) (S104) (see [Equation
6] below), and the internal potential I h (t) is a threshold TH. And y h (t) = 1 is output if the
refractory period has elapsed from the previous firing time, and y h (t) = 0 is output otherwise (S
105) .
[0038]
Then, at the time of learning, the connection weights w i are updated using the local membrane
04-05-2019
13
potential pcw i in the synapse portion 54 i for the CL neurons 51 that output 1 (S 106), and
the CL neurons 51 around the CL neurons 51 Similarly, the connection weights are updated for
S.sub.1 (S107).
As a method of determining the peripheral CL neurons 51 (ie, the update range of the connection
weights), for example, all CL neurons 51 are initially set as the update range, the range is linearly
reduced, and the final connection weights of the winner neurons There is a way to scale down,
just like updating. Then, for the CL neuron 51 whose connection weight has been updated, the
norm of the connection weight (the norm of the reference vector) is normalized to 1 (S108). That
is, in this CONP, an algorithm of a self-organizing map (SOM) is realized by learning not only the
winner neuron but also the surrounding neurons.
[0039]
On the other hand, if it is not during learning (at the time of recognition), connection weights are
not updated. Then, the coefficient α for updating the connection weight is updated by
multiplying it by a constant γ (0 ≦ γ) (S109), and the processing of steps S101 to S108 is
performed on the next input vector.
[0040]
Here, a method of calculating the internal potential I h (t) in CONP will be described. First, a
function F having four of time t, decay time constant τ, coupling weight w, and input signal x (t)
at time t is introduced as an argument, and defined as the following [Equation 3]. Note that Δt =
1 / Fs (Fs: sampling frequency).
[0041]
Then, the internal potential I (t) of the PN model at time t can be described as the sum of the
local membrane potentials p i (t) (i = 1 to n) as in the following [Equation 4]. τ is the decay time
constant of p i (t).
[0042]
04-05-2019
14
Assuming that the refractory period of the PN model is RP, the elapsed time from the previous
firing at time t is ET (t), and ET (0)> RP, the output value y (t) of the PN model is calculated by the
following algorithm Ru.
[0043]
if I (t) ≧ TH and ET (t)> RP then y (t) = 1, ET (t) = 0 else y (t) = 0, ET (t) = ET (t−Δt) + Δt
parameter τ, w 1, w 2,..., w n and TH are variable values according to each PN model, and the
operation of each PN model is determined by this combination.
[0044]
Here, the outputs of the NFD neuron 61 and the MFD neuron 62 at time t are y nfd (t) and y mfd
(t), respectively, and the connection weights of each CL neuron 51 to the NFD neuron 61 and the
MFD neuron 62 are w fd and − Assuming that w fd (where w fd> 0), the internal potential I h (t)
of the CL neuron 51 of the number h at time t can be described as the following [Equation 5]
using the function F described above.
CONP treats p nfd (t) and p mfd (t) as the dynamic change amount of the threshold (however,
instead of changing the threshold TH, the internal potential I h (t) to be compared with the
threshold TH is p nfd (t) By adjusting t) and p mfd (t), the state where only one CL neuron 51
fires is maintained.
For this reason, it is assumed that the decay time constant τ fd is sufficiently larger than the
time constant τ.
[0045]
By the way, when the total amount of the internal potential generated by the input pulse train
fluctuates significantly, a change in the threshold occurs to absorb the fluctuation, and the
change in the threshold may not be able to follow the change in the direction of the input vector.
Therefore, in CONP, the sum of the local membrane potential pcw i (t) at the synapse portion 54i
(i = 1 to n) with the coupling weight fixed to 1 with respect to the internal potential is a constant
ratio β pcw (where 0 ≦ β By subtracting in advance pcw ≦ 1), the change of the internal
potential with respect to the norm fluctuation of the input signal is suppressed. As a result, I h (t)
04-05-2019
15
of the above [Equation 5] is corrected as in the following [Equation 6], and each CL neuron 51
calculates the internal potential I h (t) according to [Equation 6]. Note that pcw i (t) = F (t, τ, 1, x
i (t)).
[0046]
CONP can also be hardware-implemented by a simple digital circuit, an example of which is
shown in FIG. In this example, CONP includes M CL neuron portions 51H corresponding to CL
neurons 51, one NFD neuron portion 61H corresponding to NFD neurons 61, and one MFD
neuron portion corresponding to MFD neurons 62. 62H, and further includes one each of
threshold value change amount generation units 63 and 64 and an internal potential suppression
amount generation unit 65.
[0047]
Each CL neuron unit 51H has n input terminals corresponding to the input terminals 511, ..., 51n
of the CL neuron 51, and n input pulses x 1 (t), x 2 (n t),..., x n (t) by n AND circuits 71 for
multiplying weights respectively, an adder 72 for adding the output from each AND circuit 71 to
the internal potential, bit shift and complement representation The apparatus includes an
attenuation generation unit 73 that attenuates the internal potential and outputs the same to the
adder 72, and a comparator 74 that compares the internal potential output from the adder 72
with a threshold, and the comparator 74 has the internal potential equal to the threshold. In the
case where it exceeds and the refractory period has elapsed from the previous firing time, y h (t)
= 1 is output, and in the other cases, y h (t) = 0 is output. The comparator 74 is supplied with p
nfd (t) and p mfd (t) as dynamic threshold value changes and S pcw (t) as the amount of
suppression of the internal potential as will be described later. 74 adjusts the internal potential
with these values as in [Equation 6] and then compares it with the threshold.
[0048]
The NFD neuron unit 61H includes M input terminals connected to the output terminals of the M
CL neuron units 51H, and M input pulses y 1 (t) and y 2 (input from these input terminals). t),..., y
M (t) by M AND circuits 76 for multiplying weights respectively, an adder 77 for adding the
output from each AND circuit 76 to the internal potential, bit shift and complement
representation The internal potential exceeds the threshold by comparing the internal potential
output from the adder 77 and the threshold with the attenuation generation unit 78 that
attenuates the internal potential and outputs the same to the adder 77, and no failure has
04-05-2019
16
occurred since the previous firing. A comparator 79 is provided which outputs 1 when the
response period has passed, and 0 otherwise, and is configured to fire when all M input pulses
are 0.
[0049]
The MFD neuron unit 62H has a configuration similar to that of the NFD neuron unit 61H, but is
configured to fire when a plurality of M input pulses are 1 by changing the weight and the
threshold.
[0050]
The threshold change amount generation units 63 and 64 are portions that generate local
membrane potentials p nfd (t) and p mfd (t) with respect to the output from the NFD neuron unit
61 H in each CL neuron unit 51 H, respectively. This is a portion provided commonly to the
neuron units 51H, but since the weight and the decay time constant do not change depending on
the CL neuron units 51H, they are taken out from each CL neuron unit 51H to be one in total.
[0051]
The threshold change amount generation unit 63 multiplies the output from the NFD neuron unit
61H by the weight w fd, the adder 82 adds the output from the AND circuit 81 to the local
membrane potential, and bit shift and complement And the attenuation generation unit 83 which
attenuates the local membrane potential by the expression and outputs the same to the adder 82,
and the local membrane potential p nfd (t) is output from the adder 82 as the dynamic change
amount of the threshold Output to the comparator 74 of FIG.
[0052]
The threshold change amount generation unit 64 has the same configuration as the threshold
change amount generation unit 63, generates a local membrane potential p mfd (t) with respect
to the output from the MFD neuron unit 62H in each CL neuron unit 51H. Is output to the
comparator 74 of each CL neuron unit 51H.
[0053]
The internal potential suppression amount generation unit 65 is a portion that generates the
suppression amount S pcw (t) of the change of the internal potential with respect to the norm
fluctuation of the input signal described above. It is generated by multiplying the sum of the local
04-05-2019
17
membrane potential pcw i (t) in the synapse portion 54i by a fixed ratio β pcw, but since the
weight and the decay time constant do not change depending on the CL neuron portion 51H,
each CL neuron portion 51H It is taken out from and made into one in total.
The internal potential suppression amount generation unit 65 multiplies the n input pulses by a
fixed weight, an adder 87 which adds the output from the AND circuit 86 to the internal
potential, bit shift and complement representation And the attenuation generation unit 88 which
attenuates the internal potential and outputs the same to the adder 87, and outputs the internal
potential as the amount of suppression S pcw (t) from the adder 87 to the comparator 74 of each
CL neuron unit 51H. .
Note that multiplication of the ratio β pcw is realized by setting the weight in each AND circuit
86 to the ratio β pcw.
[0054]
In the example of the hardware configuration of CONP shown in FIG. 15, the learning mechanism
(the mechanism for updating the weight of each CL neuron unit 51H) is not mounted.
This is because learning is performed by software simulation as described later to determine
weights, and the weights may be set on hardware.
Of course, hardware implementation of the learning mechanism is possible, but in order to
simplify the circuit configuration and reduce the circuit size, it is better to perform learning on
software to determine the weight.
[0055]
As shown in FIG. 16, the left-right direction detection unit 10 is composed of CONP having a
plurality (here, 16) of CL neurons 51.
The sixteen CL neurons 51 are arranged in a line from the number 1 to the number 16 and it is
04-05-2019
18
assumed that the closer the number is, the closer the distance is. The time difference information
(here, vector y MSO (t)) output from the time difference detection unit 8 is input to each CL
neuron 51. As a result of learning, the left-right direction detection unit 10 can quantize the
input vector y MSO (t) while maintaining the similarity. That is, during recognition, the CL
neurons 51 close to each other fire when vectors having high similarity are input during
recognition, and the CL neurons 51 that are distant from each other when vectors having low
similarity to each other are input. Will fire. Thereby, at the time of recognition, from the left and
right direction detection unit 10, the direction information in the left and right direction of the
sound source is indicated by which CL neuron 51 fires.
[0056]
As shown in FIG. 16, the front-rear direction detection unit 11 is also made up of CONP having
16 CL neurons 51 arranged in a line like the left-right direction detection unit 10, and each CL
neuron 51 has a sound pressure difference. Sound pressure difference information (here, vector
y LSO (t)) output from the detection unit 9 is input. Similar to the left and right direction
detection unit 10, the front and back direction detection unit 11 can quantize the input vector y
LSO (t) while maintaining its similarity as a result of learning. Is indicated by which CL neuron 51
fires.
[0057]
As shown in FIG. 16, the eight-direction detection unit 12 is formed of CONP having a plurality of
CL neurons 51 (here, eight according to the direction to be identified). The eight CL neurons 51
are arranged in a line from the number 1 to the number 8 and the closer the number is, the
closer the distance is. Each CL neuron 51 receives a vector having, as elements, 32 pulses
including 16 pulses output from the left and right direction detection unit 10 and 16 pulses
output from the front and back direction detection unit 11. Ru. As a result of learning, the eight
direction detection unit 12 can quantize the input vector while maintaining its similarity relation,
and at the time of recognition, from the eight direction detection unit 12, direction information of
the sound source in eight surrounding directions is It will be shown whether the neuron 51 fires.
[0058]
As described above, since CONP can be realized by a simple digital circuit, the left-right direction
detection unit 10, the front-rear direction detection unit 11, and the eight-direction detection unit
12 can also be realized by a simple digital circuit.
04-05-2019
19
[0059]
In the following, the main unit 4 of the sound source localization device 1 is realized by software
on a computer, and the result of simulation is described.
The experiment was performed in an anechoic room, with the sound source S being a speaker
arranged as shown in FIG. 17 for the microphones 2 and 3 with forward directivity. The distance
between the microphones 2 and 3 between the microphones 2 and 3 and the sound source S is
100 cm, and the direction of the arrow in the figure is a sound source localization device. It was
forward to 1's. Then, the white noise generated by the computer is emitted from the sound
source S at each position where the position of the sound source S with respect to the
microphones 2 and 3 is changed by 45 ° as shown in FIGS. The sound collected from 3 was
input to the main unit 4. The experimental parameters are shown in Tables 1-5.
[0060]
[0061]
[0062]
[0063]
[0064]
The output results of the time difference detection unit 8 when the sound source is placed at
each position of 0 ° to 315 ° are shown in FIGS. 18-1 to 18-8, and the output results of the
sound pressure difference detection unit 9 are shown in FIGS. Shown in 8.
In these output results, the shading of each mass represents the firing frequency of the neuron.
04-05-2019
20
[0065]
The learning in the left-right direction detection unit 10 and the front-rear direction detection
unit 11 is unsupervised learning based on SOM as shown in FIG. 14B, and the learning in the
eight-direction detection unit 12 is general supervised learning It is learning based on LVQ.
That is, in the eight-direction detection unit 12, the CL neurons 51 of numbers 1 to 8 are
determined as CL neurons 51 indicating directions of 0 ° to 315 °, respectively, and if data is
input, if the fired CL neurons 51 are correct ( For example, if the CL neuron 51 that fires when
the data of 0 ° is input is the number 1), the reference vector of that CL neuron 51 is brought
close to the input vector, and if it is wrong, the reference vector of that CL neuron 51 is input I
learned to keep away from the vector.
The learning was performed only by the winner neuron.
[0066]
After learning, a white noise is emitted from the sound source S at each position in FIGS. 17 (a) to
17 (h) to cause the sound source localization device 1 to perform recognition. The output results
from the detection unit 12 are shown in Tables 6, 7, and 8, respectively.
[0067]
[0068]
[0069]
Tables 6-8 show the firing rates of the CL neurons 51 when the input signal (white noise)
continues to be emitted for a certain time from each direction.
According to Tables 6 and 7, although it appears that identification in the entire circumferential
direction is possible only with the output of either the left-right direction detection unit 10 or the
04-05-2019
21
front-rear direction detection unit 11, this is an artificially generated white noise Because it is
used as the input sound, it is difficult to identify the actual sound with only one of the outputs.
[0070]
In Table 8, Nos. 1 to 8 are the numbers of the CL neurons 51 of the eight-direction detection unit
12.
For example, when an input signal is issued from the direction of 0 ° (position shown in FIG.
17A), the firing rate of CL neuron 51 is 99.4% for No. 1, 0.0% for No. 2 to 7 and No. 8 It is 0.6%.
Similarly, in the case of the 45 ° direction (the position shown in FIG. 17B), the firing rate of No.
2 is 86.1%, in the case of the 90 ° direction (the position shown in FIG. 17C). And the firing rate
of No. 3 is 92.6%, and it can be seen that the firing frequency of the CL neuron 51 according to
the direction of the input signal is the highest in the eight-direction detecting unit 12, and It can
be seen that it can be identified by the firing of the CL neuron 51 whether the input signal comes
from the direction of.
[0071]
As described above, in the sound source localization device 1, the time difference detection unit 8
detects the time difference of sound, the sound pressure difference detection unit 9 detects the
sound pressure difference, and the left-right direction detection unit 10 vector quantizes time
difference information. And output as direction information in the left and right direction, the
front and back direction detection unit 11 performs vector quantization of the sound pressure
difference information, and outputs as direction information in the front and back direction, and
the eight direction detection unit 12 outputs the direction in the left and right direction and the
front and back direction It performs vector quantization of information and outputs it as
direction information in eight surrounding directions.
These detections are performed by the PN model, and vector quantization is performed by CONP
consisting of the PN model. Since the PN model can be realized by a simple digital circuit as
described above, the time difference detection unit 8, the sound pressure difference detection
unit 9, the left and right direction detection unit 10, the front and back direction detection unit
04-05-2019
22
11, and the eight direction detection unit 12 are also simple. It can be realized with various
digital circuits, and can be easily mounted on an FPGA. Therefore, the sound source localization
device 1 is advantageous in hardware implementation. Then, if the time difference detection unit
8 and the like are realized by digital circuits, the calculation in each PN model will be executed in
parallel on the digital circuits, so that practical calculation speed can be realized.
[0072]
In the sound source localization apparatus 1, an eight direction detection unit 12 is provided, and
outputs from the left and right direction detection unit 10 and the front and rear direction
detection unit 11 are input to the eight direction detection unit 12. Although the direction
information of the sound source in the direction is output, it is optional whether or not the eightdirection detection unit 12 is provided. For example, without providing the eight direction
detection unit 12, the direction in the left and right direction is estimated based on the direction
information output from the left and right direction detection unit 10, and in the front and rear
direction based on the direction information output from the front and rear direction detection
unit 11. The directions may be estimated, and the estimation results may be output and displayed
on the display device 5 as they are. However, if the surrounding direction detection means such
as the eight direction detection unit 12 is provided, the direction identification of the sound
source is easy.
[0073]
It is a schematic diagram of PN model. This is an example in which the PN model is configured by
a digital circuit. It is a figure for demonstrating the method of the conventional sound source
localization. It is a top view of a sound source localization device concerning one embodiment of
the present invention. It is a block diagram which shows the structure of the sound source
localization apparatus. It is a block diagram which shows the structure of an input signal
processing part. It is a schematic diagram of a time difference detection model. It is a figure
which shows the structure of a MSO neuron row. It is an example which comprised the time
difference detection part by the digital circuit, and (a) is a figure for demonstrating the operation
¦ movement of the first half of 1 clock, (b) for demonstrating the operation ¦ movement of the
second half. It is a schematic diagram of a sound pressure difference detection model. It is a
figure which shows the structure of a LSO neuron row. It is a schematic diagram of CONP. It is a
schematic diagram of CL neuron in CONP. It is a flowchart which shows operation ¦ movement of
CONP. It is a flowchart which shows operation ¦ movement of CONP. This is an example in which
CONP is configured by a digital circuit. It is a schematic diagram of a time difference detection
04-05-2019
23
part, a sound pressure difference detection part, the left-right direction detection part, the frontback direction detection part, and an 8 direction detection part. It is a figure which shows the
position of the sound source in experiment. It is a figure which shows the output of the time
difference detection part at the time of putting a sound source in a 0 degree position. It is a
figure which shows the output of the time difference detection part at the time of putting a sound
source in a 45 degree position. It is a figure which shows the output of the time difference
detection part at the time of putting a sound source in a 90 degree position. It is a figure which
shows the output of the time difference detection part at the time of putting a sound source in a
135 degree position. It is a figure which shows the output of the time difference detection part at
the time of setting a sound source to a position of 180 degrees. It is a figure which shows the
output of the time difference detection part at the time of putting a sound source in a 225 degree
position. It is a figure which shows the output of the time difference detection part at the time of
setting a sound source to a position of 270 degrees. It is a figure which shows the output of the
time difference detection part at the time of setting a sound source to a position of 315 degrees.
It is a figure which shows the output of the sound pressure difference detection part at the time
of putting a sound source in the 0 degree position. It is a figure which shows the output of the
sound pressure difference detection part at the time of putting a sound source in a 45 degree
position. It is a figure which shows the output of the sound pressure difference detection part at
the time of putting a sound source in a 90 degree position. It is a figure which shows the output
of the sound pressure difference detection part at the time of putting a sound source in a 135
degree position. It is a figure which shows the output of the sound pressure difference detection
part at the time of setting a sound source to a position of 180 degrees. It is a figure which shows
the output of the sound pressure difference detection part at the time of putting a sound source
in a position of 225 degrees. It is a figure which shows the output of the sound pressure
difference detection part at the time of setting a sound source to a position of 270 degrees. It is a
figure which shows the output of the sound pressure difference detection part at the time of
setting a sound source to a position of 315 degrees.
Explanation of sign
[0074]
DESCRIPTION OF SYMBOLS 1 ... Sound source localization apparatus 2, 3 ... Microphone 8 ...
Time difference detection part 9 ... Sound pressure difference detection part 10 ... Left-right
direction detection part 11 ... Front-back direction detection part 12 ... 8 direction detection part
04-05-2019
24
1/--страниц
Пожаловаться на содержимое документа