close

Вход

Забыли?

вход по аккаунту

JPWO2014020921

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPWO2014020921
Abstract: An object arrangement estimation apparatus for estimating the arrangement of M (M is
an integer of 2 or more) objects in the real space, and N (N is 3) in the real space for each of the
M objects. A feature vector generation unit that generates a feature vector including in its
components a measure of an object at N scales representing the closeness to each of the
reference points above, and any two of the M objects. And a dissimilarity matrix deriving unit for
deriving a dissimilarity matrix of M rows and M columns having elements of the determined
norms as an element between the feature vectors of the two objects with respect to a
combination of the objects. An estimation unit for estimating the arrangement of M objects in
real space based on a degree matrix, and outputting the result as an arrangement estimation
result.
Object arrangement estimation device
[0001]
The present invention relates to an object placement estimation apparatus for estimating the
placement of an object, and more particularly to an object placement estimation apparatus for
estimating the placement of a plurality of objects.
[0002]
In recent years, a system that presents a realistic sound field using a multi-channel sound field
sound collection system and reproduction system has attracted attention.
09-05-2019
1
As such a system, the technology which developed the principle of 22.2 multi-channel sound
system and ambisonics from the system with a relatively small number of channels such as 2ch
stereo system, binaural system and 5.1ch surround system Various systems have already been
proposed, including systems using a large number of channels, such as the 121ch microphone
array / 157ch speaker array system used.
[0003]
When such a system is used to pick up a sound field, it is necessary to check the arrangement of
dozens of microphones and the cable connection between the microphone and the recording
device at the site of sound collection. Similarly, when playing back a sound field using such a
system, it is necessary to check the arrangement of dozens of speakers and the cable connection
between the speakers and the playback equipment even in the field of playback. There is.
[0004]
Therefore, there is a need for a device that can easily check the arrangement and cable
connection of a large number of microphones (or a large number of speakers).
[0005]
Patent Document 1 (US Patent Application Publication No. 2010/0195444) discloses a method
of estimating the arrangement of a plurality of speakers.
According to the method of Patent Document 1, first, the distance of any pair of loudspeakers is
measured for a plurality of loudspeakers whose arrangement is to be estimated, and based on the
measurement result, the distance constituted by the distance between each pair of loudspeakers
in real space Derive the matrix. And the method of patent document 1 calculates ¦ requires
arrangement ¦ positioning in the real space of several speakers by applying a multidimensional
scaling method with respect to the distance matrix derived ¦ led-out in this way next.
[0006]
US Patent Application Publication No. 2010/0195444
09-05-2019
2
[0007]
Vikas C. Raykar, Igor Kozintsev, Rainer Lienhart, Position Calibration of Microphones and
Loudspeakers in Distributed Position Calibration of Microphones and Loudspeakers in
Distributed Computing Platforms, IEEE Transactions On Speech And Audio Processing, p. 1-12
Stanley T. Birchfield, Amarnag Subramanya, Microphone array position calibration by basis point
classical multidimensional scaling (Microphone Array Position Calibration by Basis-Point
Classical Multidimensional Scaling, IEEE Transactions On Speech And Audio Processing,
September 2005, Vol. 13, No. 5, p. 1025-1034 Alessandro Redondi, Alessandro Redondi, Marco
Tagliasacchi, Fabio Antonacci, Augusto Sarti, geometric calibration of distributed microphone
array Geometric calibration of distributed microphone arrays, MMSP '09, October 5-7, Rio de
Janeiro, Brazil, IEEE, 2009, Kazunori KOBAYASHI, Ken'ichi FURUYA, Akitoshi Kataoka (Akitoshi
KATAOKA), Blind source localization by using multiple microphones whose position is unknown
(A Blind Source Localization by Using Freely Positioned Microphones), Institute of Electronics,
Information, and Communication Engineers, Transactions of the Institute of Electronics,
Information and Communication Engineers A, June 2003, J86-A, No. 6, p. 619-627
[0008]
However, in the prior art, the distance in real space is measured for every pair of placement
estimation objects (for example, every pair of a plurality of speakers in Patent Document 1), and
each element is a pair of loudspeakers in real space based on the measurement result. Derive a
distance matrix which is the distance between
Then, the distance matrix thus derived is regarded as the (non-) similarity matrix of the
arrangement estimation target, and multidimensional scaling is applied to determine the
arrangement of a plurality of speakers in the real space.
Therefore, as the number of placement estimation objects increases, the number of distances to
be measured becomes enormous, and it is difficult to easily estimate the placement. In addition,
the possibility of the occurrence of estimation errors due to measurement errors also increases
with the increase in the number of arrangement estimation objects. Furthermore, in some cases,
it may be difficult to accurately measure the distance between placement estimation objects, as in
the case described in Non-Patent Document 1, and in the conventional method, the placement of
the objects is estimated easily and accurately. It was difficult.
09-05-2019
3
[0009]
The embodiments of the present invention are made in view of the above problems, and provide
an apparatus capable of estimating the arrangement of a plurality of objects in the real space
more simply and accurately than in the prior art. It is the purpose.
[0010]
A first aspect of the present invention is an object arrangement estimation apparatus for
estimating the arrangement of M (M is an integer of 2 or more) objects in a real space, and for
each of M objects, A feature vector generation unit for generating a feature vector including
components of measures for an object at N scales representing the closeness to N (N is an integer
of 3 or more) reference points of M; A dissimilarity which obtains a norm between the feature
vectors of two objects for every combination of two objects included in the object, and derives a
dissimilarity matrix of M rows and M columns having the determined norm as an element The
object placement estimation apparatus includes a degree matrix deriving unit, and an estimation
unit that estimates the arrangement of M objects in the real space based on the dissimilarity
matrix and outputs the result as an arrangement estimation result.
[0011]
A second aspect of the present invention is an object placement estimation apparatus for
estimating placement of an object in real space, which estimates placement of M (M is an integer
of 2 or more) targets in real space. A method, in which for each of M objects, component
measures of N measures representing the closeness to each of N (N is an integer of 3 or more)
reference points in real space The feature vector generation step of generating a feature vector
to be included, and the combination of the feature vectors of the two objects for any combination
of two objects included in M objects, the norm determined is an element The dissimilarity matrix
deriving step of deriving M dissimilarity matrices of M rows and M columns, and estimation of
the arrangement of M objects in real space based on the dissimilarity matrices, and estimation as
an arrangement estimation result The And-up, it is the object arrangement estimation method
with.
[0012]
A third aspect of the present invention is an object arrangement estimation program for causing
a computer to function as an object arrangement estimation apparatus for estimating the
arrangement of M (M is an integer of 2 or more) objects in real space, The component includes,
for each of the M objects, the measures for the object on N scales that represent the closeness to
each of N (N is an integer of 3 or more) reference points in real space. A feature vector
generation unit that generates a feature vector, and for a combination of any two objects
included in M objects, determines a norm between feature vectors of the two objects, and uses
09-05-2019
4
the determined norm as an element A dissimilarity matrix deriving unit for deriving a
dissimilarity matrix of M rows and M columns, and the arrangement of M objects in real space
based on the dissimilarity matrix are estimated and output as an arrangement estimation result
Tough, it is an object disposed estimation program for functioning as a.
[0013]
The object arrangement estimation apparatus according to each embodiment of the present
invention is characterized in that, for each of the M objects, N scales in each of which represent
proximity to each of N (N is an integer of 3 or more) reference points in the real space. Generate
a feature vector that includes the measure for the object as a component, determine the norm
between the feature vectors of the two objects for any combination of two objects, and use M ×
M as an element for the determined norm A row dissimilarity matrix is derived, the arrangement
of M objects in real space is estimated based on the dissimilarity matrix, and is output as the
arrangement estimation result.
By doing so, the object placement estimation apparatus according to each embodiment of the
present invention can estimate the placement of a plurality of objects in the real space more
simply and accurately than in the past.
[0014]
A block diagram showing a hardware configuration of an object arrangement estimation
apparatus according to the first embodiment A block diagram showing a configuration of an
object arrangement estimation apparatus according to a first embodiment A schematic diagram
showing a relation of various amounts used for microphone arrangement estimation Which
shows the flow of processing to be carried out Figure which shows the difference in the sound
collection time in each microphone at the time of collecting the time extension pulse signal (TSP
signal) emitted from the i-th speaker by multiple microphones Microphone arrangement by 1st
Embodiment The figure which shows the experimental environment of the estimation experiment
and the speaker arrangement estimation experiment The figure which shows the experimental
result of the microphone arrangement estimation experiment by 1st Embodiment The figure
which shows the experimental result of the microphone arrangement estimation experiment by
1st Embodiment A microphone arrangement by 1st embodiment Diagram showing the
experimental results of the estimation experiment The figure which shows the relationship
between the number of speakers used and the estimation result accuracy in the microphone
arrangement estimation by A schematic diagram showing the relation of various quantities used
for speaker arrangement estimation The figure showing the experimental result of the speaker
09-05-2019
5
arrangement estimation experiment by 1st embodiment The figure which shows the
experimental result of the speaker arrangement estimation experiment by embodiment The
figure which shows the experimental result of the speaker arrangement estimation experiment
by 1st Embodiment The block diagram which shows the hardware constitutions of the object
arrangement estimation apparatus by 2nd Embodiment by 2nd embodiment Block diagram
showing the configuration of the object arrangement estimation apparatus Diagram showing
experiment environment of microphone arrangement estimation experiment according to the
second embodiment Diagram showing experiment result of microphone arrangement estimation
experiment according to the second embodiment Microphone arrangement estimation
experiment according to the second embodiment Diagram showing experimental results
Experiment of microphone placement estimation experiment according to the second
embodiment The figure which shows the result The figure which shows the relationship of the
speaker production frequency and estimation result precision in the microphone arrangement
estimation by 2nd Embodiment The block diagram which shows the hardware constitutions of
the object arrangement estimation apparatus modification 1 The object arrangement estimation
apparatus modification 1 Block diagram showing the configuration Schematic diagram showing
the relationship between the microphone group arranged in the room and the people. Flow chart
showing the flow of processing performed by the object arrangement estimation device modified
example 1 A plurality of microphones collecting sounds including human voice Illustration of
frequency and amplitude characteristics of microphone output signal output from each
microphone when sound is generated Illustration of distribution of candidate object arrangement
points in object arrangement estimation apparatus variation 2 object in object arrangement
estimation apparatus variation 2 Example of distribution of object arrangement point candidate
Example of distribution of object arrangement point candidate in object arrangement estimation
device modification 2 In object arrangement estimation device modification 2 Kicking, in the
example view object placed estimator second modification of the distribution of the object
placement point candidates, illustration of the distribution of the object arrangement point
candidates
[0015]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the attached drawings.
[0016]
1.
09-05-2019
6
Overview An embodiment of the present invention is an object placement estimation apparatus
for estimating the placement of M (M is an integer of 2 or more) objects in the real space.
The object arrangement estimation apparatus measures, for each of M objects, a measure for an
object in N scales indicating proximity to each of N (N is an integer of 3 or more) reference
points in real space. The norm between the feature vectors of the two objects for the feature
vector generation unit that generates the feature vector included in the component and the
combination of any two objects included in the M objects, the norm determined And a
dissimilarity matrix deriving unit for deriving a dissimilarity matrix of M rows and M columns,
and estimating the arrangement of M objects in real space based on the dissimilarity matrix, and
outputting as an arrangement estimation result One of its features is to have an estimation unit
to
[0017]
Here, the M objects are, for example, M microphones (M may be an integer of 2 or more as
described above).
[0018]
In that case, in order to estimate the arrangement of M microphones, N speakers are arranged in
the space (N may be an integer of 3 or more, as described above).
The positions where the N speakers are arranged correspond to the above-mentioned reference
points.
[0019]
The object arrangement estimation apparatus causes a speaker to output a predetermined
acoustic wave (for example, by outputting a time stretching pulse signal (TSP signal) to a speaker
in order to measure an impulse response of the microphone), and in each microphone from the
speaker The time when the emitted acoustic wave reaches for the first time, for example, the time
when the waveform (for example, impulse response waveform) corresponding to the acoustic
wave first appears (the time of acoustic wave arrival) at the output from the microphone is
specified.
09-05-2019
7
That is, in this case, N scales representing the closeness to each of the N reference points are
time coordinate axes used when specifying the time of arrival of the acoustic wave from each
speaker, and for the object Is the arrival time of the acoustic wave at each microphone.
[0020]
Then, based on the specified acoustic wave arrival time, an N-dimensional vector (a feature
vector having as a component the time when the acoustic wave has arrived from N reference
points for each of the microphones (objects) based on the specified acoustic wave arrival time
Generate).
Here, the time at which the acoustic wave emitted at each reference point has reached the
microphone is regarded as a measure in the scale (the above-mentioned time coordinate axis)
representing the closeness of the microphone and each reference point in real space, and the
feature vector generation unit Generates an N-dimensional feature vector having the same
dimensionality as the number N of reference points for each microphone.
That is, the feature vector is a measure (acoustic) on N scales (time coordinate axes) representing
the closeness of the position of the microphone in the real space to the microphone and the N
reference points (speakers) in the real space. It is a vector expression of what was expressed by
wave arrival time).
[0021]
Alternatively, in order to generate a feature vector, the object arrangement estimation apparatus
collects ambient environment sound including human voice using M microphones, and the
frequency amplitude characteristic of the output signal output from each microphone is
calculated. It may be calculated. According to this method, it is not necessary to output a
predetermined acoustic wave from the speakers disposed at the N reference points as described
above. The frequency-amplitude characteristics of the ambient environment sound including
human voice include a component of human voice in the form of being superimposed on the
components of noise (such as indoor reverberation and outdoor noises) that are ubiquitous in the
sound collection environment. Formants appear. As for the shape of the frequency amplitude
09-05-2019
8
characteristic, as the position of the microphone moves away from the speaker, the influence of
noise increases and moves away from the shape of the frequency characteristic of the original
formant. Therefore, by comparing the shape of the frequency amplitude characteristic of the
output signal of each microphone, it is possible to know the relative proximity of the multiple
microphones to the speaker. For example, the integral value on the frequency axis of the
difference between the frequency amplitude characteristics of the output signals of the two
microphones, the difference between the measures of the two microphones on a scale defining
the proximity to the speaker (for the speaker It can be considered as dissimilarity). This is
because, even if the positions of the two microphones are different, the amplitude components
derived from noise appearing in the frequency amplitude characteristics of the output signals are
almost the same. Therefore, the amplitude component derived from noise can be canceled by
obtaining the difference between the frequency amplitude characteristics. Thus, the difference in
frequency amplitude characteristics includes information on the difference in proximity to the
speaker. As a matter of course, from the difference between the frequency amplitude
characteristics of any two microphones thus determined, the object placement estimation
apparatus measures each microphone on a scale that defines the proximity to the speaker ( It is
also possible to determine the component of the feature vector with respect to the speaker.
[0022]
Alternatively, for example, when the distance between the speaker and the microphone is
doubled, it is known that the formant amplitude appearing in the frequency amplitude
characteristic decreases by approximately 6 dB. Based on this relationship, the feature vector
generation unit of the object placement estimation apparatus identifies the formant component
of the speaker from the frequency and amplitude characteristics of the output signal of each
microphone, and determines each of the identified formant amplitudes as a scale. A measure of
proximity to the speaker's position (corresponding to a reference point) about the microphone
may be derived to determine the components of the feature vector. As described above, three or
more reference points are required. Therefore, the object placement estimation apparatus
collects voices uttered by the speaker at N (N is 3 or more) different positions using different
microphones and generates an N-dimensional feature vector. .
[0023]
Then, the dissimilarity matrix deriving unit derives a matrix (dissimilarity matrix) of M rows and
M columns having elements between the feature vectors and norms between the feature vectors
for every pair of M microphones. .
09-05-2019
9
[0024]
Finally, the estimation unit estimates the arrangement of M microphones in the real space based
on the dissimilarity matrix, and outputs the result as an arrangement estimation result.
The estimating unit applies, for example, multidimensional scaling (MDS: MultiDimensional
Scaling) to the dissimilarity matrix to obtain the placement of M microphones, and estimates the
placement of the M microphones in the real space from the obtained placement. Output.
[0025]
Alternatively, instead of applying the MDS method to the dissimilarity matrix to obtain the
placement of the M microphones, the estimation unit of the object placement estimation
apparatus numerically approximates the placement solution by a method of full search or local
search. The arrangement of M microphones in the real space may be estimated and output from
the obtained approximate solution solution. In this case, the estimation unit derives a distance
matrix in which each element is a distance between the microphones, and compares the derived
distance matrix with the dissimilarity matrix, for the approximate solution candidate of the
placement of M microphones. The degree of matching of the approximate solution candidate may
be evaluated, and the approximate solution candidate showing the highest degree of matching
among the evaluated approximate solution candidates may be taken as the approximate solution.
[0026]
In the object arrangement estimation apparatus according to the embodiment of the present
invention, the positions of the N reference points described above may be any N points in order
to estimate the arrangement of M objects, and the actual positions of the reference points may be
determined. Information on the position in space (for example, the coordinate value of the
reference point in the real space) is also unnecessary. Therefore, the object placement estimation
apparatus according to the embodiment of the present invention can estimate the placement of
M objects without measuring the distance between the pair of placement estimation targets.
Therefore, it is possible to estimate the arrangement of M objects extremely simply. Furthermore,
in the object arrangement estimation according to the embodiment of the present invention, first,
the feature of the position of the object in the real space is first set as an N-dimensional vector
quantity (feature vector) equal to the number N of reference points (for example, the number of
09-05-2019
10
speakers). The (de) similarity of the position in the real space of the object is derived from the Ndimensional feature vector generated and defined. Therefore, in the object arrangement
estimation according to the embodiment of the present invention, the arrangement estimation
accuracy of the M objects is improved as the number (N) of reference points increases.
[0027]
In the object placement estimation apparatus according to the embodiment of the present
invention, N microphone points are generated at positions corresponding to N reference points
for M microphones corresponding to M placement estimation objects. Although it is necessary to
output predetermined acoustic waves from the speakers, this does not necessarily mean that N
speaker units need to be prepared. In the object arrangement estimation apparatus according to
the embodiment of the present invention, predetermined acoustic waves may be output at N
positions using less than N speaker units (for example, one speaker unit).
[0028]
In addition, instead of the microphone, M objects may be, for example, M speakers (M may be an
integer of 2 or more as described above). In this case, N microphones are arranged, and the
position at which each microphone is arranged is regarded as the above-mentioned reference
point to generate a feature vector for each speaker, and M feature vectors for M speakers are
generated. The dissimilarity matrix of the position in space may be derived, and the arrangement
of the speakers in the real space may be estimated from the dissimilarity matrix.
[0029]
According to the object arrangement estimation apparatus according to the embodiment of the
present invention, M persons (M is an integer of 2 or more) in the object arrangement estimation
apparatus for estimating the arrangement of the object in the real space can be The method of
estimating the arrangement of the object in the real space can also be known well.
[0030]
Also, according to the following description, a person skilled in the art can use a computer
program for causing a computer to function as an object arrangement estimation apparatus for
estimating the arrangement of M (M is an integer of 2 or more) objects in real space. You can
know the composition well.
09-05-2019
11
[0031]
2.
First Embodiment 2-1.
Configuration FIG. 1 is a block diagram showing the configuration of the object placement
estimation apparatus according to the first embodiment. The object arrangement estimation
apparatus according to the first embodiment includes a central processing unit (CPU) 11 capable
of performing predetermined data processing by executing a program, and a read only memory
(ROM) 12 storing the program. A random access memory (RAM) 13 for storing various data, a
hard disk drive (HDD) 21 as an auxiliary storage device, a display 31 as an output device, a
keyboard 32 and a mouse 33 as input devices, It has a time measuring unit 41 that measures
time, and an audio interface unit 50 including an audio output unit 51 and an audio input unit
52, which are input / output interfaces with external audio devices (speakers and microphones).
The audio interface unit 50 includes a speaker array SPa including N external speakers (SP 1, SP
2,..., SP N), and M external microphones (MC 1, MC 2,..., MC). A microphone array MCa consisting
of M) is connected.
[0032]
The CPU 11, the ROM 12, and the RAM 13 constitute a computer main unit 10.
[0033]
The display 31, the keyboard 32 and the mouse 33 constitute a user interface unit 30.
The user interface unit 30 may be configured by a display panel with a touch panel function or
the like.
[0034]
09-05-2019
12
FIG. 2 is a block diagram clearly showing functional blocks implemented by the computer main
part 10 of the object placement estimation apparatus 100 according to the first embodiment. The
CPU 11 of the computer main unit 10 reads out and executes the object arrangement estimation
program stored in the ROM 12 to thereby execute the control unit 1, the impulse generation unit
(TSP generation unit) 2, the response detection unit 3, the feature vector generation unit 4, It can
operate as the dissimilarity matrix deriving unit 5, the placement deriving unit (MDS unit) 6, and
the arrangement estimation result output unit 7. The placement and derivation unit (MDS unit) 6
and the arrangement estimation result output unit 7 constitute an estimation unit 8.
[0035]
The object arrangement estimation program does not necessarily have to be stored in the ROM
12. The object arrangement estimation program may be stored in the HDD 21 (FIG. 1), read by
the CPU 11 as appropriate, and executed. Further, the object arrangement estimation program
may be appropriately downloaded from an external storage device (not shown) via a network
(not shown) and executed by the CPU 11. Alternatively, the object arrangement estimation
program may be stored in a portable storage medium such as a flexible disk, an optical disk, a
flash memory or the like (not shown). In that case, the program stored in the portable storage
medium may be read from the same medium by the CPU 11 and executed. Alternatively, one end
may be installed on the HDD 21 or the like prior to execution.
[0036]
The control unit 1 is realized by the CPU 11 executing an object placement estimation program.
The control unit 1 monitors the progress of the operation concerning the object arrangement
estimation and controls the entire apparatus 100.
[0037]
The impulse generation unit (TSP generation unit) 2 is realized by the CPU 11 executing an
object arrangement estimation program. The impulse generation unit (TSP generation unit) 2
generates and outputs a signal for selectively outputting a predetermined acoustic wave to one or
more speakers of the speaker array SPa connected to the audio output unit 51. . The signal is, for
example, a signal (TSP signal) having a pulse-shaped waveform (time-stretched pulse waveform
09-05-2019
13
(TSP waveform)).
[0038]
The response detection unit 3 is realized by the CPU 11 executing an object placement
estimation program. The response detection unit 3 outputs the predetermined acoustic waves
(for example, output from the speaker according to the TSP signal) for each of the M inputs from
the M microphones of the microphone array MCa connected to the audio input unit 52. The
response waveform to the acoustic TSP wave is detected (the impulse response waveform to the
acoustic TSP wave is detected), and the time (acoustic wave arrival time) at which the response
waveform is detected in each of the M microphones is referred to Identify.
[0039]
The feature vector generation unit 4 is realized by the CPU 11 executing an object placement
estimation program. The feature vector generation unit 4 receives the acoustic wave arrival time
specified by the response detection unit 3 and generates an N-dimensional feature vector for
each of the M microphones (objects).
[0040]
The dissimilarity matrix deriving unit 5 is realized by the CPU 11 executing an object placement
estimation program. The dissimilarity matrix deriving unit 5 obtains a norm between feature
vectors of two microphones for a combination of any two objects (microphones) of M
microphones. Then, the dissimilarity matrix deriving unit 5 derives a M-by-M dissimilarity matrix
having the determined norm as an element.
[0041]
The cloth placement and derivation unit (MDS unit) 6 is realized by the CPU 11 executing an
object placement estimation program. The placement and derivation unit (MDS unit) 6 derives
the placement of M microphones in the real space based on the dissimilarity matrix. The
placement and derivation unit (MDS unit) 6 derives the placement of M microphones, for
09-05-2019
14
example, by applying multidimensional scaling (MDS) to the dissimilarity matrix.
[0042]
The placement estimation result output unit 7 is realized by the CPU 11 executing an object
placement estimation program. The placement estimation result output unit 7 performs linear
conversion operations such as enlargement, reduction, rotation, etc. on the placement obtained
by the placement derivation unit 6, estimates the placement of M microphones in the real space,
and outputs as a placement estimation result Do. The placement and derivation unit (MDS unit) 6
and the arrangement estimation result output unit 7 constitute an estimation unit 8 of the object
arrangement estimation apparatus according to the present embodiment.
[0043]
The control unit 1, the impulse generation unit (TSP generation unit) 2, the response detection
unit 3, the feature vector generation unit 4, the dissimilarity matrix derivation unit 5, the
placement derivation unit (MDS unit) 6, the arrangement estimation result output unit 7 At least
any one of may be realized by a dedicated hardware circuit.
[0044]
2−2.
Operation in Microphone Placement Estimation Now, with reference to FIG. 3, FIG. 4, and FIG. 5,
microphone placement estimation performed by the object placement estimation apparatus
according to the present embodiment will be described.
[0045]
FIG. 3 is a schematic view showing the relationship among M microphones as objects of
arrangement estimation, N speakers arranged at positions corresponding to reference points, and
various amounts. In the drawing, only two microphones (MC 1, MC 2) and four speakers (SP 1,
SP 2, SP 3, SP 4) are shown for simplicity. Here, p ij indicates the time (acoustic wave arrival
time) at which the acoustic wave (TSP wave) emitted from the i-th speaker SP i has reached the j-
09-05-2019
15
th microphone (MC j). d MC12 is an N-dimensional feature vector p MC1 of the first microphone
MC 1 having N acoustic wave arrival times pi 1 (i: 1 to N) at the first microphone MC 1 and a
second microphone MC 2 The norm between N-dimensional feature vector pMC2 of 2nd
microphone MC2 which makes N acoustic wave arrival time pi2 (i: 1-N) a component is shown.
ここでのノルムは、たとえば、ユークリッドノルムである。
[0046]
FIG. 4 is a flowchart of processing for microphone arrangement estimation performed by the
object arrangement estimation apparatus.
[0047]
The control unit 1 (CPU 11) of the object placement estimation apparatus sets a variable i to 1 as
an initial setting operation and stores the variable i in the RAM 13 (S1).
[0048]
Next, the impulse generation unit 2 (CPU 11) reads the value of the variable i and the TSP
waveform stored in the RAM 13, and outputs the TSP waveform to the i-th speaker SP i
connected via the audio output unit 51. The acoustic wave signal which it has is output (S2).
Thereby, an acoustic TSP wave is output from the ith speaker SP i.
[0049]
In FIG. 5, the acoustic wave TSP emitted from the i-th speaker SP i is collected by each
microphone (MC 1, MC 2,..., MC j,..., MC M-1, MC M) Is a chart showing how it
The time chart on the side of the i-th speaker SP i shows the acoustic TSP wave outputted from
the i-th speaker SP i and each microphone (MC 1, MC 2,..., MC j,..., MC M- The time chart next to
1 and MC M) shows the signals output from each.
[0050]
09-05-2019
16
In step S2 described above, when an acoustic wave signal is input to the i-th speaker SP i, a
predetermined acoustic wave TSP is output from the same speaker into the air. The acoustic
wave propagates in the air at the speed of sound and is collected by each of the microphones
(MC 1, MC 2,..., MC j,..., MC M-1, MC M). For example, in the output from the first microphone MC
1, a response waveform R i1 to the acoustic wave appears in the vicinity of the time point p i1 on
the time coordinate T i. Further, in the output from the j-th microphone MC j, a response
waveform R ij for the acoustic wave appears in the vicinity of the time p ij. The outputs from the
microphones (MC 1, MC 2,..., MC j,..., MC M−1, MC M) are stored in the RAM 13.
[0051]
Returning to FIG. 4, the response detection unit 3 (CPU 11) outputs the output of each
microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M) to the audio input unit 52. , Or from the
RAM 13, and specify the time when the peak of the response waveform appears at each output
as the acoustic wave arrival time p ij on the time coordinate axis T i for the microphone MC j (j: 1
to M) To do (S3). The acoustic wave arrival time may be determined based on other
characteristics of the response waveform (such as the timing of rising and the timing of
exceeding a predetermined sound pressure level). The identified acoustic wave arrival time is
stored in the RAM 13.
[0052]
Next, the controller 1 (CPU 11) determines whether or not the value of the variable i is N or
more. If the value of i is less than N, the process returns to step S2 via step S5. On the other
hand, if the value of i is N or more, the process proceeds to step S6.
[0053]
In step S5, the value of variable i is incremented by 1 (i → i + 1), and a new value of variable i is
stored in RAM 13. Therefore, in step S2 executed next, an acoustic TSP wave is emitted from the
speaker SP (i + 1) of the next number of the speaker from which the acoustic wave was emitted
in the previous step S2, and each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M), and
output as a response waveform. Then, in step S 3, the response detection unit 3 uses the time
09-05-2019
17
coordinate axis T i + 1 for the acoustic wave arrival time p i + 1, j at each microphone MC j (j: 1 to
M) of the acoustic TSP wave emitted from the speaker. Identify. Here, it is used to specify the
time coordinate axis T i which is a scale used to specify the arrival time of the acoustic wave from
the i-th speaker SP i and the arrival time of the acoustic wave from the (i + 1) -th speaker SP i + 1
The time coordinate axis T i + 1, which is a measure, may be the same or may be different from
each other.
[0054]
In this manner, the object placement estimation apparatus is emitted from each speaker (SP 1, SP
2,..., SP N-1, SP N) by repeating the processing of step S2 to step S5 N times. The acoustic wave
reaches each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M) at times p ij (i: 1 to N, j: 1 to
M) ) Is specified on any time axis. Here, it should be noted that, in the embodiment of the present
invention, the response detection unit 3 includes each microphone (MC 1, MC 2,..., MC j,..., MC
M−1, MC It is only necessary to specify the time on an arbitrary time coordinate axis at which
the acoustic wave has reached M), and the acoustic wave is actually each speaker (SP 1, SP 2,...,
SP N-1, SP N It is not necessary to obtain the time width required to reach each of the
microphones (MC 1, MC 2,..., MC j,..., MC M−1, MC M). Therefore, in the object arrangement
estimation apparatus according to the embodiment of the present invention, it is not necessary to
specify the time when the acoustic wave is emitted from each of the speakers (SP 1, SP 2,..., SP N1, SP N). Therefore, in the object arrangement estimation apparatus, the object arrangement is
estimated due to an error in identifying the time when the acoustic wave is emitted from each of
the speakers (SP 1, SP 2,..., SP N-1, SP N). There are no errors in the results.
[0055]
Next, the feature vector generation unit 4 (CPU 11) receives the acoustic wave arrival time (p ij (i:
1 to N, j: 1 to M)) specified by the response detection unit 3 and inputs M microphones MC An Ndimensional feature vector p MCj is generated for each of j (j: 1 to M) (S6). The generated feature
vector p MCj is stored in the RAM 13.
[0056]
The N-dimensional feature vector p MCj represents the feature of the position of the j-th
microphone MC j in the real space by an N-dimensional scale that represents the closeness to
09-05-2019
18
each of N speakers S p i (i: 1 to N). It is a thing. Specifically, the feature vector p MCj is That is,
the scale indicating the proximity to the i-th speaker SP i (i: 1 to N) is the time when the acoustic
wave has arrived from the i-th speaker SP i to each microphone MC j (j: 1 to M) here Is the time
coordinate axis T i (FIG. 5) used by the response detection unit 3 in specifying the frequency, and
the measure for the jth microphone MC j in each scale is used by the response detection unit 3 to
specify the acoustic wave arrival time. It is the acoustic wave arrival time p ij on the time
coordinate axis T i (FIG. 5).
[0057]
The N scales used to construct the N-dimensional feature vector may not be time coordinate axes.
For example, the measure may be a real space distance. Also, for example, the measure may be
the peak level of the response waveform detected at each microphone. Also, for example, the
measure may be an amount that characterizes the shape of the response waveform detected at
each microphone. Also, for example, the measure may be an amount that characterizes the nondirect sound (reversing component) detected at each microphone.
[0058]
Next, the dissimilarity matrix deriving unit 5 (CPU 11) generates the dissimilarity matrix based
on the N-dimensional feature vectors p MCj for M microphones generated by the feature vector
generating unit 4 and stored in the RAM 13 Generate D (S7). The generated dissimilarity matrix
D is stored in the RAM 13.
[0059]
The dissimilarity matrix D has its feature vector (p) for every two combinations of M
microphones MC j (j: 1 to M) (for example, microphones MC k and microphones MC l) that are
objects of arrangement estimation. It is a matrix of M rows and M columns which takes the norm
d MCkl of MCk and p MCl) as an element.
[0060]
That is, each element d MCkl is
09-05-2019
19
Therefore, the dissimilarity matrix is obtained by determining the dissimilarity of the positions of
the two microphones in real space based on the N-dimensional feature vector P MCj (j: 1 to M). It
is a matrix which shows the dissimilarity of the position in real space.
[0061]
Next, the placement and derivation unit (MDS unit) 6 (CPU 11) derives the placement of M
microphones by applying multidimensional scaling (MDS: MultiDimensional Scaling) to the
dissimilarity matrix D. The derived layout is stored in the RAM 13.
[0062]
The placement and derivation unit (MDS unit) 6 first obtains an M × M matrix D <(2)> having d
MCkl <2> as an element.
[0063]
Next, the placement and derivation unit (MDS unit) 6 uses the M × M centering matrix H having
h kl represented by the following equation using Kronecker delta δ kl as Find an M × M matrix
B to be represented.
[0064]
And finally, the placement and derivation unit (MDS unit) 6 solves the following eigenvalue
problem for B to obtain the placement about the axis of the rth dimension of M microphones, and
among these, the axis of the rth dimension The layout with respect to the vector x r (r: 1, 2, 3) for
(r = 1, 2, 3) is used to derive a layout matrix X of M rows and 3 columns.
Thereby, arrangement of M microphones MC j (j: 1 to M) in the real space (three-dimensional
space) can be obtained.
[0065]
09-05-2019
20
The placement matrix X derived by the placement and derivation unit (MDS unit) 6 is obtained by
adding linear transformations (enlargement / reduction, rotation, inversion (mirroring), etc.) to
the actual arrangement of M microphones. Have
Therefore, the placement estimation result output unit 7 reads out the placement matrix X
derived by the placement derivation unit (MDS unit) 6 from the RAM 13 and performs
appropriate linear transformation on this to determine the actual placement of M microphones.
The determined arrangement is stored in the RAM 13.
[0066]
If the variance of the coordinates of the arrangement of M microphones in the real space is
known, the arrangement estimation result output unit 7 obtains the arrangement variance of the
arrangement matrix X for each coordinate axis of the arrangement matrix X, and determines the
arrangement matrix X The variance of the placement of any of the three coordinate axes scales
the values of the three coordinates of the placement matrix to match the known variance
described above.
[0067]
Alternatively, with regard to the arrangement of M microphones in the real space, for example,
when the distance between the two microphones farthest from the certain coordinate axis is
known, the arrangement estimation result output unit 7 The values of the three coordinate axes
of the placement matrix are scaled to match the placement of the two microphones at which the
placement values for the coordinate axes are furthest apart correspond to the known distances
described above.
[0068]
Thus, based on the information (for example, the information on the positions of M objects in
arbitrary three real spaces) known about the position of the object of the arrangement estimation
in the real space, the arrangement estimation result output unit 7 A linear transformation can be
performed on the placement matrix X to estimate and output the arrangement of the
arrangement estimation target in the real space.
There is a case where the placement indicated by the placement matrix X and the coordinates in
the real space have a mirror image relationship.
09-05-2019
21
In that case, the placement estimation result output unit 7 may make the placement of the
placement matrix X coincide with the coordinates in the real space by inverting the positive and
negative values of any one coordinate axis of the placement matrix X.
[0069]
2−3. Results of Microphone Placement Estimation Experiment Hereinafter, the results of a
plurality of microphone placement estimation experiments by the object placement estimation
apparatus according to the present embodiment will be described.
[0070]
In this experiment, as shown in FIG. 6, an 80 channel microphone array MCa was arranged in a
sound field reproduction environment in which 96 channel speaker arrays (SPa1, SPa2, SPa3 and
SPa4) were arranged. The microphone array MCa has one omnidirectional microphone (DPA
4060-BM) disposed at each node in a frame structure of about 46 centimeters in diameter having
a C80 fullerene structure. The sound field reproduction environment configured by the 96channel speaker system consists of 90 speakers mounted on a rectangular parallelepiped
enclosure (fostex FE103En) on the wall of a room with a regular hexagonal cross section and six
mounted on the ceiling. .
[0071]
In such an experimental environment, an experiment was conducted to estimate the arrangement
of 80 microphones using the object arrangement estimation apparatus according to the present
embodiment. In this experiment, the conditions for outputting and detecting the acoustic wave
were TSP length 8192 [pnt], TSP response length 32768 [pnt], sampling frequency 48000 [Hz],
and quantization bit rate 16 [bit].
[0072]
09-05-2019
22
The results of the experiment are shown in FIGS. 7A, 7B and 7C. Each of them is a view of the
result as viewed from directly above, directly in front of, and just beside (a direction rotated 90
degrees in the horizontal direction from directly in front). In each figure, the actual microphone
position is indicated by a cross, and the placement estimation result is indicated by a circle.
[0073]
Further, for each microphone, the deviation between the actual position and the position of the
estimation result was determined, and the average value was determined as an error evaluation
value [mm]. In this experiment, the error evaluation value was 4.8746 [mm]. From the
experimental results, it was found that the object placement estimation apparatus according to
the present embodiment can output estimation results with sufficient accuracy to determine the
placement of microphones and the correctness of cable connection.
[0074]
2−4. The relationship between the number of loudspeakers and estimation error in
microphone placement estimation The number of loudspeakers (impulse response waveform
(TSP waveform)) used in the method of object placement estimation according to the present
embodiment The relationship with the accuracy of
[0075]
In order to investigate the relationship between the number of speakers and the accuracy of the
object arrangement estimation, the experiment was performed multiple times while changing the
number of speakers. FIG. 8 is a graph in which the result is plotted as the number of
loudspeakers on the horizontal axis and the above-mentioned error evaluation value for each
estimation result on the vertical axis. From FIG. 8, it is understood that in the present
embodiment, the accuracy of the object arrangement estimation is monotonously improved by
increasing the number of speakers (the number of reference points described above) used for the
object arrangement estimation. In particular, it can be seen that the accuracy of the object
arrangement estimation is significantly improved until the number of speakers exceeds 10. From
this, it is understood that in the object arrangement estimation according to the present
embodiment, the arrangement estimation result can be obtained with good accuracy by setting
the number of speakers (the number of the above-mentioned reference points) to about 10 or
09-05-2019
23
more.
[0076]
2−5. Operation in Speaker Arrangement Estimation As described above, the object
arrangement estimation apparatus according to the present embodiment uses the N microphones
arranged at positions corresponding to the N reference points to arrange the arrangement of M
speakers. It is also possible to estimate. Hereinafter, with reference to FIG. 9, FIG. 10A, FIG. 10B,
and FIG. 10C, the principle of the speaker arrangement estimation and the experimental result
thereof will be described.
[0077]
FIG. 9 is a schematic view showing the relationship among M speakers as objects of arrangement
estimation, N microphones arranged at positions corresponding to reference points, and various
amounts. In the drawing, only two microphones (MC 1, MC 2) and four speakers (SP 1, SP 2, SP
3, SP 4) are shown for simplicity. Here, p ij indicates the time (acoustic wave arrival time) at
which the acoustic wave (TSP wave) emitted from the i-th speaker SP i has reached the j-th
microphone (MC j). d SP12 is the time (acoustic wave arrival time) p 1j (j: 1 to N) at which the
acoustic wave emitted from the first speaker SP 1 reaches each of N microphones MC j (j: 1 to N)
N-dimensional feature vector p SP1 of the first speaker SP 1 as a component and the time
(acoustic wave) when the acoustic wave emitted from the second speaker SP 2 reaches each of N
microphones MC j (j: 1 to N) Arrival time) Indicates the norm between the second speaker SP 2
and the N-dimensional feature vector p SP2 of the second speaker SP 2 having p 2j (j: 1 to N) as
components. Similarly, d SP23 and d SP34 respectively represent the norm between the Ndimensional feature vector p SP2 of the second speaker SP 2 and the N-dimensional feature
vector p SP3 of the third speaker SP 3, and the third speaker The norm between N-dimensional
feature vector pSP3 of SP3 and N-dimensional feature vector pSP4 of 4th speaker SP4 is shown.
[0078]
In the speaker arrangement estimation, the feature vector generation unit 4 determines the
positions at which the microphones MC j (j: 1 to N) are arranged for M speakers S p i (i: 1 to M)
that are arrangement estimation objects. A feature vector p SPi (i: 1 to M) is generated for each
speaker SP i (i: 1 to M) by regarding it as the above-mentioned reference point, and a real inspace position for M speakers from M feature vectors Derive the dissimilarity matrix of and
estimate the placement of the loudspeaker in real space from the dissimilarity matrix.
09-05-2019
24
[0079]
Therefore, in this case, the N-dimensional feature vector p SPi is an N-dimensional feature
representing the closeness to the N microphones MC j (i: 1 to N) as the feature of the position of
the i-th speaker SP i in the real space. It is represented by a scale.
Specifically, the feature vector p SPi is
[0080]
Next, the dissimilarity matrix deriving unit 5 obtains the norm between the feature vectors of the
two loudspeakers, for any combination of two objects among the M loudspeakers. Then, the
dissimilarity matrix deriving unit 5 derives a M-by-M dissimilarity matrix having the determined
norm as an element.
[0081]
Specifically, the dissimilarity matrix deriving unit 5 (CPU 11) generates the dissimilarity matrix D
based on the N-dimensional feature vector p SPi.
[0082]
Therefore, each element d MCkl of the dissimilarity matrix D is
Therefore, the dissimilarity matrix is obtained by determining the dissimilarity between the
positions of the two speakers in the real space based on the N-dimensional feature vector P SPi (i:
1 to M). It is a matrix which shows the dissimilarity of the position in real space.
[0083]
Then, the placement and derivation unit (MDS unit) 6 (CPU 11) derives the placement of M
speakers by applying multidimensional scaling (MDS: MultiDimensional Scaling) to the
09-05-2019
25
dissimilarity matrix D.
[0084]
Furthermore, the placement estimation result output unit 7 performs appropriate linear
transformation on the placement matrix X derived by the placement derivation unit (MDS unit) 6
to determine the actual placement of M speakers.
[0085]
2−6.
Results of Speaker Placement Estimation Experiment Hereinafter, the results of a placement
estimation experiment of a plurality of speakers by the object placement estimation apparatus
according to the present embodiment will be described.
Note that the experimental environment is the same as the previous microphone placement
estimation experiment, so the description will be omitted.
[0086]
The results of the experiment are shown in FIGS. 10A, 10B and 10C. Each of them is a view of the
result as viewed from directly above, directly in front of, and just beside (a direction rotated 90
degrees in the horizontal direction from directly in front). In each of the figures, the actual
position of the speaker is indicated by a cross, and the arrangement estimation result is indicated
by a circle.
[0087]
Moreover, about each speaker, the shift ¦ offset ¦ difference of an actual position and the position
of an estimation result was calculated ¦ required, and the average value was calculated ¦ required
as error evaluation value [mm]. In this experiment, the error evaluation value was 23.5486 [mm].
This value is larger in error than the error evaluation value of 4.8746 [mm] in the microphone
09-05-2019
26
placement estimation experiment performed in the same experiment environment. However, in
consideration of the size of the speaker unit (for example, the size of the diaphragm), the
arrangement interval of the speaker units, the size of the speaker array, etc. It can be said that it
has sufficient accuracy to determine whether the connection is correct or not.
[0088]
3. Second Embodiment 3-1. Configuration In the second embodiment of the present invention,
the portability is improved in comparison with the first embodiment, and the object arrangement
capable of checking the arrangement of the microphone array and the cable connection easily
and accurately at various sound collecting sites It is an estimation device.
[0089]
11 and 12 are block diagrams showing the configuration of the object placement estimation
apparatus according to the second embodiment. The object arrangement estimation apparatus
according to the second embodiment has the same configuration as the object arrangement
estimation apparatus according to the first embodiment, but the audio interface unit 250
including the audio output unit 251 includes one external speaker SP (SP 1 In that the object
arrangement estimation apparatus according to the first embodiment is connected. Here, the
speaker SP is, for example, a small-sized speaker (for example, Audio-Technica AT-SPG 50) with
high portability.
[0090]
3−2. Operation of Microphone Arrangement Estimation In this embodiment, a
predetermined acoustic wave is output using one speaker SP 1, and after the acoustic wave is
output, the speaker SP 1 is moved to obtain predetermined positions at a plurality of positions.
Acoustic waves are output, and for each acoustic wave, a response waveform in each of the M
microphones MC j (j: 1 to M) is detected, and the acoustic wave arrival time is measured. In this
manner, in the present embodiment, by outputting acoustic waves from the speaker SP 1 at N
positions, N microphones MC j (j: 1 to M) are provided as in the first embodiment. An Ndimensional feature vector is generated using a measure that represents the closeness to the
reference point of. However, the number of speakers in the present embodiment is not limited to
one, and may be plural.
09-05-2019
27
[0091]
The object arrangement estimation apparatus according to the present embodiment measures
the arrival time of a predetermined acoustic wave from one speaker SP 1 at each microphone MC
j (j: 1 to M), and conveniently determines a predetermined number of times from N positions. The
arrival time of the acoustic wave is measured at each microphone MC j (j: 1 to M). The N
positions here correspond to the reference points described above. Then, the feature vector
generation unit 4 generates feature vectors p MCj (j: 1 to M) for the microphones MC j (j: 1 to M)
as in the first embodiment.
[0092]
Similar to the first embodiment, the dissimilarity matrix deriving unit 5 derives the dissimilarity
matrix D from the generated feature vectors p MCj (j: 1 to M), and the estimating unit 8 (the cloth
derivation unit 6). The placement estimation result output unit 7) estimates the placement of M
microphones in real space from the dissimilarity matrix D and outputs the result.
[0093]
As described above, the object placement estimation apparatus according to the second
embodiment is superior in portability to the object placement estimation apparatus according to
the first embodiment in that the large-scale speaker array SPa is not used, and various features
are also provided. This is advantageous in that microphone placement estimation can be
performed at a sound collection site.
[0094]
3−3.
Results of Microphone Placement Estimation Experiment Hereinafter, the results of a plurality of
microphone placement estimation experiments by the object placement estimation apparatus
according to the present embodiment will be described.
[0095]
09-05-2019
28
In this experiment, as shown in FIG. 13, 80 channels of microphone array MCa are arranged near
the lower part of Tokyo Cathedral St. Mary's Cathedral, and speaker SP 1 (audio technica ATSPG50) is not shown by hand. The acoustic wave was output while moving it to various positions.
In this experiment, the conditions for outputting and detecting the acoustic wave were TSP
length 8192 [pnt], TSP response length 105600 [pnt], sampling frequency 48000 [Hz], and
quantization bit rate 16 [bit].
[0096]
The results of the experiment are shown in FIGS. 14A, 14B and 14C. Each of them is a view of the
result as viewed from directly above, directly in front of, and just beside (a direction rotated 90
degrees in the horizontal direction from directly in front). In each figure, the actual microphone
position is indicated by a cross, and the placement estimation result is indicated by a circle.
[0097]
Further, for each microphone, the deviation between the actual position and the position of the
estimation result was determined, and the average value was determined as an error evaluation
value [mm]. In this experiment, the error evaluation value was 13.5148 [mm] as an average value
of a plurality of experiments. From this experimental result, it was found that the object
placement estimation apparatus according to the present embodiment can also output the
estimation result with sufficient accuracy for determining the placement of the microphone and
the correctness of the cable connection.
[0098]
3−4. The relationship between the number of speaker productions and estimation error in
microphone arrangement estimation In the method of object arrangement estimation according
to the present embodiment, the number of times the acoustic wave (impulse response waveform
(TSP waveform)) is output (ie The relationship between the number of reference points described
09-05-2019
29
above) and the accuracy of the result of the object arrangement estimation will be described.
[0099]
In order to investigate the relationship between the number of times the acoustic wave was
output from the speaker and the accuracy of the object arrangement estimation, the number of
times of output was changed to conduct a plurality of experiments. In addition, the position
which outputs an acoustic wave changes for every output. That is, the number of outputs from
the speaker corresponds to the number of reference points described above. FIG. 15 is a graph in
which the result is plotted with the horizontal axis as the number of acoustic wave outputs and
the vertical axis as the above-mentioned error evaluation value regarding each estimation result.
From FIG. 15, it is also understood that the accuracy of the object arrangement estimation is
monotonously improved by increasing the number of times of outputting the acoustic wave used
for object arrangement estimation (the number of reference points described above) also in the
present embodiment. . In particular, it can be seen that the accuracy of the object arrangement
estimation is significantly improved until the number of speakers exceeds 10. From this, even at
the sound collection site when the content is actually created, according to the object
arrangement estimation according to this embodiment, the number of acoustic wave outputs
(that is, the number of reference points described above) is about 10 or more. It was found that
the placement estimation result can be obtained with good accuracy.
[0100]
4. Modifications of Object Arrangement Estimation Apparatus Hereinafter, modifications of the
object arrangement estimation apparatus according to the first and second embodiments will be
described. The first modification relates to another example of the feature vector generation
method. The second modification relates to another example of the method of estimating the
placement of an object from the dissimilarity matrix. Modification 1 and Modification 2 are
modifications that can be applied individually and simultaneously to the object placement
estimation apparatus of both Embodiment 1 and 2.
[0101]
4−1. Modification 1 (Another Example of Feature Vector Generation Method) Here, another
example of the method of generating a feature vector will be described. In the embodiments
09-05-2019
30
described above, the feature vector is generated based on the time when the acoustic wave
emitted from the speaker located at the reference point has reached the microphone (acoustic
wave arrival time). On the other hand, in the present method, the feature vector is determined
based on the frequency amplitude characteristic of the output signal output from the
microphone.
[0102]
FIG. 16 is a block diagram showing the configuration of a modified example of the object
arrangement estimation apparatus. In addition, about the same component as the component
shown by FIG. 1 etc., the same reference number is attached ¦ subjected and description is
abbreviate ¦ omitted.
[0103]
The modified example of the object arrangement estimation apparatus has a configuration in
which the timekeeping unit 41 and the audio output unit 51 are omitted from the object
arrangement estimation apparatus shown in FIG. A speaker array SPa consisting of external
speakers (SP1, SP2,..., SPN) may not be connected to this apparatus.
[0104]
FIG. 17 is a block diagram in which the functional blocks implemented by the computer main
part 10 of the object arrangement estimation apparatus 300 are clearly shown. The CPU 11 of
the computer main unit 10 reads out and executes the object arrangement estimation program
stored in the ROM 12 to thereby execute the control unit 1, the frequency amplitude
characteristic calculation unit 303, the feature vector generation unit 304, the dissimilarity
matrix derivation unit 5, It can operate as the placement and derivation unit (MDS unit) 6 and the
arrangement estimation result output unit 7. The placement and derivation unit (MDS unit) 6 and
the arrangement estimation result output unit 7 constitute an estimation unit 8. The operations
of control unit 1, dissimilarity matrix derivation unit 5, placement derivation unit (MDS unit) 6,
and estimation unit 8 may be the same as those described in the first and second embodiments,
so The explanation is omitted.
09-05-2019
31
[0105]
The frequency amplitude characteristic calculation unit 303 is realized by the CPU 11 executing
an object placement estimation program. The frequency amplitude characteristic calculation unit
303 calculates frequency amplitude characteristics of output signals of the microphones (MC 1
to MC M) included in the microphone array MCa.
[0106]
The feature vector generation unit 304 is realized by the CPU 11 executing an object placement
estimation program. The feature vector generation unit 304 can input the frequency amplitude
characteristic calculated by the frequency amplitude characteristic calculation unit 303, and can
generate an N-dimensional feature vector for each of the M microphones (objects). In the
following, the feature vector generation unit 304 determines the difference between the
corresponding components of the feature vectors of any two microphones of M microphones
(objects) based on the frequency amplitude characteristic (a feature in equation (1) Difference
between vector components p i, j -p i, k, k: k ≠ j, i is an arbitrary integer of 1 to N. Will be
described in detail, but those skilled in the art can also understand how the feature vector
generation unit 304 determines the feature vector of each microphone itself from the following
description.
[0107]
Note that at least one of the control unit 1, the frequency amplitude characteristic calculation
unit 303, the feature vector generation unit 304, the dissimilarity matrix derivation unit 5, the
placement derivation unit (MDS unit) 6, and the arrangement estimation result output unit 7 is It
may be realized by a dedicated hardware circuit.
[0108]
FIG. 18 is a schematic diagram showing how three people hmn1 to hmn3 are having a meeting
indoors.
M microphones (MC 1 to MC M) are disposed in the room. The M microphones (MC 1 to MC M)
are connected to the not-shown object arrangement estimation apparatus 300 via the not-shown
09-05-2019
32
audio interface unit 350 (see FIG. 17).
[0109]
FIG. 19 is a flowchart of processing for microphone arrangement estimation performed by the
object arrangement estimation apparatus 300.
[0110]
The frequency amplitude characteristic calculation unit 303 of the object placement estimation
apparatus 300 receives output signals of M microphones (MC 1 to MC M) through the audio
interface unit 350.
These output signals are response signals of the microphones for the ambient environmental
sound in the room. The frequency / amplitude characteristic calculating unit 303 is a portion
where the ambient sound includes human voice in each output signal (for example, the voice of
the speaker hmn1 in FIG. 18 "Hi! Section) is extracted, and each of the extracted output signals
(time domain) of M microphones (MC 1 to MC M) is converted to the frequency domain, and
from the output signal (frequency domain) to its frequency amplitude characteristic Is calculated
(step S101). Information on the frequency amplitude characteristic of the output signal of each of
the microphones (MC 1 to MC M) is sent from the frequency amplitude characteristic calculation
unit 303 to the feature vector generation unit 304.
[0111]
Based on the information on the frequency amplitude characteristic sent from the frequency
amplitude characteristic calculation unit 303, the feature vector generation unit 304 determines
the difference between the frequency amplitude characteristics of the output signal for any
combination of two microphones (MC j and MC k). Calculate (step S102).
[0112]
Based on the integral value obtained by integrating the difference between the two frequency
amplitude characteristics thus obtained on the frequency axis, the feature vector generation unit
304 determines the position of the two microphones relative to the speaker (reference point).
The similarity, that is, the difference between the two microphone measures on a scale that
09-05-2019
33
defines the closeness to the reference point (p i, j -p i, k in the equation (1), k: k 、 j, i is 1 to N is
an arbitrary integer.
Ask for).
[0113]
FIG. 20 is a schematic view showing frequency amplitude characteristics of output signals of the
microphones (MC 1 to MC M). FIG. 20A shows the frequency amplitude characteristic of the
output signal from the microphone MC 1 with respect to the ambient environmental sound
including the voice uttered by the person hmn1 in the room as shown in FIG. Similarly, FIGS. 20
(b) and 20 (c) are frequency amplitude characteristics of output signals from the microphones
MC j and MC M for the same ambient sound including the same voice. In each frequency
amplitude characteristic, a formant of a voice emitted by a person hmn1 appears in a form
superimposed on a component BG of noise ubiquitously present in a sound collection
environment (such as reverberation noise in a room or noise of a crowd outdoors). Here, the
center frequency of the first formant F1 is f1, and the center frequencies of each formant after
the second formant are shown as f2, f3, and f4, respectively.
[0114]
As can be seen from FIGS. 20A and 20B, the noise component BG shows almost the same profile
in each output signal, while the formant component of the human voice is the frequency of the
original formant as the microphone moves away from the person. Move away from the shape of
the amplitude characteristic. The feature vector generation unit 304 can obtain the difference in
proximity to the speaker (reference point) for the two microphones from the difference in the
shape of the frequency amplitude characteristics of the output signals of the two microphones.
[0115]
The feature vector generation unit 304 integrates the difference between the frequency
amplitude characteristics of the output signals of the two microphones (MC j, MC k, k: k ≠ j) on
the frequency axis (step S103). The integral value obtained here is the difference between the
proximity of the microphone MC j and the microphone MC k to the reference point (speaker), ie,
09-05-2019
34
the component of the feature vector of two microphones (MC j, MC k) with respect to the
reference point. Difference (p i, j -p i, k in the formula (1), k: k ≠ j, i is an arbitrary integer of 1 to
N). )である。
[0116]
As a matter of course, the feature vector generation unit 304 may obtain the component of each
feature vector itself from the difference between the component related to the speaker (reference
point) in the feature vectors of the two microphones thus obtained. You can also.
[0117]
As described above, in step S103, the feature vector generation unit 304 determines the
dissimilarity of the positions of the two microphones with respect to each reference point
(correspondence of feature vectors) for every two microphones of the microphones (MC 1 to MC
M). Difference between the components to be
[0118]
Then, in step S104, the dissimilarity matrix deriving unit 5 determines the dissimilarity matrix D
(Equation (3)) based on the difference between corresponding components of all two feature
vectors obtained by the feature vector generation unit 304. Derive
[0119]
The feature vector generation unit 304 may obtain the feature vector of each microphone from
the integrated value obtained in step S103 and may output the feature vector to the similarity
matrix derivation unit 5.
In that case, the dissimilarity matrix deriving unit 5 may derive the dissimilarity matrix in step
S104 in the same manner as step S7 in the previous embodiment.
[0120]
The processes in steps S105 and S106 are the same as those described in the previous
embodiment (steps S8 and S9 in FIG. 4), and thus the description thereof is omitted here.
09-05-2019
35
[0121]
As in the previous embodiment, three or more reference points are required.
Therefore, the object arrangement estimation apparatus collects voices uttered by the speaker at
different N (N is 3 or more) positions using M microphones, and the microphones collect and
output the voices. The dissimilarity matrix D is derived using the output signal (step S104).
People who speak at N (N is 3 or more) positions may not be the same person.
[0122]
The feature vector generation unit 304 may generate a feature vector based on the information
on the frequency amplitude characteristic sent from the frequency amplitude characteristic
calculation unit 303 as follows.
First, the feature vector generation unit 304 may specify the formant of the speaker for the
output signal of each of the microphones (MC 1 to MC M), and may set the amplitude of the
specified formant (for example, the first formant F1) . Then, the feature vector generation unit
304 sets the amplitude of the peak of a specific formant (for example, the first formant F1 having
the center frequency f1) that appears in the frequency amplitude characteristic of the output
signal of each of the microphones (MC 1 to MC M). , The ratio to the amplitude (the amplitude
A1f1 shown in FIG. 20A) of the peak of the particular formant appearing in the frequency
amplitude characteristic of the output signal of any one microphone (for example, MC 1) (in dB,
for example) From), a measure may be determined for each microphone (MC 1 to MC M) on a
scale that represents the closeness to the reference point (person hmn1).
[0123]
For example, the peak amplitude AMf1 of the first formant F1 in the frequency amplitude
characteristic of the output signal of the microphone MC M and the peak of the first formant F1
in the frequency amplitude characteristic of the output signal of the microphone MC 1 as an
arbitrary one microphone. If the ratio of A to the amplitude A1 f1 is -6 dB, then the measure for
the microphone MC 1 in the scale indicating the proximity to the person hmn1 as a reference
point is, for example, 1 and the proximity to the reference point (person hmn1) is The measure
09-05-2019
36
for the microphone MC M in the scale to be represented may be 2.
[0124]
As described above, the feature vector generation unit 304 can also determine the feature vector
of each of the microphones (MC 1 to MC M) based on the specific frequency component of the
frequency amplitude characteristic.
[0125]
As described above, in the present modification, the object placement estimation apparatus 300
does not have to output a specific acoustic wave.
In addition, the present modification is particularly suitable for estimating the arrangement of
objects in a room or a crowd having acoustic characteristics that can provide rich reverberation.
[0126]
Also in this modification, as in the previous embodiment, it is possible to estimate the positional
relationship of a plurality of persons as the arrangement of the loudspeakers is estimated.
That is, also in this modification, it is possible to estimate the arrangement of the person who has
made a voice as the arrangement estimation target.
[0127]
4−2. Modification 2 (Another Example of Object Placement Estimation Method) Here,
another example of the object placement estimation method based on the dissimilarity matrix
will be described. In the embodiment described above, the estimation unit 8 (FIG. 2 and the like)
including the placement and derivation unit 6 estimates the placement of the object by applying
the MDS method to the dissimilarity matrix. However, the placement of the object can also be
estimated by methods other than the MDS method.
09-05-2019
37
[0128]
The placement and derivation unit 6 (FIG. 2 and the like) may obtain the placement (an
approximate solution) by, for example, numerically solving a so-called combined optimization
problem by a full search method. That is, the placement and derivation unit 6 (FIG. 2 and the like)
obtains the degree of similarity as the placement approximate solution for all placements (place
placement approximate solution candidates) of a plurality of possible objects (for example, M
microphones). It may be evaluated based on a matrix, and the nearest placement solution
candidate having the highest rating may be output as the placement estimation result.
[0129]
Alternatively, the placement and derivation unit 6 (FIG. 2 and the like) approximates the
placement by numerically solving the combinatorial optimization problem by a local search
method using an algorithm such as, for example, a so-called genetic algorithm. Solution) may be
obtained. That is, the placement and derivation unit 6 (FIG. 2 etc.) is similar to the degree of
fitness as a placement approximate solution for some of the placements (place placement
approximate solution candidates) of a plurality of possible objects (for example, M microphones).
It is also possible to evaluate based on the degree matrix, and output, as a placement estimation
result, the placement approximate solution candidate that is the highest among the estimated
placement alternative solution candidates.
[0130]
As in the above-described embodiment, also in the present modification, information on the
position of the arrangement estimation target and the position of the reference point is not
essential in estimating the arrangement of the object. However, in the case of estimating the
arrangement of the object using the method of full search or local search as in the present
modification, the condition for the position estimation target or the position where the reference
point may exist is set in advance and set. By reducing the number of possible object placements
(placement approximation solution candidates) in accordance with the above conditions, the
placement derivation unit 6 can speed up the derivation of the placement approximation solution
based on the similarity matrix.
09-05-2019
38
[0131]
In the following, an effective method will be described in the case where the approximate
solution is numerically obtained based on the dissimilarity matrix using a full search method or a
local search method.
[0132]
By setting the minimum distance between two adjacent arrangement estimation objects for the
arrangement estimation objects, it is possible to discretize the position where the objects may
exist.
By setting the minimum interval d min as a condition on the position where the placement
estimation target may exist, the number of possible placement approximate solution candidates is
reduced, and the derivation of the placement approximate solution can be accelerated. In
addition, using information on any one reference point, the distance to the object closest to it,
and the distance to the object farthest from the reference point, to limit the spatial range in
which the object may exist. Can dramatically reduce the number of candidate solutions.
[0133]
In Embodiments 1 and 2 described above, the time (acoustic wave arrival time) at which the
acoustic wave emitted from the speaker located at the reference point has reached each
microphone is specified, and a feature vector is generated. In this modification, the time when the
acoustic wave was emitted from the speaker located at the reference point is further specified,
and by doing so, the time taken for the acoustic wave to reach each microphone (acoustic wave
propagation time) You may ask for
[0134]
The microphone recording the shortest acoustic wave propagation time of the acoustic wave
propagation time from the speaker at a certain reference point to each object (microphone) is the
microphone closest to the reference point, and the longest acoustic wave propagation time The
microphone which has recorded is the microphone farthest to the reference point. Here,
assuming that the product of the shortest acoustic wave propagation time and the speed of
09-05-2019
39
sound is the minimum distance R min and the product of the longest acoustic wave propagation
time and the speed of sound is the longest distance R max, all placement objects (microphones)
exist The possible positions are limited to the range of distances from the reference point to R
min or more and R max or less.
[0135]
FIG. 21 shows object position candidate points in the case where the minimum distance d min
between objects, the minimum distance R min from a reference point, and the maximum distance
R max are given as the conditions for the position where the arrangement estimation target may
exist. It is a figure which shows CD (x mark in a figure). The object position candidate point CD
has a minimum distance d outside a sphere sph1 of radius R min centered on a certain reference
point (speaker in the figure) and inside a sphere sph2 of radius R max centered on the same
reference point It distributes with min. In this case, the placement and derivation unit 6 (FIG. 2
and the like) is not similar for each of the placement approximate solution candidates configured
by selecting candidate points of the number (M) of the objects among these object position
candidate points CD. Based on the degree matrix, the fitness as the placement approximation
solution is evaluated, and the placement approximation solution candidate for which a good
evaluation is obtained may be taken as the placement approximation solution. In the case of
using the method of full search, the degree of matching may be evaluated for all possible
alternative placement approximate solutions. In the case of using a local search method, it is
sufficient to select other approximate solution solutions to be evaluated according to a known
algorithm (genetic algorithm etc.).
[0136]
The evaluation of the degree of fitness may be performed as follows. First, for the placement
approximate solution candidate to be evaluated, the distance between the objects is obtained by
calculation, and based on the calculation result, a distance matrix in which each element is
formed by the distance between the objects is derived. Next, the degree of fitness can be
evaluated by evaluating the similarity between the distance matrix thus calculated and the
dissimilarity matrix. In other words, by evaluating the distance matrix that is close to a
proportional relation higher than the relation between the dissimilarity matrix and the distance
matrix, it is possible to evaluate the adaptability of the placement approximation solution
candidate.
09-05-2019
40
[0137]
As a condition on the position where the arrangement estimation target may exist, a condition on
the arrangement form of the object can be added. FIG. 22 is a diagram showing an object
position candidate point CD (x mark in the drawing) in the case where the condition that the
microphone which is the object constitutes a linear microphone array is added. In this case, the
object position candidate points CD are distributed only on the straight line L tangent to the
sphere sph 1 at the candidate points CD near. Further, it is extremely likely that the microphone
with the shortest acoustic wave propagation time and the microphone with the longest acoustic
wave propagation time are located at the candidate point CD near and the candidate point CD far
on the sphere of the sphere sph 2 respectively. Therefore, it is possible to speed up the derivation
of the cloth placement approximate solution by selecting the cloth placement approximate
solution candidate having such a microphone arrangement and performing the local search. In
addition, another measure of the similarity of the feature vector with respect to the reference
point of the feature point, another microphone having a measure similar to that of the
microphone of the candidate point CD near is placed at the candidate point near the candidate
point CD near It is possible to further speed up the derivation of the placement approximate
solution by selecting and performing a local search. This applies to candidate points near the
candidate point CD far as well.
[0138]
FIG. 23 is a diagram showing an object position candidate point CD (x-mark in the drawing) when
the condition that the microphone as the object forms a planar microphone array is added. In this
case, the object position candidate points CD are distributed only on the circle C tangent to the
sphere sph 1 at the candidate points CD near. Further, it is extremely likely that the microphone
with the shortest acoustic wave propagation time and the microphone with the longest acoustic
wave propagation time are located at the candidate point CD near and the candidate point CD far
on the sphere of the sphere sph 2 respectively. Therefore, it is possible to speed up the derivation
of the cloth placement approximate solution by selecting the cloth placement approximate
solution candidate having such a microphone arrangement and performing the local search.
[0139]
FIG. 24 is a diagram showing an object position candidate point CD (x-mark in the drawing) when
the condition that the microphone as the object forms a square microphone array is added. In
09-05-2019
41
this case, the object position candidate points CD are distributed only on the square SQ inscribed
in the circle C tangent to the sphere sph 1 at the candidate points CD near. Further, it is
extremely likely that the microphone with the shortest acoustic wave propagation time and the
microphone with the longest acoustic wave propagation time are located at the candidate point
CD near and the candidate point CD far on the sphere of the sphere sph 2 respectively.
Therefore, it is possible to speed up the derivation of the cloth placement approximate solution
by selecting the cloth placement approximate solution candidate having such a microphone
arrangement and performing the local search.
[0140]
FIG. 25 is a diagram showing an object position candidate point CD (x-mark in the drawing) when
the condition that the microphone which is the object constitutes a spherical microphone array is
added. In this case, the object position candidate point CD is distributed only on the surface of
the sphere sph 3 circumscribing the sphere sph 1 at the candidate point CD near and inscribed
with the sphere sph 2 at the candidate point CD far. Further, it is extremely likely that the
microphone with the shortest acoustic wave propagation time and the microphone with the
longest acoustic wave propagation time are located at the candidate point CD near and the
candidate point CD far, respectively. Therefore, it is possible to speed up the derivation of the
cloth placement approximate solution by selecting the cloth placement approximate solution
candidate having such a microphone arrangement and performing the local search.
[0141]
5. Conclusion The object placement estimation apparatus according to the embodiment of the
present invention can estimate the placement of objects without measuring the distance between
the placement estimation objects. The object arrangement estimation apparatus according to the
embodiment of the present invention uses N reference points (N: 3 or more which can be
arbitrarily selected independently of the position of the object, instead of utilizing the distance
between the arrangement estimation objects. Based on the obtained measure, an N-dimensional
feature vector representing the feature of the position of each object in the real space is
generated, and the feature vector is generated from the feature vector. The dissimilarity matrix is
derived, and the arrangement of objects in the real space (three dimensions) is derived from the
dissimilarity matrix. Therefore, in the embodiment of the present invention, since it is not
necessary to measure the distance between the placement estimation objects, it is possible to
easily and accurately estimate the placement of the objects in various situations. In the
embodiment of the present invention, the real space of each object is increased by increasing the
09-05-2019
42
number of N reference points (N: an integer of 3 or more) which can be arbitrarily selected
independently of the position of the object. It is possible to increase the number of dimensions of
the feature vector indicating the feature of the position in the above, and with the increase of the
number of dimensions, it is possible to improve the accuracy of the arrangement estimation.
[0142]
The embodiment of the present invention is useful, for example, as a device for easily and
accurately confirming the arrangement and cable connection of microphones in a multi-channel
sound collection system.
[0143]
The embodiment of the present invention is useful, for example, as a device for easily and
accurately confirming the speaker arrangement and cable connection in a multi-channel sound
field reproduction system.
[0144]
The embodiment of the present invention can also estimate the layout of a plurality of notebook
PCs using a microphone and a speaker built in the notebook PC.
[0145]
Embodiments of the present invention can also be used as an apparatus for simply and
accurately confirming the arrangement and cable connection of each microphone of a
microphone array for speech recognition.
[0146]
In the above embodiment, the component of the feature vector representing the feature of the
position of the object in the real space is generated as the time when the acoustic wave arrives
from a given reference point.
That is, in the embodiment, each component of the feature vector is an amount having a
dimension of time.
09-05-2019
43
However, feature vectors can be constructed using observables having different dimensions than
time.
For example, the feature vector can be configured based on the amount reflecting the shape of
the reverberation component of the response waveform detected at the microphone.
That is, the feature vector can be configured based on an amount representing the relative
relationship between direct sound and non-direct sound in the response waveform. Thus, in this
case, the dissimilarity matrix is constructed using data representing (non-) similarity of the
response waveform detected in each of the two microphones. In this case, the object placement
estimation apparatus may obtain cross-correlation for each element in the dissimilarity matrix,
and estimate the placement of the placement estimation target in real space based on the
obtained cross-correlation.
[0147]
In addition, the object placement estimation apparatus collects ambient environment sound
including human voice using M microphones, and generates a feature vector based on the
frequency amplitude characteristic of the output signal output from each microphone. May be By
comparing the shapes of the frequency amplitude characteristics of the output signals of a
plurality of microphones (for example, by integrating the difference of the frequency amplitude
characteristics on the frequency axis), the difference in the relative proximity of the multiple
microphones to the speaker Can be quantified. Alternatively, the component of the feature vector
may be determined based on, for example, the ratio of the amplitude of a specific frequency
component (frequency at which the formant of a human voice appears) of the frequency
amplitude characteristic of the output signal output from each microphone. That is, based on the
amplitude of the human voice's formant extracted from the frequency amplitude characteristic of
the output signal, the object placement estimation apparatus determines M proximity between
the microphone that has output the output signal and the person who has issued the voice. The
components of the feature vector can be determined by performing a relative evaluation among
the microphones of. Such a feature vector generation method is convenient for the estimation of
the arrangement in a room with rich reverberation characteristics or in a crowd. It is also
advantageous if the M microphones are arranged over a relatively wide area in the room.
[0148]
09-05-2019
44
In addition, the object placement estimation apparatus according to the embodiment of the
present invention may estimate the object placement using waves such as light and
electromagnetic waves instead of the acoustic waves. In that case, the object placement
estimation apparatus includes, for example, a light emitting element array and a light receiving
element array, or two sets of antenna arrays, and receives waves from the light emitting element
array (or one set of antenna arrays) Alternatively, the arrangement of the light emitting element
array (or the light receiving element array or one set of antenna arrays) can be estimated by
detecting in another set of antenna arrays.
[0149]
Further, the object placement estimation apparatus according to the embodiment of the present
invention may estimate the object placement using a surface wave propagating on the surface of
the object instead of the acoustic wave. In that case, the object arrangement estimation apparatus
comprises, for example, two sets of transducer arrays for converting electrical energy into
vibrational energy, and detects a surface wave from one of the transducer arrays at the other
transducer to obtain a set of transducers. The arrangement of the array can be estimated.
[0150]
The present invention can be used, for example, to confirm the arrangement and cable
connection of a plurality of microphones at a sound collection site.
[0151]
Further, the estimation unit of the object placement estimation apparatus obtains an approximate
solution of placement by numerically, full search or local search method instead of applying the
MDS method to the dissimilarity matrix, and obtains from the obtained placement approximate
solution The arrangement of M microphones in the real space may be estimated and output.
[0152]
Reference Signs List 1 · · · Control unit 2 · · · Impulse generation unit (TSP generation unit) 3 · · ·
Response detection unit 4 · · · Feature vector generation unit 5 · · · Non-similarity matrix
derivation unit 6 · · · Placement setting unit (MDS unit) 7 · · · Layout estimation result output unit
8 · · · Estimation unit 10 · · · Main computer unit 11 · · · CPU 12 · · · ROM 13 · · · RAM 21 · · · HDD
30 · · · User interface unit 31 ... Display 32 ... Keyboard 33 ... Mouse 41 ... Timekeeping unit 50 ...
Audio interface unit 51 ... Audio output unit 52 ... Audio input unit 250 ... Audio interface unit
09-05-2019
45
251 · · · Audio output unit 303 · · · Frequency amplitude characteristic calculation unit 304 · · ·
Feature vector generation unit (modification) 350 · · · Audio interface unit MCa ... Microphone
array MCj ... j-th microphone SPa ... speaker array SPi ... i-th speaker
09-05-2019
46
1/--страниц
Пожаловаться на содержимое документа