Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JPWO2014020921 Abstract: An object arrangement estimation apparatus for estimating the arrangement of M (M is an integer of 2 or more) objects in the real space, and N (N is 3) in the real space for each of the M objects. A feature vector generation unit that generates a feature vector including in its components a measure of an object at N scales representing the closeness to each of the reference points above, and any two of the M objects. And a dissimilarity matrix deriving unit for deriving a dissimilarity matrix of M rows and M columns having elements of the determined norms as an element between the feature vectors of the two objects with respect to a combination of the objects. An estimation unit for estimating the arrangement of M objects in real space based on a degree matrix, and outputting the result as an arrangement estimation result. Object arrangement estimation device [0001] The present invention relates to an object placement estimation apparatus for estimating the placement of an object, and more particularly to an object placement estimation apparatus for estimating the placement of a plurality of objects. [0002] In recent years, a system that presents a realistic sound field using a multi-channel sound field sound collection system and reproduction system has attracted attention. 09-05-2019 1 As such a system, the technology which developed the principle of 22.2 multi-channel sound system and ambisonics from the system with a relatively small number of channels such as 2ch stereo system, binaural system and 5.1ch surround system Various systems have already been proposed, including systems using a large number of channels, such as the 121ch microphone array / 157ch speaker array system used. [0003] When such a system is used to pick up a sound field, it is necessary to check the arrangement of dozens of microphones and the cable connection between the microphone and the recording device at the site of sound collection. Similarly, when playing back a sound field using such a system, it is necessary to check the arrangement of dozens of speakers and the cable connection between the speakers and the playback equipment even in the field of playback. There is. [0004] Therefore, there is a need for a device that can easily check the arrangement and cable connection of a large number of microphones (or a large number of speakers). [0005] Patent Document 1 (US Patent Application Publication No. 2010/0195444) discloses a method of estimating the arrangement of a plurality of speakers. According to the method of Patent Document 1, first, the distance of any pair of loudspeakers is measured for a plurality of loudspeakers whose arrangement is to be estimated, and based on the measurement result, the distance constituted by the distance between each pair of loudspeakers in real space Derive the matrix. And the method of patent document 1 calculates ¦ requires arrangement ¦ positioning in the real space of several speakers by applying a multidimensional scaling method with respect to the distance matrix derived ¦ led-out in this way next. [0006] US Patent Application Publication No. 2010/0195444 09-05-2019 2 [0007] Vikas C. Raykar, Igor Kozintsev, Rainer Lienhart, Position Calibration of Microphones and Loudspeakers in Distributed Position Calibration of Microphones and Loudspeakers in Distributed Computing Platforms, IEEE Transactions On Speech And Audio Processing, p. 1-12 Stanley T. Birchfield, Amarnag Subramanya, Microphone array position calibration by basis point classical multidimensional scaling (Microphone Array Position Calibration by Basis-Point Classical Multidimensional Scaling, IEEE Transactions On Speech And Audio Processing, September 2005, Vol. 13, No. 5, p. 1025-1034 Alessandro Redondi, Alessandro Redondi, Marco Tagliasacchi, Fabio Antonacci, Augusto Sarti, geometric calibration of distributed microphone array Geometric calibration of distributed microphone arrays, MMSP '09, October 5-7, Rio de Janeiro, Brazil, IEEE, 2009, Kazunori KOBAYASHI, Ken'ichi FURUYA, Akitoshi Kataoka (Akitoshi KATAOKA), Blind source localization by using multiple microphones whose position is unknown (A Blind Source Localization by Using Freely Positioned Microphones), Institute of Electronics, Information, and Communication Engineers, Transactions of the Institute of Electronics, Information and Communication Engineers A, June 2003, J86-A, No. 6, p. 619-627 [0008] However, in the prior art, the distance in real space is measured for every pair of placement estimation objects (for example, every pair of a plurality of speakers in Patent Document 1), and each element is a pair of loudspeakers in real space based on the measurement result. Derive a distance matrix which is the distance between Then, the distance matrix thus derived is regarded as the (non-) similarity matrix of the arrangement estimation target, and multidimensional scaling is applied to determine the arrangement of a plurality of speakers in the real space. Therefore, as the number of placement estimation objects increases, the number of distances to be measured becomes enormous, and it is difficult to easily estimate the placement. In addition, the possibility of the occurrence of estimation errors due to measurement errors also increases with the increase in the number of arrangement estimation objects. Furthermore, in some cases, it may be difficult to accurately measure the distance between placement estimation objects, as in the case described in Non-Patent Document 1, and in the conventional method, the placement of the objects is estimated easily and accurately. It was difficult. 09-05-2019 3 [0009] The embodiments of the present invention are made in view of the above problems, and provide an apparatus capable of estimating the arrangement of a plurality of objects in the real space more simply and accurately than in the prior art. It is the purpose. [0010] A first aspect of the present invention is an object arrangement estimation apparatus for estimating the arrangement of M (M is an integer of 2 or more) objects in a real space, and for each of M objects, A feature vector generation unit for generating a feature vector including components of measures for an object at N scales representing the closeness to N (N is an integer of 3 or more) reference points of M; A dissimilarity which obtains a norm between the feature vectors of two objects for every combination of two objects included in the object, and derives a dissimilarity matrix of M rows and M columns having the determined norm as an element The object placement estimation apparatus includes a degree matrix deriving unit, and an estimation unit that estimates the arrangement of M objects in the real space based on the dissimilarity matrix and outputs the result as an arrangement estimation result. [0011] A second aspect of the present invention is an object placement estimation apparatus for estimating placement of an object in real space, which estimates placement of M (M is an integer of 2 or more) targets in real space. A method, in which for each of M objects, component measures of N measures representing the closeness to each of N (N is an integer of 3 or more) reference points in real space The feature vector generation step of generating a feature vector to be included, and the combination of the feature vectors of the two objects for any combination of two objects included in M objects, the norm determined is an element The dissimilarity matrix deriving step of deriving M dissimilarity matrices of M rows and M columns, and estimation of the arrangement of M objects in real space based on the dissimilarity matrices, and estimation as an arrangement estimation result The And-up, it is the object arrangement estimation method with. [0012] A third aspect of the present invention is an object arrangement estimation program for causing a computer to function as an object arrangement estimation apparatus for estimating the arrangement of M (M is an integer of 2 or more) objects in real space, The component includes, for each of the M objects, the measures for the object on N scales that represent the closeness to each of N (N is an integer of 3 or more) reference points in real space. A feature vector generation unit that generates a feature vector, and for a combination of any two objects included in M objects, determines a norm between feature vectors of the two objects, and uses 09-05-2019 4 the determined norm as an element A dissimilarity matrix deriving unit for deriving a dissimilarity matrix of M rows and M columns, and the arrangement of M objects in real space based on the dissimilarity matrix are estimated and output as an arrangement estimation result Tough, it is an object disposed estimation program for functioning as a. [0013] The object arrangement estimation apparatus according to each embodiment of the present invention is characterized in that, for each of the M objects, N scales in each of which represent proximity to each of N (N is an integer of 3 or more) reference points in the real space. Generate a feature vector that includes the measure for the object as a component, determine the norm between the feature vectors of the two objects for any combination of two objects, and use M × M as an element for the determined norm A row dissimilarity matrix is derived, the arrangement of M objects in real space is estimated based on the dissimilarity matrix, and is output as the arrangement estimation result. By doing so, the object placement estimation apparatus according to each embodiment of the present invention can estimate the placement of a plurality of objects in the real space more simply and accurately than in the past. [0014] A block diagram showing a hardware configuration of an object arrangement estimation apparatus according to the first embodiment A block diagram showing a configuration of an object arrangement estimation apparatus according to a first embodiment A schematic diagram showing a relation of various amounts used for microphone arrangement estimation Which shows the flow of processing to be carried out Figure which shows the difference in the sound collection time in each microphone at the time of collecting the time extension pulse signal (TSP signal) emitted from the i-th speaker by multiple microphones Microphone arrangement by 1st Embodiment The figure which shows the experimental environment of the estimation experiment and the speaker arrangement estimation experiment The figure which shows the experimental result of the microphone arrangement estimation experiment by 1st Embodiment The figure which shows the experimental result of the microphone arrangement estimation experiment by 1st Embodiment A microphone arrangement by 1st embodiment Diagram showing the experimental results of the estimation experiment The figure which shows the relationship between the number of speakers used and the estimation result accuracy in the microphone arrangement estimation by A schematic diagram showing the relation of various quantities used for speaker arrangement estimation The figure showing the experimental result of the speaker 09-05-2019 5 arrangement estimation experiment by 1st embodiment The figure which shows the experimental result of the speaker arrangement estimation experiment by embodiment The figure which shows the experimental result of the speaker arrangement estimation experiment by 1st Embodiment The block diagram which shows the hardware constitutions of the object arrangement estimation apparatus by 2nd Embodiment by 2nd embodiment Block diagram showing the configuration of the object arrangement estimation apparatus Diagram showing experiment environment of microphone arrangement estimation experiment according to the second embodiment Diagram showing experiment result of microphone arrangement estimation experiment according to the second embodiment Microphone arrangement estimation experiment according to the second embodiment Diagram showing experimental results Experiment of microphone placement estimation experiment according to the second embodiment The figure which shows the result The figure which shows the relationship of the speaker production frequency and estimation result precision in the microphone arrangement estimation by 2nd Embodiment The block diagram which shows the hardware constitutions of the object arrangement estimation apparatus modification 1 The object arrangement estimation apparatus modification 1 Block diagram showing the configuration Schematic diagram showing the relationship between the microphone group arranged in the room and the people. Flow chart showing the flow of processing performed by the object arrangement estimation device modified example 1 A plurality of microphones collecting sounds including human voice Illustration of frequency and amplitude characteristics of microphone output signal output from each microphone when sound is generated Illustration of distribution of candidate object arrangement points in object arrangement estimation apparatus variation 2 object in object arrangement estimation apparatus variation 2 Example of distribution of object arrangement point candidate Example of distribution of object arrangement point candidate in object arrangement estimation device modification 2 In object arrangement estimation device modification 2 Kicking, in the example view object placed estimator second modification of the distribution of the object placement point candidates, illustration of the distribution of the object arrangement point candidates [0015] Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. [0016] 1. 09-05-2019 6 Overview An embodiment of the present invention is an object placement estimation apparatus for estimating the placement of M (M is an integer of 2 or more) objects in the real space. The object arrangement estimation apparatus measures, for each of M objects, a measure for an object in N scales indicating proximity to each of N (N is an integer of 3 or more) reference points in real space. The norm between the feature vectors of the two objects for the feature vector generation unit that generates the feature vector included in the component and the combination of any two objects included in the M objects, the norm determined And a dissimilarity matrix deriving unit for deriving a dissimilarity matrix of M rows and M columns, and estimating the arrangement of M objects in real space based on the dissimilarity matrix, and outputting as an arrangement estimation result One of its features is to have an estimation unit to [0017] Here, the M objects are, for example, M microphones (M may be an integer of 2 or more as described above). [0018] In that case, in order to estimate the arrangement of M microphones, N speakers are arranged in the space (N may be an integer of 3 or more, as described above). The positions where the N speakers are arranged correspond to the above-mentioned reference points. [0019] The object arrangement estimation apparatus causes a speaker to output a predetermined acoustic wave (for example, by outputting a time stretching pulse signal (TSP signal) to a speaker in order to measure an impulse response of the microphone), and in each microphone from the speaker The time when the emitted acoustic wave reaches for the first time, for example, the time when the waveform (for example, impulse response waveform) corresponding to the acoustic wave first appears (the time of acoustic wave arrival) at the output from the microphone is specified. 09-05-2019 7 That is, in this case, N scales representing the closeness to each of the N reference points are time coordinate axes used when specifying the time of arrival of the acoustic wave from each speaker, and for the object Is the arrival time of the acoustic wave at each microphone. [0020] Then, based on the specified acoustic wave arrival time, an N-dimensional vector (a feature vector having as a component the time when the acoustic wave has arrived from N reference points for each of the microphones (objects) based on the specified acoustic wave arrival time Generate). Here, the time at which the acoustic wave emitted at each reference point has reached the microphone is regarded as a measure in the scale (the above-mentioned time coordinate axis) representing the closeness of the microphone and each reference point in real space, and the feature vector generation unit Generates an N-dimensional feature vector having the same dimensionality as the number N of reference points for each microphone. That is, the feature vector is a measure (acoustic) on N scales (time coordinate axes) representing the closeness of the position of the microphone in the real space to the microphone and the N reference points (speakers) in the real space. It is a vector expression of what was expressed by wave arrival time). [0021] Alternatively, in order to generate a feature vector, the object arrangement estimation apparatus collects ambient environment sound including human voice using M microphones, and the frequency amplitude characteristic of the output signal output from each microphone is calculated. It may be calculated. According to this method, it is not necessary to output a predetermined acoustic wave from the speakers disposed at the N reference points as described above. The frequency-amplitude characteristics of the ambient environment sound including human voice include a component of human voice in the form of being superimposed on the components of noise (such as indoor reverberation and outdoor noises) that are ubiquitous in the sound collection environment. Formants appear. As for the shape of the frequency amplitude 09-05-2019 8 characteristic, as the position of the microphone moves away from the speaker, the influence of noise increases and moves away from the shape of the frequency characteristic of the original formant. Therefore, by comparing the shape of the frequency amplitude characteristic of the output signal of each microphone, it is possible to know the relative proximity of the multiple microphones to the speaker. For example, the integral value on the frequency axis of the difference between the frequency amplitude characteristics of the output signals of the two microphones, the difference between the measures of the two microphones on a scale defining the proximity to the speaker (for the speaker It can be considered as dissimilarity). This is because, even if the positions of the two microphones are different, the amplitude components derived from noise appearing in the frequency amplitude characteristics of the output signals are almost the same. Therefore, the amplitude component derived from noise can be canceled by obtaining the difference between the frequency amplitude characteristics. Thus, the difference in frequency amplitude characteristics includes information on the difference in proximity to the speaker. As a matter of course, from the difference between the frequency amplitude characteristics of any two microphones thus determined, the object placement estimation apparatus measures each microphone on a scale that defines the proximity to the speaker ( It is also possible to determine the component of the feature vector with respect to the speaker. [0022] Alternatively, for example, when the distance between the speaker and the microphone is doubled, it is known that the formant amplitude appearing in the frequency amplitude characteristic decreases by approximately 6 dB. Based on this relationship, the feature vector generation unit of the object placement estimation apparatus identifies the formant component of the speaker from the frequency and amplitude characteristics of the output signal of each microphone, and determines each of the identified formant amplitudes as a scale. A measure of proximity to the speaker's position (corresponding to a reference point) about the microphone may be derived to determine the components of the feature vector. As described above, three or more reference points are required. Therefore, the object placement estimation apparatus collects voices uttered by the speaker at N (N is 3 or more) different positions using different microphones and generates an N-dimensional feature vector. . [0023] Then, the dissimilarity matrix deriving unit derives a matrix (dissimilarity matrix) of M rows and M columns having elements between the feature vectors and norms between the feature vectors for every pair of M microphones. . 09-05-2019 9 [0024] Finally, the estimation unit estimates the arrangement of M microphones in the real space based on the dissimilarity matrix, and outputs the result as an arrangement estimation result. The estimating unit applies, for example, multidimensional scaling (MDS: MultiDimensional Scaling) to the dissimilarity matrix to obtain the placement of M microphones, and estimates the placement of the M microphones in the real space from the obtained placement. Output. [0025] Alternatively, instead of applying the MDS method to the dissimilarity matrix to obtain the placement of the M microphones, the estimation unit of the object placement estimation apparatus numerically approximates the placement solution by a method of full search or local search. The arrangement of M microphones in the real space may be estimated and output from the obtained approximate solution solution. In this case, the estimation unit derives a distance matrix in which each element is a distance between the microphones, and compares the derived distance matrix with the dissimilarity matrix, for the approximate solution candidate of the placement of M microphones. The degree of matching of the approximate solution candidate may be evaluated, and the approximate solution candidate showing the highest degree of matching among the evaluated approximate solution candidates may be taken as the approximate solution. [0026] In the object arrangement estimation apparatus according to the embodiment of the present invention, the positions of the N reference points described above may be any N points in order to estimate the arrangement of M objects, and the actual positions of the reference points may be determined. Information on the position in space (for example, the coordinate value of the reference point in the real space) is also unnecessary. Therefore, the object placement estimation apparatus according to the embodiment of the present invention can estimate the placement of M objects without measuring the distance between the pair of placement estimation targets. Therefore, it is possible to estimate the arrangement of M objects extremely simply. Furthermore, in the object arrangement estimation according to the embodiment of the present invention, first, the feature of the position of the object in the real space is first set as an N-dimensional vector quantity (feature vector) equal to the number N of reference points (for example, the number of 09-05-2019 10 speakers). The (de) similarity of the position in the real space of the object is derived from the Ndimensional feature vector generated and defined. Therefore, in the object arrangement estimation according to the embodiment of the present invention, the arrangement estimation accuracy of the M objects is improved as the number (N) of reference points increases. [0027] In the object placement estimation apparatus according to the embodiment of the present invention, N microphone points are generated at positions corresponding to N reference points for M microphones corresponding to M placement estimation objects. Although it is necessary to output predetermined acoustic waves from the speakers, this does not necessarily mean that N speaker units need to be prepared. In the object arrangement estimation apparatus according to the embodiment of the present invention, predetermined acoustic waves may be output at N positions using less than N speaker units (for example, one speaker unit). [0028] In addition, instead of the microphone, M objects may be, for example, M speakers (M may be an integer of 2 or more as described above). In this case, N microphones are arranged, and the position at which each microphone is arranged is regarded as the above-mentioned reference point to generate a feature vector for each speaker, and M feature vectors for M speakers are generated. The dissimilarity matrix of the position in space may be derived, and the arrangement of the speakers in the real space may be estimated from the dissimilarity matrix. [0029] According to the object arrangement estimation apparatus according to the embodiment of the present invention, M persons (M is an integer of 2 or more) in the object arrangement estimation apparatus for estimating the arrangement of the object in the real space can be The method of estimating the arrangement of the object in the real space can also be known well. [0030] Also, according to the following description, a person skilled in the art can use a computer program for causing a computer to function as an object arrangement estimation apparatus for estimating the arrangement of M (M is an integer of 2 or more) objects in real space. You can know the composition well. 09-05-2019 11 [0031] 2. First Embodiment 2-1. Configuration FIG. 1 is a block diagram showing the configuration of the object placement estimation apparatus according to the first embodiment. The object arrangement estimation apparatus according to the first embodiment includes a central processing unit (CPU) 11 capable of performing predetermined data processing by executing a program, and a read only memory (ROM) 12 storing the program. A random access memory (RAM) 13 for storing various data, a hard disk drive (HDD) 21 as an auxiliary storage device, a display 31 as an output device, a keyboard 32 and a mouse 33 as input devices, It has a time measuring unit 41 that measures time, and an audio interface unit 50 including an audio output unit 51 and an audio input unit 52, which are input / output interfaces with external audio devices (speakers and microphones). The audio interface unit 50 includes a speaker array SPa including N external speakers (SP 1, SP 2,..., SP N), and M external microphones (MC 1, MC 2,..., MC). A microphone array MCa consisting of M) is connected. [0032] The CPU 11, the ROM 12, and the RAM 13 constitute a computer main unit 10. [0033] The display 31, the keyboard 32 and the mouse 33 constitute a user interface unit 30. The user interface unit 30 may be configured by a display panel with a touch panel function or the like. [0034] 09-05-2019 12 FIG. 2 is a block diagram clearly showing functional blocks implemented by the computer main part 10 of the object placement estimation apparatus 100 according to the first embodiment. The CPU 11 of the computer main unit 10 reads out and executes the object arrangement estimation program stored in the ROM 12 to thereby execute the control unit 1, the impulse generation unit (TSP generation unit) 2, the response detection unit 3, the feature vector generation unit 4, It can operate as the dissimilarity matrix deriving unit 5, the placement deriving unit (MDS unit) 6, and the arrangement estimation result output unit 7. The placement and derivation unit (MDS unit) 6 and the arrangement estimation result output unit 7 constitute an estimation unit 8. [0035] The object arrangement estimation program does not necessarily have to be stored in the ROM 12. The object arrangement estimation program may be stored in the HDD 21 (FIG. 1), read by the CPU 11 as appropriate, and executed. Further, the object arrangement estimation program may be appropriately downloaded from an external storage device (not shown) via a network (not shown) and executed by the CPU 11. Alternatively, the object arrangement estimation program may be stored in a portable storage medium such as a flexible disk, an optical disk, a flash memory or the like (not shown). In that case, the program stored in the portable storage medium may be read from the same medium by the CPU 11 and executed. Alternatively, one end may be installed on the HDD 21 or the like prior to execution. [0036] The control unit 1 is realized by the CPU 11 executing an object placement estimation program. The control unit 1 monitors the progress of the operation concerning the object arrangement estimation and controls the entire apparatus 100. [0037] The impulse generation unit (TSP generation unit) 2 is realized by the CPU 11 executing an object arrangement estimation program. The impulse generation unit (TSP generation unit) 2 generates and outputs a signal for selectively outputting a predetermined acoustic wave to one or more speakers of the speaker array SPa connected to the audio output unit 51. . The signal is, for example, a signal (TSP signal) having a pulse-shaped waveform (time-stretched pulse waveform 09-05-2019 13 (TSP waveform)). [0038] The response detection unit 3 is realized by the CPU 11 executing an object placement estimation program. The response detection unit 3 outputs the predetermined acoustic waves (for example, output from the speaker according to the TSP signal) for each of the M inputs from the M microphones of the microphone array MCa connected to the audio input unit 52. The response waveform to the acoustic TSP wave is detected (the impulse response waveform to the acoustic TSP wave is detected), and the time (acoustic wave arrival time) at which the response waveform is detected in each of the M microphones is referred to Identify. [0039] The feature vector generation unit 4 is realized by the CPU 11 executing an object placement estimation program. The feature vector generation unit 4 receives the acoustic wave arrival time specified by the response detection unit 3 and generates an N-dimensional feature vector for each of the M microphones (objects). [0040] The dissimilarity matrix deriving unit 5 is realized by the CPU 11 executing an object placement estimation program. The dissimilarity matrix deriving unit 5 obtains a norm between feature vectors of two microphones for a combination of any two objects (microphones) of M microphones. Then, the dissimilarity matrix deriving unit 5 derives a M-by-M dissimilarity matrix having the determined norm as an element. [0041] The cloth placement and derivation unit (MDS unit) 6 is realized by the CPU 11 executing an object placement estimation program. The placement and derivation unit (MDS unit) 6 derives the placement of M microphones in the real space based on the dissimilarity matrix. The placement and derivation unit (MDS unit) 6 derives the placement of M microphones, for 09-05-2019 14 example, by applying multidimensional scaling (MDS) to the dissimilarity matrix. [0042] The placement estimation result output unit 7 is realized by the CPU 11 executing an object placement estimation program. The placement estimation result output unit 7 performs linear conversion operations such as enlargement, reduction, rotation, etc. on the placement obtained by the placement derivation unit 6, estimates the placement of M microphones in the real space, and outputs as a placement estimation result Do. The placement and derivation unit (MDS unit) 6 and the arrangement estimation result output unit 7 constitute an estimation unit 8 of the object arrangement estimation apparatus according to the present embodiment. [0043] The control unit 1, the impulse generation unit (TSP generation unit) 2, the response detection unit 3, the feature vector generation unit 4, the dissimilarity matrix derivation unit 5, the placement derivation unit (MDS unit) 6, the arrangement estimation result output unit 7 At least any one of may be realized by a dedicated hardware circuit. [0044] 2−2. Operation in Microphone Placement Estimation Now, with reference to FIG. 3, FIG. 4, and FIG. 5, microphone placement estimation performed by the object placement estimation apparatus according to the present embodiment will be described. [0045] FIG. 3 is a schematic view showing the relationship among M microphones as objects of arrangement estimation, N speakers arranged at positions corresponding to reference points, and various amounts. In the drawing, only two microphones (MC 1, MC 2) and four speakers (SP 1, SP 2, SP 3, SP 4) are shown for simplicity. Here, p ij indicates the time (acoustic wave arrival time) at which the acoustic wave (TSP wave) emitted from the i-th speaker SP i has reached the j- 09-05-2019 15 th microphone (MC j). d MC12 is an N-dimensional feature vector p MC1 of the first microphone MC 1 having N acoustic wave arrival times pi 1 (i: 1 to N) at the first microphone MC 1 and a second microphone MC 2 The norm between N-dimensional feature vector pMC2 of 2nd microphone MC2 which makes N acoustic wave arrival time pi2 (i: 1-N) a component is shown. ここでのノルムは、たとえば、ユークリッドノルムである。 [0046] FIG. 4 is a flowchart of processing for microphone arrangement estimation performed by the object arrangement estimation apparatus. [0047] The control unit 1 (CPU 11) of the object placement estimation apparatus sets a variable i to 1 as an initial setting operation and stores the variable i in the RAM 13 (S1). [0048] Next, the impulse generation unit 2 (CPU 11) reads the value of the variable i and the TSP waveform stored in the RAM 13, and outputs the TSP waveform to the i-th speaker SP i connected via the audio output unit 51. The acoustic wave signal which it has is output (S2). Thereby, an acoustic TSP wave is output from the ith speaker SP i. [0049] In FIG. 5, the acoustic wave TSP emitted from the i-th speaker SP i is collected by each microphone (MC 1, MC 2,..., MC j,..., MC M-1, MC M) Is a chart showing how it The time chart on the side of the i-th speaker SP i shows the acoustic TSP wave outputted from the i-th speaker SP i and each microphone (MC 1, MC 2,..., MC j,..., MC M- The time chart next to 1 and MC M) shows the signals output from each. [0050] 09-05-2019 16 In step S2 described above, when an acoustic wave signal is input to the i-th speaker SP i, a predetermined acoustic wave TSP is output from the same speaker into the air. The acoustic wave propagates in the air at the speed of sound and is collected by each of the microphones (MC 1, MC 2,..., MC j,..., MC M-1, MC M). For example, in the output from the first microphone MC 1, a response waveform R i1 to the acoustic wave appears in the vicinity of the time point p i1 on the time coordinate T i. Further, in the output from the j-th microphone MC j, a response waveform R ij for the acoustic wave appears in the vicinity of the time p ij. The outputs from the microphones (MC 1, MC 2,..., MC j,..., MC M−1, MC M) are stored in the RAM 13. [0051] Returning to FIG. 4, the response detection unit 3 (CPU 11) outputs the output of each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M) to the audio input unit 52. , Or from the RAM 13, and specify the time when the peak of the response waveform appears at each output as the acoustic wave arrival time p ij on the time coordinate axis T i for the microphone MC j (j: 1 to M) To do (S3). The acoustic wave arrival time may be determined based on other characteristics of the response waveform (such as the timing of rising and the timing of exceeding a predetermined sound pressure level). The identified acoustic wave arrival time is stored in the RAM 13. [0052] Next, the controller 1 (CPU 11) determines whether or not the value of the variable i is N or more. If the value of i is less than N, the process returns to step S2 via step S5. On the other hand, if the value of i is N or more, the process proceeds to step S6. [0053] In step S5, the value of variable i is incremented by 1 (i → i + 1), and a new value of variable i is stored in RAM 13. Therefore, in step S2 executed next, an acoustic TSP wave is emitted from the speaker SP (i + 1) of the next number of the speaker from which the acoustic wave was emitted in the previous step S2, and each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M), and output as a response waveform. Then, in step S 3, the response detection unit 3 uses the time 09-05-2019 17 coordinate axis T i + 1 for the acoustic wave arrival time p i + 1, j at each microphone MC j (j: 1 to M) of the acoustic TSP wave emitted from the speaker. Identify. Here, it is used to specify the time coordinate axis T i which is a scale used to specify the arrival time of the acoustic wave from the i-th speaker SP i and the arrival time of the acoustic wave from the (i + 1) -th speaker SP i + 1 The time coordinate axis T i + 1, which is a measure, may be the same or may be different from each other. [0054] In this manner, the object placement estimation apparatus is emitted from each speaker (SP 1, SP 2,..., SP N-1, SP N) by repeating the processing of step S2 to step S5 N times. The acoustic wave reaches each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC M) at times p ij (i: 1 to N, j: 1 to M) ) Is specified on any time axis. Here, it should be noted that, in the embodiment of the present invention, the response detection unit 3 includes each microphone (MC 1, MC 2,..., MC j,..., MC M−1, MC It is only necessary to specify the time on an arbitrary time coordinate axis at which the acoustic wave has reached M), and the acoustic wave is actually each speaker (SP 1, SP 2,..., SP N-1, SP N It is not necessary to obtain the time width required to reach each of the microphones (MC 1, MC 2,..., MC j,..., MC M−1, MC M). Therefore, in the object arrangement estimation apparatus according to the embodiment of the present invention, it is not necessary to specify the time when the acoustic wave is emitted from each of the speakers (SP 1, SP 2,..., SP N1, SP N). Therefore, in the object arrangement estimation apparatus, the object arrangement is estimated due to an error in identifying the time when the acoustic wave is emitted from each of the speakers (SP 1, SP 2,..., SP N-1, SP N). There are no errors in the results. [0055] Next, the feature vector generation unit 4 (CPU 11) receives the acoustic wave arrival time (p ij (i: 1 to N, j: 1 to M)) specified by the response detection unit 3 and inputs M microphones MC An Ndimensional feature vector p MCj is generated for each of j (j: 1 to M) (S6). The generated feature vector p MCj is stored in the RAM 13. [0056] The N-dimensional feature vector p MCj represents the feature of the position of the j-th microphone MC j in the real space by an N-dimensional scale that represents the closeness to 09-05-2019 18 each of N speakers S p i (i: 1 to N). It is a thing. Specifically, the feature vector p MCj is That is, the scale indicating the proximity to the i-th speaker SP i (i: 1 to N) is the time when the acoustic wave has arrived from the i-th speaker SP i to each microphone MC j (j: 1 to M) here Is the time coordinate axis T i (FIG. 5) used by the response detection unit 3 in specifying the frequency, and the measure for the jth microphone MC j in each scale is used by the response detection unit 3 to specify the acoustic wave arrival time. It is the acoustic wave arrival time p ij on the time coordinate axis T i (FIG. 5). [0057] The N scales used to construct the N-dimensional feature vector may not be time coordinate axes. For example, the measure may be a real space distance. Also, for example, the measure may be the peak level of the response waveform detected at each microphone. Also, for example, the measure may be an amount that characterizes the shape of the response waveform detected at each microphone. Also, for example, the measure may be an amount that characterizes the nondirect sound (reversing component) detected at each microphone. [0058] Next, the dissimilarity matrix deriving unit 5 (CPU 11) generates the dissimilarity matrix based on the N-dimensional feature vectors p MCj for M microphones generated by the feature vector generating unit 4 and stored in the RAM 13 Generate D (S7). The generated dissimilarity matrix D is stored in the RAM 13. [0059] The dissimilarity matrix D has its feature vector (p) for every two combinations of M microphones MC j (j: 1 to M) (for example, microphones MC k and microphones MC l) that are objects of arrangement estimation. It is a matrix of M rows and M columns which takes the norm d MCkl of MCk and p MCl) as an element. [0060] That is, each element d MCkl is 09-05-2019 19 Therefore, the dissimilarity matrix is obtained by determining the dissimilarity of the positions of the two microphones in real space based on the N-dimensional feature vector P MCj (j: 1 to M). It is a matrix which shows the dissimilarity of the position in real space. [0061] Next, the placement and derivation unit (MDS unit) 6 (CPU 11) derives the placement of M microphones by applying multidimensional scaling (MDS: MultiDimensional Scaling) to the dissimilarity matrix D. The derived layout is stored in the RAM 13. [0062] The placement and derivation unit (MDS unit) 6 first obtains an M × M matrix D <(2)> having d MCkl <2> as an element. [0063] Next, the placement and derivation unit (MDS unit) 6 uses the M × M centering matrix H having h kl represented by the following equation using Kronecker delta δ kl as Find an M × M matrix B to be represented. [0064] And finally, the placement and derivation unit (MDS unit) 6 solves the following eigenvalue problem for B to obtain the placement about the axis of the rth dimension of M microphones, and among these, the axis of the rth dimension The layout with respect to the vector x r (r: 1, 2, 3) for (r = 1, 2, 3) is used to derive a layout matrix X of M rows and 3 columns. Thereby, arrangement of M microphones MC j (j: 1 to M) in the real space (three-dimensional space) can be obtained. [0065] 09-05-2019 20 The placement matrix X derived by the placement and derivation unit (MDS unit) 6 is obtained by adding linear transformations (enlargement / reduction, rotation, inversion (mirroring), etc.) to the actual arrangement of M microphones. Have Therefore, the placement estimation result output unit 7 reads out the placement matrix X derived by the placement derivation unit (MDS unit) 6 from the RAM 13 and performs appropriate linear transformation on this to determine the actual placement of M microphones. The determined arrangement is stored in the RAM 13. [0066] If the variance of the coordinates of the arrangement of M microphones in the real space is known, the arrangement estimation result output unit 7 obtains the arrangement variance of the arrangement matrix X for each coordinate axis of the arrangement matrix X, and determines the arrangement matrix X The variance of the placement of any of the three coordinate axes scales the values of the three coordinates of the placement matrix to match the known variance described above. [0067] Alternatively, with regard to the arrangement of M microphones in the real space, for example, when the distance between the two microphones farthest from the certain coordinate axis is known, the arrangement estimation result output unit 7 The values of the three coordinate axes of the placement matrix are scaled to match the placement of the two microphones at which the placement values for the coordinate axes are furthest apart correspond to the known distances described above. [0068] Thus, based on the information (for example, the information on the positions of M objects in arbitrary three real spaces) known about the position of the object of the arrangement estimation in the real space, the arrangement estimation result output unit 7 A linear transformation can be performed on the placement matrix X to estimate and output the arrangement of the arrangement estimation target in the real space. There is a case where the placement indicated by the placement matrix X and the coordinates in the real space have a mirror image relationship. 09-05-2019 21 In that case, the placement estimation result output unit 7 may make the placement of the placement matrix X coincide with the coordinates in the real space by inverting the positive and negative values of any one coordinate axis of the placement matrix X. [0069] 2−3. Results of Microphone Placement Estimation Experiment Hereinafter, the results of a plurality of microphone placement estimation experiments by the object placement estimation apparatus according to the present embodiment will be described. [0070] In this experiment, as shown in FIG. 6, an 80 channel microphone array MCa was arranged in a sound field reproduction environment in which 96 channel speaker arrays (SPa1, SPa2, SPa3 and SPa4) were arranged. The microphone array MCa has one omnidirectional microphone (DPA 4060-BM) disposed at each node in a frame structure of about 46 centimeters in diameter having a C80 fullerene structure. The sound field reproduction environment configured by the 96channel speaker system consists of 90 speakers mounted on a rectangular parallelepiped enclosure (fostex FE103En) on the wall of a room with a regular hexagonal cross section and six mounted on the ceiling. . [0071] In such an experimental environment, an experiment was conducted to estimate the arrangement of 80 microphones using the object arrangement estimation apparatus according to the present embodiment. In this experiment, the conditions for outputting and detecting the acoustic wave were TSP length 8192 [pnt], TSP response length 32768 [pnt], sampling frequency 48000 [Hz], and quantization bit rate 16 [bit]. [0072] 09-05-2019 22 The results of the experiment are shown in FIGS. 7A, 7B and 7C. Each of them is a view of the result as viewed from directly above, directly in front of, and just beside (a direction rotated 90 degrees in the horizontal direction from directly in front). In each figure, the actual microphone position is indicated by a cross, and the placement estimation result is indicated by a circle. [0073] Further, for each microphone, the deviation between the actual position and the position of the estimation result was determined, and the average value was determined as an error evaluation value [mm]. In this experiment, the error evaluation value was 4.8746 [mm]. From the experimental results, it was found that the object placement estimation apparatus according to the present embodiment can output estimation results with sufficient accuracy to determine the placement of microphones and the correctness of cable connection. [0074] 2−4. The relationship between the number of loudspeakers and estimation error in microphone placement estimation The number of loudspeakers (impulse response waveform (TSP waveform)) used in the method of object placement estimation according to the present embodiment The relationship with the accuracy of [0075] In order to investigate the relationship between the number of speakers and the accuracy of the object arrangement estimation, the experiment was performed multiple times while changing the number of speakers. FIG. 8 is a graph in which the result is plotted as the number of loudspeakers on the horizontal axis and the above-mentioned error evaluation value for each estimation result on the vertical axis. From FIG. 8, it is understood that in the present embodiment, the accuracy of the object arrangement estimation is monotonously improved by increasing the number of speakers (the number of reference points described above) used for the object arrangement estimation. In particular, it can be seen that the accuracy of the object arrangement estimation is significantly improved until the number of speakers exceeds 10. From this, it is understood that in the object arrangement estimation according to the present embodiment, the arrangement estimation result can be obtained with good accuracy by setting the number of speakers (the number of the above-mentioned reference points) to about 10 or 09-05-2019 23 more. [0076] 2−5. Operation in Speaker Arrangement Estimation As described above, the object arrangement estimation apparatus according to the present embodiment uses the N microphones arranged at positions corresponding to the N reference points to arrange the arrangement of M speakers. It is also possible to estimate. Hereinafter, with reference to FIG. 9, FIG. 10A, FIG. 10B, and FIG. 10C, the principle of the speaker arrangement estimation and the experimental result thereof will be described. [0077] FIG. 9 is a schematic view showing the relationship among M speakers as objects of arrangement estimation, N microphones arranged at positions corresponding to reference points, and various amounts. In the drawing, only two microphones (MC 1, MC 2) and four speakers (SP 1, SP 2, SP 3, SP 4) are shown for simplicity. Here, p ij indicates the time (acoustic wave arrival time) at which the acoustic wave (TSP wave) emitted from the i-th speaker SP i has reached the j-th microphone (MC j). d SP12 is the time (acoustic wave arrival time) p 1j (j: 1 to N) at which the acoustic wave emitted from the first speaker SP 1 reaches each of N microphones MC j (j: 1 to N) N-dimensional feature vector p SP1 of the first speaker SP 1 as a component and the time (acoustic wave) when the acoustic wave emitted from the second speaker SP 2 reaches each of N microphones MC j (j: 1 to N) Arrival time) Indicates the norm between the second speaker SP 2 and the N-dimensional feature vector p SP2 of the second speaker SP 2 having p 2j (j: 1 to N) as components. Similarly, d SP23 and d SP34 respectively represent the norm between the Ndimensional feature vector p SP2 of the second speaker SP 2 and the N-dimensional feature vector p SP3 of the third speaker SP 3, and the third speaker The norm between N-dimensional feature vector pSP3 of SP3 and N-dimensional feature vector pSP4 of 4th speaker SP4 is shown. [0078] In the speaker arrangement estimation, the feature vector generation unit 4 determines the positions at which the microphones MC j (j: 1 to N) are arranged for M speakers S p i (i: 1 to M) that are arrangement estimation objects. A feature vector p SPi (i: 1 to M) is generated for each speaker SP i (i: 1 to M) by regarding it as the above-mentioned reference point, and a real inspace position for M speakers from M feature vectors Derive the dissimilarity matrix of and estimate the placement of the loudspeaker in real space from the dissimilarity matrix. 09-05-2019 24 [0079] Therefore, in this case, the N-dimensional feature vector p SPi is an N-dimensional feature representing the closeness to the N microphones MC j (i: 1 to N) as the feature of the position of the i-th speaker SP i in the real space. It is represented by a scale. Specifically, the feature vector p SPi is [0080] Next, the dissimilarity matrix deriving unit 5 obtains the norm between the feature vectors of the two loudspeakers, for any combination of two objects among the M loudspeakers. Then, the dissimilarity matrix deriving unit 5 derives a M-by-M dissimilarity matrix having the determined norm as an element. [0081] Specifically, the dissimilarity matrix deriving unit 5 (CPU 11) generates the dissimilarity matrix D based on the N-dimensional feature vector p SPi. [0082] Therefore, each element d MCkl of the dissimilarity matrix D is Therefore, the dissimilarity matrix is obtained by determining the dissimilarity between the positions of the two speakers in the real space based on the N-dimensional feature vector P SPi (i: 1 to M). It is a matrix which shows the dissimilarity of the position in real space. [0083] Then, the placement and derivation unit (MDS unit) 6 (CPU 11) derives the placement of M speakers by applying multidimensional scaling (MDS: MultiDimensional Scaling) to the 09-05-2019 25 dissimilarity matrix D. [0084] Furthermore, the placement estimation result output unit 7 performs appropriate linear transformation on the placement matrix X derived by the placement derivation unit (MDS unit) 6 to determine the actual placement of M speakers. [0085] 2−6. Results of Speaker Placement Estimation Experiment Hereinafter, the results of a placement estimation experiment of a plurality of speakers by the object placement estimation apparatus according to the present embodiment will be described. Note that the experimental environment is the same as the previous microphone placement estimation experiment, so the description will be omitted. [0086] The results of the experiment are shown in FIGS. 10A, 10B and 10C. Each of them is a view of the result as viewed from directly above, directly in front of, and just beside (a direction rotated 90 degrees in the horizontal direction from directly in front). In each of the figures, the actual position of the speaker is indicated by a cross, and the arrangement estimation result is indicated by a circle. [0087] Moreover, about each speaker, the shift ¦ offset ¦ difference of an actual position and the position of an estimation result was calculated ¦ required, and the average value was calculated ¦ required as error evaluation value [mm]. In this experiment, the error evaluation value was 23.5486 [mm]. This value is larger in error than the error evaluation value of 4.8746 [mm] in the microphone 09-05-2019 26 placement estimation experiment performed in the same experiment environment. However, in consideration of the size of the speaker unit (for example, the size of the diaphragm), the arrangement interval of the speaker units, the size of the speaker array, etc. It can be said that it has sufficient accuracy to determine whether the connection is correct or not. [0088] 3. Second Embodiment 3-1. Configuration In the second embodiment of the present invention, the portability is improved in comparison with the first embodiment, and the object arrangement capable of checking the arrangement of the microphone array and the cable connection easily and accurately at various sound collecting sites It is an estimation device. [0089] 11 and 12 are block diagrams showing the configuration of the object placement estimation apparatus according to the second embodiment. The object arrangement estimation apparatus according to the second embodiment has the same configuration as the object arrangement estimation apparatus according to the first embodiment, but the audio interface unit 250 including the audio output unit 251 includes one external speaker SP (SP 1 In that the object arrangement estimation apparatus according to the first embodiment is connected. Here, the speaker SP is, for example, a small-sized speaker (for example, Audio-Technica AT-SPG 50) with high portability. [0090] 3−2. Operation of Microphone Arrangement Estimation In this embodiment, a predetermined acoustic wave is output using one speaker SP 1, and after the acoustic wave is output, the speaker SP 1 is moved to obtain predetermined positions at a plurality of positions. Acoustic waves are output, and for each acoustic wave, a response waveform in each of the M microphones MC j (j: 1 to M) is detected, and the acoustic wave arrival time is measured. In this manner, in the present embodiment, by outputting acoustic waves from the speaker SP 1 at N positions, N microphones MC j (j: 1 to M) are provided as in the first embodiment. An Ndimensional feature vector is generated using a measure that represents the closeness to the reference point of. However, the number of speakers in the present embodiment is not limited to one, and may be plural. 09-05-2019 27 [0091] The object arrangement estimation apparatus according to the present embodiment measures the arrival time of a predetermined acoustic wave from one speaker SP 1 at each microphone MC j (j: 1 to M), and conveniently determines a predetermined number of times from N positions. The arrival time of the acoustic wave is measured at each microphone MC j (j: 1 to M). The N positions here correspond to the reference points described above. Then, the feature vector generation unit 4 generates feature vectors p MCj (j: 1 to M) for the microphones MC j (j: 1 to M) as in the first embodiment. [0092] Similar to the first embodiment, the dissimilarity matrix deriving unit 5 derives the dissimilarity matrix D from the generated feature vectors p MCj (j: 1 to M), and the estimating unit 8 (the cloth derivation unit 6). The placement estimation result output unit 7) estimates the placement of M microphones in real space from the dissimilarity matrix D and outputs the result. [0093] As described above, the object placement estimation apparatus according to the second embodiment is superior in portability to the object placement estimation apparatus according to the first embodiment in that the large-scale speaker array SPa is not used, and various features are also provided. This is advantageous in that microphone placement estimation can be performed at a sound collection site. [0094] 3−3. Results of Microphone Placement Estimation Experiment Hereinafter, the results of a plurality of microphone placement estimation experiments by the object placement estimation apparatus according to the present embodiment will be described. [0095] 09-05-2019 28 In this experiment, as shown in FIG. 13, 80 channels of microphone array MCa are arranged near the lower part of Tokyo Cathedral St. Mary's Cathedral, and speaker SP 1 (audio technica ATSPG50) is not shown by hand. The acoustic wave was output while moving it to various positions. In this experiment, the conditions for outputting and detecting the acoustic wave were TSP length 8192 [pnt], TSP response length 105600 [pnt], sampling frequency 48000 [Hz], and quantization bit rate 16 [bit]. [0096] The results of the experiment are shown in FIGS. 14A, 14B and 14C. Each of them is a view of the result as viewed from directly above, directly in front of, and just beside (a direction rotated 90 degrees in the horizontal direction from directly in front). In each figure, the actual microphone position is indicated by a cross, and the placement estimation result is indicated by a circle. [0097] Further, for each microphone, the deviation between the actual position and the position of the estimation result was determined, and the average value was determined as an error evaluation value [mm]. In this experiment, the error evaluation value was 13.5148 [mm] as an average value of a plurality of experiments. From this experimental result, it was found that the object placement estimation apparatus according to the present embodiment can also output the estimation result with sufficient accuracy for determining the placement of the microphone and the correctness of the cable connection. [0098] 3−4. The relationship between the number of speaker productions and estimation error in microphone arrangement estimation In the method of object arrangement estimation according to the present embodiment, the number of times the acoustic wave (impulse response waveform (TSP waveform)) is output (ie The relationship between the number of reference points described 09-05-2019 29 above) and the accuracy of the result of the object arrangement estimation will be described. [0099] In order to investigate the relationship between the number of times the acoustic wave was output from the speaker and the accuracy of the object arrangement estimation, the number of times of output was changed to conduct a plurality of experiments. In addition, the position which outputs an acoustic wave changes for every output. That is, the number of outputs from the speaker corresponds to the number of reference points described above. FIG. 15 is a graph in which the result is plotted with the horizontal axis as the number of acoustic wave outputs and the vertical axis as the above-mentioned error evaluation value regarding each estimation result. From FIG. 15, it is also understood that the accuracy of the object arrangement estimation is monotonously improved by increasing the number of times of outputting the acoustic wave used for object arrangement estimation (the number of reference points described above) also in the present embodiment. . In particular, it can be seen that the accuracy of the object arrangement estimation is significantly improved until the number of speakers exceeds 10. From this, even at the sound collection site when the content is actually created, according to the object arrangement estimation according to this embodiment, the number of acoustic wave outputs (that is, the number of reference points described above) is about 10 or more. It was found that the placement estimation result can be obtained with good accuracy. [0100] 4. Modifications of Object Arrangement Estimation Apparatus Hereinafter, modifications of the object arrangement estimation apparatus according to the first and second embodiments will be described. The first modification relates to another example of the feature vector generation method. The second modification relates to another example of the method of estimating the placement of an object from the dissimilarity matrix. Modification 1 and Modification 2 are modifications that can be applied individually and simultaneously to the object placement estimation apparatus of both Embodiment 1 and 2. [0101] 4−1. Modification 1 (Another Example of Feature Vector Generation Method) Here, another example of the method of generating a feature vector will be described. In the embodiments 09-05-2019 30 described above, the feature vector is generated based on the time when the acoustic wave emitted from the speaker located at the reference point has reached the microphone (acoustic wave arrival time). On the other hand, in the present method, the feature vector is determined based on the frequency amplitude characteristic of the output signal output from the microphone. [0102] FIG. 16 is a block diagram showing the configuration of a modified example of the object arrangement estimation apparatus. In addition, about the same component as the component shown by FIG. 1 etc., the same reference number is attached ¦ subjected and description is abbreviate ¦ omitted. [0103] The modified example of the object arrangement estimation apparatus has a configuration in which the timekeeping unit 41 and the audio output unit 51 are omitted from the object arrangement estimation apparatus shown in FIG. A speaker array SPa consisting of external speakers (SP1, SP2,..., SPN) may not be connected to this apparatus. [0104] FIG. 17 is a block diagram in which the functional blocks implemented by the computer main part 10 of the object arrangement estimation apparatus 300 are clearly shown. The CPU 11 of the computer main unit 10 reads out and executes the object arrangement estimation program stored in the ROM 12 to thereby execute the control unit 1, the frequency amplitude characteristic calculation unit 303, the feature vector generation unit 304, the dissimilarity matrix derivation unit 5, It can operate as the placement and derivation unit (MDS unit) 6 and the arrangement estimation result output unit 7. The placement and derivation unit (MDS unit) 6 and the arrangement estimation result output unit 7 constitute an estimation unit 8. The operations of control unit 1, dissimilarity matrix derivation unit 5, placement derivation unit (MDS unit) 6, and estimation unit 8 may be the same as those described in the first and second embodiments, so The explanation is omitted. 09-05-2019 31 [0105] The frequency amplitude characteristic calculation unit 303 is realized by the CPU 11 executing an object placement estimation program. The frequency amplitude characteristic calculation unit 303 calculates frequency amplitude characteristics of output signals of the microphones (MC 1 to MC M) included in the microphone array MCa. [0106] The feature vector generation unit 304 is realized by the CPU 11 executing an object placement estimation program. The feature vector generation unit 304 can input the frequency amplitude characteristic calculated by the frequency amplitude characteristic calculation unit 303, and can generate an N-dimensional feature vector for each of the M microphones (objects). In the following, the feature vector generation unit 304 determines the difference between the corresponding components of the feature vectors of any two microphones of M microphones (objects) based on the frequency amplitude characteristic (a feature in equation (1) Difference between vector components p i, j -p i, k, k: k ≠ j, i is an arbitrary integer of 1 to N. Will be described in detail, but those skilled in the art can also understand how the feature vector generation unit 304 determines the feature vector of each microphone itself from the following description. [0107] Note that at least one of the control unit 1, the frequency amplitude characteristic calculation unit 303, the feature vector generation unit 304, the dissimilarity matrix derivation unit 5, the placement derivation unit (MDS unit) 6, and the arrangement estimation result output unit 7 is It may be realized by a dedicated hardware circuit. [0108] FIG. 18 is a schematic diagram showing how three people hmn1 to hmn3 are having a meeting indoors. M microphones (MC 1 to MC M) are disposed in the room. The M microphones (MC 1 to MC M) are connected to the not-shown object arrangement estimation apparatus 300 via the not-shown 09-05-2019 32 audio interface unit 350 (see FIG. 17). [0109] FIG. 19 is a flowchart of processing for microphone arrangement estimation performed by the object arrangement estimation apparatus 300. [0110] The frequency amplitude characteristic calculation unit 303 of the object placement estimation apparatus 300 receives output signals of M microphones (MC 1 to MC M) through the audio interface unit 350. These output signals are response signals of the microphones for the ambient environmental sound in the room. The frequency / amplitude characteristic calculating unit 303 is a portion where the ambient sound includes human voice in each output signal (for example, the voice of the speaker hmn1 in FIG. 18 "Hi! Section) is extracted, and each of the extracted output signals (time domain) of M microphones (MC 1 to MC M) is converted to the frequency domain, and from the output signal (frequency domain) to its frequency amplitude characteristic Is calculated (step S101). Information on the frequency amplitude characteristic of the output signal of each of the microphones (MC 1 to MC M) is sent from the frequency amplitude characteristic calculation unit 303 to the feature vector generation unit 304. [0111] Based on the information on the frequency amplitude characteristic sent from the frequency amplitude characteristic calculation unit 303, the feature vector generation unit 304 determines the difference between the frequency amplitude characteristics of the output signal for any combination of two microphones (MC j and MC k). Calculate (step S102). [0112] Based on the integral value obtained by integrating the difference between the two frequency amplitude characteristics thus obtained on the frequency axis, the feature vector generation unit 304 determines the position of the two microphones relative to the speaker (reference point). The similarity, that is, the difference between the two microphone measures on a scale that 09-05-2019 33 defines the closeness to the reference point (p i, j -p i, k in the equation (1), k: k 、 j, i is 1 to N is an arbitrary integer. Ask for). [0113] FIG. 20 is a schematic view showing frequency amplitude characteristics of output signals of the microphones (MC 1 to MC M). FIG. 20A shows the frequency amplitude characteristic of the output signal from the microphone MC 1 with respect to the ambient environmental sound including the voice uttered by the person hmn1 in the room as shown in FIG. Similarly, FIGS. 20 (b) and 20 (c) are frequency amplitude characteristics of output signals from the microphones MC j and MC M for the same ambient sound including the same voice. In each frequency amplitude characteristic, a formant of a voice emitted by a person hmn1 appears in a form superimposed on a component BG of noise ubiquitously present in a sound collection environment (such as reverberation noise in a room or noise of a crowd outdoors). Here, the center frequency of the first formant F1 is f1, and the center frequencies of each formant after the second formant are shown as f2, f3, and f4, respectively. [0114] As can be seen from FIGS. 20A and 20B, the noise component BG shows almost the same profile in each output signal, while the formant component of the human voice is the frequency of the original formant as the microphone moves away from the person. Move away from the shape of the amplitude characteristic. The feature vector generation unit 304 can obtain the difference in proximity to the speaker (reference point) for the two microphones from the difference in the shape of the frequency amplitude characteristics of the output signals of the two microphones. [0115] The feature vector generation unit 304 integrates the difference between the frequency amplitude characteristics of the output signals of the two microphones (MC j, MC k, k: k ≠ j) on the frequency axis (step S103). The integral value obtained here is the difference between the proximity of the microphone MC j and the microphone MC k to the reference point (speaker), ie, 09-05-2019 34 the component of the feature vector of two microphones (MC j, MC k) with respect to the reference point. Difference (p i, j -p i, k in the formula (1), k: k ≠ j, i is an arbitrary integer of 1 to N). )である。 [0116] As a matter of course, the feature vector generation unit 304 may obtain the component of each feature vector itself from the difference between the component related to the speaker (reference point) in the feature vectors of the two microphones thus obtained. You can also. [0117] As described above, in step S103, the feature vector generation unit 304 determines the dissimilarity of the positions of the two microphones with respect to each reference point (correspondence of feature vectors) for every two microphones of the microphones (MC 1 to MC M). Difference between the components to be [0118] Then, in step S104, the dissimilarity matrix deriving unit 5 determines the dissimilarity matrix D (Equation (3)) based on the difference between corresponding components of all two feature vectors obtained by the feature vector generation unit 304. Derive [0119] The feature vector generation unit 304 may obtain the feature vector of each microphone from the integrated value obtained in step S103 and may output the feature vector to the similarity matrix derivation unit 5. In that case, the dissimilarity matrix deriving unit 5 may derive the dissimilarity matrix in step S104 in the same manner as step S7 in the previous embodiment. [0120] The processes in steps S105 and S106 are the same as those described in the previous embodiment (steps S8 and S9 in FIG. 4), and thus the description thereof is omitted here. 09-05-2019 35 [0121] As in the previous embodiment, three or more reference points are required. Therefore, the object arrangement estimation apparatus collects voices uttered by the speaker at different N (N is 3 or more) positions using M microphones, and the microphones collect and output the voices. The dissimilarity matrix D is derived using the output signal (step S104). People who speak at N (N is 3 or more) positions may not be the same person. [0122] The feature vector generation unit 304 may generate a feature vector based on the information on the frequency amplitude characteristic sent from the frequency amplitude characteristic calculation unit 303 as follows. First, the feature vector generation unit 304 may specify the formant of the speaker for the output signal of each of the microphones (MC 1 to MC M), and may set the amplitude of the specified formant (for example, the first formant F1) . Then, the feature vector generation unit 304 sets the amplitude of the peak of a specific formant (for example, the first formant F1 having the center frequency f1) that appears in the frequency amplitude characteristic of the output signal of each of the microphones (MC 1 to MC M). , The ratio to the amplitude (the amplitude A1f1 shown in FIG. 20A) of the peak of the particular formant appearing in the frequency amplitude characteristic of the output signal of any one microphone (for example, MC 1) (in dB, for example) From), a measure may be determined for each microphone (MC 1 to MC M) on a scale that represents the closeness to the reference point (person hmn1). [0123] For example, the peak amplitude AMf1 of the first formant F1 in the frequency amplitude characteristic of the output signal of the microphone MC M and the peak of the first formant F1 in the frequency amplitude characteristic of the output signal of the microphone MC 1 as an arbitrary one microphone. If the ratio of A to the amplitude A1 f1 is -6 dB, then the measure for the microphone MC 1 in the scale indicating the proximity to the person hmn1 as a reference point is, for example, 1 and the proximity to the reference point (person hmn1) is The measure 09-05-2019 36 for the microphone MC M in the scale to be represented may be 2. [0124] As described above, the feature vector generation unit 304 can also determine the feature vector of each of the microphones (MC 1 to MC M) based on the specific frequency component of the frequency amplitude characteristic. [0125] As described above, in the present modification, the object placement estimation apparatus 300 does not have to output a specific acoustic wave. In addition, the present modification is particularly suitable for estimating the arrangement of objects in a room or a crowd having acoustic characteristics that can provide rich reverberation. [0126] Also in this modification, as in the previous embodiment, it is possible to estimate the positional relationship of a plurality of persons as the arrangement of the loudspeakers is estimated. That is, also in this modification, it is possible to estimate the arrangement of the person who has made a voice as the arrangement estimation target. [0127] 4−2. Modification 2 (Another Example of Object Placement Estimation Method) Here, another example of the object placement estimation method based on the dissimilarity matrix will be described. In the embodiment described above, the estimation unit 8 (FIG. 2 and the like) including the placement and derivation unit 6 estimates the placement of the object by applying the MDS method to the dissimilarity matrix. However, the placement of the object can also be estimated by methods other than the MDS method. 09-05-2019 37 [0128] The placement and derivation unit 6 (FIG. 2 and the like) may obtain the placement (an approximate solution) by, for example, numerically solving a so-called combined optimization problem by a full search method. That is, the placement and derivation unit 6 (FIG. 2 and the like) obtains the degree of similarity as the placement approximate solution for all placements (place placement approximate solution candidates) of a plurality of possible objects (for example, M microphones). It may be evaluated based on a matrix, and the nearest placement solution candidate having the highest rating may be output as the placement estimation result. [0129] Alternatively, the placement and derivation unit 6 (FIG. 2 and the like) approximates the placement by numerically solving the combinatorial optimization problem by a local search method using an algorithm such as, for example, a so-called genetic algorithm. Solution) may be obtained. That is, the placement and derivation unit 6 (FIG. 2 etc.) is similar to the degree of fitness as a placement approximate solution for some of the placements (place placement approximate solution candidates) of a plurality of possible objects (for example, M microphones). It is also possible to evaluate based on the degree matrix, and output, as a placement estimation result, the placement approximate solution candidate that is the highest among the estimated placement alternative solution candidates. [0130] As in the above-described embodiment, also in the present modification, information on the position of the arrangement estimation target and the position of the reference point is not essential in estimating the arrangement of the object. However, in the case of estimating the arrangement of the object using the method of full search or local search as in the present modification, the condition for the position estimation target or the position where the reference point may exist is set in advance and set. By reducing the number of possible object placements (placement approximation solution candidates) in accordance with the above conditions, the placement derivation unit 6 can speed up the derivation of the placement approximation solution based on the similarity matrix. 09-05-2019 38 [0131] In the following, an effective method will be described in the case where the approximate solution is numerically obtained based on the dissimilarity matrix using a full search method or a local search method. [0132] By setting the minimum distance between two adjacent arrangement estimation objects for the arrangement estimation objects, it is possible to discretize the position where the objects may exist. By setting the minimum interval d min as a condition on the position where the placement estimation target may exist, the number of possible placement approximate solution candidates is reduced, and the derivation of the placement approximate solution can be accelerated. In addition, using information on any one reference point, the distance to the object closest to it, and the distance to the object farthest from the reference point, to limit the spatial range in which the object may exist. Can dramatically reduce the number of candidate solutions. [0133] In Embodiments 1 and 2 described above, the time (acoustic wave arrival time) at which the acoustic wave emitted from the speaker located at the reference point has reached each microphone is specified, and a feature vector is generated. In this modification, the time when the acoustic wave was emitted from the speaker located at the reference point is further specified, and by doing so, the time taken for the acoustic wave to reach each microphone (acoustic wave propagation time) You may ask for [0134] The microphone recording the shortest acoustic wave propagation time of the acoustic wave propagation time from the speaker at a certain reference point to each object (microphone) is the microphone closest to the reference point, and the longest acoustic wave propagation time The microphone which has recorded is the microphone farthest to the reference point. Here, assuming that the product of the shortest acoustic wave propagation time and the speed of 09-05-2019 39 sound is the minimum distance R min and the product of the longest acoustic wave propagation time and the speed of sound is the longest distance R max, all placement objects (microphones) exist The possible positions are limited to the range of distances from the reference point to R min or more and R max or less. [0135] FIG. 21 shows object position candidate points in the case where the minimum distance d min between objects, the minimum distance R min from a reference point, and the maximum distance R max are given as the conditions for the position where the arrangement estimation target may exist. It is a figure which shows CD (x mark in a figure). The object position candidate point CD has a minimum distance d outside a sphere sph1 of radius R min centered on a certain reference point (speaker in the figure) and inside a sphere sph2 of radius R max centered on the same reference point It distributes with min. In this case, the placement and derivation unit 6 (FIG. 2 and the like) is not similar for each of the placement approximate solution candidates configured by selecting candidate points of the number (M) of the objects among these object position candidate points CD. Based on the degree matrix, the fitness as the placement approximation solution is evaluated, and the placement approximation solution candidate for which a good evaluation is obtained may be taken as the placement approximation solution. In the case of using the method of full search, the degree of matching may be evaluated for all possible alternative placement approximate solutions. In the case of using a local search method, it is sufficient to select other approximate solution solutions to be evaluated according to a known algorithm (genetic algorithm etc.). [0136] The evaluation of the degree of fitness may be performed as follows. First, for the placement approximate solution candidate to be evaluated, the distance between the objects is obtained by calculation, and based on the calculation result, a distance matrix in which each element is formed by the distance between the objects is derived. Next, the degree of fitness can be evaluated by evaluating the similarity between the distance matrix thus calculated and the dissimilarity matrix. In other words, by evaluating the distance matrix that is close to a proportional relation higher than the relation between the dissimilarity matrix and the distance matrix, it is possible to evaluate the adaptability of the placement approximation solution candidate. 09-05-2019 40 [0137] As a condition on the position where the arrangement estimation target may exist, a condition on the arrangement form of the object can be added. FIG. 22 is a diagram showing an object position candidate point CD (x mark in the drawing) in the case where the condition that the microphone which is the object constitutes a linear microphone array is added. In this case, the object position candidate points CD are distributed only on the straight line L tangent to the sphere sph 1 at the candidate points CD near. Further, it is extremely likely that the microphone with the shortest acoustic wave propagation time and the microphone with the longest acoustic wave propagation time are located at the candidate point CD near and the candidate point CD far on the sphere of the sphere sph 2 respectively. Therefore, it is possible to speed up the derivation of the cloth placement approximate solution by selecting the cloth placement approximate solution candidate having such a microphone arrangement and performing the local search. In addition, another measure of the similarity of the feature vector with respect to the reference point of the feature point, another microphone having a measure similar to that of the microphone of the candidate point CD near is placed at the candidate point near the candidate point CD near It is possible to further speed up the derivation of the placement approximate solution by selecting and performing a local search. This applies to candidate points near the candidate point CD far as well. [0138] FIG. 23 is a diagram showing an object position candidate point CD (x-mark in the drawing) when the condition that the microphone as the object forms a planar microphone array is added. In this case, the object position candidate points CD are distributed only on the circle C tangent to the sphere sph 1 at the candidate points CD near. Further, it is extremely likely that the microphone with the shortest acoustic wave propagation time and the microphone with the longest acoustic wave propagation time are located at the candidate point CD near and the candidate point CD far on the sphere of the sphere sph 2 respectively. Therefore, it is possible to speed up the derivation of the cloth placement approximate solution by selecting the cloth placement approximate solution candidate having such a microphone arrangement and performing the local search. [0139] FIG. 24 is a diagram showing an object position candidate point CD (x-mark in the drawing) when the condition that the microphone as the object forms a square microphone array is added. In 09-05-2019 41 this case, the object position candidate points CD are distributed only on the square SQ inscribed in the circle C tangent to the sphere sph 1 at the candidate points CD near. Further, it is extremely likely that the microphone with the shortest acoustic wave propagation time and the microphone with the longest acoustic wave propagation time are located at the candidate point CD near and the candidate point CD far on the sphere of the sphere sph 2 respectively. Therefore, it is possible to speed up the derivation of the cloth placement approximate solution by selecting the cloth placement approximate solution candidate having such a microphone arrangement and performing the local search. [0140] FIG. 25 is a diagram showing an object position candidate point CD (x-mark in the drawing) when the condition that the microphone which is the object constitutes a spherical microphone array is added. In this case, the object position candidate point CD is distributed only on the surface of the sphere sph 3 circumscribing the sphere sph 1 at the candidate point CD near and inscribed with the sphere sph 2 at the candidate point CD far. Further, it is extremely likely that the microphone with the shortest acoustic wave propagation time and the microphone with the longest acoustic wave propagation time are located at the candidate point CD near and the candidate point CD far, respectively. Therefore, it is possible to speed up the derivation of the cloth placement approximate solution by selecting the cloth placement approximate solution candidate having such a microphone arrangement and performing the local search. [0141] 5. Conclusion The object placement estimation apparatus according to the embodiment of the present invention can estimate the placement of objects without measuring the distance between the placement estimation objects. The object arrangement estimation apparatus according to the embodiment of the present invention uses N reference points (N: 3 or more which can be arbitrarily selected independently of the position of the object, instead of utilizing the distance between the arrangement estimation objects. Based on the obtained measure, an N-dimensional feature vector representing the feature of the position of each object in the real space is generated, and the feature vector is generated from the feature vector. The dissimilarity matrix is derived, and the arrangement of objects in the real space (three dimensions) is derived from the dissimilarity matrix. Therefore, in the embodiment of the present invention, since it is not necessary to measure the distance between the placement estimation objects, it is possible to easily and accurately estimate the placement of the objects in various situations. In the embodiment of the present invention, the real space of each object is increased by increasing the 09-05-2019 42 number of N reference points (N: an integer of 3 or more) which can be arbitrarily selected independently of the position of the object. It is possible to increase the number of dimensions of the feature vector indicating the feature of the position in the above, and with the increase of the number of dimensions, it is possible to improve the accuracy of the arrangement estimation. [0142] The embodiment of the present invention is useful, for example, as a device for easily and accurately confirming the arrangement and cable connection of microphones in a multi-channel sound collection system. [0143] The embodiment of the present invention is useful, for example, as a device for easily and accurately confirming the speaker arrangement and cable connection in a multi-channel sound field reproduction system. [0144] The embodiment of the present invention can also estimate the layout of a plurality of notebook PCs using a microphone and a speaker built in the notebook PC. [0145] Embodiments of the present invention can also be used as an apparatus for simply and accurately confirming the arrangement and cable connection of each microphone of a microphone array for speech recognition. [0146] In the above embodiment, the component of the feature vector representing the feature of the position of the object in the real space is generated as the time when the acoustic wave arrives from a given reference point. That is, in the embodiment, each component of the feature vector is an amount having a dimension of time. 09-05-2019 43 However, feature vectors can be constructed using observables having different dimensions than time. For example, the feature vector can be configured based on the amount reflecting the shape of the reverberation component of the response waveform detected at the microphone. That is, the feature vector can be configured based on an amount representing the relative relationship between direct sound and non-direct sound in the response waveform. Thus, in this case, the dissimilarity matrix is constructed using data representing (non-) similarity of the response waveform detected in each of the two microphones. In this case, the object placement estimation apparatus may obtain cross-correlation for each element in the dissimilarity matrix, and estimate the placement of the placement estimation target in real space based on the obtained cross-correlation. [0147] In addition, the object placement estimation apparatus collects ambient environment sound including human voice using M microphones, and generates a feature vector based on the frequency amplitude characteristic of the output signal output from each microphone. May be By comparing the shapes of the frequency amplitude characteristics of the output signals of a plurality of microphones (for example, by integrating the difference of the frequency amplitude characteristics on the frequency axis), the difference in the relative proximity of the multiple microphones to the speaker Can be quantified. Alternatively, the component of the feature vector may be determined based on, for example, the ratio of the amplitude of a specific frequency component (frequency at which the formant of a human voice appears) of the frequency amplitude characteristic of the output signal output from each microphone. That is, based on the amplitude of the human voice's formant extracted from the frequency amplitude characteristic of the output signal, the object placement estimation apparatus determines M proximity between the microphone that has output the output signal and the person who has issued the voice. The components of the feature vector can be determined by performing a relative evaluation among the microphones of. Such a feature vector generation method is convenient for the estimation of the arrangement in a room with rich reverberation characteristics or in a crowd. It is also advantageous if the M microphones are arranged over a relatively wide area in the room. [0148] 09-05-2019 44 In addition, the object placement estimation apparatus according to the embodiment of the present invention may estimate the object placement using waves such as light and electromagnetic waves instead of the acoustic waves. In that case, the object placement estimation apparatus includes, for example, a light emitting element array and a light receiving element array, or two sets of antenna arrays, and receives waves from the light emitting element array (or one set of antenna arrays) Alternatively, the arrangement of the light emitting element array (or the light receiving element array or one set of antenna arrays) can be estimated by detecting in another set of antenna arrays. [0149] Further, the object placement estimation apparatus according to the embodiment of the present invention may estimate the object placement using a surface wave propagating on the surface of the object instead of the acoustic wave. In that case, the object arrangement estimation apparatus comprises, for example, two sets of transducer arrays for converting electrical energy into vibrational energy, and detects a surface wave from one of the transducer arrays at the other transducer to obtain a set of transducers. The arrangement of the array can be estimated. [0150] The present invention can be used, for example, to confirm the arrangement and cable connection of a plurality of microphones at a sound collection site. [0151] Further, the estimation unit of the object placement estimation apparatus obtains an approximate solution of placement by numerically, full search or local search method instead of applying the MDS method to the dissimilarity matrix, and obtains from the obtained placement approximate solution The arrangement of M microphones in the real space may be estimated and output. [0152] Reference Signs List 1 · · · Control unit 2 · · · Impulse generation unit (TSP generation unit) 3 · · · Response detection unit 4 · · · Feature vector generation unit 5 · · · Non-similarity matrix derivation unit 6 · · · Placement setting unit (MDS unit) 7 · · · Layout estimation result output unit 8 · · · Estimation unit 10 · · · Main computer unit 11 · · · CPU 12 · · · ROM 13 · · · RAM 21 · · · HDD 30 · · · User interface unit 31 ... Display 32 ... Keyboard 33 ... Mouse 41 ... Timekeeping unit 50 ... Audio interface unit 51 ... Audio output unit 52 ... Audio input unit 250 ... Audio interface unit 09-05-2019 45 251 · · · Audio output unit 303 · · · Frequency amplitude characteristic calculation unit 304 · · · Feature vector generation unit (modification) 350 · · · Audio interface unit MCa ... Microphone array MCj ... j-th microphone SPa ... speaker array SPi ... i-th speaker 09-05-2019 46
© Copyright 2021 DropDoc