Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2008164747 To perform tuning of a voice acquisition unit of a voice recognition robot efficiently. A voice recognition robot 100 performs tuning on a microphone array 180 functioning as a voice acquisition unit, a speaker 170 functioning as a voice output unit, a robot arm 160 on which the speaker 170 is mounted, and the microphone array 180. It has a tuning unit to do. The tuning unit has a tuning control unit that causes the speaker 170 to output a reference sound at the time of tuning, and a tuning execution unit that executes tuning using a response of the sound acquisition unit to the reference sound. [Selected figure] Figure 1 Speech recognition robot [0001] The present invention relates to a voice recognition robot, and more particularly to a voice recognition robot having a function of tuning its voice acquisition unit. [0002] Robots are expanding their scope to offices and homes regardless of where industrial or production activities occur. In the field of industrial activities and production activities, we substitute or support various difficult industries, and emulate the movement mechanism and emotional expression of relatively intelligent walking animals such as people and pets in the home, Co-exist with humans. Many of 04-05-2019 1 these robots do not operate only in a predetermined pattern as in conventional robots, but are so-called autonomous ones that analyze situations and act. As an example of an autonomous robot, there is a voice recognition robot that has a voice recognition function, analyzes voice instructions, and performs the instructed operation. [0003] Usually, in the environment where the voice recognition robot is placed, other voices (noises) are mixed in addition to the voice of the instruction. In order to prevent the malfunction of the voice recognition robot, it is necessary to receive the voice of the instruction correctly, and various attempts have been made for that. [0004] Patent Document 1 discloses a technique for making it easier to receive a target voice than noise. According to this technique, when a voice instruction is given, the voice recognition robot estimates the sound source direction of the voice command, moves to the estimated sound source direction, and approaches the sound source, thereby making the voice of the voice command another voice. It is made to be able to receive sound stronger than. [0005] Further, Patent Document 2 discloses a technology of diagnosing by a robot itself whether or not a part for acquiring a voice (hereinafter referred to as a voice acquisition unit) is operating properly. Specifically, in the vicinity of a voice acquisition unit, for example, a microphone, contact sound or striking sound is generated using both left and right robot arms, and it is confirmed whether or not the microphone can collect sound. If the check result is NG, the robot arm performs a gesture such as swinging a neck while pointing at the microphone to notify the user that there is a problem with the microphone. JP, 2006-181651, A JP, 2002-144260, A [0006] The technology disclosed in Patent Document 1 is premised on the voice acquisition unit 04-05-2019 2 operating correctly. Whether or not the voice acquisition unit is operating correctly can be represented, for example, by whether or not it is an ideal directivity pattern as designed. The tuning for the voice acquisition unit to be an ideal directivity pattern is usually performed by an engineer. [0007] Further, although the technology disclosed in Patent Document 2 can notify the user that there is a problem, tuning for solving the problem is also left to the user or engineer. [0008] At the development stage of the speech recognition robot, the number of units is small, so the engineer does not burden the tuning of the speech acquisition part so much, but in the mass production stage, automation is required due to factors such as man-hours and costs. [0009] In addition, even after the voice recognition robot reaches the user, tuning of the voice acquisition unit is necessary. If the engineer visits the use site every time the tuning is performed, the user is burdened with money and time. Automatic tuning of the voice acquisition unit of the voice recognition robot is desired also by the user. [0010] The tuning of the voice acquisition unit such as a microphone requires a reference sound source for outputting a reference voice. It is no exaggeration to say that it is possible to automate tuning depending on whether or not there is an effective provision method of the reference sound source. Here, a method of setting a reference sound source at a predetermined place and moving the voice recognition robot to a position close to the reference sound source at the time of tuning will be considered. If a tuning function is implemented on the main body of the voice recognition 04-05-2019 3 robot, automatic tuning of the voice acquisition unit becomes possible if the voice recognition robot moves close to the reference sound source. However, this method has a problem that the number of processes increases at the production stage because the movement of the robot is accompanied, which is not efficient. In addition, a place for installing a reference sound source is also required. [0011] In tuning after passing to the user, installation and maintenance of the reference sound source in the actual use environment is also a burden on the user. [0012] The present invention has been made in view of the above circumstances, and realizes the automatic tuning of the voice acquisition unit of the voice recognition robot efficiently and conveniently. [0013] A voice recognition robot according to the present invention includes a voice acquisition unit, a voice output unit, a mounting unit on which the voice output unit is mounted, and a tuning unit that performs tuning on the voice acquisition unit. The tuning unit has a tuning control unit that causes the voice output unit to output a reference voice when performing tuning, and a tuning execution unit that performs tuning using a response of the voice acquisition unit to the reference voice. According to the present invention, since the voice recognition robot is provided with the voice output unit and the tuning unit, at the time of tuning, the tuning unit causes the voice output unit to output the reference voice, and the response of the voice acquisition unit to the reference voice Can be tuned using Therefore, there is no need to secure the installation place of the reference sound source or move the voice recognition robot for tuning, which is convenient and efficient. [0014] 04-05-2019 4 The present invention can be applied to a speech recognition robot in which the speech acquisition unit is a microphone array including a plurality of microphones. In this case, the tuning execution unit executes calibration for eliminating variations in sensitivity characteristics of the plurality of microphones. [0015] The mounting unit in the voice recognition robot according to the present invention includes a displacement unit configured such that the voice output unit mounted thereto can displace the relative position with respect to the voice acquisition unit, and the tuning control unit outputs the voice to the voice acquisition unit It is preferable to cause the voice output unit to output a reference voice after displacing the displacement unit so that the relative positional relationship between the units becomes a predetermined positional relationship. With such a configuration, for example, the audio output unit can be moved to a position suitable for tuning of the audio acquisition unit. [0016] Moreover, as this displacement part, it is preferable to comprise with the robot arm provided with one or more joints, and the drive part which drives this robot arm. The voice recognition robot usually has a robot arm, and if the voice output unit is attached thereto, there is no need to separately provide a mechanism for attaching the voice output unit. [0017] In addition, the tuning control unit causes the displacement unit to be displaced a plurality of times so that the relative positional relationship between the voice acquisition unit and the voice output unit respectively becomes a plurality of different predetermined positional relationships. The reference voice is output to the voice output unit each time the relative positional relationship of the voice output unit becomes a predetermined positional relationship, and the tuning execution unit performs tuning each time the voice reference unit is output from the voice output unit. It is preferable to combine the results of multiple tunings. According to such a configuration, a better tuning effect can be obtained particularly when the sound acquisition unit is a microphone array. 04-05-2019 5 [0018] Furthermore, it is preferable that the tuning control unit causes the voice output unit to sequentially output a plurality of reference voices having mutually different frequencies, and the tuning execution unit executes tuning for each frequency of the reference voice. Since the characteristics such as the sensitivity of the voice acquisition unit may differ depending on the frequency of the received voice signal, if the tuning is performed for each frequency with such a configuration, the tuning accuracy can be enhanced. [0019] In addition, what expressed the apparatus mentioned above as a method, a system, or a program is also within the scope of the present invention. [0020] According to the technique of the present invention, automatic tuning of the voice acquisition unit of the voice recognition robot can be realized efficiently and conveniently. [0021] Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows a speech recognition robot 100 according to an embodiment of the present invention. As shown, the voice recognition robot 100 includes a head 110, a body (hereinafter referred to as a main body), a wheel 150, and a robot arm 160. The body 120 is provided with a microphone array 180. At the tip of the robot arm 160, a speaker 170 for outputting sound is provided. [0022] 04-05-2019 6 The head 110 is equipped with, for example, a CCD camera that functions as the eye of the voice recognition robot 100. The wheels 150 move the voice recognition robot 100 by rotating. The robot arm 160 comprises a shoulder joint. The main body 120 incorporates a drive unit for driving the robot arm 160, and the robot arm 160 can be rotated by the drive of the drive unit. Further, the operation of these functional blocks is controlled by a control unit (not shown) incorporated in the main body 120. Hereinafter, this control unit will be referred to as a normal control unit in order to distinguish it from the tuning control unit described later. [0023] The microphone array 180 functions as the "ear" of the speech recognition robot 100, that is, as a speech acquisition unit, and acquires speech signals. [0024] The main body 120 also includes a tuning unit for tuning the microphone array 180. In the following description and illustration, reference numeral 130 is given to this tuning unit. [0025] FIG. 2 shows the microphone array 180 and the tuning unit 130. As illustrated, the microphone array 180 includes a voice input unit 182 formed by arranging a plurality of (three in the illustrated example) microphones 182a, 182b, and 182c, and a microphone array processing unit 184. The microphone array processing unit 184 is obtained by an AD converter 186 that obtains a digital audio signal by performing analog-to-digital (AD) conversion on an audio signal input by each microphone of the audio input unit 182, and an AD converter 186. Correction unit 188 that respectively corrects the digital sound signal of each microphone, and a frequency conversion unit 190 that converts the frequency by performing FFT conversion on the digital sound signal of each microphone corrected by the correction unit 188 A voice enhancement unit 192 that performs voice enhancement processing on the digital voice signal of each microphone whose frequency has been converted, and noise that is obtained by performing noise estimation using the digital voice signal of each microphone whose frequency is converted Noise estimation unit 1 from signals obtained by estimation unit 194 and speech enhancement unit 192 With obtaining the desired audio signal by subtracting the noise signal estimated by 4, it comprises a feature 04-05-2019 7 amount obtaining unit 196 outputs the extracted feature amount for speech recognition from the target speech signal. Note that the feature amount obtained by the feature amount obtaining unit 196 is input to a voice recognition processing unit (not shown) provided in the main body 120 of the voice recognition robot 100, whereby voice recognition processing is performed. The result of the speech recognition process is output to the above-described normal control unit provided in the main body 120 of the speech recognition robot 100, and the normal control unit performs an operation corresponding thereto to one or more corresponding functional blocks of the speech recognition robot 100 Let me do it. [0026] The speech emphasizing unit 192 mainly performs DS (Delay-and-Sum) processing. Since the voice input unit 182 includes a plurality of microphones, the timing at which the voice signal from the sound source reaches the microphones differs among the microphones. This will be described using the schematic diagram of FIG. [0027] As shown in FIG. 3, the three microphones 20a, 20b and 20c are arranged at an interval d. The angle between the sound source 10 and each microphone is θ. In the case of the drawing, the timing at which the voices reach the respective microphones differs by L / sound speed (L: interval d × sin θ) in the drawing. Therefore, the voice signals acquired by the three microphones are out of phase, and it is necessary to match the phase with them. [0028] In the DS processing by the voice emphasizing unit 192, the phases of the voice signals obtained earlier are made to be the same by sequentially delaying the voice signals acquired earlier according to the timing deviation of the voice signals obtained from the microphones of the voice input unit 182. The respective audio signals after being added are added. Since the DS processing by the voice emphasizing unit 192 determines the amount to delay the voice signal obtained by each microphone based on the estimated position of the sound source, the directional noise is removed, and the target voice is obtained. Have an emphasizing effect. 04-05-2019 8 [0029] The noise estimation unit 194 mainly performs NBF (Null-Beam-Former) processing. This NBF processing emphasizes signals in directions other than the direction of the target voice to form a blind spot in the target voice direction. The noise estimation unit 194 obtains diffusive noise. [0030] Then, the feature amount acquiring unit 196 subtracts the noise signal acquired by the noise estimating unit 194 from the speech signal acquired by the speech emphasizing unit 192, thereby removing the diffusive noise to obtain the target speech. [0031] FIG. 4 shows an example of an ideal directivity pattern of a microphone array. The frequency in the figure corresponds to θ in FIG. In order to obtain the target voice signal correctly and to improve the accuracy of voice recognition, the microphone array is designed to have this ideal directivity pattern as much as possible. [0032] By the way, there is always an error in the sensitivity of the microphone. It is known that inexpensive microphones currently on the market usually have a sensitivity error of 2 decibels (dB) or more. Therefore, even if the same microphones constitute a microphone array, variations in sensitivity can not be avoided among the microphones. [0033] Variations in sensitivity among the microphones particularly affect the noise estimation process. If the accuracy of the noise estimation process decreases, distortion occurs in the directivity pattern of the microphone array, and the accuracy of speech recognition also decreases. 04-05-2019 9 [0034] The correction unit 186 is for eliminating variations in sensitivity characteristics of the microphones of the audio input unit 182, and corrects the audio signal from each of the microphones with a correction filter set for each microphone. In addition, since the reception characteristics of each microphone may change due to the passage of time, the change of the environment in which the voice recognition robot 100 is placed, etc., it is necessary to tune the microphone array 180 from time to time. [0035] The tuning unit 130 is for tuning the microphone array 180. At the time of tuning, audio signals acquired by the microphones of the audio input unit 182 are input to the tuning unit 130, and the tuning unit 130 uses them to set correction filters for the respective microphones and supplies them to the correction unit 188. [0036] FIG. 5 shows the tuning unit 130. The tuning unit 130 has a tuning control unit 135 and a tuning execution unit 140. The tuning control unit 135 performs a process of determining whether to perform tuning on the microphone array 180 and a process of controlling the robot arm 160 and the speaker 170 of the voice recognition robot 100 when it is determined to be "do". Do. Note that each element described in the figure as a functional block that performs various processes of the tuning unit 130 can be configured by a CPU, a memory, and other LSIs in terms of hardware, and in terms of software, a memory It is realized by the program etc. which were loaded to. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any of them. [0037] The determination as to whether or not to perform the tuning is performed based on whether or not there is a possibility that the variation pattern of the sound receiving characteristic of each 04-05-2019 10 microphone of the voice input unit 182 in the microphone array 180 has changed. For example, when predetermined time, such as 24 hours, has passed since the last tuning, when the result of performing sound environment clustering indicates that the environmental sound has changed, the wheel 150 is moved by the above-described normal control unit of the voice recognition robot 100 If it is determined that there is control to be performed, it is determined that tuning is required. Of course, the conditions for this determination are not limited to those described here. [0038] The tuning control unit 135 controls the operation of the robot arm 160 and the speaker 170 when it is determined that tuning is to be performed . First, control of the robot arm 160 will be described. Control of the robot arm 160 is performed through control of a drive unit (not shown) incorporated in the main body 120, and in the following description, is used in the same sense as control of the robot arm 160 and control of the drive unit. The robot arm 160 is controlled by the tuning control unit 135 at the time of tuning, and is controlled by the normal control unit at the time of normal operation other than the tuning. [0039] FIG. 6 is a top view of the speech recognition robot 100. As shown in FIG. For the sake of clarity, only the main body 120 and the robot arm 160 (including the speaker 170 attached thereto) are illustrated, and the head 110 and the wheel 150 are omitted. The tuning control unit 135 is an angle formed by the voice input unit 182 and the robot arm 160 in the microphone array 180 provided in the main body 120, more precisely, an angle α formed by the arrangement direction of each microphone of the voice input unit 182 and the robot arm 160. The robot arm 160 is rotated so as to have a predetermined value. The speaker 170 mounted on the tip of the robot arm 160 changes its voice output direction and the direction formed by the voice input unit 182 as the robot arm 160 rotates. That is, by rotating the robot arm 160, the tuning control unit 135 sets the angle θ shown in the schematic view of FIG. 3 to a value corresponding to the predetermined value of the angle α. [0040] After the robot arm 160 rotates, the tuning control unit 135 causes the speaker 170 attached to the robot arm 160 to output a reference sound. In this embodiment, a TSP (Time-Stretched-Pulse) 04-05-2019 11 signal is used as the reference speech. [0041] When each microphone of the voice input unit 182 receives a TSP signal from the speaker 170, it outputs a response signal (hereinafter referred to as a TSP response signal). These TSP response signals are input to the tuning execution unit 140 of the tuning unit 130. [0042] The tuning execution unit 140 time-reverses the TSP signal to obtain a TSP time-reversal signal, and convolutes the TSP time-reversal signal into the TSP response signal of each microphone. By this, an impulse response signal of each microphone is obtained. [0043] As described above, the correction unit 188 is for eliminating variations in sensitivity of the microphones of the voice input unit 182, and performs correction using a correction filter for each microphone. The tuning execution unit 140 sets these correction filters so that the sensitivities of the microphones become uniform. [0044] In the present embodiment, the tuning execution unit 140 sets correction filters of other microphones so as to have the same sensitivity as that of the reference microphone in the audio input unit 182. Specifically, the power spectrum of the microphone is calculated from the impulse response signal of the target microphone, and the correction filter A of this microphone is determined according to the following equation (1). [0045] A = P / P0 (1) where A: correction filter P: power spectrum of the target microphone P0: power spectrum of the reference microphone It is to be noted that which one of the microphones 04-05-2019 12 included in the voice input unit 182 is used as the reference microphone It is left to the designer. [0046] Furthermore, in the actual use environment, the speech recognition robot 100 considers that it is necessary to receive speech signals from sound sources in various directions, and in the present embodiment, the tuning unit 130 determines at a plurality of different angles α. Tune. Specifically, for one microphone, a correction filter A is obtained for each angle α, and these correction filters A are integrated to obtain an integrated correction filter. [0047] The tuning execution unit 140 obtains integrated correction filters for microphones other than the reference microphone and outputs the integrated correction filter to the correction unit 188. The correction unit 188 updates the correction filters of these microphones to the respective integrated correction filters output from the tuning execution unit 140. [0048] In addition, since it is preferable to set a correction filter for each frequency of the audio signal in order to obtain a better correction effect, in the present embodiment, a plurality of tuning control units 135 have different frequencies (bins) at the time of tuning. TSP signals are sequentially output to the speaker 170. The tuning execution unit 140 obtains the correction filter A for each frequency (bin) for each microphone other than the reference microphone and supplies the correction filter A with the correction filter A. [0049] FIG. 7 is a flow chart showing the flow of processing of the tuning unit 130 in the speech recognition robot 100 of the present embodiment. In the standby state, the tuning control unit 04-05-2019 13 135 in the tuning unit 130 determines whether to perform tuning (S10). If it is determined that "does not execute" (S10: No), while the standby state of the tuning unit 130 continues (S20), if it is determined as "execute" (S10: Yes), the tuning control unit 135 The robot arm 160 of the recognition robot 100 is rotated (S30), and after rotation of the robot arm 160, the speaker 170 is caused to output a TSP signal (S40). [0050] The tuning executing unit 140 executes tuning (S50). Specifically, the tuning execution unit 140 obtains the correction filter A of each microphone other than the reference microphone using the TSP response signal of each microphone to the TSP signal and the time inverted signal of the TSP signal. Then, the tuning control unit 135 further rotates the robot arm 160 to change the angle α, and causes the speaker 170 to output a TSP signal of the same frequency. For the changed angle α, the tuning execution unit 140 obtains the correction filter A again for each microphone other than the reference microphone. The change of the angle α, the output of the TSP signal, and the calculation of the correction filter A are repeated a plurality of times, and for each microphone, the same number of correction filters A as the number of the angles α are obtained. The tuning execution unit 140 integrates the correction filter A for each microphone other than the reference microphone to obtain an integrated correction filter, and outputs the integrated correction filter to the correction unit 188. [0051] When the processing for obtaining the integrated correction filter for one frequency (bin) ends, the tuning control unit 135 causes the speaker 170 to output TSP signals of different frequencies (bin), and the tuning execution unit 140 operates on the frequency (bin). The above processing for obtaining the integrated correction filter is performed. [0052] Thus, from the tuning execution unit 140 to the correction unit 188, the integrated correction filter of the correction filter A obtained for the plurality of angles α for each frequency (bin) of each microphone other than the reference microphone is provided. [0053] The voice recognition robot 100 of the present embodiment has a tuning function, and the 04-05-2019 14 speaker 170 is mounted on the robot arm 160, and can output a reference voice for tuning according to the control of the tuning control unit 135. Therefore, when performing tuning, there is no need to move the robot to the installation place of the reference sound source, man-hours and cost can be suppressed in the production site, and convenience is provided to the user in actual use. Furthermore, there is an advantage that the installation place of the reference sound source for tuning is not required. [0054] Further, in the present embodiment, the speaker 170 is attached to the robot arm 160 of the voice recognition robot 100. In general, a robot arm is provided in a robot, and by using the robot arm as a mounting unit of the voice output unit, the configuration of the voice recognition robot can be simplified. [0055] Furthermore, since the robot arm is a displacement unit capable of displacing the audio output unit (speaker) attached thereto relative to the microphone array, when the robot arm is rotated, it is fixed to the audio output unit and the robot It is possible to change the relative positional relationship with the acquired voice acquisition unit. By this, the position of the audio output unit can be easily changed at the time of tuning so as to be a suitable relative position suitable for tuning. Further, as in the case of the voice recognition robot 100, it is convenient when the voice acquisition unit needs to perform tuning for each different relative position. [0056] Furthermore, since the reach of the robot arm is mechanically fixed normally, there is little movement error when changing the relative position of the audio output unit and the audio acquisition unit by rotating it. As a result, it is possible to reduce an error in the relative position of the voice acquisition unit and the voice acquisition unit. 04-05-2019 15 [0057] The present invention has been described above based on the embodiments. The embodiment is an exemplification, and various changes, additions, and decreases may be made without departing from the spirit of the present invention. Those skilled in the art will understand that variations to which these changes, additions, and decreases are added are also within the scope of the present invention. [0058] For example, in the voice recognition robot 100 according to the embodiment described above, the speaker 170 functioning as a voice output unit is provided on only one robot arm, but the voice output units may be provided on both robot arms. [0059] Further, although the speech recognition robot 100 has a speech acquisition unit which is a microphone array, the present invention can be applied to a speech recognition robot having any kind of speech acquisition unit that requires a reference speech at the time of tuning. [0060] Further, the tuning unit 130 in the voice recognition robot 100 sets the angle α as a relative position between the speaker 170 and the voice input unit 182, and changes it at the time of tuning. The relative position between the voice acquisition unit and the voice input unit may include the distance between the voice acquisition unit and the voice input unit without being limited to the angle formed by them. Therefore, the angle and the distance may be changed, and tuning may be performed for each angle and for each distance, and the results may be integrated. The change of the distance may be realized, for example, by using a robot arm that is not only rotatable but also expandable and contractable by extending and retracting the robot arm at the time of tuning. 04-05-2019 16 [0061] Further, in the present embodiment, a robot arm having a shoulder joint is used as an example, but if the relative positional relationship between the sound output unit and the sound acquisition unit attached thereto can be displaced, only the shoulder joint is Alternatively, for example, a robot arm including one or more of a shoulder joint, an elbow joint, and an arm joint may be used. [0062] It is a figure showing a speech recognition robot concerning an embodiment of the invention. It is a figure which shows the microphone array and tuning part in the speech recognition robot shown in FIG. It is a figure for demonstrating the process of the audio ¦ voice emphasis part in the microphone array shown in FIG. It is a figure which shows the example of the ideal directivity pattern of the microphone array shown in FIG. It is a figure which shows the detail of a tuning part. It is a figure for demonstrating the tuning control part in the tuning part shown in FIG. It is a flowchart which shows the flow of a process of the tuning part shown in FIG. Explanation of sign [0063] DESCRIPTION OF SYMBOLS 10 sound source 20a microphone 20b microphone 20c microphone 100 voice recognition robot 110 head 120 main body 130 tuning unit 135 tuning control unit 140 channeling execution unit 150 wheel 160 robot arm 170 speaker 180 microphone array 182 voice input unit 184 microphone array processing unit 186 AD conversion Unit 188 Correction unit 190 Frequency conversion unit 192 Speech enhancement unit 194 Noise estimation unit 196 Feature quantity acquisition unit 04-05-2019 17
© Copyright 2021 DropDoc