close

Вход

Забыли?

вход по аккаунту

JP2004180197

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004180197
To reproduce data recorded by a plurality of microphones (stereos), by specifying a point to be
noted of the reproduced image, the reproduction in which the sound in that direction is focused
is performed. Focusing on sound uses microphone array technology. Thereby, it is possible to
focus in an arbitrary direction other than the direction in which the image is focused. A
microphone array comprising a plurality of microphones, a plurality of holding means for
holding input sound signals from individual microphones constituting the microphone array
together with the microphones, an input means for inputting position information, and the
holding And focusing means for performing acoustic focusing in the direction of the acquisition
position using the acoustic signals of the plurality of channels. [Selected figure] Figure 1
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND
RECORDING MEDIUM
TECHNICAL FIELD [0001] The present invention can be generally used in an information
recording apparatus capable of recording both sound information and image information. In
particular, the present invention relates to an information recording apparatus using a
microphone array for sound collection. 2. Description of the Related Art In recent years, a
technology has been developed which can control directivity characteristics at the time of sound
collection using a microphone array. In the past, there have been analog processing such as
directional microphones that use sound pressure gradients such as unidirectional microphones,
but recently digital signal processing has been accelerated, and directivity control using digital
control technology has been implemented. It has come to be known. For specific directivity
control using digital control technology, for example, Journal of the Acoustical Society of Japan,
Vol. 51, No. 5, pp. No. 390-394, "Directivity control by microphone array". Various inventions
have also been proposed in which a microphone array is used for a video camera. For example, in
04-05-2019
1
Japanese Patent Application Laid-Open No. 5-308553, a microphone capable of controlling the
directivity and an audio signal focus processor for focusing the directivity of the directional
microphone on the subject in synchronization with focusing on the subject of the video camera
are provided. A feature video camera is disclosed. According to this technology, if an image is
focused on a certain subject, an effect is obtained that the sound is also focused. Further,
according to Japanese Patent Application Laid-Open No. 2000-196941 , in a system
including a video camera and a microphone, if the user indicates a point where it is desired to
focus the sound on the monitor screen during recording, Techniques for focusing have been
disclosed. In the above-mentioned Japanese Patent Application Laid-Open No. 5-308553,
although it is possible to focus the sound in the direction in which the camera faces, the sound is
focused in a direction different from the direction in which the camera faces. It is not possible. In
JP-A-2000-196941, it is possible for the user to turn the focus of the sound in a direction
different from the camera direction by operating on the image output to the monitor at the time
of recording, but after recording It was impossible to focus the sound to any point specified by
the user during playback. It is an object of the present invention to make the above points
possible. That is, the present invention provides an apparatus capable of focusing sound in an
arbitrary direction instructed by the user later during reproduction, not during recording.
An information processing apparatus according to the present invention for achieving the above
object has the following configuration. That is, the information processing apparatus according to
claim 1 comprises: a microphone array consisting of a plurality of microphones; and a plurality of
holding means for holding input acoustic signals from individual microphones constituting the
microphone array for each microphone; It has input means for inputting position information,
and focusing means for performing acoustic focusing in the direction of the acquisition position
using the held acoustic signals of a plurality of channels. In the information processing apparatus
according to the second aspect of the present invention, in the first apparatus, a feedback unit
that feeds back an acoustically focused signal, and a direction in which the focus direction is
changed and the maximum output can be obtained from the feedback signal. And focus changing
means for changing the focus direction of the sound based on the determined direction. In the
information processing apparatus according to the third aspect, in the second apparatus, a
fluctuation range in which the focus direction is fluctuated to obtain the maximum output is
limited to the vicinity of the position information acquired by the position information input unit.
Limit means to The information processing apparatus according to a fourth aspect of the present
invention is the information processing apparatus according to any one of the first to third
apparatuses, further comprising processing means for processing the acoustic signal to give an
acoustic effect to the acoustic signal after focusing. An information processing apparatus
according to a fifth aspect of the present invention is the information processing apparatus
according to any one of the first to third apparatuses, further comprising speech recognition
means for recognizing an acoustic signal after focusing. The information processing apparatus
according to a sixth aspect of the present invention is the information processing apparatus
04-05-2019
2
according to any one of the first to fifth apparatuses, further comprising display means for
displaying the input position on a display when the position is input by means for inputting
position information. An information processing apparatus according to a seventh aspect
operates as a video camera or a digital camera in the first to sixth aspects. DETAILED
DESCRIPTION OF THE INVENTION Preferred embodiments of the present invention will be
described below with reference to the attached drawings. Embodiment 1 FIG. 1 is a view showing
the configuration of an information output apparatus according to an embodiment of the present
invention. The flow of the operation is shown in flowcharts 2 and 3. A description of the
embodiment is given below using these figures. In FIG. 1, reference numeral 100 denotes a
microphone array for collecting acoustic signals, which comprises a set of n equal microphones
shown in 101 to 10 n.
An AD conversion unit 150 includes a set of AD converters 151 to 15n. Here, the sound signal
recorded from the microphone is converted into a digital signal. A memory unit 110 includes n
memories from 111 to 11n. Each memory has a one-to-one correspondence with the
microphones constituting the microphone array and the individual AD converters, and is
connected in the order of the microphone, the AD converter, and the memory. Reference numeral
120 denotes an interlock switch in which n switches interlock and turn on / off. Reference
numeral 130 is a delay device that electrically generates a delay. An adder 140 adds delayed
information. Reference numeral 142 denotes a DA / amplifier for amplifying an acoustic signal
after DA. 145 is a speaker for outputting sound. Reference numeral 170 denotes a pointing
device for the user to input a position and a display for displaying video information. In this
embodiment, the display 170 is realized by a display such as an LCD and a touch panel on the
display. A delay control unit 180 controls the delay amount of 130. Reference numeral 190
denotes a camera for capturing a moving image or a still image. An AD converter 192 converts
video information acquired by the camera into a digital signal. Reference numeral 195 denotes a
memory for recording video information converted into a digital signal in 192. A switch 197 may
be interlocked with the switch 120. Next, the operation will be described using the flowcharts of
FIG. 2 and FIG. FIG. 2 shows an operation at the time of recording of video and audio signals, and
FIG. 3 shows an operation at the time of reproducing the recorded video and audio signals. First,
the operation at the time of recording will be described. In S100, a moving image and an acoustic
signal are acquired. Acquisition of a moving image is performed by 190 cameras, and acquisition
of an acoustic signal is performed by a 100 microphone array. In S101, the video and audio
signals are converted from analog signals to digital signals, respectively. The acoustic signal is
performed by the 150 AD converter, and the moving image signal is performed by the 192 AD
converter. In S102, storage of information after digital conversion of a moving image or an audio
signal is performed. Information after digital conversion of the acoustic signal is stored in each of
the memories 111 to 11 n belonging to the memory set 110. Information after digital conversion
of moving pictures is stored in a memory of 195. The memory may be in the form of a magnetic
tape, a RAM, a hard disk, or any storage device commonly used in computers.
04-05-2019
3
Next, the operation at the time of reproduction will be described. In S200, the switch of moving
image information 197 and the switch of acoustic information 120 are turned on and the moving
image and sound signal are input to the display 170 and the set 130 of delay units, respectively.
In step S201, acquisition of the pointing device information 170 is performed. Here, it is
assumed that the display surface for displaying reproduction is a touch panel, and the pointing
device can obtain positional information on the screen instructed by the user. In this case, for
example, video information is displayed as shown in FIG. Here, 200 indicates a display. The user
indicates the position by pressing a point on the display 200. This point is shown at 210. In
S201, the position of 210 is acquired, for example, in the form of X, Y coordinates. Among the
following steps, S 202 and S 203 are performed by the delay control unit 180. First, in S202, the
direction information is calculated using the acquired position information. In the present
embodiment, for the sake of simplicity, the case where the microphone arrays are arranged in
two rows in a row will be described. Similarly, although realization is possible when the
microphones are arranged in three dimensions, the description is omitted. First, the value of x in
FIG. 4, that is, the distance from the center direction of imaging, is obtained from the coordinate
information of 210. Further, d can be obtained from the focal length of the imaging device. From
these values, sin θ is determined. Next, in step S203, control of the delay time is performed. In
this embodiment, a delay-and-sum array is described as an example of the directivity control
method of acoustic information. However, the delay-and-sum array is not necessarily required as
long as the technology can be used for directivity control of acoustic information. In the delaysum array, by giving a given delay time to a plurality of delay elements, after making the target
signal in phase, addition can be performed to emphasize the target signal. For this technology,
Journal of the Acoustical Society of Japan, Vol. 51, No. 5, pp. 390-394 "Directivity control by
microphone array". In this method, assuming that the velocity of sound c, the microphone
spacing is s, and the incident angle of the sound wave is θ, the delay time τ can be calculated as
follows. The target signal is in-phased by providing a delay of τ = s · (sin θ) / c or more. In
S203, the delay time is calculated based on the angle information (sin θ) obtained in S202, and
the delay time is given to each delay element belonging to the set 130 of delay elements. At S
204, the adder 140 adds the in-phase acoustic signals in the set of delay units 130 to emphasize
the target signal.
The output of the in-phase target signal is increased because the values are added, but the other
signals are relatively reduced in output, and only the signal in the target direction, that is, the
signal from the sin θ direction is emphasized. In S 205, the signal added by the adder 140 is
converted into an analog signal by 142 and amplified by an amplifier, and output as an acoustic
signal by the speaker 145. As described above, it is possible to focus the sound in the direction
instructed by the user with the pointing device at the time of reproduction. Further, since the
audio signal is stored in the memory for each channel, it is possible to change the focus direction
04-05-2019
4
each time the user changes the pointing direction. Second Embodiment Although the method of
designating the position where the user wants to focus the sound is disclosed in the first
embodiment, there are cases where the sound source position can not always be designated
accurately. Therefore, the system side discloses a method of searching for and focusing on an
accurate position. FIG. 5 is an explanatory view of a search range according to the present
embodiment. The user specifies the position of 310 on the display 300 as in the first
embodiment. However, if any sound source is present in this vicinity, there may be a deviation
between the user's designated position and the sound source position. In the present
embodiment, therefore, a method of searching for the vicinity of the user's designated position
indicated by 320 and accurately focusing on the sound source position will be described. FIG. 6 is
a block diagram according to this embodiment. The operation is the same as that of the first
embodiment until the addition at 440, and therefore the description is omitted. The signal
obtained at 440 is fed back to the delay shift unit 497. In 497, values in the range of sin θ ± α
are reset to search for the range of 320, and delay and addition are performed again to achieve
in-phase. This is repeated to determine the point at which the in-phase signal is maximum in the
range of sin θ ± α as the sound source direction, and the signal is output through 442 and
445. If the point designated by the user is set again, the maximum value is searched again in the
same manner to determine the output. Third Embodiment In the first and third embodiments,
acoustic processing is performed after focusing on sound at 140 or 440. The processing may be
performed at the stage of the digital signal before the DA or 142 or 442, or may be performed
after conversion to analog. As the type of processing, any generally used acoustic processing
such as echo, vibrato and distortion can be selected.
Fourth Embodiment In the first and third embodiments, when the focused audio signal is voice,
after focusing on the voice at 140 or 440, recognition is performed using voice recognition
technology, The recognition result may be displayed on a display, or the application may be
operated based on the recognition result. In this case, the suppression effect of suppressing the
noise coming from the other direction is obtained. As described above, according to the present
invention, it is possible to focus sound on an arbitrary point indicated by the user at the time of
reproduction. Since the focus of sound is processed afterward, if the user wants to focus on
another point during playback, the focus point can be changed at that point. Further, according
to the second embodiment, when the user thinks and instructs to focus on a certain sound
source, the correction can be made automatically even if there is a slight shift, and the focus can
be directed to the sound source. The effect is that you only need to specify. Further, according to
the third embodiment, various sound effects can be obtained by processing the obtained sound.
Furthermore, according to the fourth embodiment, when the focused acoustic signal is voice,
voice recognition can be performed to operate an application. Since sounds other than the
focused position are suppressed, an effect of preventing erroneous operation of speech
recognition under noise can be obtained. Furthermore, when a plurality of people are speaking, it
is possible to arbitrarily specify which speaker the user recognizes speech of. It is the figure
04-05-2019
5
which shows the constitution of the information output device in the form 1 of execution of this
invention. FIG. 2 is a flow chart showing an operation at the time of recording of the information
output apparatus in Embodiment 1 of the present invention. FIG. 3 is a flow chart showing an
operation at the time of reproduction of the information output apparatus in Embodiment 1 of
the present invention. FIG. 4 is a view for explaining the operation of the display and the pointing
device in Embodiment 1 of the present invention. FIG. 5 is a view for explaining an example of a
display and a pointing device according to a second embodiment of the present invention and a
sound source search range. FIG. 6 is a diagram showing a configuration of an information output
device in Embodiment 2 of the present invention. <Explanation of code> 100 microphone array
110 memory section 120 interlocking switch 130 delay device
04-05-2019
6
1/--страниц
Пожаловаться на содержимое документа