close

Вход

Забыли?

вход по аккаунту

JP2018148436

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018148436
Abstract: PROBLEM TO BE SOLVED: To provide an apparatus, system, method and program
capable of adding a sense of reality desired by a user and a user-specific expression. SOLUTION:
An apparatus generates an audio file according to an input, an audio acquisition unit 401 for
acquiring audio signals from a plurality of microphones, a means for receiving an input for
emphasizing directivity in a predetermined direction among audio signals. The audio file
generation unit 407 is provided. Moreover, the directivity setting part 403 which sets directivity
selection information for setting directivity based on the input of a reception means is further
provided. Furthermore, the audio file generation unit 407 converts the audio signal acquired by
the audio acquisition unit 401 based on the directivity selection information to generate a stereo
audio file. [Selected figure] Figure 4
Device, system, method and program
[0001]
The present invention relates to an apparatus, a system, a method and a program.
[0002]
With the spread of omnidirectional cameras, techniques for capturing omnidirectional video have
been developed.
When viewing such a omnidirectional animation, there is known a stereophonic sound
03-05-2019
1
technology that reproduces a stereophonic sound in accordance with the direction of the line of
sight.
[0003]
For example, Japanese Patent No. 5777185 (Patent Document 1) discloses a technique for
reproducing three-dimensional sound by recording with a plurality of microphones. That is, in
Patent Document 1, by synchronizing the image to be reproduced and the stereophonic sound, it
is possible to output stereophonic sound data according to the viewpoint position and the gaze
direction of the user.
[0004]
However, in the prior art including the patent document 1, when acquiring or reproducing sound
data such as voice, it was not possible to perform synthesis or conversion of stereophonic sound
desired by the user. Therefore, there has been a demand for a technique for adding the sense of
reality desired by the user and the expression unique to the user.
[0005]
The present invention has been made in view of the problems in the above-mentioned prior art,
and it is an object of the present invention to provide a system, an apparatus, a method and a
program capable of adding a sense of reality desired by a user Do.
[0006]
That is, according to the present invention, voice acquisition means for obtaining voice signals
from a plurality of microphones, reception means for receiving an input for emphasizing
directivity in a predetermined direction of the voice signals, and voice according to the inputs. An
apparatus is provided, comprising: generation means for generating a file.
[0007]
As described above, according to the present invention, there is provided an apparatus, system,
method and program capable of adding a sense of reality and a user-specific expression desired
by the user.
03-05-2019
2
[0008]
BRIEF DESCRIPTION OF THE DRAWINGS The figure which shows schematic structure of the
hardware of the whole system in embodiment of this invention.
The figure which shows a user's mounting a head mounted display.
The figure which shows the hardware configuration contained in the omnidirectional camera of
this embodiment, and a user terminal.
FIG. 2 is a software block diagram included in the omnidirectional camera of the present
embodiment. The figure which shows the block of the process which produces ¦ generates threedimensional sound data at the time of imaging ¦ photography. The figure which shows the block
of the process which produces ¦ generates three-dimensional sound data at the time of
reproduction ¦ regeneration. The figure explaining the example of the positional relationship of
the built-in microphone and external microphone which are contained in a omnidirectional
camera. The figure explaining the example of the directivity of each direction component
contained in the stereophonic sound file of ambisonics format. FIG. 7 is a view showing an
example of a screen for performing an operation of changing the directivity of the sensitivity
characteristic in the present embodiment. FIG. 7 is a diagram for explaining directivity when the
attitude of the omnidirectional camera system changes in the present embodiment. 6 is a
flowchart of processing for shooting a video including stereoscopic audio in the present
embodiment. 6 is a flowchart of processing for setting a voice acquisition mode in the present
embodiment.
[0009]
Hereinafter, the present invention will be described by way of embodiments, but the present
invention is not limited to the embodiments described later. In the drawings referred to below,
the same reference numerals are used for the common elements, and the description thereof will
be omitted as appropriate. Further, in the following specification, the term voice is not
limited to voices emitted by people, and is generally referred to as music, mechanical sounds,
motion sounds, and other sounds transmitted by vibration of air generically.
03-05-2019
3
[0010]
FIG. 1 is a diagram showing a schematic configuration of hardware of an entire system according
to an embodiment of the present invention. FIG. 1 exemplarily shows an environment configured
to include the omnidirectional camera system 110 in which the external microphone 110b is
connected to the omnidirectional camera 110a, the user terminal 120, and the head mounted
display 130. . The hardware can be connected to each other by wireless communication or wired
communication, and can transmit and receive various data such as setting data and shooting
data. Further, the number of hardware units is not limited to that shown in FIG. 1, and the
number included in the system is not limited.
[0011]
The omnidirectional camera 110a according to the present embodiment is configured to include
a plurality of imaging optical systems, and is imaged as an omnidirectional image with a solid
angle 4π steradian by combining the images captured by the imaging optical systems. be able to.
In addition, the omnidirectional camera 110a can also image omnidirectional images
continuously in time, and can thereby image omnidirectional moving images. When shooting an
omnidirectional video, the microphone unit of the omnidirectional camera system 110 can
acquire audio around the imaging environment.
[0012]
Note that the audio acquired by the omnidirectional camera system 110 can provide the user
with a realistic image as stereoscopic audio. In addition, in the case of acquiring threedimensional sound, the user can adjust sensitivity characteristics of each microphone unit to
emphasize and acquire speech in a direction desired by the user. As described above, by
adjusting the directivity of the microphone unit, it is possible to add more sense of reality and a
user-specific expression. The microphone unit provided in the omnidirectional camera system
110 may be built in the omnidirectional camera 110a, may be connected from the external
microphone 110b, and may be further combined.
[0013]
03-05-2019
4
Examples of the user terminal 120 according to the present embodiment include a smartphone
terminal, a tablet terminal, and a personal computer. The user terminal 120 can communicate
with the omnidirectional camera system 110 by wire or wirelessly, and is a device that displays a
setting for shooting and a captured image. The setting of the omnidirectional camera system 110
and the display of an image captured by the omnidirectional camera 110 a can be operated by
installing an application on the user terminal 120 in advance. In the following description of the
present embodiment, the function of setting the omnidirectional camera system 110 is described
as being held by the user terminal 120, but the embodiment is not limited. For example, the
omnidirectional camera system 110 may include a screen and perform various operations.
[0014]
The head mounted display 130 of the present embodiment is a device for viewing an
omnidirectional image and an omnidirectional animation. In the above description, an example in
which the image captured by the omnidirectional camera 110a is displayed on the user terminal
120 has been described, but in order to provide a more realistic viewing environment, a playback
device such as the head mounted display 130 is used. You may display it. The head mounted
display 130 is configured to include a monitor and a speaker, and is a device worn on the head of
the user. FIG. 2 is a view showing how the user mounts the head mounted display 130.
[0015]
As shown in FIG. 2, the monitor of the head mounted display 130 is provided near the eyes and
the speakers are in contact with both ears. The monitor can display a wide-angle image
corresponding to the user's field of view cut out from the omnidirectional image. In addition, the
speaker can output the sound recorded at the time of shooting the omnidirectional moving
image, and in particular, the sound to be output can be a stereo sound.
[0016]
The head mounted display 130 according to this embodiment includes a sensor for detecting an
attitude, such as a motion sensor. For example, as shown by an arrow shown by a broken line in
FIG. 2, it is possible to change the displayed image by following the movement of the head of the
user. As a result, the user can feel as if he were at the place where the image was actually taken.
03-05-2019
5
In addition, stereoscopic sound output from the speaker of the head mounted display 130 can
also be reproduced in synchronization with the field of view of the user. For example, when the
user changes the direction of the line of sight by moving the head, the voice from the sound
source in the direction of the line of sight can be emphasized and output. As a result, the user can
view the image and the sound according to the change of the direction of the line of sight, and
can view the realistic moving image.
[0017]
As shown in FIGS. 1 and 2, in the following description, the longitudinal direction of the
omnidirectional camera 110a and the user will be described as the x-axis, the lateral direction as
the y-axis, and the vertical direction as the z-axis. In addition, the vertical direction independent
of the orientation axes of the omnidirectional camera 110a and the user is referred to as the
zenith direction. Specifically, the zenith direction indicates a direction directly above the user on
the celestial sphere, and is a direction coincident with the anti-vertical direction. In the present
embodiment, the inclination angle of the omnidirectional camera 110a with respect to the zenith
direction indicates the inclination of the zenith direction along the opposing surface facing each
imaging optical system in the omnidirectional camera 110a. Therefore, when the omnidirectional
camera 110a is used in the default posture without tilting, the zenith direction coincides with the
z-axis direction.
[0018]
The schematic configuration of the hardware according to the embodiment of the present
invention has been described above. Next, the detailed hardware configuration of each device will
be described. FIG. 3 is a diagram showing a hardware configuration included in the
omnidirectional camera 110 a and the user terminal 120 of the present embodiment. The
omnidirectional camera 110a includes a CPU 311, a RAM 312, a ROM 313, a storage device 314,
a communication I / F 315, a voice input I / F 316, an imaging device 318, and an attitude sensor
319. The hardware is connected via a bus. Further, the user terminal 120 includes a CPU 321, a
RAM 322, a ROM 323, a storage device 324, a communication I / F 325, a display device 326,
and an input device 327. It is connected.
[0019]
03-05-2019
6
First, the omnidirectional camera 110a will be described. The CPU 311 is a device that executes a
program that controls the operation of the omnidirectional camera 110a. The RAM 312 is a
volatile storage device for providing an execution space of a program executed by the
omnidirectional camera 110a, and is used for storing and expanding programs and data. The
ROM 313 is a non-volatile storage device for storing programs, data, etc. executed by the
omnidirectional camera 110 a.
[0020]
The storage device 314 is a readable and writable non-volatile storage device that stores an
operating system (OS) or an application that causes the omnidirectional camera 110 a to
function, various setting information, captured image data, audio data, and the like. The
communication I / F 315 is an interface that communicates with other devices such as the user
terminal 120 and the head mounted display 130 according to a predetermined communication
protocol to transmit and receive various data.
[0021]
The voice input I / F 316 is an interface connected to a microphone unit for acquiring and
recording voice when shooting a moving image. The microphone unit connected to the voice
input I / F 316 is a nondirectional microphone 317a not having directivity of sensitivity
characteristics for a specific direction, or a directional microphone 317b having directivity of
sensitivity characteristics for a specific direction. And / or both. In addition to the microphone
unit built into the omnidirectional camera 110a (hereinafter referred to as "built-in microphone"),
an external microphone 110b is connected to the omnidirectional camera 110a in the voice input
I / F 316. You can also.
[0022]
The omnidirectional camera system 110 of the present embodiment can enhance and acquire
voice in a desired direction by adjusting the directivity of the built-in microphone of the
omnidirectional camera 110a and the external microphone 110b. . In addition, the microphone
unit of the present embodiment is configured to include at least four microphones in one device,
whereby the directivity of the sensitivity characteristic of the entire microphone unit is
determined. In addition, the detail about acquisition of three-dimensional sound is mentioned
03-05-2019
7
later.
[0023]
The imaging device 318 is configured to include at least two sets of imaging optical systems, and
is a device that captures an omnidirectional image in the present embodiment. The imaging
device 318 can generate an omnidirectional image by combining the images captured by the
imaging optical systems. The posture sensor 319 is, for example, an angular velocity sensor such
as a gyro sensor, detects the tilt of the omnidirectional camera 110a, and outputs it as posture
data. Further, the posture sensor 319 can calculate the vertical direction based on the detected
tilt information, and perform zenith correction of the omnidirectional image.
[0024]
The omnidirectional camera 110a can store image data, audio data, and posture data in
association with each other when shooting. With these various data, when viewing an image on
the head mounted display 130, it is possible to reproduce a video according to the user's
operation.
[0025]
Next, the user terminal 120 will be described. The CPU 321, the RAM 322, the ROM 323, the
storage device 324, and the communication I / F 325 included in the user terminal 120 are the
CPU 311, the RAM 312, the ROM 313, and the storage device 314 of the omnidirectional camera
110a described above. Since the communication I / F 315 corresponds to each other and has the
same function, the description is omitted.
[0026]
The display device 326 is a device as a display unit that displays the state of the user terminal
120, an operation screen, and the like to the user, and examples thereof include an LCD (Liquid
Crystal Display). The input device 327 is a device as input means for the user to operate the user
terminal 120, and examples thereof include a keyboard, a mouse, a stylus pen, and the like. In
03-05-2019
8
addition, the input device 327 may be a touch panel display combined with the function of the
display device 326. In addition, in the user terminal 120 of this embodiment, although the smart
phone terminal provided with the touch-panel display is demonstrated to an example, it does not
limit embodiment.
[0027]
The hardware configuration included in the omnidirectional camera 110a and the user terminal
120 according to the present embodiment has been described above. Next, functional means
executed by each hardware in the present embodiment will be described with reference to FIG.
FIG. 4 is a software block diagram included in the omnidirectional camera 110a of the present
embodiment.
[0028]
The omnidirectional camera 110a includes an audio acquisition unit 401, an external
microphone connection determination unit 402, a directivity setting unit 403, a signal processing
unit 404, an apparatus attitude acquisition unit 405, a zenith information storage unit 406, an
audio file generation unit 407, and an audio file. Each function means of the storage unit 408 is
included. Below, each function means is demonstrated.
[0029]
The sound acquisition unit 401 constitutes sound acquisition means in the present embodiment,
and outputs the sound acquired by the built-in microphone and the external microphone 110 b
as sound data. Further, the voice acquisition unit 401 can perform various processes on the
obtained voice, and can thereby output voice data. The audio data output from the audio
acquisition unit 401 is provided to the signal processing unit 404.
[0030]
The external microphone connection determination unit 402 constitutes external microphone
connection determination means in the present embodiment, and determines whether or not the
03-05-2019
9
external microphone 110 b is connected to the omnidirectional camera 110 a. The result of the
presence / absence of connection of the external microphone determined by the external
microphone connection determination unit 402 is output to the voice acquisition unit 401. When
the external microphone 110 b is connected to the omnidirectional camera 110 a, the audio
acquisition unit 401 acquires audio data by synchronizing the external microphone 110 b with
the built-in microphone.
[0031]
The directivity setting unit 403 configures directivity setting means in the present embodiment,
and sets the directivity of the sensitivity characteristic of the built-in microphone and the
external microphone 110b. The setting of the directivity can be performed, for example, by
receiving an input from an application installed in the user terminal 120. As an example, it can
be set by changing the shape of the polar pattern on the operation screen so as to emphasize the
directivity in a predetermined direction. The directivity setting unit 403 outputs the directivity of
the set sensitivity characteristic as directivity selection information and supplies the directivity to
the signal processing unit 404.
[0032]
The signal processing unit 404 constitutes signal processing means in the present embodiment,
performs processing such as various corrections on the audio data output from the audio
acquisition unit 401, and outputs the processed data to the audio file generation unit 407.
Further, the signal processing unit 404 can perform directivity synthesis or conversion using the
directivity selection information output from the directivity setting unit 403 as a parameter.
Furthermore, the signal processing unit 404 can perform directivity synthesis and conversion in
consideration of the tilt of the omnidirectional camera 110 a and the like based on the posture
data output from the device posture acquisition unit 405 and the zenith information recording
unit 406. .
[0033]
The apparatus attitude acquisition unit 405 constitutes an apparatus attitude acquisition unit in
the present embodiment, and acquires the inclination of the omnidirectional camera 110 a
detected by the attitude sensor 319 as attitude data. The zenith information recording unit 406
03-05-2019
10
constitutes zenith information recording means in the present embodiment, and records the tilt
of the omnidirectional camera 110 a based on the posture data acquired by the device posture
acquiring unit 405. As described above, since the device attitude acquisition unit 405 and the
zenith information recording unit 406 can appropriately correct the omnidirectional image by
acquiring the orientation of the omnidirectional camera 110a, the omnidirectional camera 110a
is inclined at the time of photographing. Even when the image is rotated, it is possible to reduce
the discomfort of the user at the time of image reproduction. Furthermore, when acquiring audio
¦ speech data, it can correct similarly. For example, even when the omnidirectional camera 110a
is rotated at the time of recording, the directivity of the sensitivity characteristic can be
maintained with respect to the direction of the sound source desired by the user.
[0034]
The audio file generation unit 407 constitutes audio file generation means in the present
embodiment, and generates audio data processed by the signal processing unit 404 as an audio
file in a format that can be reproduced by various reproduction apparatuses. The audio file
generated by the audio file generation unit 407 can be output as a stereo audio file. The audio
file storage unit 408 constitutes an audio file storage unit in the present embodiment, and stores
the audio file generated by the audio file generation unit 407 in the storage device 314.
[0035]
The above-described software block corresponds to functional means realized by the CPU 311
executing the program of the present embodiment to cause each hardware to function. Also, the
functional means shown in each embodiment may be entirely realized as software, or part or all
of them may be implemented as hardware that provides equivalent functions.
[0036]
So far, the hardware configuration of the omnidirectional camera 110a in the present
embodiment has been described. Below, the functional block which performs a specific process
which produces ¦ generates three-dimensional audio ¦ speech data from the acquired audio ¦
voice is demonstrated. FIG. 5 is a block diagram showing a process of generating stereoscopic
audio data at the time of shooting.
03-05-2019
11
[0037]
The functional block shown in FIG. 5 shows the voice acquisition unit 401, the signal processing
unit 404, and the voice file generation unit 407 in FIG. 4 in detail. FIG. 5 exemplifies the case
where a directional microphone is connected as the external microphone 110 b to the
omnidirectional camera 110 a whose built-in microphone is a nondirectional microphone. That
is, the built-in microphone is a nondirectional microphone unit including the microphones of CH1
to 4 (upper part in FIG. 5), and the external microphone 110b is a microphone unit having
directivity including the microphones of CH5 to 8 (FIG. 5) Bottom). Although FIG. 5 shows the
built-in microphone as a nondirectional microphone and the external microphone 110b as a
directional microphone, this is an example, and may be a combination other than this, and the
external microphone 110b may also be used. May not be connected.
[0038]
First, processing of an audio signal output from the built-in microphone will be described with
reference to the upper part of FIG. The audio signal input from each of the microphones (MICs)
of CH1 to CH4 has its signal level amplified by a preamplifier (Pre AMP). Generally, since the
level of the signal from the microphone is small, amplification to a predetermined gain by the
preamplifier makes it possible to make the circuit easy to handle in the circuit that performs the
subsequent processing. Also, in the pre-amplifier, impedance conversion may be performed.
[0039]
The audio signal amplified by the preamplifier is then digitized by the ADC (Analog to Digital
Converter). After that, frequency separation etc. is performed on the audio signal digitized by
various filters such as HPF (High Pass Filter), LPF (Low Pass Filter), IIR (Infinite Impulse
Response), FIR (Finite Impulse Response), etc. It will be.
[0040]
Next, in the sensitivity correction block, the sensitivity of the audio signal input from each
microphone and processed is corrected. Then, the compressor corrects the signal level. The
03-05-2019
12
sensitivity correction block and the correction process by the compressor can reduce the signal
gap between the channels of each microphone.
[0041]
After that, in the directivity synthesis block, the directivity setting unit 403 synthesizes voice data
with the sensitivity characteristic of directivity set by the user. That is, when the microphone unit
is a nondirectional microphone, the directional synthesis block adjusts the parameters of the
audio data output from the microphone unit based on the directivity selection information, in the
direction desired by the user. Synthesize voice data with directivity.
[0042]
The speech data synthesized by the directivity synthesis block is subjected to various correction
processes in the correction block. An example of the correction processing is correction of a
timing deviation or frequency caused by frequency separation in the pre-stage filter. The audio
data corrected by the correction block is output as a built-in microphone audio file, and stored in
the audio file storage unit 408 as stereo audio data.
[0043]
Audio files containing stereo audio data can be stored in ambisonics format as an example. In
ambisonics type audio files, each directivity of nondirectional W component, X component with
directivity in the x-axis direction, Y component with directivity in the y-axis direction, and Z
component with directivity in the z-axis direction Audio data having sex component is included.
The above-mentioned audio file format is not limited to the ambisonics format, and may be
generated and stored as a stereo audio file according to another format.
[0044]
Next, processing of the audio signal output from the external microphone 110b will be described
with reference to the lower part of FIG. The presence or absence of the external microphone 110
b is determined by the external microphone connection determination unit 402. If it is
03-05-2019
13
determined that the external microphone 110b is not connected, the following processing is not
performed. On the other hand, when it is determined that the external microphone 110b is
connected, the following processing is performed. The audio input from each of the microphones
5 to 8 (MIC) included in the external microphone 110 b is subjected to various signal processing
by a preamplifier, an ADC, an HPF / LPF, an IIR / FIR, a sensitivity correction block, and a
compressor. Since these various signal processings are the same as in the case of the built-in
microphone, detailed description will be omitted.
[0045]
Audio data is input to the directivity conversion block after the above-described signal processing
is performed. In the directivity conversion block, the directivity setting unit 403 converts voice
data with the sensitivity characteristic of directivity set by the user. That is, when the microphone
unit is a directional microphone, the directivity conversion block adjusts the parameters of the
audio data output from the four microphones constituting the microphone unit based on the
directivity selection information. Convert into directional voice data in a desired direction.
[0046]
The audio data converted by the directivity conversion block is subjected to various correction
processes in a correction block. The correction process is the same as that performed in the
correction block of the built-in microphone. The audio data corrected by the correction block is
output as an external microphone audio file, and stored in the audio file storage unit 408 as
stereo audio data. The external microphone audio file is also stored as three-dimensional audio
data of various formats, as with the built-in microphone audio file.
[0047]
The built-in microphone audio file and external microphone audio file generated and stored as
described above are transferred to various playback devices. For example, it can be reproduced
by a reproduction device such as the head mounted display 130, and can be viewed as
stereophonic sound.
[0048]
03-05-2019
14
In another embodiment, at the time of reproduction of a captured moving image, stereoscopic
audio data having directivity in the direction desired by the user can be generated. FIG. 6 is a
diagram showing a block of a process of generating stereoscopic audio data at the time of
reproduction in the present embodiment.
[0049]
In the embodiment shown in FIG. 6, the built-in microphone audio file is similarly generated by
the microphone, preamplifier, ADC, HPF / LPF, IIR / FIR, sensitivity correction block, and
compressor described in FIG. When the external microphone 110b is connected to the
omnidirectional camera 110a, an external microphone audio file is also generated in the same
manner. These generated built-in microphone audio files and external microphone audio files do
not have directivity of sensitivity characteristics at the generation stage.
[0050]
Next, each generated speech file is input to the directional synthesis block. Further, directivity
selection information set by the user in the directivity setting unit 403 is also input to the
directivity combining block. The directional synthesis block adjusts the parameters of the audio
data contained in the audio file based on the directivity selection information, and synthesizes
audio data having directivity in the direction desired by the user.
[0051]
Thereafter, the audio data synthesized by the directivity synthesis block is subjected to correction
processing such as timing deviation and frequency in the correction block. The audio data
corrected by the correction block is output as a stereo audio playback file to a playback device
such as the head mounted display 130 and can be viewed as stereo sound.
[0052]
03-05-2019
15
In addition to the directivity selection information, posture data of the omnidirectional camera
110a at the time of shooting can be input to the directivity combining block and the directivity
conversion block described in FIGS. 5 and 6. By combining or converting the directivity of
sensitivity characteristics together with posture data, the directivity to the direction of the sound
source desired by the user is maintained even when the omnidirectional camera 110a is tilted or
rotated at the time of recording. can do.
[0053]
In the above, the functional block which performs a specific process which produces ¦ generates
three-dimensional audio ¦ speech data from the acquired audio ¦ voice was demonstrated using
FIG. 5 and FIG. 6. Next, acquisition of the three-dimensional audio ¦ voice in this embodiment is
demonstrated. FIG. 7 is a view for explaining an example of the positional relationship between
the built-in microphone included in the omnidirectional camera 110a and the external
microphone 110b.
[0054]
FIG. 7A is a diagram showing the definition of the x-axis, y-axis, and z-axis when the
omnidirectional camera system 110 is in the normal posture state, and the front-rear direction of
the omnidirectional camera system 110 is x The axis, the left-right direction is defined as y-axis,
and the up-down direction is defined as z-axis. The omnidirectional camera system 110 shown in
FIG. 7A is provided with a built-in microphone. Further, an external microphone 110b is
connected to the omnidirectional camera 110a. Below, the case where four microphones are
included in each microphone unit of the built-in microphone and the external microphone 110b
will be described as an example.
[0055]
In order to obtain three-dimensional sound data efficiently using four microphones, it is
preferable that the arrangement of each microphone is not on the same plane. In particular, in
sound collection in the ambisonics format, a microphone is generally disposed at a position
corresponding to each vertex of a regular tetrahedron as shown in FIG. 7 (b). An audio signal
collected by the microphone in such an arrangement is also called an A format, even in the
ambisonics format. Therefore, it is preferable that the built-in microphone and the external
03-05-2019
16
microphone 110b included in the omnidirectional camera 110a of the present embodiment are
also arranged in a positional relationship corresponding to a regular tetrahedron as shown in
FIG. 7B. In addition, arrangement ¦ positioning of the microphone demonstrated by this
embodiment is an example, Comprising: An embodiment is not limited.
[0056]
The voice signal picked up in this way can be synthesized or converted into a signal
representation when picked up with a pick-up directional characteristic called B format by the
signal processing unit 404, as shown in FIGS. 3D audio files can be generated. FIG. 8 is a diagram
for explaining an example of the directivity of each directional component included in a
stereophonic sound file of the ambisonics format.
[0057]
The sphere shown in FIG. 8 is a schematic representation of the directivity of sound collection in
the default state. FIG. 8 (a) shows that it is omnidirectional because it represents directivity by
one sphere centered on the origin. In FIG. 8 (b), since directivity is expressed by two spheres
centered at (x, 0, 0) and (-x, 0, 0), directivity is in the x-axis direction Is shown. In FIG. 8C,
directivity is expressed in the y-axis direction since directivity is expressed by two spheres
centered on (0, y, 0) and (0, -y, 0). Is shown. In FIG. 8D, since directivity is expressed by two
spheres centered at (0, 0, z) and (0, 0, -z), there is directivity in the z-axis direction Is shown. That
is, FIGS. 8A to 8D respectively correspond to directivity components of the W component, the X
component, the Y component, and the Z component in the three-dimensional sound file shown in
FIGS. 5 and 6.
[0058]
In the present embodiment, the user can change the directivity of the sensitivity characteristic,
and the changed directivity is output as directivity selection information. The directivity selection
information having directivity in the direction desired by the user is processed by the directivity
synthesis block and the directivity conversion block as parameters for synthesizing or converting
the acquired speech. Then, the change of the directivity of the sensitivity characteristic by a user
is explained next. FIG. 9 is a view showing an example of a screen for performing an operation of
changing the directivity of the sensitivity characteristic in the present embodiment.
03-05-2019
17
[0059]
FIG. 9 shows an example of the screen of the user terminal 120 for changing the directivity of
the sensitivity characteristic of the omnidirectional camera system 110. The left view of FIG. 9
shows the positions of the omnidirectional camera system 110 and the sound source. FIG. 5 is a
plan view of the device showing an example of the relationship. The middle drawing of FIG. 9
shows a state in which the user operates the screen of the user terminal 120, and on the screen,
a polar pattern diagram of directivity of sensitivity characteristics in the default state of the
omnidirectional camera system 110 is displayed There is. On the right side of FIG. 9, a polar
pattern diagram of the directivity of the sensitivity characteristic after the change, which is
changed by the operation of the user shown in the middle of FIG. 9, is displayed. Below, the input
operation which emphasizes specific directivity by changing the directivity of a sensitivity
characteristic is demonstrated by making into an example the various situations shown by FIG.9
(a)-(d).
[0060]
FIG. 9A shows an example in the case where there is a sound source in the front-rear direction of
the omnidirectional camera system 110 and the directivity of the direction of the sound source is
selected. The polar pattern diagram of the xy plane is displayed on the screen in the middle view
of FIG. 9A, and the user performs an operation of spreading the two fingers touching the screen
up and down. By such an operation, as shown in the right side of FIG. 9A, the polar pattern
narrows in the y-axis direction, and can be set as a sensitivity characteristic having directivity in
the x-axis direction.
[0061]
FIG. 9B is an example in the case where there is a sound source at the upper part of the
omnidirectional camera system 110 and the directivity of the direction of the sound source is
selected. The polar pattern diagram on the zx plane is displayed on the screen in the middle view
of FIG. 9B, and the user performs an operation of moving the two fingers touching the screen to
the upper side. By such an operation, as shown in the right side of FIG. 9B, the polar pattern
spreads in the positive direction of the z axis, and can be set as a sensitivity characteristic having
directivity in one direction in the z axis direction.
03-05-2019
18
[0062]
FIG. 9C is an example in the case where there is a sound source in the lower left direction and the
upper right direction when viewed from the front of the omnidirectional camera system 110, and
the directivity of the direction of the sound source is selected. The polar pattern diagram of the
yz plane is displayed on the screen in the middle view of FIG. 9C, and the user performs an
operation of spreading the two fingers touching the screen in the lower left direction and the
upper right direction . By such an operation, the polar pattern can be changed as shown in the
right side of FIG. 9C, and can be set as a sensitivity characteristic having directivity in the
direction from the upper right to the lower left in the yz plane.
[0063]
FIG. 9D is an example in the case where there is a sound source on the right front of the
omnidirectional camera system 110 and an operation to select the directivity of the direction of
the sound source is performed. The polar pattern diagram of the xy plane is displayed on the
screen in the middle view of FIG. 9D, and the user performs an operation of moving the finger
touching the screen in the upper right direction. By such an operation, as shown in the right
figure of FIG. 9D, the polar pattern can be changed to have directivity in the upper right direction
of the xy plane, and the directivity is sharp with respect to the direction of the sound source It
can be set as a sensitivity characteristic.
[0064]
As described above, the user sets the directivity of the sensitivity characteristic, and the
directivity setting unit 403 outputs directivity selection information corresponding to the
changed polar pattern. In the present embodiment, by performing an operation on the polar
pattern diagram displayed on the screen, the user can easily understand visually and the
directivity of the sensitivity characteristic can be changed. In addition, although the operation by
a touch-panel display was illustrated in the example of FIG. 9, it does not limit to this, For
example, operation by other methods, such as mouse operation, may be used. Further, the
operation of changing the directivity of the sensitivity characteristic is not limited to that shown
in FIG. 9, and various directivity control information having directivity in the direction desired by
the user is generated by various operations. Can.
03-05-2019
19
[0065]
Further, in the present embodiment, by acquiring the attitude of the omnidirectional camera
system 110 and recording the zenith information, the directivity of the sensitivity characteristic
desired by the user is maintained even when the imaging attitude changes. be able to. FIG. 10 is a
diagram for explaining the directivity when the attitude of the omnidirectional camera system
110 changes in the present embodiment. In FIG. 10, the directivity of the sensitivity
characteristic shown in the right side of FIG. 9 (b) will be described as an example.
[0066]
FIG. 10A shows the case where the omnidirectional camera system 110 is in the default normal
posture state, which is the same as the posture shown in FIG. 9B. At this time, the user selects
directivity as in the polar pattern shown in the right diagram of FIG. 9B, and selects a mode for
recording with the zenith direction fixed. Therefore, the directivity of the sensitivity characteristic
shown in the right of FIG. 10 (a) is the same as that of FIG. 9 (b).
[0067]
It is assumed that the user changes the attitude of the omnidirectional camera system 110 as
shown in FIGS. 10B and 10C after performing an operation of recording the zenith direction. For
example, as shown in the left view of FIG. 10 (b), even if the omnidirectional camera system 110
is turned upside down, the polar pattern is fixed as shown in FIG. 10 (b) because the zenith
direction is fixed. As shown in the right figure, the shape has directivity extending in the negative
direction of the z-axis, and sound can be collected from a sound source in the zenith direction.
[0068]
Further, as shown in the left view of FIG. 10C, when the omnidirectional camera system 110 is
tilted by 90 ° in the lateral direction, the x-axis direction is the zenith direction. Therefore, the
polar pattern in this case has a directivity having a spread in the positive direction of the x-axis
as shown in the right of FIG. 10C, and the sound source in the zenith direction as in FIG. Sound
03-05-2019
20
can be collected from
[0069]
In the present embodiment, the attitude data of the omnidirectional camera system 110 is
acquired in this manner, and the zenith direction is fixed and recorded. Therefore, even when the
attitude of the omnidirectional camera system 110 changes at the time of shooting, the
directivity of the sensitivity characteristic with respect to the direction of the sound source can
be maintained, and sound collection from the direction desired by the user can be performed. In
the description of FIG. 10, the orientation of the omnidirectional camera system 110 has been
described as an example where it is inclined 90 ° and 180 ° with respect to the normal
orientation, but the orientation angle of the omnidirectional camera system 110 is Any angle can
be taken.
[0070]
Up to this point, the change of the directivity of the sensitivity characteristic and the attitude of
the omnidirectional camera system 110 at the time of shooting have been described. Next,
specific processing executed in the present embodiment will be described with reference to FIG.
FIG. 11 is a flowchart of a process of capturing an image including stereoscopic audio in the
present embodiment.
[0071]
In the present embodiment, the process starts from step S1000, and in step S1001, the voice
acquisition mode is set. The setting performed in step S1001 includes the presence or absence of
connection of the external microphone 110b, the setting of directivity selection information, and
the like, and the details of these settings will be described later.
[0072]
In addition, the omnidirectional camera 110a acquires surrounding sound at the time of start-up
or various settings, compares the signals from the microphones included in the microphone unit,
03-05-2019
21
and detects a defect when the user detects a defect. You can call attention. For example, in the
detection of a defect, it is assumed that audio signals are output from three of the four
microphones included in the microphone unit. On the other hand, when the signal level from the
remaining one microphone is low, it is determined that the microphone has a defect. As described
above, when the output of the signal of a part of the microphones is lowered or the microphones
are blocked, the directivity conversion and synthesis can not be appropriately performed, and
suitable three-dimensional audio data can not be generated. There is a risk. Therefore, when a
defect in the signal of each microphone is detected as described above, an alert notifying the user
of the occurrence of the defect is displayed on the user terminal 120 to prompt the user to take
action. The above-described process may be performed during shooting.
[0073]
Thereafter, the user inputs an instruction to start imaging in step S1002. The input in step
S1002 may be performed, for example, by pressing an imaging button provided on the
omnidirectional camera 110a. Alternatively, an instruction to start shooting may be transmitted
to the omnidirectional camera 110 a via an application installed in the user terminal 120.
[0074]
When the imaging start is input in step S1002, in step S1003, the omnidirectional camera 110a
acquires attitude data, defines information on the zenith direction, and records the information.
By defining the zenith information in step S1003, even if the attitude of the omnidirectional
camera system 110 changes during shooting, it is possible to acquire the voice in the direction
desired by the user.
[0075]
Thereafter, in step S1004, the mode set in step S1001 is referred to, and it is determined
whether or not the directivity of the sensitivity characteristic is set. If the directivity is set (YES),
the process branches to step S1005 to call the set directivity selection information, and then the
process proceeds to step S1006. If the directivity is not set (NO), the process branches to step
S1006.
03-05-2019
22
[0076]
In step S1006, image shooting and audio recording are performed in the set mode, and in step
S1007, it is determined whether an instruction to end shooting has been input. As in the case of
the shooting input in step S1002, the shooting end instruction is issued by pressing the shooting
button of the omnidirectional camera 110a or the like. If the end of shooting has not been input
(NO), the process returns to step S1006 to continue shooting and recording. If it is determined in
step S1007 that the end of shooting is input (YES), the process proceeds to step S1008.
[0077]
In step S1008, the image data and the audio data are stored in the storage device 314 of the
omnidirectional camera 110a, and the process ends in step S1009. In particular, speech data can
be subjected to directivity synthesis or directivity conversion, and can be stored in the speech file
storage unit 408 as stereo speech data.
[0078]
As described above, the omnidirectional camera system 110 can acquire an image and a sound
by the processing described above. Next, the details of the setting of the voice acquisition mode
in step S1001 will be described. FIG. 12 is a flowchart of the process of setting the voice
acquisition mode in the present embodiment, and corresponds to the process of step S1001 in
FIG.
[0079]
The setting of the voice acquisition mode starts the process from step S2000. In step S2001, it is
selected whether the recording mode is a mode in which the sensitivity characteristic of each
microphone is designated in a specific direction to acquire stereophonic sound or a mode in
which normal stereophonic sound is acquired. If the sensitivity characteristic is specified in a
specific direction and the mode for acquiring stereophonic sound is selected (YES), the process
branches to step S2002 and the mode for acquiring ordinary stereophonic sound is selected (NO
), The process branches to step S2006.
03-05-2019
23
[0080]
In step S2002, the input of directivity selection information is accepted. The directivity selection
information can be set, for example, by changing the polar pattern of the directivity of the
sensitivity characteristic by operating the user terminal 120, as shown in FIG. By the operation of
step S2002, the user can change the direction of the specific sound source to have directivity,
and can easily set the directivity.
[0081]
Thereafter, in step S2003, the external microphone connection determination unit 402
determines whether the external microphone 110b is connected to the omnidirectional camera
110a. If the external microphone 110b is connected (YES), the process proceeds to step S2004,
and if the external microphone 110b is not connected (NO), the process proceeds to step S2005.
[0082]
In step S2004, the sound acquisition mode is set as a mode for acquiring stereoscopic sound with
directivity in the selected direction by using the built-in microphone and the external microphone
110b in combination, and the process ends in step S2009. .
[0083]
Also, in step S2005, the voice acquisition mode is set as a mode for acquiring stereophonic voice
having directivity in the selected direction using only the built-in microphone, and the process
ends in step S2009.
[0084]
Next, the case where the mode for acquiring normal three-dimensional sound is selected in step
S2001 (NO) will be described.
After step S2001, when the process branches to step S2006, in step S2006, the external
microphone connection determination unit 402 determines whether the external microphone
03-05-2019
24
110b is connected to the omnidirectional camera 110a.
Note that the process of step S2006 can be performed in the same manner as the process of step
S2003. If an external microphone is connected (YES), the process proceeds to step S2007, and if
an external microphone is not connected (NO), the process proceeds to step S2008.
[0085]
In step S2007, the sound acquisition mode is set as a mode for acquiring normal threedimensional sound by using the built-in microphone and the external microphone 110b in
combination, and the process ends in step S2009.
[0086]
Also, in step S2008, the voice acquisition mode is set as a mode for acquiring normal
stereoscopic voice using only the built-in microphone, and the process ends in step S2009.
[0087]
The voice acquisition mode can be set by the processing described above.
The set voice setting mode can be used as a determination criterion of the determination process
in step S1004 of FIG.
Also, the directivity selection information input in step S2002 is called as a setting value in step
S1005, and is used as a parameter when acquiring a stereo sound.
[0088]
As described above, according to the embodiment of the present invention described above, it is
possible to provide an apparatus, system, method, and program capable of adding a sense of
reality desired by a user and a user-specific expression.
[0089]
Each function of the embodiment of the present invention described above can be realized by a
03-05-2019
25
device executable program described in C, C ++, C #, Java (registered trademark) or the like, and
the program of this embodiment is a hard disk drive, a CD- It can be stored and distributed in a
device readable recording medium such as a ROM, an MO, a DVD, a flexible disk, an EEPROM,
and an EPROM, and can be transmitted via a network in a format that other devices can use.
[0090]
As mentioned above, although this invention was demonstrated with embodiment, this invention
is not limited to embodiment mentioned above, As long as the effect ¦ action and effect of this
invention are exhibited within the range of the embodiment which those skilled in the art may
think about. , Are included in the scope of the present invention.
[0091]
DESCRIPTION OF SYMBOLS 110 ... All-sky camera system, 110a ... All-sky camera, 110b ...
External microphone, 120 ... User terminal, 130 ... Head mounted display, 311, 321 ... CPU, 312,
322 ... RAM, 313, 323 ... ROM, 314 , 324, storage device, 315, 325, communication I / F, 316,
voice input I / F, 317a, non-directional microphone, 317b, directional microphone, 318,
photographing device, 319, posture sensor, 326, display device , 327: input device, 401: voice
acquisition unit, 402: external microphone connection determination unit, 403: directivity setting
unit, 404: signal processing unit, 405: device attitude acquisition unit, 406: zenith information
recording unit, 407: voice File generation unit, 408 ... audio file storage unit
[0092]
Patent No. 5777185 gazette
03-05-2019
26
1/--страниц
Пожаловаться на содержимое документа