close

Вход

Забыли?

вход по аккаунту

JP2009296274

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009296274
An audio / video signal processing apparatus capable of automatically adjusting to an audio
suitable for an image according to an image scene is provided. In a video / audio signal
processing device, a video scene update detection unit detects a video scene update based on
decode information obtained from a video decoder during decoding processing, and a video
scene feature determination unit The feature of the new video scene is determined from the
decoded image output from the video decoder 11, and the sound field control information
generation unit 15 controls the sound field suitable for the video scene according to the feature
of the video scene. The sound field control information for generating the sound field is
generated, and the sound field adjustment unit 16 adjusts the sound field of the decoded sound
output from the sound decoder 12 based on the sound field control information. [Selected figure]
Figure 1
Video and audio signal processing device
[0001]
The present invention relates to a video and audio signal processing apparatus.
[0002]
Content stored in media such as digital television broadcast or online distributed moving image
content and DVD has a stream data format in which image data and audio data compressed and
encoded respectively are multiplexed.
[0003]
08-05-2019
1
Therefore, in the video / audio signal processing apparatus to which these contents are input,
first, the input stream data is separated into a video stream and an audio stream by a Demux
(multiplex signal separator).
[0004]
Thereafter, the video stream is decoded by the video decoder, and the decoded image is output to
the video output device after being image-adjusted by the video filter.
[0005]
On the other hand, the audio stream is decoded by an audio decoder, and the decoded audio is
output to an audio output device after being subjected to audio adjustment by an audio filter.
[0006]
Conventionally, when outputting such video and audio, in addition to simply reproducing the
input video and audio data as it is, some processing may be performed on the video or audio.
[0007]
For example, there has been proposed a digital broadcast receiving apparatus that
simultaneously highlights subtitles and audio output and notifies the user of scene switching
when a specific scene matching the user's preference is broadcasted (for example, Patent
Document 1) reference.).
[0008]
The proposed digital broadcast receiver meets the user's desire not to miss a favorite scene.
[0009]
By the way, as a request of the user, there is a request of automatically adjusting to an audio
suitable for the image according to the image scene.
For example, in a scene in which performers talk in a talk program, there is a demand for
automatically adjusting the sound so that human conversation can be easily heard.
08-05-2019
2
[0010]
However, the above-described proposed device has a problem in that the audio is only enhanced
by the switching of the scene, and the audio is not adjusted to the audio matched to the switched
scene.
JP, 2005-109925, A (page 3-4, FIG. 1)
[0011]
Therefore, an object of the present invention is to provide a video and audio signal processing
apparatus capable of automatically adjusting to a sound suitable for a video scene according to
the video scene.
[0012]
According to one aspect of the present invention, a video decoder for decoding a video stream, an
audio decoder for decoding an audio stream, and a video scene based on the decoding
information obtained from the video decoder at the time of the decoding by the video decoder.
When the start of a new video scene is detected by the video scene update detection means for
detecting update and the video scene update detection means, the video for which the feature of
the video scene is to be judged from the decoded image output from the video decoder Sound
field control information generating means for generating sound field control information for
controlling the sound field suitable for the video scene according to the scene feature judging
means and the characteristic of the video scene judged by the video scene characteristic judging
means And the sound decoder based on sound field control information output from the sound
field control information generation unit. Video audio signal processing device is provided,
characterized in that it comprises a sound field adjusting means for adjusting the sound field of
decoding speech being et output.
[0013]
According to the present invention, it is possible to automatically adjust to an audio suitable for
the video according to the video scene.
[0014]
Hereinafter, embodiments of the present invention will be described with reference to the
08-05-2019
3
drawings.
[0015]
In the present embodiment, it is assumed that video and audio content is a content in which a
performer talks on a talk program.
In this content, it is assumed that the image contains the appearance of a performer, in particular
an appearance centered on the face, and the sound mainly contains the voice of the performer.
[0016]
When the video and audio streams of the above-described video and audio content are input, the
video and audio signal processing apparatus according to the present embodiment adjusts and
outputs audio so that the performer can easily hear the conversation.
[0017]
FIG. 1 is a block diagram showing an example of the configuration of a video and audio signal
processing apparatus according to a first embodiment of the present invention.
[0018]
The video and audio signal processing apparatus 1 according to the present embodiment
includes a video decoder 11 for decoding an input video stream, an audio decoder 12 for
decoding an input audio stream, and decode information obtained from the video decoder 11
during decoding processing. When the start of a new video scene is detected by the video scene
update detection unit 13 that detects the update of the video scene based on the video scene
update detection unit 13 and the video scene update detection unit 13, the video from the
decoded image output from the video decoder 11 is According to the features of the video scene
determined by the video scene feature determination unit 14 that determines the features of the
scene and the video scene feature determination unit 14, sound field control information for
controlling the sound field suitable for the video scene is Based on the sound field control
information generation unit 15 to be generated and the sound field control information output
from the sound field control information generation unit 15 It includes a sound field adjusting
unit 16 for adjusting the sound field of the decoding audio output from the voice decoder 12, a
video filter 17 for performing a predetermined filtering process on the decoded image outputted
from the video decoder 11, a.
08-05-2019
4
[0019]
The video / audio signal processing device 1 determines whether a new video scene is a
performer's conversation scene every time the video scene changes.
For that purpose, the video scene update detection unit 13 detects video scene update.
[0020]
The video scene update detection unit 13 detects a video scene update based on the decode
information on the scene change obtained from the video decoder 11 during the decoding
process.
[0021]
Decoding information relating to a scene change is, for example, a moving image compression
coding standard H.264. In H.264, information indicates that the picture type has become I type,
or information that indicates that the value of the motion vector is broken up for each
macroblock.
[0022]
The video scene feature determination unit 14 detects the movement of the mouth from the face
detected by the face detection unit 141 and the face detection unit 141 that detects the face of a
person from the decoded image output from the video decoder 11, and speaks And a speech
determination unit 142 that determines whether or not to
[0023]
The face detection unit 141 detects whether the face of the person is included in the decoded
image using face recognition technology.
[0024]
The utterance determination unit 142 pays attention to the movement of the mouth in the face
detected by the face detection unit 141 and indicates the movement such as opening and closing
of the mouth, the face detected by the face detection unit 141 is uttered It is determined that
08-05-2019
5
[0025]
When the speech determination unit 142 determines that "the user is speaking", the video scene
feature determination unit 14 determines that the feature of the current video scene is the
"person's conversation scene".
[0026]
When the sound field control information generation unit 15 determines that the video scene
feature determination unit 14 determines that the person's conversation scene is displayed,
sound filter of frequency characteristic suitable for listening to person's conversation as
sound field control information. Generate information.
[0027]
The sound field adjustment unit 16 sets the frequency characteristic of the built-in sound filter
according to the "sound filter information of the frequency characteristic suitable for listening to
the person's conversation" output from the sound field control information generation unit 15,
and the sound decoder 12 Filter the decoded voice output from.
As a result, the sound field adjustment unit 16 outputs a sound adjusted so that the person's
conversation can easily be heard.
[0028]
Note that this audio filter process continues until the video scene update detection unit 13
detects a new video scene update, and the video scene feature determination unit 14 determines
that the new video scene is not a person's conversation scene. Be done.
[0029]
When the video scene feature determination unit 14 determines that the new video scene is not a
conversation scene of a person, the sound field control information generation unit 15 outputs
sound filter information of standard frequency characteristics as sound field control
information . Generate
08-05-2019
6
Thereby, the sound field adjustment unit 16 performs standard filter processing on the decoded
sound output from the audio decoder 12.
[0030]
According to such a present embodiment, it is determined whether or not a person's conversation
scene is included in the decoded image output from the video decoder, and when the person's
conversation scene is detected, the decoded audio output from the audio decoder On the other
hand, it is possible to automatically perform voice filter processing of frequency characteristics
suitable for listening to a person's conversation.
Thereby, it is possible to automatically make it easy to listen to the conversation of the person
shown in the video.
[0031]
In this embodiment, as the video and audio content, it is assumed that the video is a content such
that a mobile such as a car of a car race moves on the screen and the audio is monaural audio.
[0032]
The video / audio signal processing apparatus according to the present embodiment adjusts the
voice so as to emphasize the features of the moving body when the video stream and the audio
stream of the above-described video / audio content are input, and adjust the motion of the
moving body. The sound is moved as well, and an immersive voice is output.
[0033]
FIG. 2 is a block diagram showing an example of the configuration of a video and audio signal
processing apparatus according to a second embodiment of the present invention.
[0034]
The video and audio signal processing apparatus 2 according to the present embodiment
includes a video decoder 11 for decoding an input video stream, an audio decoder 12 for
decoding an input audio stream, and decode information obtained from the video decoder 11
08-05-2019
7
during decoding processing. When the start of a new video scene is detected by the video scene
update detection unit 13 that detects the update of the video scene based on the video scene
update detection unit 13 and the video scene update detection unit 13, the video from the
decoded image output from the video decoder 11 is According to the features of the video scene
determined by the video scene feature determination unit 24 that determines the features of the
scene and the video scene feature determination unit 24, sound field control information for
controlling the sound field suitable for the video scene is Based on the sound field control
information generation unit 25 to be generated and the sound field control information output
from the sound field control information generation unit 25 It includes a sound field adjusting
unit 16 for adjusting the sound field of the decoding audio output from the voice decoder 12, a
video filter 17 for performing a predetermined filtering process on the decoded image outputted
from the video decoder 11, a.
[0035]
In FIG. 2, the blocks having the same functions as in the first embodiment are given the same
reference numerals as in FIG. 1, and the detailed description thereof is omitted here.
[0036]
The video scene feature determination unit 24 of the present embodiment detects a moving
object from the decoded image output from the video decoder 11, and when the moving object
detection unit 241 detects a moving object, the video decoder And 11, a position information
generation unit 242 that generates position information of the moving object based on the
motion vector data included in the decode information output from 11.
[0037]
The mobile object detection unit 241 compares a pattern image extracted from the decoded
image with a reference pattern of a car, a train, an aircraft or the like registered in advance, and
detects a reference pattern having a high degree of coincidence. It is determined that a mobile
object of
[0038]
The mobile detection unit 241 generates mobile information on the type of the detected mobile.
[0039]
When the moving body detection unit 241 detects a moving body, the position information
generation unit 242 determines the position of the moving body based on the position of the
08-05-2019
8
image and the motion vector data included in the decode information output from the video
decoder 11. Generate information.
[0040]
When the moving object detection unit 241 detects a moving object, the video scene feature
determination unit 24 determines that the feature of the current image scene is the moving scene
of the moving object, and the moving object generated by the moving object detection unit 241
The information and the position information generated by the position information generation
unit 242 are output to the sound field control information generation unit 25.
[0041]
The sound field control information generation unit 25 is voice filter information for emphasizing
the detected features of the moving object based on the moving object information generated by
the moving object detection unit 241. For example, if the moving object is a car, an engine
Generate voice filter information that emphasizes sounds and the like.
[0042]
In addition, the sound field control information generation unit 25 generates sound intensity
information for changing the balance of the sound intensity on the left and right based on the
position information generated by the position information generation unit 242.
[0043]
The sound field adjustment unit 16 sets the frequency characteristic of the built-in sound filter
according to the sound filter information emphasizing the feature of the moving object
output from the sound field control information generation unit 25 and is outputted from the
sound decoder A filter process is performed on decoded speech.
[0044]
In addition, the sound field adjustment unit 16 changes the left and right sound intensity of the
sound output device such as a speaker according to the right and left sound intensity output
from the sound field control information generation unit 25.
[0045]
08-05-2019
9
Also in this embodiment, when the video scene update detection unit 13 detects a new video
scene update, the video scene feature determination unit 24 determines that a moving object is
not detected in the new video scene. The sound field control information generation unit 25
changes the sound field control information to "sound filter information of standard frequency
characteristics".
Thereby, the processing of the sound field adjustment unit 16 with respect to the decoded sound
output from the audio decoder 12 is changed to the standard filter processing.
In addition, the balance of the left and right voice intensity is also set to the standard state.
[0046]
According to such a present embodiment, it is determined whether or not the mobile body is
included in the decoded image output from the video decoder, and when the mobile body is
detected, the decoded audio output from the audio decoder is Voice filter processing can be
performed automatically to emphasize the features of the detected moving object, and voice can
be moved according to the movement of the moving object on the screen.
As a result, even with monaural audio content, it is possible to enjoy realistic audio in which the
sound moves in accordance with the movement of the moving object shown in the video.
[0047]
FIG. 1 is a block diagram showing an example of the configuration of a video and audio signal
processing apparatus according to a first embodiment of the present invention.
FIG. 5 is a block diagram showing an example of the configuration of a video and audio signal
processing apparatus according to a second embodiment of the present invention.
Explanation of sign
08-05-2019
10
[0048]
1, 2 video / audio signal processing device 11 video decoder 12 audio decoder 13 video scene
update detection unit 14, 24 video scene feature determination unit 15, 25 sound field control
information generation unit 16 sound field adjustment unit 17 video filter 141 face detection
unit 142 Speech determination unit 241 Mobile object detection unit 242 Position information
generation unit
08-05-2019
11
1/--страниц
Пожаловаться на содержимое документа