close

Вход

Забыли?

вход по аккаунту

JP2004185514

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004185514
PROBLEM TO BE SOLVED: To create a display by which participants can share a list of
proceedings by a simple operation without interrupting the progress of the meeting, and can be
an accurate record. To provide an audio recording device that makes The system comprises a
CPU 1 for controlling the entire system, a RAM 2 serving as a work memory, a hard disk 3 for
storing programs and data, a keyboard 6 and a mouse 7 serving as input events, The monitor 4 is
composed of a CRT, LCD, etc. for displaying images, an audio input interface 8 for inputting
audio data from the microphone 9, an audio output interface 10 for outputting audio data to the
speaker 11, and respective components The system bus 5 is provided. [Selected figure] Figure 1
Voice recording apparatus, voice recording method, voice recording program and recording
medium
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an
audio recording apparatus, and more particularly to video and audio recording and retrieval
techniques. 2. Description of the Related Art Conventionally, in a meeting or the like in which a
plurality of people participate, a white board or the like which allows a plurality of persons to
simultaneously view the written content and the posted content is used in many cases. And, on
this whiteboard, the explanation of the presenter etc., the opinion, the contents of the report, the
proceedings of the proceedings, etc. are entered and posted, and the various information on the
whiteboard is shared among the conference participants. The Whiteboard can not refer to verbal
speech over time, so it has a function to record what was said to refer to the contents during and
after the meeting, what has been discussed so far, and will be discussed in the future And a
function to promote the proceedings of the proceedings by sharing them among the conference
participants. However, for the purpose of recording, although it is desirable that the written
content on the board be detailed, it takes time to transfer the detailed content by writing, which
04-05-2019
1
hinders the proceedings of the proceedings. In addition, as the description content of one item
becomes more detailed, the amount increases, and as a result, the listability tends to decrease.
Therefore, if it is possible to accurately record verbal discussions without hassle, and support the
task of arranging and displaying a list on a board, it is considered to be useful for improving the
efficiency of the conference. As such, Japanese Patent Laid-Open No. 11-53385 discloses a
system for recording voice and the like and reproducing the recording after the conference.
According to it, it is supposed that the omission of retrieval is reduced by displaying the message
structure graphically and illustrating the same message as the user-specified message. Also, the
task of extracting a semantic structure from a series of utterance groups is a high-level
intellectual activity, and this is the action that participants should take during the meeting.
Therefore, if structure extraction and display content description are in charge of human beings
while accurate recording and reproduction are performed by machines, accurate recording and
advanced structure display can be realized while reducing the user load during a meeting. be able
to. From such a point of view, the same applicant has proposed a multimedia document creation
system in which a user designates and cuts out a part of moving image and audio data recorded
for a long time, and a system which reproduces a cutout part. According to it, in order to find a
cutout position (time range), temporal changes such as the sound source direction and the
retouched information are recorded and displayed spatially.
Also, it is possible to paste the clipped data into any application by copying it to a clip pod or the
like. In Japanese Patent Application Laid-Open No. 2002-247489, an apparatus combining a
microphone array and a video camera is used to estimate a sound source direction with a
microphone and record an entire image, and a position corresponding to the sound source
direction on the image It is disclosed about the technology which inputs a name and associates
an utterance and a name. [Patent Document 1] Japanese Patent Application Laid-Open No. 1153385 [Patent Document 2] Japanese Patent Application Laid-Open No. 2002-247489 [Problem
to be Solved by the Invention] However, according to Patent Document 1, a statement structure
used for graphic display As an example, a list by order of speaking time is given as an example
according to the speaker, but how to use a complicated semantic structure is not shown.
Moreover, although the method of attaching to a mail as a utilization method of extraction data is
shown by patent document 2, the other utilization method is not clarified. In view of such
problems, the present invention is a simple operation without interfering with the progress of the
conference, and a display object which can be shared by participants during the conference to
list the proceeding status and which can be an accurate record. An object of the present
invention is to provide a voice recording device that streamlines a meeting by creating it.
SUMMARY OF THE INVENTION In order to solve the problems, the present invention relates to
an audio data recording means for recording audio data input through an audio input interface,
and the audio Audio display means for graphically displaying audio data recorded by the data
recording means on a monitor; and image display means for displaying an image on a display
area by operation with a pointing device or keyboard operation via a user input interface. One of
04-05-2019
2
a file recording process, a file input process, an audio data paste process, a graphic drawing
process, and an audio reproduction process is performed based on an input event input through
the user input interface. A feature of the present invention is to execute various processing
programs based on input events input through a user input interface. Then, the processing
results are displayed in the display area of the display screen, the display contents also display
characters and images in the whiteboard area, and audio data is displayed as an image in the
audio display area. According to this invention, it is possible to share the progress of the meeting
during the meeting by a simple operation without interfering with the progress of the meeting,
and to create a display item that can be an accurate record. , Can progress the meeting
efficiently.
The voice data pasting process determines whether a time range is designated in the voice data
displayed on the monitor by the voice display unit, and the time range is designated. The audio
data corresponding to the time range is cut out to generate audio data and the audio data is
stored in a file. The minutes will be more accurate if the voice at that time can be referred to
correspond to the minutes in the meeting. Therefore, in the present invention, a time range is
specified for the sound data displayed on the monitor, and the sound data of the specified time
range is cut out and stored in a file. According to this invention, the audio data of the designated
time range is cut out and stored in the file, so that the audio data at that time can be reproduced
and confirmed later. A third aspect of the present invention is characterized in that the audio
data displayed on the monitor is volume data at each time, and the volume data is updated every
predetermined time. Audio data displayed on the display unit is volume data at each time.
Therefore, when displayed, it is preferable to update with new volume data at regular intervals.
According to this invention, the displayed volume data is updated every predetermined time, so
that new volume data can always be displayed. An audio data recording means for recording
each audio data input through a plurality of audio input interfaces, and an audio display means
for graphically displaying the audio data recorded by the audio data recording means on a
monitor Image display means for displaying an image in the display area by an operation by a
pointing device or a keyboard operation via a user input interface, and file recording processing,
file input based on an input event input by the user input interface It is characterized in that any
one of processing, audio data pasting processing, figure drawing processing, and audio
reproduction processing is performed. In claim 1, only one audio data is input via the audio input
interface. However, since a conference is generally conducted with a plurality of people, in the
present invention, a plurality of microphones are connected to the voice input interface to record
a plurality of voice data. According to this invention, the participant-specific microphone is used
to display the volume graph for each microphone. Also, since a pointer to a part of the entire
audio data is used without using individual section audio data, the file capacity can be saved, and
furthermore, the time range can be easily adjusted later.
04-05-2019
3
Preferably, the sound display means measures and displays the volume of audio data input
through the plurality of audio input interfaces, and adds the plurality of audio data to record as
one audio data. It features. The volume of audio data from each microphone is separately and
simultaneously displayed by the audio display means. Each voice data is added and recorded as
one voice data. According to this invention, the volume of the audio data is displayed individually,
and the audio data is added to one and stored, so that the display content can be shared and the
storage capacity of the data can be reduced. . A sixth aspect of the present invention is an audio
data recording means for recording each audio data input through two audio input interfaces,
and an audio for graphically displaying the audio data recorded by the audio data recording
means on a monitor. A display means, an image display means for displaying an image on the
display area by an operation by a pointing device or a keyboard operation via a user input
interface, and an image data recording means for recording an image input via the image input
interface The information processing apparatus is characterized in that the file recording
process, the file input process, the voice data pasting process, the graphic drawing process or the
voice reproduction process is performed based on an input event input through the user input
interface. If the position of the speaker during the meeting, that is, the direction of the sound
source, is known, it will be easier to confirm the contents of the meeting accurately and who has
made a speech. Therefore, in the present invention, in order to detect the sound source direction,
two voice data and means for recording the image of the speaker are provided. According to this
invention, the two voice data and the means for recording the image of the speaker are provided
to detect the sound source direction, so that the speaker can be accurately confirmed. According
to a seventh aspect of the present invention, the sound source measuring means further
comprises a sound source measuring means for measuring a sound source of the sound data
based on a phase difference of the sound data input through the two sound input interfaces.
While estimating a sound source direction and displaying a graph on the display area, an image
captured by the imaging device is displayed on the same screen. The present invention measures
and displays the direction of the sound source from the phase difference of the audio data input
to the two microphone arrays, and supports to easily specify the section. Further, by recording
the captured image, the image is simultaneously reproduced at the time of the reproduction of
the section voice, and the content can be more easily understood.
Furthermore, an image of the sound source direction is used for the section voice icon to make it
easy to understand the content of the icon. According to this invention, since the sound source
direction is detected and images are simultaneously recorded, it is easy to specify the section of
audio data, and the content can be easily understood. According to an eighth aspect of the
present invention, in the audio data pasting process, it is determined whether or not a time range
is designated in the audio data displayed on the monitor by the audio display means, and the
time range is designated. In this case, an average value of the sound source direction within the
time range is determined, and a partial image of the sound source direction corresponding to the
start time of the time range is cut out from the recorded image data and displayed at a
04-05-2019
4
predetermined position in the display area. It features. In the present invention, since the image
in the sound source direction of the designated section is cut out, the face of the speaker is
automatically displayed in addition to the comment, and a display that is easier to understand
can be realized. Further, in the audio reproduction, the moving image data is reproduced in the
image display area together with the audio of the designated section recorded in the link data.
According to this invention, as the related information of each item on the whiteboard, the image
is also reproduced in addition to the voice, so the contents can be more easily understood. An
audio data recording means for recording each audio data input via a plurality of audio input
interfaces, and an audio display means for graphically displaying the audio data recorded by the
audio data recording means on a monitor An image display unit for displaying an image on the
display area by an operation by a pointing device or a keyboard operation via a user input
interface; and an image data recording unit for recording an image input via a plurality of image
input interfaces And processing any one of file recording processing, file input processing, audio
data pasting processing, graphic drawing processing, and audio reproduction processing based
on an input event input through the user input interface. In claim 6, two microphones are
provided in one camera to record the sound source direction and the image of the speaker.
However, in that case, the sound source direction had to be calculated, which was troublesome.
Therefore, according to the present invention, a plurality of microphone cameras in which a
camera and a microphone are integrated are prepared, and each speaker is made to correspond
one to one. According to this invention, since the microphone camera is made to correspond to
the speaker on a one-to-one basis, it is not necessary to calculate the sound source direction, and
the voice data can be associated with the image immediately. A tenth aspect of the present
invention is characterized in that the audio display means displays the volume graph of each of
the audio data and the image data corresponding to the audio data on the same screen.
The present invention displays an image of a camera attached to each microphone in the audio
display area at the left end of the volume graph of the plurality of microphones. In addition,
although the image of the corresponding time is displayed on the icon, the content is not the cutout image, but the image of the camera attached to the microphone of which the average value of
the sound level is maximum in the section is selected. According to this invention, since the
volume graphs of the plurality of microphones and the images are displayed in association with
each other, it becomes easy to search for necessary audio data from the images. In the file
recording process, the image data drawn in the display area, the icon pasted in the display area,
and the link data representing the relationship between the audio data are given names. It is
characterized by being recorded in a file. When data is recorded in a file, it is necessary to
associate and record each relevant information. For that purpose, link data that records these
relationships are recorded along with the file name. According to the invention, since the link
data is recorded together with the file, the related file data can be searched quickly. The file input
process may be configured to read the image data and the link data from the file of the
determined name, and to display an icon associated with the image of the image data and the link
04-05-2019
5
data in the display area. It features. When loading a file, the icon of the loaded image also needs
to be displayed at the same time. According to this invention, since the icon associated with the
image of the image data and the link data is displayed on the display area when reading the file,
operability and convenience are improved. A thirteenth aspect of the present invention is
characterized in that the figure drawing processing draws a broken line on the screen by mouse
dragging that occurs in the display area. By responding to mouse events that occur in the display
area, it is possible to draw a broken line on the screen by mouse dragging as in a general
drawing tool. According to this invention, since a broken line is drawn on the screen by mouse
dragging, compatibility with normal computer operation is produced, and operability is
improved. Preferably, the voice reproduction process reads the position of a series of voice icons
pasted from the link data, reads a file of the voice icon coincident with the pointer position, and
reproduces voice. It is characterized by In order to search and reproduce the recorded voice data,
an icon corresponding to each voice data is read.
Then, when a pointer is selected from among the icons, audio data corresponding to the icon is
reproduced. According to this invention, since the icon of the audio data to be reproduced is
designated by the pointer, desired audio data can be reproduced by a simple operation. An audio
data recording step for recording audio data input through an audio input interface, an audio
display step for graphically displaying audio data recorded by the audio data recording step on a
monitor, and user input And an image display step of displaying an image on a display area by an
operation by a pointing device or a keyboard operation through an interface, and based on an
input event input by the user input interface, file recording processing, file input processing,
audio It is characterized in that any one processing of data pasting processing, figure drawing
processing or sound reproduction processing is performed. According to this invention, the same
function and effect as in claim 1 can be obtained. In the file recording process, the file recording
process records the image data drawn in the display area, the icon pasted in the display area, and
the link data representing the relationship between the audio data in a file having a given name.
It is characterized by According to this invention, the same function and effect as in claim 11 are
obtained. The file input processing may load the image data and the link data from the file of the
determined name, and display an icon associated with the image of the image data and the link
data in the display area. It is characterized by According to this invention, the same function and
effect as in claim 12 are obtained. The eighteenth aspect is characterized in that the figure
drawing processing draws a broken line on a screen by mouse dragging generated in the display
area. According to this invention, the same function and effect as in claim 13 are obtained.
According to a nineteenth aspect of the present invention, there is provided a computercontrollable program of the audio recording method according to any one of the fifteenth to
eighteenth aspects. According to this invention, by programming the audio recording method of
the present invention according to the OS that can be controlled by the computer, any computer
equipped with the OS can be controlled by the same processing method. A twentieth aspect is
characterized in that the sound recording program according to the twentieth aspect is recorded
04-05-2019
6
in a computer readable form. According to this invention, the program can be operated anywhere
by carrying the recording medium by recording the program on the recording medium in a
computer readable format.
BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be described in
detail below with reference to the embodiments shown in the drawings. However, the constituent
elements, types, combinations, shapes, relative arrangements, and the like described in this
embodiment are not intended to limit the scope of the present invention thereto alone, as long as
they are not specifically described, and are merely illustrative examples. . FIG. 1 is a diagram
showing a system configuration according to a first embodiment of the present invention. This
system displays an image, a CPU 1 that controls the entire system, a RAM 2 that functions as a
work memory, a hard disk 3 that stores programs and data, a keyboard 6 and a mouse 7 that
functions as input events, A monitor 4 composed of a CRT, an LCD, etc., an audio input interface
8 for inputting audio data from the microphone 9, an audio output interface 10 for outputting
audio data to the speaker 11, and a system bus 5 for connecting each component It is configured
with. Incidentally, a large screen display device such as a projector is effective as the monitor 4 in
order for the meeting participants to simultaneously view the display of the PC. However, it is
also possible to use a CRT, an LCD, etc. In particular, when focusing on the purpose of meeting
recording, it is also possible for a recording person to use personally without sharing the display.
FIG. 2 shows the display screen configuration of the program of this embodiment. The display
screen 15 is composed of two parts, a whiteboard area 16 and an audio display area 17. In this
configuration, the whiteboard area 16 is disposed on the upper side, but either may be used. FIG.
3 is a view showing the configuration of a program according to the present embodiment. Since
the same reference numerals are given to the same components, duplicate descriptions will be
omitted. This program comprises three sub-programs, a whiteboard display sub-program 22, an
audio recording sub-program 25, and an audio display sub-program 21, which are configured to
simultaneously execute these in time division. The user input IF 20 is a functional block provided
by a general operating system, and notifies the application of an input from the keyboard 6 and a
button down, up, or drag operation by the mouse 7 as an event. The voice input IF 8 inputs to the
application a digital signal obtained by A / D-converting an electrical signal input from the
microphone 9 connected to the computer. The audio output IF 10 D / A converts the digital
signal output from the application and outputs audio from a speaker connected to the computer.
The configuration and operation of each subprogram will be individually described below. 1.
The whiteboard display subprogram 22 simulates the function of a general whiteboard, inputs an
operation by a user such as a mouse or a pointing device such as a tablet or a keyboard
operation, and displays the result in the whiteboard display area 16. Further, voice data is input
from the voice display subprogram 21, and the graphic display is displayed in the voice display
area 17. FIG. 4 is a flowchart of this subprogram. After the program is started, first, the
04-05-2019
7
whiteboard display area 16 of the display screen is filled with white and initialized (S1). Next, the
user input from the keyboard or mouse is read (wait for an event) (S2). If the input is a key input
event of "end operation" (a route of YES in S3), after executing file recording processing
described later (S9), the whole program is ended. The end key is the "Q" key. If the input is a key
input event of "file input operation" (a route of YES in S4), file input processing is executed (S10)
and the whole program is ended. The file input key is the "R" key. If the input is a key input event
of "pasting operation" (a route of YES in S5), audio data pasting processing described later is
performed (S12), and user input is awaited again. If the input read is a "click" event of the mouse
button (YES route in S6), the graphic drawing processing described later is performed (S11), and
the user input is awaited again. If the read input is a "double click" event of the mouse button
(YES route in S7), voice reproduction processing described later is performed, and user input is
awaited again. Next, each of the above processes will be individually described. In the graphic
drawing process (S11), a broken line is drawn on the screen by mouse dragging in the same
manner as a general drawing tool by responding to the mouse event generated in the whiteboard
display area 16 as follows. Button down ... memorizes the position of the pointer. Drag: Draws a
straight line from the previous pointer position to the current pointer position. Also, the stored
position is updated to the current pointer position. Button up: Draw a straight line from the
previous pointer position to the current pointer position. The voice data paste processing (S12)
displays on the whiteboard a mark associated with voice data of a user's designated time range
recorded in a voice display subprogram described later.
{Circle over (1)} In the audio display subprogram, it is confirmed whether or not the time range is
designated. If it is not specified, the paste process ends. {Circle over (2)} If the time range is
designated, the voice data in which the data of the corresponding time range is cut out is
generated from the recorded voice data, and a series of numbers is stored in the file as a name.
Since the voice recording data is uncompressed PCM format as described later, the extraction
processing can be realized by simple seek. (3) Display a prompt to the user and input a comment
for the section voice from the keyboard. {Circle over (4)} An icon is displayed side by side with
the comment at the position recorded as the voice paste designated position described later. [5]
Add a new entry to voice link data stored. In the audio reproduction process (S 8), the position of
the double-clicked pointer is read. Loads the position of a series of voice icons pasted so far from
voice link data. If the pointer position is on any voice icon, the file of the corresponding number
recorded in the voice link data is read, and the voice is reproduced from the speaker through the
voice output IF. If the pointer position is not on any voice icon, the position is stored as the next
voice paste designation position. In the file recording process (S9), whiteboard image data to be
described later and link data are recorded in files of given names. In the file input process (S10),
whiteboard image data and link data to be described later are read from the file of the
determined name, and a whiteboard image and an audio link icon are displayed in the
whiteboard display area. The voice recording subprogram inputs voice data from a voice input IF
of the system and records the data on an external storage device such as a hard disk. 【
04-05-2019
8
0017】 2. The audio display subprogram graphically displays the audio data recorded by
the audio recording subprogram on the monitor. Also, the user's mouse operation is input, and
audio data of the time range designated by the user is output to the whiteboard display
subprogram. The subprogram simultaneously executes two execution units of a waveform
display unit which updates the screen display at predetermined time intervals and a section
designation unit which inputs designated section information by a user operation. In the
waveform display, as shown in FIG. 6, the screen of the audio display area 16 is updated at
predetermined time intervals. The audio display area 16 displays a volume graph 35 at each time
point, that is, the right end represents the current time and goes back to the left as it goes to the
left along the time axis facing right.
The display updates every 5 seconds and the graph moves to the left. Furthermore, if there is a
designated section display 36 described later, it moves left together with the graph every 5
seconds. Also, the width of the audio display area 16 is 640 pixels, and the time scale, that is, the
time corresponding to one pixel is 1 second / pixel. As a result, it is possible to display the voice
recording status for the past 640 seconds on the screen. In section designation, by mouse
operation into the voice display area 16, section information designated by the user is read and
stored, and it is superimposed and displayed on the voice display. Corresponds to the mouse
event occurring in the audio display area as follows. Button down ... Reads and stores the pointer
position. The time corresponding to that position is stored as the first time interval end. The time
corresponding to the position on the screen can be obtained from the pointer position and the
time scale, and the time start time currently displayed, that is, the time corresponding to the left
end of the audio display area. Drag: Draws a rectangle with the left and right ends of the button
down pointer position and the current pointer position, and displays the range. Button Up: The
pointer position is read, and the time corresponding to that position is stored as the second time
interval end. If there is a request from the audio data paste processing of the whiteboard display
subprogram together with the first time interval end described above, this is transmitted as a
specified time range. Next, the data structure of the program of the embodiment of the present
invention will be described. The whiteboard image data is raster image data of 640x480 pixels
monochrome 1 bit representing a figure drawn in the whiteboard display area. The audio link
data represents the association between the icon pasted in the whiteboard display area and the
audio data. FIG. 5 shows the structure of voice link data. It is a repetition of a line (text) 33
composed of xy coordinates (pixel unit, integer) 30, 31 on the whiteboard, a section voice data
file name (number) 32, and a comment input from the keyboard. Audio data is PCM data of 8
kHz 16 bits / s. Next, the entire operation of the present embodiment will be described as an
example of use viewed from the user. 1. Start 2. A recorder operating the device of the
invention launches the application with the start of the conference. The screen 15 as shown in
FIG. 2 is displayed on the screen. 3. Comment 4. As in a regular meeting, each participant
speaks verbally. 5. Cutting out 6. When the recording person judges that the message should
be displayed on the board, the recording person designates the time interval 36 of the message
04-05-2019
9
by mouse dragging while looking at the volume graph 35 displayed in the voice display area 17.
7. コピー 8. The recorder double-clicks an arbitrary position on the whiteboard display area
16 to specify the paste position, and then presses the V key to paste the voice icon.
Following the prompt displayed, a comment summarizing the contents of the message is entered
from the keyboard 6. 9. Addition Drag the mouse 7 on the whiteboard to write lines
connecting icons, other characters, figures, etc. 10. Replay During the discussion, the
participant views the whiteboard pasted and repeated, and understands the content. Among
those items, if there is an item for which the content can not be accurately recalled by the
comment alone, the recorder double-clicks the icon. Then, the voice of the time section
associated with the icon is reproduced from the speaker 11, and the conference participant can
remember the exact content. 11. Exit When the meeting is over, the recorder presses the
Q key to exit the application. 12. After playback If you want to review the content of the
meeting again after the meeting is over, start the application and press the R key to load the
recorded file. The same display as at the end of the previous conference is reproduced in the
whiteboard display area 16. You can also start from this state if you want to hold a previous
meeting. In this embodiment, the keyboard 6 is used for copy and paste operation of voice cutout
data, but Drag and drop to move the pointer while pressing the mouse button or a context
menu is popped up by clicking the right button It is also preferable that a general "copy and
paste" operation can be used, such as selecting "copy" or "paste" from among them. Also, in the
present embodiment, only the paste processing has been described for the voice link icon, but it
is also effective to correct the position and the comment after pasting as in a general graphic
editing program called a draw tool. is there. FIG. 7 is a diagram showing a system configuration
according to a second embodiment of the present invention. Since the same reference numerals
are given to the same components, duplicate descriptions will be omitted. FIG. 7 differs from FIG.
1 in that participant-specific microphones are connected to a plurality (here four) of voice input
interfaces 8. In the present embodiment, microphones 9a to 9d specific to the participants are
used to display volume graphs for the respective microphones. Also, a pointer to a part of the
entire speech data is used without using individual section speech data.
This saves file space and also facilitates later adjustment of the time range. For each voice data
input from the microphone, the volume of each is measured and displayed, and the result of
adding all four voice data is recorded as one voice data. FIG. 8 is a diagram showing the
configuration of a program according to the present embodiment. Since the same reference
numerals are given to the same components, duplicate descriptions will be omitted. FIG. 9 is a
view showing an example of a display screen according to the present embodiment. The audio
display area 40 is divided into four areas, and the volumes 9a to 9d of the respective
microphones are graphically displayed. Hereinafter, only differences from the first embodiment
will be described. In the audio data pasting process of the whiteboard display subprogram 42,
04-05-2019
10
when the time range is designated in the audio display subprogram, the corresponding time
range data is read. Enter a comment from the keyboard and display the icon. Update link data.
The voice reproduction process reads the start time and duration of the section from the link
data of the icon corresponding to the double click position. Only the designated section portion
of the recorded whole audio data is reproduced. Next, the data structure will be described. The
audio link data represents the association between the icon pasted in the whiteboard display area
and the audio data. FIG. 10 shows the structure of voice link data. From xy coordinates (pixel
unit, integer) 50, 51 on the whiteboard, start time of the section (relative time from recording
start time, second unit) 52 and duration (second unit) 53, comment 54 input from the keyboard
Is the repetition of the line (text) As compared with the first embodiment, since the section audio
data does not exist in the present embodiment, the capacity of the entire data to be recorded can
be reduced. In addition, by editing the voice link data, it is easy to correct the time interval
corresponding to each utterance. However, in the first embodiment, each utterance can be
reproduced with only the section voice data (without the entire voice data). FIG. 11 is a diagram
showing a system configuration according to a third embodiment of the present invention. The
present embodiment differs from the second embodiment in that a camera 58 is connected via an
image input IF 57. As in the prior art, by measuring and displaying the sound source direction
from the phase difference of the audio data input to the microphone array, the section can be
easily specified. Further, by recording the captured image, the image is simultaneously
reproduced at the time of the reproduction of the section voice, and the content can be more
easily understood.
Furthermore, an image of the sound source direction is used for the section voice icon to make it
easy to understand the content of the icon. FIG. 12 is an external view of the microphone array
and the video camera connected to the recording PC. Since the microphones 9a and 9b and the
camera 58 are fixed, the sound source direction estimated from the microphone array and the
lateral position on the image captured by the camera are measured and correlated in advance.
FIG. 13 is a diagram showing the configuration of the display screen of the present embodiment.
A whiteboard area 60 on the left side, a video display area 61 on the upper side of the audio
display area 63 on the right side, and a sound source direction display area 62 on the lower side.
FIG. 14 is a view showing an example of a display screen according to the present embodiment.
The audio display area 63 is vertically oriented on the right side of the screen, and is divided into
an upper video display area 61 and a lower sound source direction display area 62. Then, in the
video display area 61 and the sound source direction display area 62, the video image 61 and the
sound source directions 62a, 62b and 62c corresponding to the image are displayed as in the
prior art (Japanese Patent Application No. 2001-45838). . However, in the present invention, the
lower end of the area indicates the current time. FIG. 15 is a diagram showing the configuration
of a program according to this embodiment. Since the same reference numerals are given to the
same components, duplicate descriptions will be omitted. The differences from the second
embodiment will be described below. The moving image recording subprogram 70 sequentially
04-05-2019
11
receives image data from the image input IF 73, compresses the data, and records the
compressed data on the hard disk 3. The compression method is the well-known Motion JPEG
method, and the image size is 320 × 240 pixels, 1 frame / sec. The audio display subprogram
71 receives audio data of two channels from the audio recording subprogram 25 to estimate the
sound source direction, and displays the result in graph. Further, image data is input from the
image recording subprogram 70 and displayed in the image display area 61. In the audio data
pasting process of the whiteboard display subprogram, if a time range is designated in the audio
display area, the time range is read, the average value of the sound source direction within the
range is determined, and from the recorded image data, As shown in FIG. 16, the partial image
81 (80 × 120 pixels) of the sound source direction 82 at the section start time is cut out, and
the cut out image is displayed at a designated position on the whiteboard. By this processing,
since the image of the sound source direction of the section is cut out, the face of the speaker is
automatically displayed in addition to the comment, and a display that is easier to understand
can be realized.
Further, in the audio reproduction, the moving image data is reproduced in the image display
area together with the audio of the designated section recorded in the link data. As the related
information of each item on the whiteboard, the image is also reproduced in addition to the voice,
so the contents are easier to understand. FIG. 17 is a diagram showing a system configuration
according to the fourth embodiment of the present invention. Since the same reference numerals
are given to the same components, duplicate descriptions will be omitted. FIG. 17 differs from
FIG. 7 in that a plurality of devices combining one microphone 86 and one camera 85 as shown
in FIG. 19 are connected to the voice input interface 8 and the image input interface 57,
respectively. FIG. 18 is a view showing an example of a display screen of the present
embodiment. In the audio display area 93, images 91a to 91d of the cameras 58a to 58d
attached to the respective microphones are displayed at the left ends of the volume graphs 92a
to 92d of the respective microphones 9a to 9d. Also, the icon displays an image at the
corresponding time as in the third embodiment, but the content is not a cut-out image, but the
image of the camera attached to the microphone with the highest average sound level in that
section select. According to the inventions of claims 1 and 15 as described above, the
participants share the proceedings of the proceedings during the conference by a simple
operation without interrupting the proceedings of the conference. By creating a display that can
be recorded accurately and accurately, the conference can be advanced efficiently. In claims 2
and 16, voice data of a designated time range is cut out and stored in a file, so that voice data at
that time can be reproduced and confirmed later. In the third and seventeenth aspects of the
present invention, the displayed volume data is updated every predetermined time, so that new
volume data can be always displayed. In claims 4 and 18, the participant-specific microphones
are used to display a volume graph for each microphone. Also, since a pointer to a part of the
entire audio data is used without using individual section audio data, the file capacity can be
saved, and furthermore, the time range can be easily adjusted later. According to the fifth aspect
04-05-2019
12
of the present invention, the volume of the audio data is separately displayed, and the audio data
is added to one and stored, so that the display contents can be shared and the storage capacity of
the data can be reduced. . According to the sixth aspect of the present invention, the two voice
data for detecting the sound source direction and the means for recording the image of the
speaker are provided, so that the speaker can be identified accurately.
According to the seventh aspect of the present invention, since the sound source direction is
detected and images are simultaneously recorded, it is easy to specify a section of audio data and
to make it easy to understand the contents. According to the eighth aspect of the invention, as
the related information of each item on the whiteboard, an image is also reproduced in addition
to the voice, so that the contents can be more easily understood. In the ninth aspect, the
microphone camera is made to correspond to the speaker on a one-to-one basis, so there is no
need to calculate the sound source direction, and the voice data can be associated with the image
immediately. According to the tenth aspect of the present invention, since the volume graphs of
the plurality of microphones and the images are displayed in association with each other, it
becomes easy to search for necessary audio data from the images. In the eleventh aspect, since
the link data is recorded together with the file, related file data can be searched quickly. In the
twelfth aspect, since the icon associated with the image of the image data and the link data is
displayed on the display area when reading the file, operability and convenience are improved.
According to the thirteenth aspect of the present invention, since a broken line is drawn on the
screen by mouse dragging, compatibility with normal computer operation occurs and operability
is improved. In the fourteenth aspect, since the icon of the audio data to be reproduced is
designated by the pointer, desired audio data can be reproduced by a simple operation.
According to the nineteenth aspect, by programming the voice recording method of the present
invention in accordance with an OS that can be controlled by a computer, any computer
equipped with the OS can be controlled by the same processing method. According to the
twentieth aspect, by recording the program on a recording medium in a computer readable
format, the program can be operated anywhere by carrying the recording medium. BRIEF
DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a system configuration according
to a first embodiment of the present invention. FIG. 2 is a view showing a display screen
configuration of a program of the first embodiment of the present invention. FIG. 3 is a diagram
showing the configuration of a program according to the first embodiment of this invention. FIG.
4 is a flowchart of a subprogram of the present invention. FIG. 5 is a diagram showing the
structure of voice link data according to the present invention. FIG. 6 is a view showing an
example of a display screen according to the first embodiment of the present invention. FIG. 7 is
a diagram showing a system configuration according to a second embodiment of the present
invention. FIG. 8 is a diagram showing the configuration of a program according to a second
embodiment of the present invention. FIG. 9 is a view showing an example of a display screen
according to a second embodiment of the present invention.
04-05-2019
13
FIG. 10 is a diagram showing the structure of voice link data according to the present invention.
FIG. 11 is a diagram showing a system configuration according to a third embodiment of the
present invention. FIG. 12 is an external view of a microphone array and a video camera
connected to the recording PC of the present invention. FIG. 13 is a diagram showing the
configuration of a display screen according to a third embodiment of the present invention. FIG.
14 is a view showing an example of a display screen according to the third embodiment of the
present invention. FIG. 15 is a diagram showing a configuration of a program according to a third
embodiment of the present invention. FIG. 16 is a view showing a clipped image of the present
invention. FIG. 17 is a diagram showing a system configuration according to a fourth
embodiment of the present invention. FIG. 18 is a view showing an example of a display screen
according to a fourth embodiment of the present invention. FIG. 19 is an external view of a
microphone camera of the present invention. [Description of the code] 1 CPU, 2 RAM, 3 hard
disk, 4 monitor, 5 system bus, 6 keyboard, 7 mouse, 8 voice input interface, 9 microphone, 10
voice output interface, 11 speaker
04-05-2019
14
1/--страниц
Пожаловаться на содержимое документа