JP2013239938

Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013239938
Abstract: An apparatus for checking an outside video image by a plurality of cameras connected
to a plurality of microphones and a network is performed with low cost and low weight. In a
sound source detection system in which a plurality of camera units and an ECU unit are
connected via a network, a plurality of camera units respectively perform first time
synchronization processing time synchronization with the ECU unit via the network Voice coding
processing in which the unit, the microphone for detecting voice information, the voice
information, and the time information held by the first time synchronization processing unit are
associated with each other as encoded voice information and encoded, and output through the
network A second time synchronization processing unit that synchronizes time with a plurality of
camera units via a network, an audio decoding processing unit that decodes encoded voice
information, and a plurality of camera units. Based on the time information that each of the
transmitted encoded voice information has, the direction of the sound source is calculated based
on the difference between the times at which the predetermined voices reached the respective
microphones. And a detection processing section. [Selected figure] Figure 1
Sound source detection system
[0001]
The present invention relates to a sound source detection system.
[0002]
There exist patent document 1 and patent document 2 as a technique which detects and notifies
03-05-2019
1
presence of an external emergency vehicle in a motor vehicle, for example.
These are for detecting the direction of the emergency vehicle based on a plurality of voices
detected by a plurality of microphones (hereinafter referred to as microphones) installed outside
the vehicle.
[0003]
On the other hand, there have been put into practical use an outside-of-vehicle image
confirmation apparatus in which a plurality of cameras installed outside the vehicle are
connected to the image processing apparatus, and a plurality of outside-vehicle images captured
by the cameras are combined and displayed by the image processing apparatus. As means for
connecting the camera and the image processing apparatus, there are an analog method and a
method of digitizing a signal and transmitting it through a network such as Ethernet (registered
trademark). When the resolution of the camera is about the standard television image quality, an
analog method using a composite signal is often used in that the number of wires can be
reduced. However, in recent years, there has been an increase in cases where higher resolution of
the camera is required to improve visibility and the like. Although high-resolution camera images
can not be analog-transmitted using a composite method, using a network such as Ethernet
(registered trademark), digitalized image signals can be compressed and transmitted over the
network It is possible and effective as a system for transmitting high resolution video at low cost.
[0004]
Furthermore, if a network is used, it is possible to transmit information other than camera video
signals as well. For example, it is also possible to transmit an audio signal collected by a
microphone attached to the camera, and other auxiliary information as needed.
[0005]
Japanese Patent Laid-Open No. 6-328980
[0006]
If you want to mount both the detection system for emergency vehicles with multiple
03-05-2019
2
microphones and the confirmation device for external video with multiple cameras connected to
the network by analog method, it is necessary to respectively install a cable for audio and a
network for cameras There is a problem of increased cost and weight.
[0007]
On the other hand, as described in the background art, it is also possible to digitally transmit an
audio signal together with camera video information using a network for a camera.
However, when transmission is performed using a network, there may be delays associated with
digitization, compression processing, packet processing, buffer processing, and the like.
In a system using a plurality of cameras, there is a problem that processing and transmission
delay time accompanying transmission and reception from each camera slightly differ and
transmission timing may be shifted.
[0008]
In order to solve the above problems, for example, the configuration described in the claims is
adopted. The present application includes a plurality of means for solving the above-mentioned
problems, and the sound source detection system in which a plurality of camera units and an
ECU unit are connected via a network is an example of the plurality of means. Respectively code
the first time synchronization processing unit that synchronizes with the ECU unit via the
network, the microphone that detects voice information, the voice information, and the time
information that the first time synchronization processing unit holds A voice encoding processing
unit that associates the encoded information as encoded voice information and outputs it via a
network, and the ECU unit includes a second time synchronization processing unit that performs
time synchronization with a plurality of camera units via the network A voice decoding
processing unit for decoding the encoded voice information, and time information of the encoded
voice information transmitted from the plurality of camera units. To O emissions based on the
difference between the time the predetermined sound arrives, characterized in that it comprises a
detection processing unit for calculating the direction of the sound source, the.
[0009]
03-05-2019
3
According to the present invention, it is possible to realize an easy-to-use detection device for an
emergency vehicle.
[0010]
It is an example of the general view of an emergency vehicle detection system.
It is an example of the figure which shows the detail of camera 1a-1d. It is an example of the
figure which shows the detail of ECU2. It is an example of the figure which shows the audio ¦
voice which arrives at cameras 1a-1 d. It is an example of the display screen which shows the
direction of an emergency vehicle. It is an example of the figure which shows the detail of camera
1a-1d. It is an example of the figure which shows the detail of the speech coding part 12. FIG. It
is an example of the figure which shows the detail of audio ¦ voice decoding part 21a-21d. It is
an example of the figure which shows the detail of the method of calculating the direction of a
sound source from the difference of the arrival time of sound. It is an example of the figure which
shows the detail of the characteristic information analysis part 17. FIG.
[0011]
Examples will be described below with reference to the drawings.
[0012]
[System Configuration] FIG. 1 is an overall view of an emergency vehicle detection system.
4 is a vehicle, 1a to 1d are cameras, 2 is an electronic control unit for video and audio
(hereinafter referred to as ECU), and 3a to 3d are network cables.
[0013]
The cameras 1a to 1d installed outside the vehicle 4 shoot images and collect sound, and the
signals are sent to the ECU 2 via the network cables 3a to 3d.
[0014]
03-05-2019
4
[Configuration of Camera] An example of the cameras 1a to 1d will be described with reference
to FIG.
11 is an audio input unit, 12 is an audio encoding unit, 13 is a video input unit, 14 is a video
encoding unit, 15 is a network processing unit, and 16 is a time synchronization processing unit.
[0015]
The voice input unit 11 includes a microphone, an amplifier, an A / D converter, and the like, and
converts collected voice into a digital signal. The digitized voice signal is input to the voice coding
unit 12.
[0016]
The speech encoding unit 12 encodes the speech signal into a format suitable for network
transmission. The encoded voice signal is input to the network processing unit 15.
[0017]
The video input unit 13 includes a lens, an imaging sensor, a video processing unit, and the like,
and converts a captured video into a digital signal. The digitized video signal is input to the video
encoding unit 13.
[0018]
The video encoding unit 14 encodes the input video signal into a format suitable for network
transmission. The encoded video signal is input to the network processing unit 15.
[0019]
03-05-2019
5
The network processing unit 15 performs packetization processing (for example, IP
packetization) necessary for network transmission on the input encoded audio signal and
encoded video signal, and sends the packetized processing to the network.
[0020]
The time synchronization processing unit 16 exchanges a time synchronization packet including
information on time with the ECU 2 through the network processing unit 15, and corrects the
time based on the information.
Thereby, the same time is maintained between the ECU 2 and the cameras 1a to 1d. The time
information maintained by the time synchronization processing unit 16 is supplied to the audio
encoding unit 12 and each unit of the video encoding unit 14.
[0021]
The speech encoding unit 12 adds time information corresponding to the input digital speech
signal and encodes the same.
[0022]
The processing content of the speech encoding unit 12 will be described in more detail with
reference to FIG.
FIG. 7 is an example of a diagram showing details of the speech encoding unit 12. The speech
coding unit 12 includes a compression coding unit 121 and a packetization processing unit 122.
Further, the packetization processing unit 122 includes a division processing unit 1221, a header
generation unit 1222, a header addition unit 1223, and a padding unit 1224. The digital speech
signal input to the speech coding unit 12 is input to the compression / coding unit 121.
[0023]
The compression / encoding unit 121 performs compression / encoding processing of the audio
03-05-2019
6
signal. As an example of compression / encoding processing, for example, G. 711 and G. There
are 729 and so on. The encoded speech signal output from the compression / encoding
processing unit 121 is input to the packetization processing unit 122.
[0024]
The packetization processing unit 122 performs packetization processing on the input encoded
speech signal. The packetization process referred to here is not the lower-order packetization
process such as the IP packetization in the above-mentioned network processing unit 15 but the
higher-order packetization process closer to the application. As an example, for example, RTP
packetization processing can be mentioned. The encoded speech signal input to the packetization
processing unit 122 is first input to the division processing unit 1221.
[0025]
The division processing unit 1221 divides the input encoded voice signal into a predetermined
size suitable for packetization, and sequentially outputs the same to the header addition unit
1223.
[0026]
The header generation unit 1222 generates data of a header portion to be added at the time of
packetization.
The header generation unit 1222 receives the time information supplied from the time
synchronization processing unit 16 described above, generates a time stamp using the time
information, and includes the data in the header portion. Furthermore, the header generation unit
1222 sequentially outputs the generated data of the header unit to the header attachment unit
1223.
[0027]
The header attachment unit 1223 receives the divided encoded speech signal from the division
processing unit 1221, and adds the header data received from the header generation unit 1222
03-05-2019
7
immediately before that. The processing result of the header attachment unit 1223 is output to
the padding processing unit 1224.
[0028]
The padding processor 1224 appropriately inserts padding data at a predetermined position of
the packet in order to adjust the length of the entire packet. The data to which padding data has
been added in the padding processing unit 1224 is output from the packetization processing unit
122 as a packet.
[0029]
By the processing of the packetization processing unit 122 described above, for each of the
encoded speech signals divided into predetermined units, a header having time information
corresponding to the encoded speech signal is grouped and output as a packet. Although the
video signal input from the video input unit 13 is also different from the video and the audio, the
video code is processed as an encoded video signal in which a header having time information is
processed in the same manner as described in FIG. 7. It is output from the conversion unit 14.
[0030]
[Configuration of ECU] Next, the ECU 2 will be described with reference to FIG. FIG. 3 is an
example of a diagram showing details of the ECU 2. 21a to 21d are speech decoding units, 22 is
a detection processing unit, 23 is a time synchronization processing unit, 24 is a video decoding
unit, 25 is a display processing unit, 26 is a monitor, and 27 is a network processing unit.
[0031]
The time synchronization processing unit 23 exchanges time synchronization packets including
information on time with the cameras 1a to 1d via the network processing unit 27, and corrects
the time based on the information. Thereby, the same time is maintained between the cameras 1a
to 1d. The time information maintained by the time synchronization processing unit 23 is
supplied to each unit inside the ECU 2 including the speech decoding units 21 a to 21 d and the
03-05-2019
8
detection processing unit 22.
[0032]
The video decoding unit 24 receives the coded video signals of the cameras 1a to 1d, and
performs decoding processing of each video. The decoded video signals are output to the display
processing unit 25.
[0033]
The display processing unit 25 performs processing such as selection and combination on the
decoded video signal, and shapes the video signal into a form suitable for display. The processing
result of the display processing unit 25 is output to the monitor 26 and displayed to the user.
[0034]
The network processing unit 27 receives the encoded audio signal and the encoded video signal
transmitted from each camera of the cameras 1a to 1d via the network, and performs a process
of releasing the packetization. This processing corresponds to processing just the reverse of the
above-described packetization processing of the network processing unit 15 in the cameras 1a to
1d. The resulting encoded audio signal is output to the audio decoding units 21a to 21d, and the
encoded video signal is output to the video decoding unit 24.
[0035]
The voice decoding units 21a to 21d receive coded voice signals corresponding to the cameras
1a to 1d, respectively, and perform voice decoding processing. The decoded audio signal is
output to the detection processing unit 22. The processing content of the speech decoding units
21a to 21d will be described in more detail with reference to FIG.
[0036]
03-05-2019
9
FIG. 8 is an internal block diagram common to the speech decoding units 21a to 21d. The speech
decoding units 21 a to 21 d include a depacketization processing unit 211, an expansion /
decoding unit 212, a buffer unit 213, and a transmission control unit 214. Further, the
depacketization processing unit 211 includes a padding deletion unit 2111, a header separation
unit 2112, and a header analysis unit 2113. The encoded speech signals input to the speech
decoding units 21 a to 21 d are input to the depacketization processing unit 211.
[0037]
The depacketizing processing unit 211 depacketizes the input encoded speech signal. This
processing corresponds to processing just the reverse of the above-described packetization
processing unit 122 in the cameras 1a to 1d. The encoded speech signal input to the
depacketizing processing unit 211 is first input to the padding removal unit 2111.
[0038]
When padding data is added to the encoded speech signal, the padding deletion unit 2111
performs processing to delete the padding data. The packet resulting from deletion of the
padding data is output to the header separation unit 2112.
[0039]
The header separation unit 2112 separates the data of the header part of the input packet and
the data of the remaining encoded speech signal. The data of the header part among the
separated data is output to the header analysis unit 2113. Further, the remaining encoded
speech signal is output to the decompression / decoding unit 212.
[0040]
The header analysis unit 2113 analyzes the input data of the header unit and extracts
parameters included in the header unit. Among the parameters to be extracted, time stamps are
also included. The extracted time stamp is output to the delivery control unit 214.
03-05-2019
10
[0041]
The decompression / decoding unit 212 performs decompression / decoding processing of the
encoded speech signal input from the header separation unit 2112. This process corresponds to
the process exactly the same as the above-described compression / encoding unit 121 in the
cameras 1a to 1d. The digital audio signal output from the decompression / decoding unit 212 is
input to the buffer unit 213.
[0042]
The transmission control unit 214 receives time information output from the time
synchronization unit 23 in addition to the time stamp input from the header analysis unit 2113.
The transmission control unit 214 compares the value of the time stamp with the time
information, and issues a data output instruction to the buffer unit 213 at a timing when both
coincide with each other.
[0043]
The buffer unit 213 temporarily accumulates the digital audio signal output from the expansion /
decoding unit 212, and outputs the accumulated digital audio signal when receiving the above
data output instruction from the transmission control unit 214. Do.
[0044]
As described above, by outputting the digital audio signal at the timing when the time stamp and
the time information coincide, the digital audio signals output from the audio decoding units 21a
to 21d are output in a relatively synchronized state. It is possible to do so.
[0045]
In reality, a predetermined offset may be added to the above time stamp and / or time
information in consideration of the fact that finite time is required for transmission / reception in
the network, expansion / decoding processing, and the like.
03-05-2019
11
Even in that case, it is possible to keep the relative output timing of the digital audio signal
output from the audio decoding units 21a to 21d constant.
[0046]
[Vehicle detection processing] Next, the movement when detecting an emergency vehicle will be
described.
When the siren sound emitted by the emergency vehicle 5 reaches the microphones of the
cameras 1a to 1d as shown in FIG. 1, the time when the siren sound reaches each microphone is
the distance from the emergency vehicle as shown in FIG. It depends.
[0047]
FIG. 4 is an example of a diagram showing audio reaching the cameras 1a to 1d. In FIG. 4, the
horizontal axis indicates time t, and indicates that time t is advanced as it goes to the right. FIG. 4
shows an example in which sirens of an ambulance are detected by the microphones 1a to 1d.
Since the sirens of the ambulance alternately repeat high frequency sound and low frequency
sound, H showing a high sound in the figure is shown. , L indicating a low sound is alternately
detected along the time series. As described above, the audio encoding unit 12 of the cameras 1a
to 1d adds time information corresponding to the input digital audio signal and encodes it, so
that the signal is transmitted through the network by any chance. Even if fluctuations occur in
the transmission time due to jitters or the like, the speech decoding units 21a to 21d of the ECU
2 select the speech signals of the respective microphones based on the reference time
information supplied from the time synchronization processing unit 23. It is possible to restore
temporal relationships correctly. The detection processing unit 22 calculates the direction of the
emergency vehicle based on the difference in arrival time of the siren sound to each of the
microphones described above. In that case, the above-mentioned change point of the high sound
and the low sound can be used as an object of comparison.
[0048]
The method of calculating the direction of the sound source from the difference in the arrival
03-05-2019
12
time of the sound will be described in more detail with reference to FIG. In FIG. 9A, consider a
situation in which two microphones A and B are placed apart by a distance s, and sound from the
sound source is incident on them from the direction of the angle θ. If the distance from the
sound source to the microphone is sufficiently long, the sound arriving at the microphone A and
the sound arriving at the microphone B can be regarded as parallel, so that d = s · cos θ
(equation 1) can be expressed in FIG. it can. On the other hand, if the sound velocity is v and the
sound from the sound source reaches the microphone A with a delay of time T with respect to
the microphone B, the distance d between the microphones A and B is d = T · v (Equation 2)
expressed. As expressed by cos θ = T · v / s (Eq. 3) from (Eq. 1) and (Eq. 2), the direction θ of
the sound source can be determined if the difference T in arrival time is known.
[0049]
As shown in FIG. 9B, the sound from the upper right direction (sound source 1) and the sound
from the lower right direction (sound source 2) are at the same angle θ, so they can not be
distinguished by only two microphones. In this regard, it is possible to determine the direction by
combining the results obtained from a plurality of sets of microphones. For example, the results
obtained from the set of the microphones 1a and 1b and the set of the microphones 1a and 1c in
FIG. 1 may be combined to adopt a common result.
[0050]
As shown in FIG. 3, the information on the direction of the emergency vehicle calculated by the
detection processing unit 22 is input to the display processing unit 25. The display processing
unit 25 generates screen information indicating the direction of the emergency vehicle based on
the information. FIG. 5 shows an example of the display screen. FIG. 5 (a) is an example in which
the direction is indicated by an illustration, and FIG. 5 (b) is an example indicated by character
information. Further, FIG. 5C is an example in which an illustration showing the direction of the
emergency vehicle is superimposed and displayed on a screen on which a situation around the
vehicle is photographed by a camera and displayed on a monitor. By displaying in this manner, it
is possible to show the direction of the emergency vehicle intelligibly to the user and to improve
the convenience of the user.
[0051]
As described above, according to the present invention, it is possible to detect an emergency
vehicle using a plurality of cameras connected via a network, which increases the cost and
03-05-2019
13
weight of the emergency vehicle detection device. It is possible to reduce and realize.
[0052]
Although four microphones are shown in the above embodiment, the number of microphones is
not limited to four and may be three or more.
[0053]
In addition, the detection processing unit 22 may estimate the distance of the emergency vehicle
based on the strength of the siren sound, and may notify the information on the screen together.
As for the siren sound of the emergency car, the standard of the sound volume is defined by the
related rules and the like.
Therefore, if the relationship between the distance from the emergency vehicle and the volume to
be received is previously examined based on the reference, the distance to the emergency vehicle
can be estimated from the strength of the received siren sound. This can further improve the
convenience for the user.
[0054]
In addition, the detection processing unit 22 detects the frequency shift of the siren sound due to
the Doppler effect, and based on that, it is determined whether the emergency vehicle is
approaching or away, and the information is also notified on the screen. It is good. As for the
siren sound of the emergency vehicle, the reference such as timbre and frequency is defined by
the relevant rules and the like. In general, assuming that the frequency of the sound source is fs,
the observed frequency is f1, the sound speed is v, the speed at which the sound source moves
toward the microphone is vs, and the speed at which the microphone (vehicle equipped) moves
toward the sound source is v1. Since the Doppler effect is expressed as f1 = f · (v−v1) / (v−vs)
(Equation 4), it is possible to obtain the velocity vs of moving the sound source toward the
microphone from f, f1 and v1 .
[0055]
03-05-2019
14
The velocity v1 at which the microphone moves toward the sound source is v1 = v0 · cosφ
(equation 5) from the angle (φ) between the direction of the sound source determined by the
above method and the direction in which the vehicle travels and the velocity v0 traveled by the
vehicle expressed. Thereby, it is possible to further improve the convenience for the user.
[0056]
Further, although an example in which the information on the direction of arrival of the
emergency vehicle and the like is shown by video is shown, it may be notified by voice instead or
in combination.
[0057]
Further, in the above, the case of detecting an emergency car has been described as an example,
but the present invention is not limited to this. For example, another sound such as a warning
sound of a crossing may be detected.
[0058]
A second embodiment of the present invention will now be described.
Descriptions of parts common to the first embodiment will be omitted.
[0059]
FIG. 6 shows a camera according to this embodiment.
The difference from the first embodiment is that the feature information analysis unit 17 is
provided. The siren sound input to each of the microphones of the cameras 1a to 1d as shown in
FIG. 4 is analyzed by the feature information analysis unit 17, and the voice pattern specific to
the emergency vehicle and the change point of high sound and low sound are detected.
Furthermore, information on the time at which each change point has occurred is extracted, and
the time information on the change point is transmitted to the ECU 2 via the voice encoding unit
03-05-2019
15
12 and the network processing unit 15.
[0060]
The detection processing in the feature information analysis unit 17 will be described in more
detail with reference to FIG. FIG. 17 is an example of a diagram showing details of the feature
information analysis unit 17. The feature information analysis unit 17 includes an audio pattern
storage unit 171, a pattern comparison unit 172, and a pattern change point detection unit 173.
[0061]
The voice pattern storage unit 171 records a voice pattern specific to an emergency vehicle (for
example, a high sound or a low sound of an ambulance siren).
[0062]
The pattern comparison unit 172 acquires the voice pattern stored in the voice pattern storage
unit 171, compares the input digital voice signal with the voice pattern stored in the voice
pattern storage unit 171, and compares the voice pattern. The result (matching / mismatching, in
the case of matching, which pattern matched) is output to the pattern change point detection unit
173.
[0063]
The pattern change point detection unit 173 receives the time information output from the time
synchronization processing unit and changes the comparison result input from the pattern
comparison unit 172 (that is, change from match to mismatch, change from mismatch to match)
Or, when a change in the pattern in the case of a match occurs, the time information at that time
is output to the speech encoding unit 12 as change point time information.
In this way, it is possible to detect the voice pattern specific to the emergency vehicle and the
change points of the high sound and the low sound, and further to extract information of the
time when each change point occurs.
[0064]
03-05-2019
16
The detection processing unit 22 of the ECU 2 receives the time information of the change point
from each of the cameras 1a to 1d, and calculates the direction of the emergency vehicle based
on the information.
[0065]
As described above, also in the case of the present embodiment, the same effect as that of the
first embodiment can be obtained.
Further, in the case of the present embodiment, since the analysis of the voice pattern is
performed on the camera side, it is possible to reduce the processing on the ECU 2 side.
Furthermore, in the case of the present embodiment, only the time information of the change
point needs to be transmitted instead of the voice information acquired by the microphone 11, so
that the communication amount between the camera 1 and the ECU 2 can be reduced.
[0066]
1a to 1d Camera 2 Electronic Control Unit (ECU) 3a to 3d Network 4 Own Vehicle 5 Emergency
Vehicle 11 Audio Input Unit 12 Audio Encoding Unit 13 Video Input Unit 14 Video Encoding
Unit 15 Network Processing Unit 16 Time Synchronization Processing Unit 17 Features
Information analysis units 21a to 21d voice decoding unit 22 detection processing unit 23 time
synchronization processing unit 24 video decoding unit 25 display processing unit 26 monitor
27 network processing unit
03-05-2019
17