close

Вход

Забыли?

вход по аккаунту

JP2010232862

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010232862
An audio processing apparatus, an audio processing method, and a program are provided that
preferably reduce noise of a sound signal input from a microphone in a plurality of
environments. A voice processing apparatus 100 includes a position pattern detection unit 102
that detects an index of relative position between a sound source and a plurality of microphones,
and voice processing for sound signals input from each of the microphones in the relative
position. It has the process determination part 103 determined based on a parameter ¦ index, and
the signal processing part 104 which performs the said audio processing determined with
respect to the said sound signal. [Selected figure] Figure 1
Voice processing apparatus, voice processing method, and program
[0001]
The present invention relates to a voice processing apparatus, and more particularly to a voice
processing apparatus capable of obtaining a target sound with high SNR by classifying positions
of a sound source and a microphone into N patterns and performing processing corresponding to
each position pattern.
[0002]
Conventionally, by collecting voice using a plurality of microphones called a microphone array
and performing signal processing on these, a technique for estimating a target sound source
direction and suppressing noise and extracting a signal from a target sound source at a high SNR
It has been known.
04-05-2019
1
[0003]
For example, in Non-Patent Document 1, a so-called delay is used in which a target sound is
received by a microphone array, the arrival time difference of the target sound to each
microphone is corrected for each signal received by each microphone, and then those signals are
added. By summing, it is shown how to obtain a signal that emphasizes the target sound.
The invention disclosed in Non-Patent Document 1 is based on the assumption that a signal in
which a target sound and noise are mixed is input to any microphone.
[0004]
Also, as a method of using a plurality of microphones, using two microphones, one is a noise
collecting microphone, the other is a microphone for collecting a target sound mixed with noise,
and noise is generated from the signal of the microphone collecting the target sound. There is
known a method of reducing noise by subtracting the output of a collecting noise microphone
and extracting a target sound more clearly.
[0005]
As an example, in Japanese Patent Laid-Open No. 2004-226656 (Patent Document 1), using two
microphones, the distance between the lip and the reference microphone selected in advance is
the signal level of the reference microphone and the other microphone. A technique such as a
speaker distance detection device is disclosed that calculates from the difference of and adjusts
the amount of subtraction when subtracting the signal of the other microphone from the signal
of the reference microphone according to the distance.
[0006]
The invention disclosed in Patent Document 1 is based on the premise that one microphone
receives a signal in which a target sound and noise are mixed, but the other microphone has only
noise or a relatively small amount even if the target sound is mixed. ing.
[0007]
Unexamined-Japanese-Patent No. 2004-226656
[0008]
04-05-2019
2
J.L.Flanagan,J.D.Johnston,R.Zahn and
G.W.Elko, Computer−steered microphone arrays
for sound transduction in large
rooms, J.Acoust.
Soc.
Am.,vol.78,no.5,pp.1508−1518,
1985
[0009]
However, when processing sound signals generated using a plurality of microphones, an
environment in which the target sound and noise are mixed in any of the microphones, and the
target sound and noise in one of the microphones cause noise in the other microphones. If the
same processing is performed in an environment where the sound mainly enters, the target
sound may not be processed properly.
This is not taken into consideration in the inventions disclosed in Non-Patent Document 1 and
Patent Document 1 above.
[0010]
The present invention has been made to solve these problems in view of the above points, and it
is an object of the present invention to preferably reduce noise of a sound signal input from a
microphone in a plurality of environments.
[0011]
In order to solve the problems described above and to achieve the object, an aspect of the
present invention provides a position pattern detection unit that detects an index of relative
position between a sound source and a plurality of microphones, and input from each of the
04-05-2019
3
plurality of microphones And a signal processing unit that executes the determined audio
processing on the sound signal. The processing determination unit determines the audio
processing on the sound signal based on the index of the relative position. .
[0012]
According to the present invention, it is possible to suitably reduce noise of a sound signal input
from a microphone in a plurality of environments.
[0013]
FIG. 1 is a block diagram showing an audio processing apparatus according to a first
embodiment.
It is a figure which shows the flowchart which shows the process in the speech processing unit
concerning a 1st embodiment.
It is a figure (the 1) showing a position pattern.
It is a figure (the 2) which shows a position pattern.
It is a figure (the 3) which shows a position pattern. It is a figure which shows the example of a
speech processing unit at the time of being classified with position pattern (2). It is a figure
which shows the example of a speech processing unit at the time of being classified with position
pattern (1). It is a block diagram showing the speech processing unit concerning a 2nd
embodiment. It is a figure which shows the flowchart which shows operation ¦ movement of the
speech processing unit concerning 2nd Embodiment. It is a block diagram showing the speech
processing unit concerning a 3rd embodiment. It is a figure showing the flow chart which shows
operation of the speech processing unit concerning a 3rd embodiment. It is a figure explaining
the example which provided the angle sensor in the mobile telephone. It is an explanatory view
showing the hardware constitutions of the speech processing unit concerning this embodiment.
[0014]
04-05-2019
4
First Embodiment FIG. 1 is a block diagram showing an audio processing apparatus according to
a first embodiment. The speech processing apparatus 100 according to the first embodiment
collates the input sound signal with the position pattern of the sound source held in advance, and
executes speech processing corresponding to each position pattern. The voice processing
apparatus 100 includes a sound input unit 101, a position pattern detection unit 102, a process
determination unit 103, a signal processing unit 104, and a pattern database (hereinafter
referred to as a pattern DB . ) 109).
[0015]
The sound input unit 101 converts input sounds from a plurality of microphones into digitized
sound signals, and detects the start and end of sound. The position pattern detection unit 102
detects an index of the position pattern of the sound source and the microphone from the sound
signal. The process determining unit 103 determines the process to be performed on the sound
signal by comparing the index of the position pattern with the position pattern held in advance.
The signal processing unit 104 performs processing in accordance with the determination of the
processing determination unit 103.
[0016]
The pattern DB 109 holds information relating to position patterns of a plurality of microphones
and sound sources. The position pattern represents the relative position (relative position) of the
plurality of microphones and the sound source. In the pattern DB 109, indexes of patterns of
sound signals input from a plurality of microphones are stored in association with each position
pattern. The pattern stored in the pattern DB 109 is called by the process determination unit 103
and is compared with the index detected by the position pattern detection unit 102.
[0017]
FIG. 2 is a flowchart showing processing in the speech processing apparatus 100 according to
the first embodiment. Here, an example in which position patterns of a sound source and a
microphone are acquired using two microphones will be described. Let two microphones be a
microphone 1 and a microphone 2, respectively.
04-05-2019
5
[0018]
In step S101, the sound input unit 101 converts the sound input to the microphone from an
analog signal to a digital signal using an AD converter.
[0019]
In step S102, in order to detect the start and end of the speech to be subjected to noise
processing, speech section detection is performed using, for example, the number of times of
zero crossing.
This voice section detection is performed using the microphone outputs of the microphone 1 and
the microphone 2.
[0020]
More specifically, for the sound signal acquired by the microphone 1 and the sound signal
acquired by the microphone 2, the number of zero crossings is calculated, and if it is determined
that the voice section is detected by either of the microphones, the detection from the detection
point Treat the sound as speech.
[0021]
Here, the start point information detected by the microphone 1 and the microphone 2 is held as
S1 and S2, respectively.
The processing of FIG. 2 is ended after the end of the voice is determined in the output
microphone where the voice start end is detected the latest. The section detection method is not
limited to this, and various section detection methods can be applied. Also, for example, a section
detection method specific to a plurality of microphones may be applied.
[0022]
04-05-2019
6
In step S103, the position pattern detection unit 102 detects the index of the position pattern of
the sound source and the microphone using the audio signal detected by the sound input unit
101. The index uses, for example, the time difference of voice arrival time between microphones
and the signal level ratio.
[0023]
More specifically, for example, in the case of using the microphone 1 as a reference, the sound
source is closer to the microphone 1 as the difference in time of voice arrival to the microphone
2 becomes larger. The sound source is closer to the microphone 1 as the signal level on the
microphone 1 side is higher than that of the microphone 2.
[0024]
When calculating these two indices, the initial voice section of the target voice is used. The initial
sound is a sound of a certain section after the sound is detected. The voice arrival time difference
between microphones is calculated by using cross correlation. The start time of the microphone
at which the voice start end is detected the earliest is detected as time 0, the voice signal input to
the microphone 1 is x1, the correlation calculation section of x1 is time ts to time te (S1 <ts <te)
What normalized the waveform in a section by power is set to x1 '. Also, assuming that the audio
signal input to the microphone 2 is x2, the time T of the section [0-T] for obtaining the audio
arrival time difference is at least ts from the detection of the start of the microphone at which the
audio start is detected the latest. Set to take more than the interval of. For example, when S1 <S2,
T is set according to the following equation (1).
[0025]
[0026]
In the equation (1), the output speech arrival time difference td can be expressed by the equation
(5) using the following equations (2) to (4).
[0027]
04-05-2019
7
The arrival time difference td is one of the position pattern determination indexes of the sound
source and each microphone.
In the case of two microphones, if the arrival time difference td from the microphone 1 to the
microphone 2 is determined, the arrival time difference from the microphone 2 to the
microphone 1 can be obtained by reversing the positive and negative signs.
The signal level ratio dd between the microphone 1 and the microphone 2 can be obtained by the
following equation using td obtained earlier.
[0028]
The signal level ratio dd in equation (6) is one of the position pattern determination indexes of
the sound source and each microphone. The position pattern determination index is not limited
to those described above, and various criteria can be applied. For example, the maximum value of
correlation calculated previously is included in this. If the maximum correlation value is higher
than a certain reference, the sound source and the two microphones are equidistant, and if the
maximum correlation value is lower than a certain reference, the sound source is near to either
one microphone and the sound source to one microphone It is possible to derive a position
pattern that is far. The maximum correlation value rmax is calculated by the following equation
(7).
[0029]
In step S104, the processing determination unit 103 uses the index for determining the position
pattern calculated by the position pattern detection unit 102, and checks which of the three
position patterns (1) to (3) below belongs to: Do. 3 to 5 show three position patterns.
[0030]
(1) The sound source approaches the microphone 1 (FIG. 3). (2) The sound source approaches
the microphone 2 (FIG. 4). (3) The sound source is not close to either of the microphones (FIG. 5).
04-05-2019
8
[0031]
Assuming that the arrival time difference determination threshold tthre and the signal level
difference determination thresholds ddthre1 and ddthre2 are constants (where tthre> 0 and
ddthre1> ddthre2> 0), when td> 0, the following equations (8) and (9) The position pattern is
classified into (1) when
[0032]
Further, when td <= 0, when the following equations (10) and (11) hold, the position pattern is
classified into (2).
If it is not classified into either of (1) and (2), the position pattern is classified into (3).
[0033]
In step S105, the signal processing unit 104 performs predetermined processing in accordance
with the classified position pattern. 6 and 7 are diagrams showing an operation at the time of
signal processing switching in the signal processing unit 104. FIG. FIG. 6 is a diagram showing an
example of a speech processing device when classified as position pattern (2), and FIG. 7 shows
an example of a speech processing device when classified as position pattern (1) FIG.
[0034]
Hereinafter, switching of signal processing will be described.
[0035]
When the position pattern is classified into (1), the voice input to the microphone 1 is set as the
target voice, and the sound input to the microphone 2 is processed as noise.
Specifically, assuming that α is a constant (0 ≦ α), the output speech o of the speech
04-05-2019
9
processing apparatus 100 can be expressed by the following equation (12) using the delay time
td calculated earlier.
[0036]
At this time, the signal may be converted to the frequency domain to perform spectral
subtraction. Alternatively, it is also possible to remove noise components from x1 using x2 as a
reference signal using an adaptive filter that is often used in an echo canceller or the like.
[0037]
When the position pattern is classified into (2), the voice input to the microphone 2 is set as the
target voice, and the sound input to the microphone 1 is processed as noise. The specific process
is the same as when the microphone 1 and the microphone 2 of the process in the case of
position pattern (1) are interchanged. At this time, the output speech o is expressed by the
following equation (13).
[0038]
As described above, the processing when the sound source is classified as a position pattern in
which the sound source approaches a specific microphone can be considered in other ways. For
example, α may be a function of the maximum correlation value to adjust the subtraction
amount. At this time, the value of α can be controlled by the following equation (14) by a linear
function with a and b as constants.
[0039]
By expressing α as in equation (14), the amount of subtraction can be reduced when the
maximum correlation value is high, and the amount of subtraction can be increased when the
maximum correlation value is low.
[0040]
04-05-2019
10
When the position pattern is classified into (3), the delay-and-sum array process is performed
using both the voices input to the microphone 1 and the microphone 2.
When a delay and sum array is used, the output speech o is expressed by the following equation
(15).
[0041]
Note that the array processing adaptation unit is not limited to the above method, and for
example, by applying Griffiths-Jim type array processing, two microphones form a blind spot of
noise for a certain angle, and the voice of that range is generated. It is possible to extract the
target voice o with high SNR. Returning to FIG. 2, in step S106, the end is detected by the sound
input unit 101, and the end of the audio processing is ended.
[0042]
Although the embodiments of the present invention have been described by taking two
microphones as an example, it is not essential that two microphones are used in practicing the
present invention, and the present invention may be applied to three or more microphones. It is
also possible to expand. In the case of three microphones, assuming that the microphones are
microphone 1, microphone 2 and microphone 3, the following seven position patterns are
prepared.
[0043]
(1 ') The microphone 1 is approaching. (2 ') The microphone 2 is approaching. (3 ') The
microphone 3 is approaching. (4 ') The microphones 1 and 2 are approaching. (5 ') The
microphones 2 and 3 are approaching. (6 ') The microphones 1 and 3 are approaching. (7 ') Not
approaching any microphones.
[0044]
04-05-2019
11
For the input sound signal, it is determined which position pattern is to be classified using the
above-mentioned arrival time difference and signal level difference. More specifically, the
difference in arrival time from the microphone 1 to the microphone 2 is td12, and is calculated
by the equations (2) to (5). Similarly, the arrival time differences among the other microphones
are also calculated as td13, td21, td23, td31, and td32, respectively. Further, the signal level
difference between the microphone 1 and the microphone 2 is dd12, and is calculated by the
equation (6). Similarly, the signal level differences among the other microphones are calculated
as dd13, dd21, dd21, dd23, dd31, dd32, respectively.
[0045]
At this time, the arrival time difference between the microphone n1 closest to the sound source
and the other two microphones becomes a positive value. The arrival time difference of the
second microphone n2 closest to the sound source is positive with respect to the remaining one
microphone, and is negative with respect to the microphone n1. The microphone n3 farthest
from the sound source has a negative arrival time difference with the other two microphones.
Therefore, based on this characteristic, it is first determined which microphone will be the
microphone n1, the microphone n2 and the microphone n3.
[0046]
The arrival time difference between the microphone n1 and the microphone n2 is tdn1 n2, the
threshold for the arrival time difference is tdthre1, the signal level difference between the
microphone n1 and the microphone n2 is ddn1 n2, the threshold for the signal level difference is
ddthre1, and the following equation (16) (17) When the microphone 1 is the microphone n1 (1
′), the microphone 2 is the microphone n1 (2 ′), and the microphone 3 is the microphone n1
(3 ′) Classify into patterns.
[0047]
Next, the arrival time difference between the microphone n1 and the microphone n2 is tdn1n2,
the threshold for the arrival time difference is tdthre1, the arrival time difference between the
microphone n2 and the microphone n3 is tdn2n3, the threshold for the arrival time difference is
tdthre2, the microphone n1 and the microphone n2 The signal level difference is ddn1n2, the
threshold of the signal level difference is ddthre1, the signal level difference between the
microphone n2 and the microphone n3 is ddn2n3, the threshold of the signal level difference is
ddthre2, and the following equations (18) to (21) If the microphone 3 is a microphone n3 (4 '),
the microphone 1 is a microphone n3 (5'), and the microphone 2 is a microphone n3 (6 ').
04-05-2019
12
[0048]
Also, if it is not classified into any position pattern from (1 ') to (6'), the distance of the sound
source to all the microphones is considered to be far, and is classified into the position pattern (7
').
[0049]
After being classified in this way, processing is switched according to each pattern.
More specifically, in the case of (1 '), (2') and (3 '), the processing according to equation (22) is
performed to subtract noise from the target sound of the microphone close to the sound source.
Here, α1 and α2 are constants, and α1 ≧ 0 and α2 ≧ 0.
[0050]
Moreover, in the case of (4 '), (5'), (6 '), the process by Formula (23) is performed.
As a result, the two microphones near the sound source are voice-emphasized in the delay sum
array, and the output of the microphone farthest from the sound source is used for noise
subtraction.
[0051]
Further, in the case of (7 '), the process is performed according to equation (24). As a result,
speech is emphasized in the delay and sum array using all the microphones. Thus, it can be easily
expanded to three microphones.
[0052]
04-05-2019
13
Also, three or more microphones may be used to estimate the sound source position in a threedimensional space. When the sound source position can be estimated, the distance from each
microphone to the sound source can be calculated. Let the distances between the microphone
and the sound source obtained by this processing be ld1, ld2, and ld3, respectively.
[0053]
At this time, when the distance threshold ldthre is a constant and the following expression (25) is
satisfied, the distance is classified into (1 '). Similarly, classification into (2 ') to (7') position
patterns can also be realized.
[0054]
Second Embodiment FIG. 8 is a block diagram showing an audio processing apparatus according
to a second embodiment. The speech processing apparatus 100a according to the second
embodiment selects and performs processing corresponding to each position of the sound source
acquired by the position sensor on the sound signal. The voice processing device 100a includes a
sound input unit 101, a position pattern detection unit 102a, a process determination unit 103a,
a signal processing unit 104, and a pattern DB 109a.
[0055]
The sound input unit 101 detects the start and end of voice from the input sound. The position
pattern detection unit 102a detects an index of the position pattern of the sound source and the
microphone based on the signal from the position sensor. The process determination unit 103a
determines the process to be performed by collating the index of the position pattern with the
position pattern held in advance. The signal processing unit 104 performs processing in
accordance with the determination of the processing determination unit.
[0056]
The pattern DB 109a holds position patterns of the sound source and the microphone. In the
04-05-2019
14
pattern DB 109a, the index of the signal input from the position sensor is associated with each
position pattern of the relative position between the sound source and the microphone. The
position pattern stored in the pattern DB 109a is read from the position pattern detection unit
102a, and is collated with the input from the position sensor.
[0057]
FIG. 9 is a flowchart showing the operation of the speech processing apparatus according to the
second embodiment. Description will be made using an example in which two microphones
(microphone 1 and microphone 2) are used to process a target voice. In addition, it is not
essential that there are two microphones, and implementation is possible if there are two or more
microphones. In addition, it is not an essential element that the target sound is voice. The
operation of the present embodiment is the same as that of the first embodiment except for the
operations of the position sensor, the position pattern detection unit 102a, and the processing
determination unit 103a, and the same operation as that of the first embodiment will not be
described. .
[0058]
In step S203, the measurement result of the sensor is used as a position pattern determination
index by the output from the distance sensor attached near each microphone. Specifically, the
distance sensor is an infrared sensor or the like that can measure the distance from each
microphone to the target object that hits the sound source, and the distance from each
microphone to the sound source is measured. Two microphones are used, and distances from the
microphone 1 and the microphone 2 to the sound source are ld1 and ld2, respectively.
[0059]
In step S204, the process determination unit 103a uses the position pattern determination index
calculated by the position pattern detection unit 102 to classify the three position patterns to
which the process belongs. Three patterns are shown below. (1A) The sound source is
approaching the microphone 1. (2A) The sound source is approaching the microphone 2. (3A)
The sound source is not close to either of the microphones.
04-05-2019
15
[0060]
At this time, assuming that the distance threshold ldthre is a constant, it is classified into (1A) if
the following equation (26) holds, and it is classified into (2A) if the following equation (27)
holds.
[0061]
If none of the above, it is classified into the position pattern (3A).
The processing in step S205 and step S206 after position pattern classification is the same as
step S105 and step S106 in FIG. 2, and thus the description thereof is omitted here.
[0062]
Third Embodiment FIG. 10 is a block diagram showing a speech processing apparatus according
to a third embodiment. The voice processing apparatus 100b according to the third embodiment
detects a position pattern of a sound source based on an input from a position sensor and a
sound signal, and executes voice processing corresponding to each position pattern.
[0063]
The voice processing device 100b includes a sound input unit 101, a position pattern detection
unit 102b, a process determination unit 103b, a signal processing unit 104, and a pattern DB
109b. The sound input unit 101 converts the input sound from the microphone into a digitized
sound signal, and detects the start and end of sound. The position pattern detection unit 102 b
detects an index of the position pattern of the sound source and the microphone from the input
from the position sensor and the voice. The process determining unit 103b determines the
process to be performed by collating the index of the position pattern with the position pattern
held in advance. The signal processing unit 104 performs processing in accordance with the
determination of the processing determination unit.
[0064]
04-05-2019
16
The pattern DB 109 b holds positional patterns of the microphone and the sound source. In the
pattern DB 109b, combinations of the index of the signal input from the position sensor and the
index of the sound signal are associated with each position pattern of the relative position
between the microphone and the sound source. The pattern stored in the pattern DB 109 b is
called from the position pattern detection unit 102 b, and is collated with the sound signal
acquired by the sound input unit 101 and the input from the position sensor.
[0065]
FIG. 11 is a flowchart showing the operation of the speech processing apparatus according to the
third embodiment. Here, an example in which two microphones (microphone 1 and microphone
2) are used to process a target voice will be described. In addition, it is not essential that there
are two microphones, and two or more microphones are sufficient. In addition, it is not an
essential element that the target sound is voice. The operation of the present embodiment is the
same as that of the second embodiment except that the operations of the position pattern
detection unit 102b and the process determination unit 103b are different from those of the
second embodiment, and therefore the same operation part is , I will omit the explanation.
[0066]
The voice processing device 100b may use, for example, a distance sensor as a position sensor.
In step S303, the position pattern detection unit 102b acquires the measurement result by the
distance sensor and the voice information as a position pattern determination index.
[0067]
More specifically, an infrared sensor or the like is used as a position sensor to measure the
distance from the device to the sound source. Further, two microphones for acquiring a sound
signal are used, and a distance from the sound processing apparatus 100b acquired using a
sensor to a sound source is ld. Further, as in the first embodiment, the voice arrival time
difference td and the signal level ratio dd are also determined.
[0068]
04-05-2019
17
In step S304, the process determining unit 103b uses the position pattern determination index
calculated by the position pattern detecting unit to classify the three position patterns to which
the process belongs. Three patterns are shown below. (1B) The sound source is approaching the
microphone 1. (2B) The sound source is approaching the microphone 2. (3B) The sound source is
not close to either of the microphones.
[0069]
The arrival time difference determination threshold value tthre, the signal level difference
determination threshold values ddthre1 and ddthre2, and the distance determination threshold
value ldthre are respectively set to constants (where tthre> 0, ddthre1> ddthre2> 0, ldthre> 0).
Here, in the case of td> 0, the position pattern is classified into (1B) when all the following
expressions (28) hold.
[0070]
Further, in the case of td <= 0, the position pattern is classified into (2B) when the following
equation (29) is all satisfied. Also, position patterns that are neither (1B) nor (2B) are classified
into (3). Processing similar to (1), (2) and (3) of the first embodiment is performed for each of the
three position patterns.
[0071]
The output from the angle sensor can also be used as a position pattern determination index. FIG.
12 is a diagram for explaining an example in which an angle sensor is provided in a mobile
phone. In the example of FIG. 12, the mobile phone is used sideways at the time of operation and
vertically at the time of call. In such a device, the angle is detected using an angle sensor attached
to the device body. An example of the detected angle θ is shown in FIG. The angle θ is, for
example, 0 degrees where the line connecting two microphones and the ground are horizontal.
Also, as in the first embodiment, the voice arrival time difference td and the signal level ratio dd
are also determined.
04-05-2019
18
[0072]
In the example of FIG. 12, the classification of the position pattern into (1B), (2B), and (3B) is
carried out by using arrival time difference determination threshold tthre, signal level difference
determination threshold ddthre1, ddthre2 and angle determination threshold θthre as constants
(but If tthre> 0, ddthre1> ddthre2> 0, and θthre ≧ 0), the following equation (30) and the
following equation (31) are used.
[0073]
When td> 0, the position pattern is classified into (1B) when the following equation (30) holds.
[0074]
In the case of td <= 0, the position pattern is classified into (2B) when the following equation (31)
holds.
If the position pattern is neither (1B) nor (2B), it is classified as (3B).
[0075]
Next, the hardware configuration of the speech processing apparatus according to the present
embodiment will be described with reference to FIG.
FIG. 13 is an explanatory view showing a hardware configuration of the speech processing
apparatus according to the present embodiment.
[0076]
The voice processing apparatus according to the present embodiment is connected to a control
device such as a central processing unit (CPU) 51, a storage device such as a read only memory
(ROM) 52 or a random access memory (RAM) 53, and a network. A communication I / F 54 for
performing communication and a bus 61 for connecting each unit are provided.
[0077]
04-05-2019
19
The program executed by the voice processing apparatus according to the present embodiment
is provided by being incorporated in advance in the ROM 52 or the like.
[0078]
The program executed by the voice processing apparatus according to the present embodiment
is a file in an installable format or an executable format, and is a compact disk read only memory
(CD-ROM), a flexible disk (FD), a CD-R (compact) It may be configured to be provided by being
recorded on a computer readable recording medium such as a Disk Recordable) or a DVD (Digital
Versatile Disk).
[0079]
Furthermore, the program executed by the voice processing apparatus according to the present
embodiment may be stored on a computer connected to a network such as the Internet and
provided by being downloaded via the network.
Further, the program executed by the voice processing apparatus according to the present
embodiment may be provided or distributed via a network such as the Internet.
[0080]
The program executed by the voice processing apparatus according to the present embodiment
has a module configuration including the above-described units, and as the actual hardware, the
CPU 51 reads out the program from the ROM 52 and executes the program. It is loaded on the
main storage device, and each part is generated on the main storage device.
[0081]
The present invention is not limited to the above embodiment as it is, and at the implementation
stage, the constituent elements can be modified and embodied without departing from the scope
of the invention.
In addition, various inventions can be formed by appropriate combinations of a plurality of
constituent elements disclosed in the above embodiment.
04-05-2019
20
For example, some components may be deleted from all the components shown in the
embodiment.
Furthermore, components in different embodiments may be combined as appropriate.
[0082]
As described above, the voice processing apparatus according to the embodiment of the present
invention is useful for noise removal, and is particularly suitable for processing a sound signal
input from a microphone array.
[0083]
1, 2, 3 microphone 100, 100a, 100b speech processing device 101 sound input unit 102, 102a,
102b position pattern detection unit 103, 103a, 103b processing determination unit 104 signal
processing unit
04-05-2019
21
1/--страниц
Пожаловаться на содержимое документа