close

Вход

Забыли?

вход по аккаунту

JP2006005807

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2006005807
PROBLEM TO BE SOLVED: To provide an acoustic signal processing method, an acoustic signal
processing device, and an acoustic signal, which can independently increase / decrease
predetermined sound components of an inharmonic structure included in an acoustic signal
without affecting other sound components. Provided is a processing system and a computer
program. SOLUTION: A predetermined sound component of an inharmonic structure included in
an acoustic signal is extracted, and the extracted predetermined sound component is increased or
decreased. At that time, the spectrum of the acoustic signal is calculated by frequency analysis,
and the spectrum corresponding to the predetermined sound component of the nonharmonic
structure is extracted and increased or decreased. Further, the extraction of the predetermined
tonal component of the inharmonic structure is performed based on the tonal component of the
template stored in advance. At this time, the sound component of the template is corrected such
that the difference between the extracted sound component and the sound component of the
template is equal to or less than a predetermined value. [Selected figure] Figure 1
Acoustic signal processing method, acoustic signal processing device, acoustic signal processing
system, and computer program
[0001]
The present invention relates to an acoustic signal processing method for increasing or
decreasing a predetermined sound component of an inharmonic structure included in an acoustic
signal, an acoustic signal processing device, an acoustic signal processing system, and an
inharmonic structure included in an acoustic signal. The present invention relates to a computer
program that causes a computer to increase or decrease a predetermined sound component.
08-05-2019
1
[0002]
A graphic equalizer (hereinafter, referred to as an equalizer) is widely used as means for
adjusting an audio signal such as music output from a speaker (see, for example, Patent
Document 1).
By using an equalizer, for example, an acoustic signal reproduced from a CD (Compact Disk) can
be frequency analyzed to increase or decrease the spectrum of a specific frequency region. For
example, when emphasizing the bass drum sound included in the sound signal output from the
speaker, the spectrum of the low frequency region is increased. Unexamined-Japanese-Patent No.
5-175773
[0003]
However, music performance is often performed using a plurality of musical instruments, and an
acoustic signal often includes a plurality of musical instrument sounds. Therefore, when the
spectrum of the specific frequency region of the acoustic signal is increased or decreased, a
plurality of musical instrument sounds having the spectrum in the specific frequency region
often increase or decrease. For example, if the spectrum of the low frequency region is increased
to emphasize the bass drum, not only the bass drum sound increases, but also the sound of other
instruments having a spectrum in the low frequency region such as bass guitar sound increases.
It will be done.
[0004]
As described above, since the equalizer increases or decreases the spectrum of the specific
frequency region of the acoustic signal, all the instrument sounds having the spectrum in the
specific frequency region are increased or decreased. Therefore, there is a problem that it is not
possible to increase or decrease a specific musical instrument sound without affecting other
musical instrument sounds, such as increasing or decreasing bass drum sounds without affecting
the bass guitar sound.
[0005]
The present invention has been made in view of such circumstances, and extracts the
08-05-2019
2
predetermined sound component included in the acoustic signal by extracting and increasing or
decreasing the predetermined sound component of the inharmonic structure included in the
acoustic signal. It is an object of the present invention to provide an acoustic signal processing
method, an acoustic signal processing device, and a computer program, which can be increased
and decreased independently without affecting other sound components.
[0006]
Further, according to the present invention, there is provided an acoustic signal processing
method capable of extracting an inharmonic structure sound such as a drum sound from an
acoustic signal based on a spectrum distribution by calculating a spectrum of the acoustic signal
by frequency analysis. Another object is to provide a signal processing device and a computer
program.
[0007]
Further, the present invention corrects the sound component of the template so that the
difference between the extracted sound component and the sound component of the template is
equal to or less than a predetermined value. Another object of the present invention is to provide
an acoustic signal processing method, an acoustic signal processing device, and a computer
program that can improve the above.
[0008]
Further, according to the present invention, a predetermined number of sound components are
selected from the smaller differences between the extracted sound components and the sound
components of the template, and the sound components of the template are the median value of
the selected predetermined number of sound components. Another object of the present
invention is to provide an acoustic signal processing method, an acoustic signal processing
device, and a computer program that can obtain a template in which the spectrum of a sound
component that is not an inharmonic structure is suppressed by updating to.
[0009]
Further, according to the present invention, at the time of the first correction of the sound
component of the template, a large difference is calculated when the two are similar by
quantizing the extracted sound component and the sound component of the template. Another
object of the present invention is to provide an acoustic signal processing method, an acoustic
signal processing device, and a computer program that can be suppressed.
08-05-2019
3
[0010]
Further, according to the present invention, the volume of the extracted predetermined sound
component is independently adjusted separately from the volume of the sound signal by
increasing or decreasing the extracted predetermined sound component according to the
received increase or decrease amount. Another object of the present invention is to provide an
audio signal processing method, an audio signal processing device, and a computer program that
can be used.
[0011]
Further, according to the present invention, an acoustic signal processing method, an acoustic
signal processing device, and the like capable of efficiently distributing loads by performing
extraction processing of a predetermined sound component of an inharmonic structure and
increase / decrease processing by different devices. Another object is to provide an acoustic
signal processing system and a computer program.
[0012]
An acoustic signal processing method according to a first aspect of the invention includes the
steps of extracting a predetermined tonal component of an inharmonic structure included in the
acoustic signal, and increasing or decreasing the extracted predetermined tonal component. .
[0013]
An acoustic signal processing method according to a second aspect of the present invention is
the audio signal processing method according to the first aspect, comprising the step of
calculating the spectrum of the acoustic signal by frequency analysis, and the extracting step
comprises a spectrum corresponding to a predetermined sound component of the inharmonic
structure. To extract.
[0014]
In the acoustic signal processing method according to the third aspect of the present invention,
in the first or second aspect, the extraction of the predetermined sound component of the
nonharmonic structure is performed with reference to the sound component of the template
stored in advance. And correcting the sound component of the template so that the difference
between the extracted sound component and the sound component of the template is equal to or
less than a predetermined value.
[0015]
08-05-2019
4
An acoustic signal processing method according to a fourth aspect of the present invention is an
acoustic signal processing method for extracting a predetermined tonal component of an
inharmonic structure included in the acoustic signal with reference to a tonal component of a
template stored in advance. And correcting the sound component of the template so that the
difference between the extracted sound component and the sound component of the template is
equal to or less than a predetermined value.
[0016]
An acoustic signal processing method according to a fifth aspect of the present invention is the
acoustic signal processing method according to the third or fourth aspect, wherein the correcting
step calculates a difference between each of the extracted sound components and the sound
component of the template when the extracted sound components are plural. And selecting the
predetermined number of sound components from the smaller one of the calculated differences,
and updating the sound components of the template to the median value of the predetermined
number of selected sound components. Do.
[0017]
An acoustic signal processing method according to a sixth aspect of the present invention is the
acoustic signal processing method according to the fifth aspect, further comprising the step of
quantizing the extracted sound component and the sound component of the template at the time
of initial correction of the sound component of the template. The calculating step is characterized
by calculating a difference between each of the extracted sound components being quantized and
the sound component of the template.
[0018]
An acoustic signal processing method according to a seventh aspect of the present invention is
the audio signal processing method according to any one of the first to sixth aspects, wherein the
step of receiving an increase or decrease amount of the predetermined sound component is
performed. And increasing or decreasing the extracted predetermined sound component.
[0019]
An acoustic signal processing method according to an eighth aspect of the present invention
comprises the steps of extracting a predetermined tonal component of an inharmonic structure
included in the acoustic signal, and time information of extracting the predetermined tonal
component of the inharmonic structure from the acoustic signal. Receiving the predetermined
sound component, the step of outputting the sound signal, the received time information, the
predetermined sound component, the step of receiving the sound signal, and the received time
08-05-2019
5
information. And d) increasing or decreasing the received sound component included in the
sound signal.
[0020]
An acoustic signal processing apparatus according to a ninth aspect of the present invention
comprises an extraction means for extracting a predetermined sound component of an
inharmonic structure included in the sound signal, and an increase / decrease means for
increasing or decreasing the predetermined sound component extracted by the extraction means.
It is characterized by having.
[0021]
An acoustic signal processing apparatus according to a tenth aspect of the present invention is
the acoustic signal processing apparatus according to the ninth aspect, further comprising
calculation means for calculating a spectrum of the acoustic signal by frequency analysis, and the
extraction means extracts a spectrum corresponding to a predetermined sound component of the
inharmonic structure. It is characterized in that it is configured to
[0022]
An acoustic signal processing apparatus according to an eleventh aspect of the invention is the
audio signal processing device according to the ninth or tenth aspect, wherein extraction of the
predetermined sound component of the inharmonic structure is performed with reference to the
sound component of the template stored in advance in the storage unit. And a correction unit
configured to correct the sound component of the template so that the difference between the
extracted sound component and the sound component of the template is equal to or less than a
predetermined value.
[0023]
An acoustic signal processing apparatus according to a twelfth aspect of the invention performs
acoustic signal processing for extracting a predetermined tonal component of an inharmonic
structure included in the acoustic signal with reference to a tonal component of a template stored
in advance in a storage unit. The apparatus is characterized by further comprising correction
means for correcting the sound component of the template so that the difference between the
extracted sound component and the sound component of the template is equal to or less than a
predetermined value.
[0024]
08-05-2019
6
The acoustic signal processing apparatus according to a thirteenth aspect of the present
invention is the acoustic signal processing apparatus according to the eleventh or twelfth aspect,
wherein the correction means determines a difference between each of the extracted sound
components and the sound component of the template when the extracted sound components
are plural. Subtraction means, selection means for selecting a predetermined number of sound
components from the smaller difference obtained by the subtraction means, and sound
components of the template are updated to the median value of the predetermined number of
sound components selected by the selection means And updating means.
[0025]
An acoustic signal processing apparatus according to a fourteenth aspect of the present
invention is the acoustic signal processing apparatus according to the thirteenth aspect, further
comprising: quantizing means for quantizing the extracted sound component and the sound
component of the template at the first correction of the sound component of the template; The
means is characterized in that it is configured to obtain a difference between each of the
extracted sound components being quantized and the sound component of the template.
[0026]
An acoustic signal processing apparatus according to a fifteenth invention is the sound signal
processing apparatus according to any one of the ninth to fourteenth inventions, further
comprising reception means for receiving the increase or decrease amount of the predetermined
sound component, the increase or decrease means corresponding to the received increase or
decrease amount It is characterized in that it is configured to increase or decrease the extracted
predetermined sound component.
[0027]
According to a sixteenth aspect of the present invention, an acoustic signal processing system
comprises: extracting means for extracting a predetermined tonal component of an inharmonic
structure included in the acoustic signal; and the extracting means extracts a predetermined
tonal component of the inharmonic structure from the acoustic signal. A first acoustic signal
processing device having extracted time information, the predetermined sound component, and
an output unit for outputting the acoustic signal, time information output from the first acoustic
signal processing device, the predetermined A second increasing / decreasing means for
increasing / decreasing the received sound component included in the received sound signal
based on the sound component, the receiving means for receiving the sound signal, and the time
information received by the reception means; And an acoustic signal processing device.
08-05-2019
7
[0028]
An acoustic signal processing apparatus according to a seventeenth aspect of the present
invention is an audio signal processing apparatus comprising: extraction means for extracting a
predetermined tonal component of an inharmonic structure included in the acoustic signal; And
an output unit configured to output the predetermined sound component and the acoustic signal.
[0029]
According to an eighteenth aspect of the present invention, there is provided an acoustic signal
processing apparatus comprising: accepting means for accepting time information obtained by
extracting a predetermined sound component of an inharmonic structure from an acoustic signal,
the predetermined sound component, and the acoustic signal information; And increasing /
decreasing means for increasing / decreasing the received sound component contained in the
received acoustic signal based on the time information received by the device.
[0030]
A computer program according to a nineteenth aspect of the invention includes a procedure for
causing a computer to extract a predetermined sound component of an inharmonic structure
included in an acoustic signal, and a procedure for causing a computer to increase or decrease
the extracted predetermined sound component. It is characterized by
[0031]
The computer program according to the twentieth invention comprises, in the nineteenth
invention, a procedure for causing a computer to calculate a spectrum of an acoustic signal by
frequency analysis, and the procedure for extracting the spectrum corresponds to a spectrum
corresponding to a predetermined sound component of an inharmonic structure. Are extracted
by a computer.
[0032]
The computer program according to the twenty-first invention is the computer program
according to the nineteenth or twentieth invention, wherein extraction of the predetermined
tonal component of the inharmonic structure is performed with reference to the tonal component
of the template stored in advance. And correcting the sound component of the template so that
the difference between the extracted sound component and the sound component of the
template is equal to or less than a predetermined value.
08-05-2019
8
[0033]
A computer program according to a twenty-second aspect of the present invention is a computer
program which causes a computer to extract a predetermined sound component of an
inharmonic structure included in an acoustic signal with reference to a sound component of a
template stored in advance. And correcting the sound component of the template so that the
difference between the extracted sound component and the sound component of the template is
equal to or less than a predetermined value.
[0034]
A computer program according to a twenty-third aspect of the present invention is the computer
program according to the twenty-first or twenty-second aspect, wherein the procedure for
making correction makes the computer compare the difference between each extracted sound
component and the sound component of the template when there are multiple extracted sound
components. A procedure for calculating, a procedure for causing the computer to select a
predetermined number of sound components from the smaller one of the calculated differences,
and a procedure for causing the computer to update the template to the median value of the
predetermined number of selected sound components It is characterized by
[0035]
A computer program according to a twenty-fourth aspect of the present invention is the
computer program according to the twenty-third aspect, further comprising a step of quantizing
the extracted sound component and the sound component of the template at the time of initial
correction of the sound component of the template. The procedure for calculating is
characterized by causing a computer to calculate a difference between each of the extracted
sound components being quantized and the sound component of the template.
[0036]
The computer program according to the twenty-fifth aspect of the invention is the computer
program according to any one of the nineteenth to twenty-fourth aspects of the invention,
including a procedure for causing the computer to receive the increase or decrease of the
predetermined sound component. And causing the computer to increase or decrease the
extracted predetermined sound component.
[0037]
A computer program according to a twenty-sixth aspect of the present invention comprises a
procedure for causing a computer to extract a predetermined sound component of an inharmonic
structure included in an acoustic signal, Time information extracted, the predetermined sound
08-05-2019
9
component, and a procedure for outputting the acoustic signal.
[0038]
A computer program according to a twenty-seventh aspect of the present invention comprises a
procedure for causing a computer to receive time information obtained by extracting a
predetermined sound component of an inharmonic structure from an acoustic signal, the
predetermined sound component, and the sound signal information. And a step of increasing or
decreasing the received sound component included in the received sound signal based on the
received time information.
[0039]
In the first, ninth and nineteenth aspects of the present invention, the predetermined tonal
component of the inharmonic structure included in the acoustic signal is extracted.
The sound of the inharmonic structure is, for example, the sound of a percussion instrument such
as a drum.
Then, the extracted predetermined sound component is increased or decreased for the acoustic
signal.
For example, when the sound component of the extracted drum is increased, the drum sound can
be emphasized, and when the sound component of the extracted drum is decreased, the drum
sound can be canceled.
The predetermined sound component included in the sound signal can be extracted and
independently increased or decreased without affecting the other sound components.
[0040]
In the second, tenth and twentieth inventions, the spectrum of the acoustic signal is calculated by
frequency analysis.
08-05-2019
10
The sounds of percussion instruments such as drums are non-harmonic structures that have little
harmonic structure, while the sounds of other musical instruments are harmonic structures.
Therefore, it is possible to distinguish the sounds of the nonharmonic structure of percussion
instruments such as drums from the sounds of harmonic structures of other musical instruments
based on the spectral distribution.
Thus, based on the spectral distribution, it is possible to extract from the acoustic signal the
sound of the non-harmonic structure of the percussion instrument such as a drum.
[0041]
In the third, fourth, eleventh, twelfth, twenty-first, and twenty-second aspects of the invention,
extraction of the predetermined sound component of the nonharmonic structure is performed
based on the sound component of the template stored in advance.
For example, when extracting a drum sound, a template of the drum sound is stored in advance.
However, the drum sound included in the acoustic signal and the drum sound of the template
stored in advance are unlikely to be identical, and are often slightly different.
Therefore, the sound component of the template is corrected so that the difference between the
extracted sound component and the sound component of the template is equal to or less than a
predetermined value.
As a result, the drum sound included in the sound signal and the drum sound of the template
stored in advance become substantially the same, so that the extraction accuracy of the drum
sound is improved, and increase or decrease of the extracted drum sound can be accurately
performed. .
08-05-2019
11
In addition, it becomes possible to extract various drum sounds based on one template.
[0042]
In the fifth, thirteenth, and twenty-third aspects, when there are a plurality of extracted sound
components, the difference between each extracted sound component and the sound component
of the template is calculated, and a predetermined number of sounds are selected from the
smaller calculated difference. Select the ingredients.
Then, the sound component of the template is updated to the median value of the predetermined
number of selected sound components to correct the template.
The spectral structure of the tonal component of the inharmonic structure is likely to appear at
the same location of the selected tonal component.
On the other hand, the spectral structure of the tonal component of the harmonic structure is
unlikely to appear at the same position of the selected tonal component.
Therefore, when the median is determined, the spectral structure of the inharmonic structure is
likely to be retained, but the musical instrument sounds of harmonic structures other than
percussive sounds, such as drums, for example, are less likely to be retained. It is possible to
suppress the spectrum of sound components that do not have a harmonic structure.
[0043]
In the sixth, fourteenth, and twenty-fourth aspects of the present invention, at the time of the
first correction of the sound component of the template, the extracted sound component and the
sound component of the template are quantized, and each extracted sound component after
quantization and the above Calculate the difference with the sound component of the template.
For example, the drum sound included in the sound signal and the drum sound of the template
are unlikely to be exactly the same, and in the state where the template is not corrected, a large
08-05-2019
12
difference tends to occur even if they are similar. is there.
By quantizing the extracted sound component and the sound component of the template, a
difference is obtained using a representative value such as a median value, so that it is possible to
suppress that a large difference is calculated in the case of similarity.
[0044]
In the seventh, fifteenth, and twenty-fifth aspects, the predetermined amount of increase or
decrease in the sound component is received, and the extracted predetermined sound component
is increased or decreased in accordance with the received amount of increase or decrease.
For example, it is possible to receive an increase / decrease amount with an increase / decrease
volume as in the case of a sound volume of an audio signal.
The user can adjust the increase / decrease volume to independently adjust the volume of the
extracted predetermined sound component separately from the volume of the sound signal.
[0045]
In the eighth, sixteenth, seventeenth, eighteenth, twenty-sixth, and twenty-seventh aspects of the
present invention, in the first sound processing apparatus, a predetermined sound component of
the inharmonic structure included in the acoustic signal is extracted, The time information which
extracted the predetermined sound component of wave structure from the sound signal, the
predetermined sound component, and the sound signal are output.
The output can be recorded on a recording medium or transmitted to a communication network.
Then, in the second audio signal processing device, the received time information, the
predetermined sound component, and the sound signal are received, and based on the received
time information, the received included in the received sound signal Increase or decrease the
sound component.
08-05-2019
13
The reception can be received by a recording medium or received from a communication
network.
Since extraction of the predetermined tonal component of the inharmonic structure is heavy, it is
preferable to process it with a high-performance computer or the like.
On the other hand, since increase and decrease of the predetermined sound component is small
in load, it can be processed by a general audio device or the like.
In this way, the load can be distributed efficiently, and even a low-performance audio device can
increase / decrease the predetermined sound component of the non-harmonic structure.
[0046]
According to the first, ninth and nineteenth aspects of the present invention, it is possible to
independently increase or decrease the predetermined sound component of the inharmonic
structure included in the sound signal without affecting the other sound components.
[0047]
According to the second, tenth, and twentieth aspects of the present invention, it is possible to
extract a non-harmonic sound such as a drum sound from the acoustic signal based on the
spectral distribution.
[0048]
According to the third, fourth, eleventh, twelfth, twenty-first, and twenty-second aspects of the
present invention, the extraction accuracy of the inharmonic structure sound such as the drum
sound is improved, and the extracted drum sound is accurately increased or decreased. be able
to.
In addition, it becomes possible to extract non-harmonic sound such as various drum sounds with
one template.
08-05-2019
14
[0049]
According to the fifth, thirteenth, and twenty-third aspects of the present invention, it is possible
to obtain a template in which the spectrum of the sound component that is not the inharmonic
structure is suppressed.
[0050]
According to the sixth, fourteenth and twenty-fourth inventions, it is possible to suppress that a
large difference is calculated when the extracted sound component and the sound component of
the template are similar.
[0051]
According to the seventh, fifteenth, and twenty-fifth aspects of the invention, the volume of the
extracted predetermined sound component can be independently adjusted separately from the
volume of the acoustic signal.
[0052]
According to the eighth, sixteenth, seventeenth, eighteenth, eighteenth, twenty-sixth, and twentyseventh aspects of the present invention, the load is efficiently achieved by performing extraction
processing of the predetermined sound component of the nonharmonic structure and increase /
decrease processing with different devices. It is possible to disperse the predetermined sound
component of the inharmonic structure with a general audio device or the like.
[0053]
Hereinafter, the present invention will be specifically described based on the drawings showing
the embodiments thereof.
FIG. 1 is a block diagram showing a configuration example of a computer (sound signal
processing apparatus) according to the present invention.
The computer 10 includes a central processing unit (CPU) 11, a random access memory (RAM)
12 such as a DRAM, a hard disk drive (hereinafter referred to as a hard disk) 13, and an external
08-05-2019
15
storage unit 14 such as a flexible disk drive or a CD-ROM drive. And a communication unit 17
that communicates with a communication network 20 such as a LAN (Local Area Network) or the
Internet.
The computer 10 further includes an input unit 15 such as a keyboard or a mouse, and a display
unit 16 such as a CRT display or a liquid crystal display.
[0054]
The CPU 11 controls the respective units 12 to 17 described above.
The CPU 11 stores in the RAM 12 the program or data received from the input unit 15 or the
communication unit 17 or the program or data read from the hard disk 13 or the external
storage unit 14 and executes the program stored in the RAM 12 or data Various processing such
as calculation is performed, and temporary data used for various processing results or various
processing is stored in the RAM 12.
The data such as the calculation result stored in the RAM 12 is stored in the hard disk 13 by the
CPU 11 or output from the display unit 16 or the communication unit 17.
[0055]
An acoustic signal (sound data) accepted from the outside by the computer 10 is stored in the
hard disk 13, and the computer 10 is a nonharmonic structure sound such as a sound of a
percussion instrument such as a drum sound included in the acoustic signal (sound Component)
is extracted, and the extracted sound is increased or decreased.
The amount of increase or decrease of the extracted sound is received by the input unit
(reception means) 15.
The non-harmonic structure sound is a sound having substantially no harmonic structure, but
may contain a negligible harmonic structure as compared to a general musical instrument sound
08-05-2019
16
having a harmonic structure. Good.
[0056]
The CPU 11 operates as means (calculation means) for calculating the power spectrum P (t, f) of
the sound signal at the frame t and the frequency f.
The acoustic signal is sampled at 44.1 kHz, for example, and an STFT using a Hanning window
with a window width of 4096 points (frequency resolution 10.8 Hz) and a window shift length
441 points (time resolution 10 ms) P (t, f) is obtained by calculating (Short Time Fourier
Transformation).
[0057]
The CPU 11 operates as means for detecting the tone generation time candidate oi of the drum.
The drum sound generation time candidate oi detects, for example, a time (frame) at which the
rise of the power spectrum is large.
The CPU 11 calculates the differential Q (t, f) = {∂P (t, f) with respect to the time (frame) of P (t,
f) in three consecutive frames (t = a−1, a, a + 1) in the time direction. If the condition is satisfied,
the differential Q (a, f) in the frame a is calculated.
On the other hand, if Q (t, f)> 0 is not satisfied in three consecutive frames, Q (a, f) = 0.
Next, in each frame t, the CPU 11 multiplies Q (t, f) by a low pass filter function F (f) based on a
typical frequency characteristic of the drum, and sums S (t) in the frequency direction
[0058]
08-05-2019
17
Calculate
FIG. 2 is a diagram showing an example of F (f), the horizontal axis is frequency f and the vertical
axis is F (f). F (f) is stored in advance in the hard disk 13. The CPU 11 calculates a time at which S
(t) takes a local maximum value, and sets it as a generation time candidate oi. Preferably, the CPU
11 performs 11-frame smoothing on the S (t) by the method of Savitzky and Golay before
detecting the maximum value.
[0059]
The hard disk (storage unit) 13 stores a seed template TS created based on the single tone signal
of the drum. TS is a power spectrum of a fixed time length obtained by STFT with the onset time
as the onset time. TS is a matrix in which rows correspond to time and columns correspond to
frequency, and each element can be represented by TS (t, f) (where 1 ≦ t ≦ 15, 1 ≦ f ≦ 2048).
[0060]
The CPU 11 operates as means (correction means) for adapting the seed template TS to the
acoustic signal to be analyzed. The CPU 11 updates the seed template TS as described later, and
repeats updating of the template thereafter. Hereinafter, the template after the g-th update is
represented by Tg. Since TS is a template initially input (g = 0), T0 = TS. The CPU 11 detects a
spectrum fragment Pi (i = 1,..., N, where N is a power spectrum of a fixed time length starting
from the pronunciation time candidate oi [ms] detected from the acoustic signal to be analyzed It
operates as a means (calculation means) for extracting the total number of sounding time
candidates that have been made. The spectral fragment Pi is a matrix of the same size as the
template Tg.
[0061]
Although spectral fragments are extracted in this way, it is preferable to perform correction
processing of the pronunciation time candidate oi because the temporal resolution of 10 ms is
not sufficient for performing template adaptation with high accuracy. For example, the CPU 11
operates as means for correcting the pronunciation time candidate oi [ms] to oi '[ms], and
extracts the spectrum fragment Pi from the pronunciation time candidate oi' [ms] after
08-05-2019
18
correction. For example, if the spectral fragment extracted from oi '= oi-5 [ms] or oi + 5 [ms] is of
higher quality than the spectral fragment extracted from oi [ms], then oi' [ms] is extracted as the
start time The resulting power spectrum is taken as a spectral fragment Pi.
[0062]
For example, the CPU 11 extracts a spectrum fragment Pi, j whose start time is time (oi + j) [ms]
(where j =-5, 0, 5 [ms]). Next, the CPU 11 calculates a correlation value Corr (j) between the
template Tg ′ and the spectral fragment Pi, j.
[0063]
Calculate Next, the CPU 11 obtains an offset value J that maximizes Corr (j), and sets Pi, J at the
obtained offset value J as Pi.
[0064]
The CPU 11 further generates a template Tg ′ obtained by multiplying the template Tg and the
spectrum fragment Pi by the low-pass filter function F (f) and a spectrum fragment Pi ′ Tg ′ (t,
f) = F (f) Tg (t, f) Pi ′ (T, f) = F (f) Pi (t, f) is calculated.
[0065]
The CPU 11 operates as means (selection means) for selecting a predetermined number M of
spectrum fragments similar to the template Tg in the middle of adaptation.
The predetermined number M is a fixed ratio (0.1 in the present description) with respect to the
total number of spectrum fragments (the number of detected pronunciation time candidates).
The CPU (subtraction means) 11 calculates the distance (difference) Di between the template Tg
and the spectral fragment Pi, and selects the predetermined number M of spectral fragments
from the smaller one of the calculated distances. The distance Di is
[0066]
08-05-2019
19
It is possible to calculate more. However, when the distance Di is calculated by the above
equation, the distance between the template Tg and the spectral fragment Pi may be calculated
so large that the power peak positions of the spectral fragment Pi and the template Tg may be
slightly different. There is. FIG. 3 is a view showing an example of the distance between the
template Tg and the spectral fragment Pi, the horizontal axis is frequency f, the vertical axis is
power P, the solid line is Pi, and the broken line is Tg. As shown in FIG. 3 (a), the distance
between the two can be calculated to be very large only by slightly changing the power peak
position.
[0067]
Therefore, in the present invention, in the first adaptation, the seed template T0 and the spectral
fragment i are subjected to quantization processing with lower time-frequency resolution as
shown in FIGS. 3 (b) and 3 (c). Calculate the distance Di. For example, the time resolution after
quantization is 2 [frames] (20 [ms]), and the frequency resolution is 5 [bins] (54 [Hz]). The CPU
(quantization means) 11 quantizes the seed template T0 and the spectrum fragment i, and
quantizes the spectrum T0 "(t", f ") and Pi" (t ", f").
[0068]
[0069]
Calculate
Next, the CPU 11 determines the distance Di between the seed template T0 (Ts) and the spectral
fragment Pi.
[0070]
Calculate
[0071]
08-05-2019
20
The CPU 11 operates as means (updating means) for updating the template Tg to a new template
Tg + 1 based on the selected predetermined number M of spectral fragments Ps (s = 1,..., M).
The spectral structure of the drum sound is likely to appear at the same position in each spectral
fragment Ps. On the other hand, spectral components of musical instrument sounds other than
the drum are unlikely to appear at the same position in each spectral fragment Ps. Therefore, the
CPU 11 determines the median value of the selected spectral fragment Ps as new template Tg + 1
Tg + 1 (t, f) = medianPs (t, f). When the median is determined, the spectral structure of the drum
sound is likely to be retained, but the musical instrument sound other than the drum is unlikely
to be retained, and the spectral components of the musical instrument sound other than the
drum may be suppressed. Is high. Therefore, the seed template T0 of the drum sound can be
adapted to the drum sound in the sound signal including a plurality of types of musical
instrument sounds.
[0072]
By repeating the determination of the new template Tg + 1, the drum sound of the template
approaches the drum sound included in the acoustic signal, and the template adaptation is
performed. However, as the above determination is repeated, the amount of change in the
template decreases and the adaptation converges. The CPU 11 compares the template Tg with
the new template Tg + 1, and if the difference is less than the predetermined value, operates as a
means for determining that the adaptation has converged, and adds the new template Tg + 1 to
the post-application template TA. Do.
[0073]
The CPU 11 performs template matching based on the post-application template TA, and
operates as a unit (extraction unit) that determines whether the drum is sounding to the
sounding time candidate oi. The CPU 11 first multiplies the above-described low-pass filter
function F (f) to obtain a weighting function ω ω (t, f) representing the size of the feature on the
spectrum at each frame t and each frequency f of the adapted template TA. = F (f) TA (t, f) is
calculated.
08-05-2019
21
[0074]
Here, if the volume of each spectral fragment is different from the volume of the template, there
is a possibility that it may not be possible to correctly determine whether the template is
included in the spectral fragment, and the volume of each spectral fragment is correct in order to
perform template matching accurately. It is preferable to make corrections to match the volume
of the template. The CPU 11 selects the frequency ft, k (k = 1,..., 15) of the feature point with the
k-th largest value of ω (t, ft, k) in the frame t in the template TA, and the power difference ηi (t,
ft, k) ηi (t, ft, k) = Pi (t, ft, k) -TA (t, ft, k) is calculated. After that, the CPU 11 selects the value of
the first quadrant of ηi (t, ft, k) (the position of 25% of the number of samples counted from the
smallest sample when the samples are listed in order from the smallest). , Power difference at
frame t, i (t). The CPU 11 determines that TA is not included in Pi if the number of frames not
satisfying δi (t) Ψ (Ψ is a negative constant) is larger than a threshold R.
[0075]
The CPU 11 determines the final power difference Δi (the spectral segment correction value:
−Δi).
[0076]
Calculate
The CPU 11 determines that TA is not included in Pi if Δi ≦ Θ (Θ is a constant), and determines
that TA is included in Pi if Δi ≦ Θ is not satisfied. The corrected spectral fragment Pi 'Pi' (t, f) =
Pi (t, f)-? I is calculated.
[0077]
The CPU 11 operates as means for calculating the distance between the post-application
template TA and the spectrum fragment Pi 'after correction. When calculating the distance, the
CPU 11 determines whether or not the spectrum of TA is included in the spectrum of Pi '. FIG. 4
is a diagram showing an example of determination as to whether or not a spectrum is included.
The horizontal axis is frequency f, the vertical axis is power P, the solid line is Pi ', and the broken
line is TA. For example, as shown in FIG. 4A, when Pi ′ (t, f) is larger than TA (t, f), Pi ′ (t, f) is
08-05-2019
22
not only the spectral component of the drum sound but also the other It also contains the
spectral components of the instrument, and it is determined that TA (t, f) is included in Pi ′ (t, f).
In other cases, as shown in FIG. 4 (b), it is determined that TA (t, f) is not included in Pi '(t, f). The
CPU 11 measures the local distance measure γi (t, f) at the frequency f and a frame t between
TA and Pi ′.
[0078]
Calculate However, 'is a negative constant, and using' as a non-zero negative number absorbs
small variations in spectral components. The CPU 11 multiplies the distance measure γi by the
weighting function ω in the time-frequency domain to obtain an overall distance Γi.
[0079]
Calculate The CPU 11 operates as means for determining whether or not the target drum has
sounded in the part Pi ', and when Γi <θ is satisfied, it is determined that the target drum has
sounded, and the sounding time candidate oi is sounded Confirm at the time.
[0080]
The CPU 11 operates as means (increase / decrease means) for increasing or decreasing the
drum sound at the time of sound generation. FIG. 5 is a diagram showing an example of increase
and decrease of the drum sound at the sound generation time, the horizontal axis is frequency f,
the vertical axis is power P, and t represents time (frame). The CPU 11 multiplies the spectrum
Px corresponding to the post-adaptation template TA by r (0 ≦ r ≦ 1) as shown in FIG. 5B (note
that the broken line in FIG. 5B is before the multiplication by r, the solid line is 5 (c), the sound
signal P 'shown in FIG. 5 (c) is calculated by subtracting r · Px from the spectrum P of the
acoustic signal shown in FIG. 5 (a). . When the drum sound is to be increased, r · Px is added to
the spectrum P of the acoustic signal.
[0081]
As described above, the CPU 11 calculates various numerical values, but the numerical values
08-05-2019
23
calculated by the CPU 11 are stored in the RAM 12 or the hard disk 13. When a new numerical
value is calculated using the calculated numerical value, the CPU 11 reads a necessary numerical
value into the RAM 12 and calculates a new numerical value.
[0082]
The computer program recorded on the recording medium 19 such as a CD-ROM can be read by
the external storage unit 14 and stored in the hard disk 13 or the RAM 12 and executed by the
CPU 11 to operate the CPU 11 as the above-described units. . It is also possible to receive a
computer program from another device connected to the communication network 20 by the
communication unit 17, store the computer program in the hard disk 13 or the RAM 12, and
execute the program by the CPU 11.
[0083]
Next, increase and decrease of drum sound using the computer (sound signal processing
apparatus) according to the present invention will be described. FIG. 6 is a flowchart showing an
example of the procedure for increasing and decreasing the drum sound when template
adaptation is performed. The computer 10 receives, for example, an acoustic signal (sound data)
from the recording medium 19 in the external storage unit 14 and stores it in the hard disk 13 or
inputs the acoustic signal into a sound card (not shown). The converted and converted sound
data (hereinafter referred to as an acoustic signal) is stored in the hard disk 13. Further, the
computer 10 receives a drum sound template (seed template Ts) from the recording medium 19
by the external storage unit 14 and stores it in the hard disk 13.
[0084]
The CPU 11 analyzes the frequency of the acoustic signal, calculates the power spectrum P, and
stores the data of the calculated power spectrum P in the hard disk 13. Next, the CPU 11 detects
the pronunciation time candidate oi using the extracted power spectrum P stored in the hard disk
13 (S10), and stores the detected pronunciation time candidate oi in the hard disk 13. The CPU
11 extracts (calculates) the spectral fragment Pi based on the pronunciation time candidate oi
(S12), and stores the data of the extracted spectral fragment Pi in the hard disk 13. Thereafter,
the CPU 11 performs template adaptation (template correction) (S14), updates the template Tg
stored in the hard disk 13, and causes the post-application template TA to converge.
08-05-2019
24
[0085]
Thereafter, the CPU 11 performs template matching using the post-adaptation template TA to
determine the sound generation time (extract a drum sound) (S16), and store the determined
sound generation time in the hard disk 13. The CPU 11 increases / decreases the power
spectrum around the determined sounding time (S18) using the post-application template TA,
creates an audio signal for output, and stores it in the hard disk 13. The increase or decrease is
performed according to the amount of increase or decrease received by the input unit 15. As the
output audio signal, for example, an output audio signal (sound data) can be written from the
external storage unit 14 to the recording medium 19 or an output audio signal can be output
from a sound card (not shown).
[0086]
FIG. 7 is a flowchart showing an example of the detailed procedure of template adaptation (S14)
shown in FIG. The CPU 11 calculates the distance Di between the spectrum fragment Pi and the
template Tg (S20), and stores the calculated distance Di in the hard disk 13. In addition, the
distance Di is calculated after performing quantization at the first time. The CPU 11 selects a
spectrum fragment Ps having a small calculated distance Di (S22), and performs template update
(S24) using the median of the selected spectrum cross section. If the amount of change in the
template before and after updating becomes equal to or less than a predetermined value
(adaptation converges) (S26: YES), the CPU 11 ends the template adaptation processing, and the
adaptation does not converge (S26: The same process (S20, S22, S24) is repeated.
[0087]
FIG. 8 is a flow chart showing an example of the detailed procedure of template matching (S16)
shown in FIG. The CPU 11 corrects the spectral fragment Pi to fit the template (S30), and stores
the corrected spectral fragment Pi 'in the hard disk 13. The CPU 11 obtains the change amount
(correction value Δi) of the spectrum fragment before and after correction and stores it in the
RAM 12 and compares it with the threshold Θ stored in advance in the hard disk 13. (S32: YES),
the template matching process is ended. If the correction value Δi is smaller than the threshold
Θ (S32: NO), the CPU 11 calculates the distance Γi between the template and the spectrum
segment after correction (S34), and stores the calculated distance Γi in the hard disk 13. The
08-05-2019
25
CPU 11 compares the calculated distance Γi with the threshold value θ stored in advance in the
hard disk 13. If the distance Γi is equal to or larger than the threshold value θ (S36: YES), the
template matching process is ended. If the distance Γi is smaller than the threshold θ (S36: NO),
the CPU 11 determines the pronunciation time candidate oi as the pronunciation time (S38), and
stores the determined pronunciation time in the hard disk 13.
[0088]
FIG. 9 is a flow chart showing an example of the detailed procedure of the spectrum segment
correction (S30) shown in FIG. The CPU 11 calculates (S40) the power difference ηi between the
template TA and the spectrum fragment Pi at the feature frequency of each time (frame), stores it
in the RAM 12 or the hard disk 13, and based on the calculated power difference ηi at the
feature frequency. The power difference δi at each time is calculated (S42) and stored in the
RAM 12 or the hard disk 13. The CPU 11 compares the power difference δi at each time with
the threshold Ψ previously stored in the hard disk 13, calculates the number of frames where
the power difference δi is equal to or more than the threshold Ψ, and stores it in the RAM 12 or
the hard disk 13. Compare the number of frames with δ i above the threshold Ψ with the
threshold R stored in advance in the hard disk 13 (S44), and if the number of frames is below the
threshold R (S44: YES) finish. If the number of frames is larger than the threshold R (S44: NO),
the CPU 11 integrates the power difference δi at each time to calculate the power difference
(correction value Δi) (S46), and stores it in the hard disk 13. The CPU 11 compares the
calculated power difference Δi with the threshold Θ stored in advance in the hard disk 13. If the
power difference Δi is less than or equal to the threshold ((S48: YES), the correction process of
the spectral fragment Pi ends. Do. If the power difference Δi is larger than the threshold Θ (S48:
NO), the CPU 11 subtracts the power difference Δi from the spectrum fragment Pi (S50) to
obtain a corrected spectrum fragment Pi ′, and obtains the corrected spectrum fragment The Pi
'is stored in the hard disk 13.
[0089]
In the embodiment described above, a computer has been described as an example of the
acoustic signal processing apparatus, but the invention is not limited to the computer, and the
acoustic signal processing apparatus may use acoustic signals of recording devices, electronic
musical instruments, audio devices, portable audio devices, mobile phones, etc. It is possible to
apply the present invention to any device that produces an output.
[0090]
08-05-2019
26
FIG. 10 is a block diagram showing a configuration example of an audio apparatus (sound signal
processing apparatus) according to the present invention.
The audio device 30 includes an operation unit 35 for receiving various operations such as a
reproduction operation, a display unit 36 such as a liquid crystal panel for displaying an
operation state such as "during playback", a disk such as MD (Mini Disc) or a flash memory A
reproduction unit 34 for reading data from the recording medium of the above, and reproducing
an acoustic signal from the read data; an output unit 37 for outputting the acoustic signal
reproduced by the reproduction unit 34 to a headphone or a speaker; A control unit (CPU) 31
that controls each component such as the reproduction unit 34 and the output unit 37, and a
RAM 32 and a flash memory 33 connected to the control unit 31 are provided. The control unit
31 controls each component such as the reproduction unit 34 and the output unit 37 according
to the operation received from the operation unit 35, and causes the output unit 37 to output an
acoustic signal.
[0091]
The control unit 31 operates as means for extracting a predetermined sound component of an
inharmonic structure, such as drum sound, included in the sound signal, and means for
increasing or decreasing the extracted predetermined sound component. Further, the control unit
31 operates as means for calculating the spectrum of the acoustic signal by frequency analysis,
and extracts a spectrum corresponding to a predetermined sound component of the
nonharmonic structure. The extraction of the predetermined tonal component of the inharmonic
structure is performed with reference to the tonal component of the template stored in advance
in the flash memory 33 (storage unit), and the control unit 31 determines the extracted tonal
component and the above-mentioned tonal component. It operates as a means for correcting the
sound component of the template such that the difference between the sound component of the
template and the sound component is equal to or less than a predetermined value. More
specifically, when there are a plurality of extracted sound components, the control unit 31 selects
a predetermined number of sound components from the means for obtaining a difference
between each of the extracted sound components and the sound component of the template. And
means for updating the sound component of the template to the median value of the
predetermined number of selected sound components to correct the sound component of the
template.
[0092]
08-05-2019
27
Further, at the time of the first correction of the sound component of the template, the control
unit 31 operates as means for quantizing the extracted sound component and the sound
component of the template, and the quantized respective extracted sound components The
difference between the template and the sound component is determined. In addition, the
operation unit 35 operates as a means for receiving the increase / decrease amount of the
predetermined sound component, and the control unit 31 increases / decreases the extracted
predetermined sound component according to the received increase / decrease amount. The
operation unit 35 includes, for example, a volume control for a bass drum in addition to the
volume control for the entire sound signal.
[0093]
The audio device 30 shown in FIG. 10 extracts and increases or decreases a predetermined sound
component of an inharmonic structure such as a drum sound according to the present invention,
similarly to the computer shown in FIG. For example, the control unit 31, the RAM 32, the flash
memory 33, the reproduction unit 34, the operation unit 35, the display unit 36, and the output
unit 37 of the audio device 30 respectively correspond to the CPU 11, the RAM 12, the hard disk
13, the external storage unit 14, and the input unit of the computer 10. 15. The display unit 16
performs extraction and increase / decrease of drum sounds and the like according to the
present invention, as with the sound card (not shown).
[0094]
In the example of FIG. 10, the control unit (CPU) 31 extracts and increases and decreases drum
sounds and the like according to the present invention, but provides a dedicated LSI that extracts
and increases and decreases drum sounds and the like. It is also possible to configure so as to
perform extraction and increase / decrease of a predetermined sound component of inharmonic
structure such as sound by the control unit 31 but by using a dedicated LSI. Also, the present
invention can be applied to any audio device, such as providing the communication device for
communicating with the outside to the audio device 30, or enabling the recording unit 34 to be
able to record in addition to reproduction. Further, in the case of a mobile phone, the present
invention can be applied to an acoustic signal processing unit of any device that handles an
acoustic signal, such as applying the present invention to an acoustic signal processing unit of a
mobile phone.
08-05-2019
28
[0095]
In the embodiment described above, extraction and increase / decrease of the drum sound are
described as an example of the non-harmonic structure sound, but the drum sound is not limited
to this and it is not limited to the drum sound. It is possible to extract and increase or decrease
the sound of the wave structure, and to extract and increase or decrease the sound of the
nonharmonic structure output from another sound source. Further, it is also possible to extract
and increase or decrease bass drum sounds or snare drum sounds from the drum sounds.
[0096]
Further, the acoustic signal to be processed according to the present invention may include an
audio signal. For example, a predetermined sound component of the inharmonic structure is
extracted from an audio signal of music including vocal, and the extracted sound component is
increased or decreased. Of course, it is possible to extract a predetermined tonal component of
the inharmonic structure from an acoustic signal containing a voice for speech recognition, and
to increase or decrease the extracted tonal component. Therefore, in the speech recognition
process, it is possible to extract and reduce a predetermined tonal component of the inharmonic
structure included in the speech data. The sound component of the inharmonic structure
included in the audio signal is often a noise component, and the noise component can be
extracted and reduced for cancellation. This can improve the accuracy of speech recognition.
[0097]
In the above description, the power spectrum around the sounding time is increased or
decreased (S16, S18 in FIG. 6) following the determination of the sounding time, but the
sounding time is determined and the power spectrum is increased or decreased around the
sounding time It is also possible to process For example, after determining the sounding time of
the drum of the sound signal, the sound signal (sound data), the sounding time (sounding
position data), and the template after adaptation are sent to another computer via a recording
medium or a network. It is also possible to increase or decrease the power spectrum around the
tone generation time on the computer or audio device side of For example, the communication
unit (output means) 17 of the computer (first acoustic signal processing apparatus) shown in FIG.
1 transmits the acoustic signal, the sounding time, and the post-adaptation template, or from the
external storage unit (output means) 14 It is possible to write on a recording medium. Also, for
08-05-2019
29
example, the reproduction unit (reception means) 34 of the audio apparatus (second acoustic
signal processing apparatus) shown in FIG. 10 reads out the acoustic signal, the sounding time,
and the post-adaptation template from the recording medium. The unit 31 can increase or
decrease the power spectrum corresponding to the post-adaptation template at the time of sound
generation for the sound signal. Similarly, the communication unit (accepting means) 17 of the
computer (second acoustic signal processing apparatus) shown in FIG. 1 receives the acoustic
signal, the pronunciation time, and the post-adaptation template, or the external storage unit
(accepting means) 14 It is possible to read out the sound signal, the sounding time and the postadaptation template from the recording medium, and to increase or decrease the power spectrum
corresponding to the post-adaptation template at the sounding time with respect to the sound
signal. Moreover, it is also possible to perform template application (template correction)
individually with an acoustic signal processing apparatus such as another computer.
[0098]
It is a block diagram showing an example of composition of a computer (acoustic signal
processing device) concerning the present invention. It is a figure which shows the example of F
(f). It is a figure which shows the example of the distance of template Tg and spectrum fragment
Pi. It is a figure which shows the example of determination of whether the spectrum is contained.
It is a figure which shows the example of increase / decrease in the drum sound in sounding
time. It is a flowchart which shows the example of the increase / decrease procedure of the drum
sound at the time of performing template adaptation. It is a flowchart which shows the example
of the detailed procedure of template adaptation (S14) shown in FIG. It is a flowchart which
shows the example of the detailed procedure of the template matching (S16) shown in FIG. It is a
flowchart which shows the example of a detailed procedure of correction ¦ amendment (S30) of a
spectrum fragment shown in FIG. It is a block diagram showing an example of composition of an
audio device (acoustic signal processing device) concerning the present invention.
Explanation of sign
[0099]
DESCRIPTION OF SYMBOLS 10 computer 11 CPU 12 32 RAM 13 hard disk 14 external storage
unit 15 input unit 16 display unit 17 communication unit 19 recording medium 20
communication network 30 audio device 31 control unit (CPU) 33 flash memory 34
reproduction unit 35 operation unit 36 display unit 37 Output unit
08-05-2019
30
1/--страниц
Пожаловаться на содержимое документа