close

Вход

Забыли?

вход по аккаунту

JP2014093577

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014093577
Abstract: The present invention provides a technique for easily setting speech processing
parameters suitable for the situation. A plurality of voice processing parameters that can be used
in each of a plurality of terminal devices are defined, and the input unit 32 receives
recommendation of any of the voice processing parameters from the server device. The voice
processing unit 36 performs voice processing by setting the parameters of the voice processing
corresponding to the received recommendation. The receiving unit 40 receives the user's
evaluation of the audio processing. The output unit 38 reports the received evaluation to the
server device. Here, the rating reported indicates at least good or bad. In addition, when
indicating that the evaluation to be reported is bad, the input unit 32 newly accepts
recommendation of another voice processing parameter from the server device. The voice
processing unit 36 uses another voice processing parameter corresponding to the newly
accepted recommendation. [Selected figure] Figure 2
Terminal device, server device, voice processing method, setting method, voice processing system
[0001]
The present invention relates to voice processing technology, and more particularly, to a terminal
device, a server device, a voice processing method, a setting method, and a voice processing
system for setting parameters for voice processing.
[0002]
10-05-2019
1
Content distribution / distribution techniques such as movies, dramas, sports broadcasts, and
music are made unidirectionally by content providers, and generally, preference information of
each viewer is not taken into consideration.
However, it is desirable that the preference of the viewer for the content be reflected. In order to
cope with this, preference data and the like including operation information at the time of
content reproduction are transferred to the server via the network, and the server statistically
processes the preference data to add value information for each attribute of the viewer.
Generates. Furthermore, the content is played back so as to add value information (see, for
example, Patent Document 1).
[0003]
Unexamined-Japanese-Patent No. 2002-232823
[0004]
When playing music or the like, audio processing is performed.
When executing audio processing, parameters such as equalizer tap coefficients (hereinafter,
"audio processing parameters" etc.) are set, but the conditions for listening to music and user
preferences are reflected in the audio processing parameters. It is desirable that On the other
hand, since setting of speech processing parameters is generally not easy, simplification of
setting is also desired.
[0005]
The present invention has been made in view of such a situation, and an object thereof is to
provide a technique for easily setting speech processing parameters suitable for the situation.
[0006]
In order to solve the above problems, in the terminal device of an aspect of the present invention,
a plurality of voice processing parameters that can be used in each of the plurality of terminal
devices are defined, and from the server device, any voice processing parameters The voice
10-05-2019
2
processing unit that executes voice processing and the user's evaluation of the voice processing
performed by the voice processing unit are set by setting the input unit for accepting the
recommendation of and the parameter of the voice processing corresponding to the
recommendation received in the input unit. It has a receiving unit to receive, and an output unit
to report the evaluation received by the receiving unit to the server device.
The evaluation reported from the output unit indicates at least good or bad, and the input unit
indicates another voice processing parameter from the server device when it indicates that the
evaluation reported from the output unit is bad The voice processing unit uses another voice
processing parameter corresponding to the newly accepted recommendation in the input unit.
[0007]
Another aspect of the present invention is a server apparatus. In this apparatus, a plurality of
voice processing parameters that can be used in each of a plurality of terminal devices are
defined, and an analysis unit that selects one of the voice processing parameters to be used by
one terminal device; An output unit for recommending selected voice processing parameters to
the terminal device, and an input unit for receiving user's evaluation of voice processing from the
terminal device that has performed voice processing using the voice processing parameters
corresponding to the recommendation by the output unit And The evaluation that can be
accepted in the input section indicates at least good or bad, and the analysis section newly selects
another voice processing parameter when the evaluation received in the input section indicates
that the evaluation is poor, and the output section Recommends another voice processing
parameter newly selected by the analysis unit to the terminal device.
[0008]
Yet another aspect of the present invention is a speech processing method. In this method, a
plurality of voice processing parameters that can be used in each of a plurality of terminal
devices are defined, and the server device receives the recommendation of any of the voice
processing parameters, and the voice corresponding to the received recommendation. The
method comprises the steps of performing speech processing by setting processing parameters,
receiving a user's evaluation of the speech processing, and reporting the received evaluation to
the server device. The rating reported from the reported step indicates at least good or bad, and
the accepting step indicates another voice processing parameter from the server device when the
reported rating indicates a bad rating. The step of accepting the recommendation anew and
10-05-2019
3
performing the speech processing uses another speech processing parameter corresponding to
the newly accepted recommendation.
[0009]
Yet another aspect of the present invention is a setting method. In this method, a plurality of
voice processing parameters that can be used in each of a plurality of terminal devices are
defined, and one of the voice processing parameters to be used by one terminal device is
selected, and the selected voice processing And a step of accepting the user's evaluation of the
voice processing from the terminal device which has performed the voice processing using the
parameter of the voice processing corresponding to the recommendation. In the accepting step,
the acceptable evaluation indicates at least good or bad, and the selecting step newly selects and
recommends another voice processing parameter when the accepted evaluation indicates poor
evaluation. And recommend to the terminal device another newly selected voice processing
parameter.
[0010]
Yet another aspect of the invention is a speech processing system. In this voice processing
system, a plurality of voice processing parameters that can be used in each of a plurality of
terminal devices are defined, and any one of the voice processing parameters is selected, and a
server device that recommends the selected voice processing parameters And a terminal device
that executes voice processing using parameters of voice processing corresponding to the
recommendation by the server device, and reports the evaluation to the server device when the
user's evaluation of the voice processing is received. The evaluation reported from the terminal
indicates at least good or bad, and the server newly selects another voice processing parameter
and newly selects it when it indicates that the received evaluation is bad. Recommend different
voice processing parameters.
[0011]
Note that arbitrary combinations of the above-described components, and conversions of the
expression of the present invention among methods, apparatuses, systems, recording media,
computer programs and the like are also effective as aspects of the present invention.
[0012]
10-05-2019
4
According to the present invention, voice processing parameters suitable for the situation can be
easily set.
[0013]
It is a figure which shows the structure of the speech processing system which concerns on
Example 1 of this invention.
It is a figure which shows the structure of the terminal device of FIG.
It is a figure which shows the data structure of the table memorize ¦ stored in the audio
processing parameter memory ¦ storage part of FIG. It is a figure which shows the structure of
the server apparatus of FIG. It is a figure which shows the data structure of the terminal
information database of FIG. It is a figure which shows another data structure of the terminal
information database of FIG. It is a sequence diagram which shows the reproduction ¦
regeneration procedure by the speech processing system of FIG. It is a figure which shows the
structure of the terminal device which concerns on Example 2 of this invention. It is a figure
which shows the structure of the vehicle which concerns on Example 3 of this invention. It is a
figure which shows the structure of the terminal device which concerns on Example 3 of this
invention. It is a figure which shows the structure of the server apparatus based on Example 3 of
this invention. It is a sequence diagram which shows the reproduction ¦ regeneration procedure
by the speech processing system which concerns on Example 3 of this invention.
[0014]
(Example 1) Before specifically explaining Example 1 of the present invention, the knowledge
based on which it is based will be described. The first embodiment of the present invention
relates to an audio processing system that performs audio tuning on an audio reproduction
device installed in a vehicle such as a car. Audio tuning is the optimization of audio playback /
recording according to the surrounding environment. Such voice tuning is also called voice
calibration, voice equalization, voice compensation, voice correction, voice optimization. By
tuning the voice, more natural voice, more realistic voice, more balanced voice, noise-reduced
voice, voice according to the user's preference, etc. are reproduced. Such sound tuning is
generally performed at the factory. Also, it may be done at a store that sells playback devices,
such as a car audio store. However, voice tuning is not usually done at the user home as it is
10-05-2019
5
difficult by the general user.
[0015]
The audio tuning process is generally performed in the following steps. (1) Hardware
configuration / calibration The speaker, the amplifier and the cable are changed by adding a
speaker or changing the position of the speaker. (2) Crossover Assign a specific frequency band
to each speaker. (3) Setting amplification factor Adjust each channel level of the amplifier to
prevent clipping and keep balance. (4) Time alignment Adjust the delay of the sound from each
speaker to correct the phase mismatch. (5) Equalization Control the frequency response to adjust
the tone. Equalization is mainly classified into the following two types. The first is equalization
for compensation (hereinafter referred to as compensation equalization ). Due to device or
environmental influences, the speech changes before reaching the user. Compensation
equalization is a process for manipulating the frequency response to reduce device and
environmental influences. The second is artificial equalization. Even if the above compensation is
made, the voice may not meet the user's preference. Artificial equalization is a process for
adjusting the frequency response to match the speech to the user's preferences.
[0016]
The problems relating to the tuning of speech are, for example, as follows. The first problem is
the difficulty of tuning the voice. Voice tuning requires advanced technology and equipment.
Advanced techniques are required to check voice balance / tone / delay / noise levels. A high
level of technology is required to reproduce such a voice that the user is satisfied with. In
addition, as an advanced device, an impulse generator, a microphone, a human head type
recording device incorporating a microphone in the binaural part, and a real-time voice analyzer
are required. The second problem is that learning of the characteristics of the environment is not
performed. As described above, voice tuning is performed only at the factory before shipping,
and it is rare to perform voice tuning after shipping. The third problem is that user
characteristics such as race, nationality, age and gender are not distinguished. For example, high
frequency amplification is required because the elderly are not sensitive to high frequencies. On
the other hand, young people tend to prefer the sound with emphasized bass. It is desirable to
compensate for the frequency range, as Indians also horn the road with loud noise. In VIP
vehicles, voice tuning should be performed on VIP seats.
[0017]
10-05-2019
6
Next, an outline of this embodiment will be described. In this embodiment, the playback device
and the terminal device are mounted in the vehicle, and the server device is installed outside the
vehicle. Further, the terminal device and the server device are connected by a wireless circuit.
The server device receives the user profile and the like from the terminal device, and analyzes
the preference of the user based on the user profile and the like to select the voice processing
parameter. A plurality of speech processing parameters are prepared in advance by a specialist.
The server device recommends the use of the selected voice processing parameter to the
terminal device. The terminal device performs audio processing using the recommended audio
processing parameters, and the reproduction device reproduces the audio processing result. The
user who listens to this inputs "good" or "bad" to the terminal device as an evaluation for the
speech processing. The terminal device reports the evaluation to the server device. If the
evaluation is "poor", the server device selects another voice processing parameter and rerecommends it to the terminal device.
[0018]
Here, in order to analyze the tendency of preference, the server device uses data of user profile,
evaluation and voice content. The user profile is acquired at an initial stage from the terminal
device, and is indicated by, for example, the nationality, race, age, type and the like of the user.
The evaluation reflects the past processing result of speech processing parameters and the
history. The audio content data includes genre, artist, and title. The analysis may be done
automatically by software or partially manually. Collaborative filtering, content based filtering,
Bayesian networks etc. are applied for trend analysis.
[0019]
The recommended speech processing parameters are indicated in a ranking style. If the user
evaluates the most recommended speech processing parameter as "bad", the evaluation is
reduced to the server device. The server device recommends the following voice processing
parameters to the terminal device. The voice processing system repeats the above processing
until the user obtains a voice processing parameter that he / she likes. The rating thus reduced is
used to learn user preferences and the database is updated.
[0020]
10-05-2019
7
For the first task, a highly skilled expert generates a plurality of speech processing parameters
according to various environments or preferences. Such multiple voice processing parameters
are shared by multiple users. In particular, a group is formed by a plurality of users who tend to
have a preference, and speech processing parameters are shared within the group. As a result, it
is possible to provide sound processing parameters suitable for the environment or preferences
even for users without advanced technology. At this time, in order to simplify the user's
processing, the user only inputs "good" or "bad". Thus, the voice processing parameters are set
easily and quickly. As for the second problem, voice processing parameters are added and used
by highly skilled experts. As a result, even after shipment, audio tuning is performed. For the
third problem, by forming the above-described group, voice processing parameters are set
according to the tendency of the group preference. As a result, the setting for distinguishing the
features of the user is simplified.
[0021]
FIG. 1 shows the configuration of a speech processing system 100 according to a first
embodiment of the present invention. The voice processing system 100 includes a server device
10, a network 12, a base station device 14, a terminal device 20, and a reproduction device 18.
Here, the terminal device 20 and the playback device 18 are mounted on the vehicle 16. Here,
although one reproduction device 18 and one terminal device 20 are shown in order to clarify
the drawing, a plurality of these may be present. The vehicle 16 is, for example, a car. The
reproduction device 18 is mounted on the vehicle 16 and reproduces audio data. In the
following, voice and voice data are used without distinction, and these include music.
The reproduction device 18 is, for example, a car audio or a navigation device. The reproduction
device 18 and the terminal device 20 are connected by a cable or the like, and audio data
reproduced by the reproduction device 18 is acquired from the terminal device 20.
[0022]
The terminal device 20 transmits information (hereinafter referred to as condition
information ) regarding the condition of the voice processing to the base station device 14. The
condition information includes, for example, the user profile of the user who listens to the voice
in the vehicle 16 and the like. The condition information is finally transmitted to the server
device 10. The terminal device 20 receives the recommendation of the voice processing
parameter from the server device 10 via the base station device 14. The terminal device 20
10-05-2019
8
executes the audio processing using the audio processing parameter corresponding to the
recommendation by the server device 10, and causes the reproduction device 18 to reproduce
the result of the audio processing. The terminal device 20 reports the evaluation to the server
device 10 when the user's evaluation for the voice processing is received. As mentioned above,
the rating at least indicates "good" or "bad". If the evaluation is "poor", the terminal device 20
receives another voice processing parameter from the server device 10.
[0023]
The base station apparatus 14 is connected to the server apparatus 10 via the network 12 at one
end, and is connected to the terminal apparatus 20 via a wireless circuit at the other end. The
base station apparatus 14 corresponds to, for example, a base station apparatus of a mobile
phone system. The server device 10 receives the condition information from the terminal device
20 and analyzes the condition information to select one of the voice processing parameters.
Here, a plurality of voice processing parameters that can be used in each of the plurality of
terminal devices 20 are defined. The server device 10 recommends the selected voice processing
parameter to the terminal device 20. In addition, the server device 10 receives an evaluation
from the terminal device 20 after the recommendation of the voice processing parameter. If the
received evaluation indicates "poor", the server device 10 newly selects another voice processing
parameter, and recommends the newly selected another voice processing parameter to the
terminal device 20.
[0024]
FIG. 2 shows the configuration of the terminal device 20. The terminal device 20 includes a
communication unit 30, an input unit 32, a voice processing parameter storage unit 34, a voice
processing unit 36, an output unit 38, a reception unit 40, and a profile storage unit 42. The
communication unit 30 executes communication with the server device 10 by executing wireless
communication with the base station device 14 (not shown). For wireless communication, known
techniques may be used, and for example, a mobile phone communication system, a wireless
local area network (LAN) system, and a metropolitan area network (MAN) system are used. Also,
as a mobile phone communication system, a third generation mobile phone system may be used,
and LTE (Long Term Evolution) may be used.
[0025]
10-05-2019
9
The output unit 38 acquires the condition information and generates a user information stream
in which the condition information is stored. The output unit 38 outputs the user information
stream to the communication unit 30. The user information stream is transmitted from the
communication unit 30. One of the condition information is a user profile. The user profile
includes the name, age, gender, race, nationality, address, car type, size of the space in the car,
and the like. Such information is stored in advance in the profile storage unit 42. Also, another
one of the condition information is the song title, artist, and genre of the audio to be reproduced.
Such information is extracted from speech input from the outside. Audio input from the outside is
stored in a disc such as a CD (Compact Disc), a DVD (Digital Versatile Disc), Bluray or the like.
[0026]
The input unit 32 receives an analysis data stream from the server device 10 from the
communication unit 30. This corresponds to receiving a recommendation of any voice processing
parameter from the server device 10. The recommendation of any speech processing parameter
is indicated by identification information for identifying the speech processing parameter.
Therefore, the input unit 32 acquires identification information. The input unit 32 outputs the
identification information to the voice processing parameter storage unit 34.
[0027]
The voice processing parameter storage unit 34 stores a plurality of voice processing
parameters. An example of the audio processing parameter is a tap coefficient to be set in the
filter forming the equalizer. FIG. 3 shows the data structure of a table stored in the voice
processing parameter storage unit 34. As shown in FIG. As shown, an identification information
field 200 and a voice processing parameter field 202 are stored. That is, each of the speech
processing parameters is associated with identification information. The voice processing
parameters may be classified according to the user profile, the genre, and the like. Further, the
voice processing parameter storage unit 34 may store voice processing parameters
corresponding to basic genres, which are voice processing parameters to be used at the initial
stage of processing. Such speech processing parameters are set at the factory. Return to FIG.
[0028]
10-05-2019
10
The voice processing unit 36 receives from the voice processing parameter storage unit 34 voice
processing parameters corresponding to the identification information received by the input unit
32. That is, the voice processing unit 36 selects a voice processing parameter corresponding to
the received identification information from the voice processing parameter storage unit 34 in
which a plurality of voice processing parameters associated with the identification information
are stored. The voice processing unit 36 performs voice processing on voice input from the
outside by setting voice processing parameters. The audio processing unit 36 outputs the result
of the audio processing to the reproduction device 18. The reproduction device 18 receives the
result of the sound processing from the sound processing unit 36, and reproduces the result of
the sound processing. The reproduction device 18 outputs the reproduced sound from the
speaker.
[0029]
The receiving unit 40 receives information input by the user from an interface (not shown). The
interface is configured by a button or the like, and the button is provided on a dashboard or a
handle. When the receiving unit 40 has a voice recognition function, the interface may be
configured by a microphone. When the reception unit 40 has an image recognition function, the
interface may be configured by a camera. The information input by the user is the user's
evaluation of the speech processing performed by the speech processing unit 36. As mentioned
above, the rating is indicated by "good" or "bad". Note that buttons corresponding to good
and bad may be provided. The reception unit 40 outputs the evaluation to the output unit
38.
[0030]
When the output unit 38 receives an evaluation from the reception unit 40, the output unit 38
generates a user information stream in which the condition information is stored. The output unit
38 outputs the user information stream to the communication unit 30. The user information
stream is transmitted from the communication unit 30. That is, the output unit 38 reports the
evaluation received by the reception unit 40 to the server device 10. After this process, the input
unit 32 receives, from the server device 10, an analysis data stream including identification
information for identifying another voice processing parameter when it indicates that the
evaluation is bad. This corresponds to receiving a new recommendation of another voice
processing parameter. The voice processing unit 36 uses another voice processing parameter
corresponding to the newly received identification information.
10-05-2019
11
[0031]
In terms of hardware, this configuration can be realized with the CPU, memory, or other LSI of
any computer, and with software, it can be realized by a program loaded into the memory, etc.
Are drawing functional blocks. Therefore, it is understood by those skilled in the art that these
functional blocks can be realized in various forms only by hardware and by a combination of
hardware and software.
[0032]
FIG. 4 shows the configuration of the server device 10. The server device 10 includes a
communication unit 50, an input unit 52, a terminal information database 54, an analysis unit
56, an analysis result database 58, and an output unit 60. The communication unit 50
communicates with the terminal device 20 by being connected to the network 12 and the base
station device 14. The communication unit 50 can communicate with a plurality of terminal
devices 20. The communication unit 50 receives the user information stream from the terminal
device 20. The input unit 52 receives condition information from the user information stream.
The input unit 52 stores the condition information corresponding to each of the plurality of
terminal devices 20 in the terminal information database 54.
[0033]
The terminal information database 54 stores condition information and the like for each of the
plurality of terminal devices 20 as a database. The terminal information database 54 is
configured as a large scale relational database, a NoSQL (Not Only SQL) database. Here, the
terminal devices 20 having many common items in the condition information are grouped as a
group, and in the terminal information database 54, classification by groups is made. FIG. 5
shows the data structure of the terminal information database 54. As shown in FIG. As shown, a
group field 210 and a terminal device field 212 are included. The terminal device column 212
stores condition information and the like for each terminal device 20. In addition, condition
information and the like for two or more terminal devices 20 are grouped as a group as shown in
the group column 210.
[0034]
10-05-2019
12
FIG. 6 shows another data structure of the terminal information database 54. As shown in FIG.
This corresponds to the condition information and the like for one terminal device 20 stored in
the terminal device column 212 of FIG. As shown, an item column 220 and a content column
222 are included. As the user profile, age, gender, race, nationality, vehicle type, and size are
shown. As engine conditions, long distance drive and speeding are indicated. As environmental
conditions, silence and noise are indicated. As a position, a driver's seat and a rear seat are
shown. As music information, an artist, a genre, an album name, and a song name are shown. A
part of the music is a part of the sound that is actually reproduced by the reproduction device
18. Feedback corresponds to the above evaluation. The frequency of changes to the user profile
is generally less than the frequency of changes to the music information. Therefore, only the
music information may be appropriately updated by the notification from the terminal device 20.
Return to FIG.
[0035]
The analysis unit 56 performs analysis on the terminal information database 54 to extract
necessary information in order to constantly improve and personalize the voice processing
parameters. The analysis may be done partly manually by the expert or completely automatically.
If done automatically, the analysis unit 56 uses the song title or part of the audio to analyze the
features of the audio content. At this time, the analysis unit 56 executes processing for
recommendation, data mining, machine learning, pattern recognition, and statistical
methodology. For example, distance measurement, similarity measurement, sampling techniques,
dimensionality reduction are used as data processing. Also, as the classification, K-nearest,
decision trees, rules, Bayesian networks, simple Bayes, artificial neural networks, support vector
machines are used. Also, K-means, density based, message passing, and hierarchy are used as
clustering. Collaborative filtering and content based filtering are used as recommendations. Since
known techniques may be used for these, the description is omitted here.
[0036]
Here, two specific examples of analysis will be described. In the first example, when the first
terminal device 20a, the sixth terminal device 20f, and the tenth terminal device 20j select the
sound processing parameters of the identification information 7-1 for the songs A, G, and Y. Do.
Therefore, these are included in one group. When the first terminal device 20a and the sixth
terminal device 20f select the sound processing parameter of the identification information 9-5
10-05-2019
13
for the song X, the analyzing unit 56 outputs the sound for the song X for the tenth terminal
device 20j. First, identification information 9-5 is selected as a processing parameter. In the
second example, when a user of a particular nationality tends to recommend "focus on bass", a
speech processing parameter reflecting that is selected. Similar voice processing parameters are
selected for the terminal device 20 of a new user of the same nationality.
[0037]
As described above, the analysis unit 56 selects one of the voice processing parameters to be
used by one terminal device 20 by referring to the data stored in the terminal information
database 54. The data stored in the terminal information database 54 corresponds to evaluation
and condition information. In the analysis, evaluations from two or more terminal devices 20
included in the same group and conditions of voice processing are reflected. The correspondence
between the content of the data and the voice processing parameter to be selected is also stored
in the terminal information database 54 in advance. This correspondence relationship is
determined by experiments, simulations, and the like. The analysis unit 56 determines speech
processing parameters from the data using the correspondence relationship. The analysis unit 56
outputs identification information corresponding to the selected voice processing parameter to
the analysis result database 58.
[0038]
The analysis result database 58 stores the identification information selected by the analysis unit
56. In addition, the analysis result database 58 may store the analysis result in the analysis unit
56. Analysis results are used to generate market information that includes user behavior,
preferences, and trends. Market information is output to the outside. The output unit 60
generates an analysis data stream so as to store the identification information selected by the
analysis unit 56, and outputs the analysis data stream to the communication unit 50. The
communication unit 50 transmits the analysis data stream to the terminal device 20 via the
network 12 and the base station device 14. This corresponds to recommending the voice
processing parameter selected in the analysis unit 56 to the terminal device 20.
[0039]
After the above processing, the input unit 52 receives, via the communication unit 50, the user's
10-05-2019
14
evaluation of the voice processing from the terminal device 20 that has performed the voice
processing using the voice processing parameter corresponding to the recommendation. The
evaluation is also stored in the user information stream. The input unit 52 stores the evaluation
in the terminal information database 54. The analysis unit 56 newly selects another voice
processing parameter when the evaluation received by the input unit 52 indicates bad . A
process similar to that described above is used to select another speech processing parameter. In
addition, another sound processing parameter is predetermined by experiment and simulation.
The output unit 60 generates an analysis data stream so as to store identification information
corresponding to another voice processing parameter newly selected in the analysis unit 56, and
outputs the analysis data stream to the communication unit 50.
[0040]
In the processing up to this point, a plurality of voice processing parameters are defined in
advance, and the server device 10 selects one of the voice processing parameters according to
the information from the terminal device 20. Note that voice processing parameters may be
added after the start of processing. The server device 10 receives the new voice processing
parameters derived by the expert. The terminal information database 54 stores a new speech
processing parameter and also stores identification information associated with the new speech
processing parameter. Furthermore, the terminal information database 54 also stores
correspondences including new speech processing parameters. Since the new speech processing
parameter is not stored in the terminal device 20, when the analysis unit 56 selects a new speech
processing parameter, the output unit 60 stores the new speech processing parameter together
with the identification information, Generate an analysis data stream. The terminal device 20 of
FIG. 2 stores new voice processing parameters and identification information in the voice
processing parameter storage unit 34, and the voice processing unit 36 sets new voice
processing parameters to execute voice processing.
[0041]
The operation of the speech processing system 100 having the above configuration will be
described. FIG. 7 is a sequence diagram showing a reproduction procedure by the audio
processing system 100. The terminal device 20 reports the condition information to the server
device 10 (S10). The server device 10 selects a voice processing parameter (S12). The server
device 10 notifies the terminal device 20 of identification information corresponding to the
selected voice processing parameter (S14). The terminal device 20 sets voice processing
parameters (S16), and executes voice processing (S18). The terminal device 20 receives the
10-05-2019
15
evaluation (S20). The terminal device 20 reports the evaluation to the server device 10 (S22).
The server device 10 selects another voice processing parameter (S24). The server device 10
notifies the terminal device 20 of identification information corresponding to the selected other
voice processing parameter (S26). The terminal device 20 sets another voice processing
parameter (S28), and executes voice processing (S30).
[0042]
According to the embodiment of the present invention, any one of a plurality of speech
processing parameters is selected, so that even a user with a low speech tuning technology can
easily set speech processing parameters suitable for the situation. Can. In addition, even if the
user does not like the set voice processing parameter, the next voice processing parameter is set,
which facilitates the user's processing. Also, since the audio processing parameters are prepared
by the expert, audio reproduction suitable for the situation can be enabled. In addition, since
speech processing parameters prepared by experts are shared, highly accurate speech
processing parameters can be easily used. Also, since speech processing parameters are added,
the speech processing parameter database can be updated. Also, since the speech processing
parameters used for speech processing are not fixed, the speech processing parameters can be
updated to suit the situation. Further, since the speech processing parameters are shared in the
group generated based on the condition information, the number of samples of the condition
information for determining the speech processing parameter can be increased. In addition, since
the number of samples increases, the accuracy can be improved.
[0043]
Second Embodiment Similar to the first embodiment, the second embodiment of the present
invention also relates to an audio processing system that performs audio tuning on an audio
reproduction device. In the second embodiment, in addition to the processing in the first
embodiment, a voice processing parameter suitable for the traveling condition of the vehicle is
selected. Therefore, the terminal device is connected to a positioning device such as a Global
Positioning System (GPS) and acquires position information from the positioning device. The
terminal device reports position information to the server device. In the server device,
identification information of voice processing parameters is stored so as to be associated with
position information. The server device selects an audio processing parameter based on the
received position information. The voice processing system 100 and the server device 10
according to the second embodiment are of the same type as in FIGS. 1 and 4. Here, we will focus
on the differences from the previous ones.
10-05-2019
16
[0044]
FIG. 8 shows the configuration of the terminal device 20 according to the second embodiment of
the present invention. The terminal device 20 includes a communication unit 30, an input unit
32, an audio processing parameter storage unit 34, an audio processing unit 36, an output unit
38, a first accepting unit 46, a second accepting unit 48, and a profile storage unit 42. また。
The terminal device 20 is connected to the positioning device 44. The first reception unit 46
corresponds to the reception unit 40 in FIG. The positioning device 44 acquires position
information by having a GPS reception function. The positioning device 44 outputs the position
information to the second accepting unit 48. Also, the positioning device 44 may be a gyroscope
device, an accelerometer device, a pressure sensing device, a brain sensing device, or a vital sign
sensing device. In that case, surrounding environment information is acquired.
[0045]
The second accepting unit 48 is connected to the positioning device 44, and receives information
acquired by the positioning device 44, for example, position information. Here, for connection
between the positioning device 44 and the second reception unit 48, for example, a universal
serial bus (USB), a serial bus, a parallel bus, a high-definition multimedia interface (HDMI), and a
telephone jack are used. The output unit 38 generates a user information stream in which the
position information received by the second reception unit 48 is stored. The output unit 38
outputs the user information stream to the communication unit 30. The input unit 32 receives
identification information from the server device 10 via the communication unit 30. The position
information is reflected in the identification information.
[0046]
In the server device 10 of FIG. 4, position information is stored in the user information stream
received by the input unit 52 from the communication unit 50. The input unit 52 outputs the
position information to the terminal information database 54. The terminal information database
54 also stores position information from the input unit 52. Furthermore, the terminal
information database 54 stores voice processing parameters in a predetermined area. For
example, the predetermined area corresponds to a school route, and the terminal information
database 54 stores the position information of the school route and the voice processing
10-05-2019
17
parameter in association with each other. When the vehicle 16 is traveling on a school route,
there is a high risk that a student will jump to the road, so voice processing parameters are
defined in advance such that the concentration for grasping the surrounding situation is higher.
The predetermined area may be a residential area. At that time, if approaching a residential area
in the middle of the night, sound processing parameters are defined in advance to suppress the
bass.
[0047]
When the position information acquired by the input unit 52 is included in the predetermined
area stored in the terminal information database 54, the analysis unit 56 selects a voice
processing parameter corresponding to the area. . As described above, as selection, the analysis
unit 56 selects identification information corresponding to the voice processing parameter. That
is, when the vehicle 16 travels in a predetermined area, the analysis unit 56 selects the voice
parameter associated with the position information regardless of the condition information and
the evaluation. The analysis unit 56 may perform correction according to the position
information on the voice processing parameter selected based on the condition information and
the evaluation. Specifically, the analysis unit 56 provisionally selects identification information
based on the condition information and the evaluation. Further, in the terminal information
database 54, the correspondence between the temporarily selected identification information and
the final identification information is stored for each position information. The analysis unit 56
selects final identification information from the temporarily selected identification information
based on the correspondence relationship. Thus, the analysis unit 56 reflects the position
information to select the sound processing parameter. The subsequent processing is the same as
that of the first embodiment, and thus the description thereof is omitted here.
[0048]
According to the embodiment of the present invention, since the sound processing parameter
according to the position information is set, it is possible to reproduce the sound suitable for
mounting the reproduction device and the terminal device on the vehicle. In addition, since the
sound suitable for mounting the reproduction device and the terminal device on the vehicle is
reproduced, safe driving can be performed.
[0049]
10-05-2019
18
Third Embodiment The third embodiment of the present invention also relates to an audio
processing system that performs audio tuning on an audio reproduction apparatus as in the past.
In the third embodiment, in addition to the processing in the first embodiment, the voice tuning
is performed in consideration of the position in the vehicle of the user listening to the voice.
Therefore, the terminal device is connected to a sensor for measuring the position of a user who
should listen to voice, for example, a driver, a passenger, etc., and acquires position information
of the user from the sensor. The terminal device also reports position information of the user to
the server device. The server device derives the distance between each of the plurality of
speakers provided in the vehicle and the user based on the position information, and outputs the
sound from each speaker based on the derived distance. Deduce the delay time.
[0050]
Adjusting the timing of the difference of outputting the sound from each speaker by the delay
time corresponds to time alignment. Although the delay time corresponds to the speech
processing parameter, the delay time here is called an initial processing parameter in order to
clarify the difference from the previous speech processing parameter. Note that the initial
processing parameters are not limited to the delay time, and may include tap coefficients as
before. The terminal device performs speech processing by using the initial processing
parameters. The voice processing system 100 according to the third embodiment is of the same
type as that of FIG. Here, we will focus on the differences from the previous ones.
[0051]
FIG. 9 shows a configuration of a vehicle 16 according to a third embodiment of the present
invention, which corresponds to a top view of the vehicle 16. The vehicle 16 includes a front left
speaker 110, a front right speaker 112, a rear left speaker 114, a rear right speaker 116, a front
right seat 120, a front left seat 122, a handle 124, a dashboard 126, a room mirror 128, and a
terminal device 20. Further, in the vehicle 16, the terminal device 20 is connected to a
reproduction device 18 (not shown).
[0052]
The time alignment is audio tuning for adjusting the delay of audio output from each of the front
10-05-2019
19
left speaker 110, the front right speaker 112, the rear left speaker 114, and the rear right
speaker 116. This is done to correct the phase mismatch. When the driver 118 is seated at the
front right seat 120, the distances from the front left speaker 110, the front right speaker 112,
the rear left speaker 114, and the rear right speaker 116 to the driver 118 are different from
each other. When the terminal device 20 and the reproduction device 18 simultaneously
reproduce sound from four speakers, the arrival time from each speaker is different, so the sound
image is not correct. On the other hand, when the right front seat 120 is moved back and forth
and up and down, the position of the driver 118 changes. The position of the right front seat 120
affects the sound image. In addition, the driver 118 who is high in seat height is affected by the
sound image. In order to cope with such a situation, vehicle type information of the vehicle 16
and a detection result of a sensor are received by the server device 10. The sensor is installed, for
example, on the right front seat 120, and detects the position in the front-rear direction of the
right front seat 120 and the position in the vertical direction.
[0053]
When the highest position for listening to voice is set to be the left front seat 122 instead of the
right front seat 120, it is input to the terminal device 20 that the left front seat 122 is the place
to listen to the voice to the server device 10 Will be sent. On the other hand, only vehicle type
information may be transmitted to the server device 10. In that case, the calibration parameters
at the seat position, the position at which voice is heard, and the seat height may be measured at
the terminal device 20.
[0054]
In FIG. 9, compensation equalization for speech tuning may be targeted. Compensation
equalization is voice tuning to correct frequency characteristics and flatten the frequency
response in the frequency domain. When the driver 118 is sitting on the right front seat 120, the
reflection from the right is dominant for the right window. The frequency characteristics between
the right front seat 120 and the left front seat 122 are largely different. In addition, in-vehicle
products such as the handle 124, the dashboard 126, and the rearview mirror 128 affect the
frequency characteristics. Since the frequency characteristics change according to the position of
the seat, the information on the seat position is automatically detected by the sensor and
transmitted to the server device 10. When the passenger, not the driver 118, listens to the voice,
that effect is manually input to the terminal device 20 by the driver 118 and transmitted to the
server device 10.
10-05-2019
20
[0055]
When another passenger is sitting on the seat, they are affected by the frequency response of
their reflection, absorption and diffraction by their body. Therefore, the seat position of another
passenger is also automatically detected by the sensor and reported to the server device 10. Only
the information of the vehicle type may be transmitted to the server device 10. The calibration
parameters at different passenger locations are measured by the terminal device 20. Such a
terminal device 20 has a function to calculate initial process parameters from such parameters.
[0056]
FIG. 10 shows the configuration of the terminal device 20 according to the third embodiment of
the present invention. The terminal device 20 includes a communication unit 30, an input unit
32, an audio processing parameter storage unit 34, an audio processing unit 36, an output unit
38, a first accepting unit 46, a second accepting unit 48, a profile storage unit 42, and an initial
processing parameter storage. Part 70 is included. In addition, the terminal device 20 is
connected to the sensor 72. The first reception unit 46 corresponds to the reception unit 40 in
FIG. The sensor 72 is installed on the right front seat 120 of FIG. A well-known technique may be
used for the sensor 72, so the description thereof is omitted here. The sensor 72 outputs the
detection result to the second reception unit 48. The detection result corresponds to the position
information of the right front seat 120.
[0057]
The second receiving unit 48 is connected to the sensor 72 and receives the detection result of
the sensor 72. This corresponds to receiving the result of the voice processing performed by the
voice processing unit 36 as well as environmental information on the driver 118, that is, the
environment that the user listens to. The second reception unit 48 outputs environmental
information. The output unit 38 generates a user information stream in which the environment
information accepted by the second accepting unit 48 is stored. The output unit 38 outputs the
user information stream to the communication unit 30. This corresponds to reporting
environment information to the server device 10.
[0058]
10-05-2019
21
The input unit 32 receives initial processing parameters from the server device 10 via the
communication unit 30. The initial process parameters are generated in the server device 10
based on the environment information. The input unit 32 outputs the initial process parameter to
the initial process parameter storage unit 70. The initial process parameter storage unit 70
stores initial process parameters from the input unit 32. The voice processing unit 36 performs
voice processing by setting initial processing parameters at an initial stage. This process includes
time alignment, compensation equalization, and crossover. After the audio to which the initial
processing parameters have been applied is output from the reproduction device 18, the
processing described above is executed to update the audio processing parameters.
[0059]
FIG. 11 shows the configuration of the server apparatus 10 according to the third embodiment of
the present invention. The server device 10 includes a communication unit 50, an input unit 52,
a terminal information database 54, a first analysis unit 62, a second analysis unit 64, an initial
processing parameter database 66, an analysis result database 58, and an output unit 60. The
input unit 52 receives the user information stream from the terminal device 20 via the
communication unit 50. The input unit 52 also extracts, from the user information stream,
environment information acquired by the sensor 72 and environment information related to the
environment in which the user listens to the result of the audio processing. The input unit 52
stores environment information corresponding to each of the plurality of terminal devices 20 in
the terminal information database 54.
[0060]
The first analysis unit 62 corresponds to the analysis unit 56 of FIG. The second analysis unit 64
generates initial processing parameters for the terminal device 20 based on the environment
information from the terminal device 20. If the initial processing parameter is a delay time for
time alignment, the second analysis unit 64 derives a delay time that decreases as the distance
increases after the distance from each speaker to the user is derived. This corresponds to
deriving a delay time according to the distance from the front left speaker 110, the front right
speaker 112, the rear left speaker 114, and the rear right speaker 116 to the driver 118 in FIG.
When the initial processing parameter is a tap coefficient, a plurality of tap coefficients acquired
in advance in experiments or the like are stored, and the second analysis unit 64 selects one of
them based on the environment information.
10-05-2019
22
[0061]
The initial process parameter database 66 stores the initial process parameters derived in the
second analysis unit 64. The output unit 60 generates an analysis data stream so as to store the
initial process parameters stored in the initial process parameter database 66, and outputs the
analysis data stream to the communication unit 50.
[0062]
FIG. 12 is a sequence diagram showing a reproduction procedure by the audio processing system
100 according to the third embodiment of the present invention. The terminal device 20 acquires
sensor data (S50). The sensor data corresponds to the aforementioned environmental
information. The terminal device 20 reports sensor data and condition information to the server
device 10 (S52). The server device 10 calculates initial processing parameters (S54). The server
device 10 notifies the terminal device 20 of initial processing parameters (S56). The terminal
device 20 sets initial processing parameters (S58), and executes voice processing (S60).
[0063]
According to the embodiment of the present invention, since the initial processing parameters
are set in accordance with the position of the seat, it is possible to realize time alignment for the
sound from each speaker. Further, not only time alignment but also compensation equalization
can be realized by the initial processing parameters, so that the influence of sound distortion in
the frequency domain can be reduced.
[0064]
The present invention has been described above based on the embodiments. It is understood by
those skilled in the art that this embodiment is an exemplification, and that various modifications
can be made to their respective components or combinations of processing processes, and such
modifications are also within the scope of the present invention. .
10-05-2019
23
[0065]
In the embodiment of the present invention, the reproduction device 18 and the terminal device
20 are mounted in the vehicle 16. However, not limited to this, for example, the reproduction
device 18 and the terminal device 20 may be mounted other than the vehicle 16. Specifically, the
playback device 18 and the terminal device 20 may be installed in a room. According to this
modification, the application range of the present invention can be expanded.
[0066]
Any combination of Embodiments 1 to 3 of the present invention is also effective. According to
this modification, an effect by any combination can be obtained.
[0067]
DESCRIPTION OF REFERENCE NUMERALS 10 server device 12 network 14 base station device
16 vehicle 18 reproduction device 20 terminal device 30 communication unit 32 input unit 34
audio processing parameter storage unit 36 audio processing unit 38 output unit 40 reception
unit , 42 profile storage unit, 50 communication unit, 52 input unit, 54 terminal information
database, 56 analysis unit, 58 analysis result database, 60 output unit, 100 speech processing
system.
10-05-2019
24
1/--страниц
Пожаловаться на содержимое документа