close

Вход

Забыли?

вход по аккаунту

JP2016107978

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016107978
Abstract: A method of using face recognition to adjust the orientation of a movable speaker. The
method comprises receiving based on image data captured by one or more cameras, identifying
one or more coordinates of the space based on the location of the user. Adjusting at least one
actuator coupled to the speaker to change a direction of the speaker to achieve an acoustic
environment determined with respect to the one or more coordinates based on the one or more
coordinates Generating a control signal using a plurality of computer processors, consisting of
leaving the one or more cameras fixed in space while changing the orientation of the speakers.
[Selected figure] Figure 3
Adjustment speaker using face recognition
[0001]
The present disclosure relates to moveable speakers, and more particularly to using face
recognition to adjust the orientation of the moveable speakers.
[0002]
In order to output sound to a vehicle, fixed speakers are generally used.
For example, in a vehicle such as a car, a plurality of speakers are generally fixedly arranged in
order to output sounds (for example, music, recorded books, radio programs, etc.) for the driver
and passengers of the vehicle. . However, fixed speakers can not be adjusted for passengers at
03-05-2019
1
different locations within the listening environment. One particular loudspeaker arrangement is
optimal for passengers of a certain height, but may not be optimal for passengers of higher or
lower height. Also, the direction of the audio output by the fixed loudspeakers does not change
according to the number of passengers in the vehicle. For example, if the loudspeaker
arrangement is designed to provide optimal performance given the presence of four passengers
in the vehicle, when there are fewer than four passengers, the loudspeaker arrangement
performance will be It may not be as optimal as the performance of other loudspeaker
arrangements.
[0003]
A method according to an embodiment of the present disclosure includes receiving regarding a
user's location based on image data captured by a camera and identifying one or more
coordinates in space based on the user's location. There is. Also, the method includes generating
a control signal for adjusting the at least one actuator based on the one or more coordinates, the
control signal including coordinates of one or more sound output areas of the speaker To change
the direction of the speaker.
[0004]
Another embodiment of the present disclosure is a system comprising a movable speaker, an
actuator mechanically coupled to the movable speaker, and a computing device. The computing
device is configured to receive the user's location based on the image data captured by the
camera and to identify one or more coordinates in space based on the user's location. The
computing device is configured to generate a control signal to adjust the actuator based on the
one or more coordinates. The control signal is configured to redirect the moveable speaker such
that the audio output area of the moveable speaker includes one or more coordinates.
[0005]
Another embodiment of the present disclosure is a computer program product for adjusting a
speaker, the computer program product comprising computer readable program code executable
by one or more computer processors. The program code is configured to receive the user's
location based on the image data captured by the camera and to identify one or more coordinates
in space based on the user's location. The program code is also configured to generate a control
03-05-2019
2
signal for adjusting the at least one actuator based on the one or more coordinates, the control
signal having one or more voice output areas of the movable speaker It is configured to change
the direction of the movable speaker so as to include the coordinates.
[0006]
The present invention provides, for example, the following items. (Item 1) Receiving a user
location based on image data captured by one or more cameras, identifying one or more
coordinates of space based on the user location, one or more of the above One or more
computers adjusting at least one actuator coupled to the speaker to change the orientation of the
speaker to achieve an acoustic environment determined with respect to the one or more
coordinates based on a plurality of coordinates A method comprising: generating a control signal
using a processor; changing the orientation of the speaker while leaving the one or more cameras
fixed in space. (Item 2) The location of the user is identified by the face of the user in one of a
two-dimensional or three-dimensional space, and the user associated with the user using the
identified face of the user The method of claim 1 comprising: identifying a preference of the user
and changing parameters of a vehicle in which the user is located based on the preference of the
user. 3. The method of any of the above items, wherein the user's location comprises depth
measurement based on the distance from the user and the one or more cameras. (Item 4) The
method according to any of the items above, wherein generating the control signal further
comprises converting the one or more coordinates into a control signal that uses a
predetermined function. 5. The method of any of the above items, wherein the control signal is
configured to adjust the speaker such that the speaker is directed to the one or more coordinates.
6. The one or more coordinates define a path in a three-dimensional space, and the control signal
is configured to adjust the actuator such that the audio output of the region tracks the path. The
above method of any of the items. (Item 7) Receiving regarding a plurality of locations
corresponding to a plurality of users based on the image data captured by the one or more
cameras and generating a plurality of control signals of a plurality of actuators based on the
plurality of locations The method of any of the above items, further comprising: each said
actuator adjusting each one of a plurality of speakers. 8. The method of any of the above items
wherein the plurality of speakers are in a vehicle and the plurality of locations identify locations
of each of the plurality of users in the vehicle. (Item 9) A movable speaker, an actuator
mechanically coupled to the movable speaker, and a location of a user based on image data
captured by one or more cameras, and a space based on the location of the user Coupled to
change the orientation of the moveable speaker to identify one or more coordinates and to
achieve an acoustic environment determined with respect to the one or more coordinates based
on the one or more coordinates A system including a computing device configured to generate a
control signal configured to adjust an actuator to change the orientation of the moveable speaker
while leaving the one or more cameras fixed in space.
03-05-2019
3
10. The system of any of the above items wherein the user's location is identified by the user's
face in one of two-dimensional or three-dimensional space. (Item 11) The system according to
any of the items, wherein the movable speaker is provided in a vehicle that accommodates the
user. 12. The at least two actuators mechanically coupled to the moveable speaker, the
computing device being configured to determine control signals of each of the at least two
actuators, The system of any of the above items wherein the audio output area comprises the one
or more coordinates. 13. The system of any of the above items, wherein each control signal is
configured to adjust the moveable speaker such that the moveable speaker points at the one or
more coordinates. (Item 14) The system according to any of the items above, wherein the one or
more coordinates correspond to a predicted location of the user's body part in space. (Item 15)
Further including a plurality of actuators and a plurality of speakers, the computing device
performs reception regarding a plurality of locations corresponding to a plurality of users based
on the image data captured by the one or more cameras, The system of any of the above items,
configured to generate control signals for the plurality of actuators based on the location of each
of the plurality of actuators, each of the . 16. A computer program product for adjusting a
speaker, the computer program product comprising a computer readable storage medium having
computer readable program code executable by one or more computer processors. The readable
program code receives a user location based on image data captured by one or more cameras,
identifies one or more coordinates of the space based on the user location, and A control
configured to adjust at least one actuator coupled to the speaker to change a direction of the
speaker to achieve an acoustic environment determined with respect to the one or more
coordinates based on a plurality of coordinates. Generate a signal on While the direction of the
speaker is changed, the one or more cameras the computer program product to remain fixed in
space.
17. The computer program product of any of the above items wherein the user's location is
identified by the user's face in one of two dimensional or three dimensional space. 18. The
computer program product of any of the above items, wherein the user location comprises depth
measurement based on a distance from the user and the one or more cameras. (Item 19) The
generation of the control signal includes converting the one or more coordinates into a control
signal using a predetermined function, and the depth measurement makes at least one input the
predetermined function. The above computer program product of any of the above items, further
comprising. 20. The computer program product of any of the above items, wherein the control
signal is configured to adjust the speaker to direct the speaker to the one or more coordinates.
Summary The embodiments herein describe an audio system that makes adjustments based on
the location of a person. That is, instead of relying on fixed speakers, the audio system adjusts
the direction of the audio output of one or more speakers to optimize the performance of the
audio system based on the user's location or the number of users. To do so, the audio system
03-05-2019
4
comprises a tracking application that identifies the camera and the location of the user and / or
the number of users in front of the camera. Using this information, the audio system adjusts one
or more actuators coupled to the speaker to redirect the speaker's audio output. As the user
moves, the audio system continually adjusts the speakers to optimize system performance.
[0007]
1 illustrates a system for adjusting mobile speakers based on a user's location. FIG. 1 is a block
diagram of a system for adjusting a speaker based on face recognition. The method of adjusting a
speaker is shown based on face recognition. The figure which identifies the place where a
speaker turns is shown based on face recognition. The figure which identifies the place where a
speaker turns is shown based on face recognition. The figure which identifies the place where a
speaker turns is shown based on face recognition. 7 illustrates adjusting the placement of
speakers based on the number of passengers in the vehicle. 7 illustrates adjusting the placement
of speakers based on the number of passengers in the vehicle. 7 illustrates adjusting the
placement of speakers based on the number of passengers in the vehicle. 7 illustrates adjusting
the placement of speakers based on the number of passengers in the vehicle. A system for
identifying a path for adjusting a movable speaker based on face recognition.
[0008]
Wherever possible, the same reference numbers are used to designate the same elements that
are common to the drawings for ease of understanding. It is contemplated that the elements
disclosed in one embodiment may be utilized in the other embodiments without the particular
listing. The drawings referred to herein should not be understood as being drawn to scale unless
otherwise indicated. Also, the drawings are often simplified or have details or components
omitted for clarity of presentation and explanation. The drawings and the description serve to
explain the principles considered below, and the same names indicate the same elements.
[0009]
In the embodiments herein, an audio system is described that adjusts based on the user's
location. Instead of relying on fixed speakers that can not change the direction of the audio
output, the audio system described herein can, depending on the location and the number of
users, direct the audio output of one or more speakers adjust. To do this, the audio system may
03-05-2019
5
include a camera and a face recognition application that identifies the user's location and / or the
number of users in front of the camera. Using this information, the audio system adjusts one or
more actuators coupled to the speaker to turn in the direction of the speaker's audio output, ie,
the direction in which the speaker is facing. For example, the face recognition application may
identify the user's location in 3D space, and in response, the audio system adjusts the speakers to
point to that location. As the user moves and keeps changing places, the audio system can adjust
the speakers to optimize system performance on an ongoing basis.
[0010]
In one aspect, the face recognition application detects multiple users in front of the camera. The
audio system can adjust the speakers based on various locations of the user. For example, the
speaker can be moved so that the direction of the audio output is between two users to achieve
optimal performance. Also, the audio system may include a plurality of adjustable speakers, one
of the speakers facing one of the users and the other of the other speakers pointing to another.
Regardless of the number or location of users, the audio system may be pre-programmed to
change the orientation of the speakers in order to optimize (ie, improve) audio performance.
[0011]
FIG. 1 is an audio system 100 for adjusting the moveable speaker 105 based on the location of
the user. System 100 includes a speaker 105, an actuator 110, a camera 115 and a user 120. As
shown, the speaker 105 may be any device that produces sound in response to the input of an
electrical signal. The speaker 105 is coupled to the actuator 110 and changes the audio output
direction of the speaker 105 along one or more axes, such as up and down, left and right,
diagonal, circular movement, etc. The actuator 110 shown here is a piston actuator that can be
contracted or expanded to adjust the direction in which the speaker 105 is pointing. By
controlling the two actuators 110A and 110B, the system 100 moves the speaker 105 to point to
a particular point or area. For example, the area at the front of the speaker 105 can be divided
into a 2D or 3D grid, and adjusting the actuator 110 moves the speaker 105 to point the system
100 to a point or area in the 2D or 3D grid It can be done. Furthermore, in one embodiment, the
orientation of the speaker 105 changes to point to a point or area, but the orientation of the
camera 115 remains fixed.
[0012]
03-05-2019
6
The piston actuator 110 shown here is just one example of a suitable actuator. The actuator 110
can use balls and sockets, screws, gear systems, chains, etc. to adjust the orientation of the
speaker 105. Furthermore, the actuator 110 can use any type of drive system, such as a
mechanical, electrical, hydraulic or pneumatic system to generate movement. Although FIG. 1
shows two actuators 110, the speaker 105 in other embodiments can be moved by only one
actuator 110. In one aspect, actuator 110 may not be directly coupled to speaker 105. For
example, a cable can be used to transfer the force generated by remote actuator 110 to speaker
105. By doing so, it is possible to reduce the size of the speaker 105 so that the speaker 105 can
be confined in a limited area such as the dashboard of the vehicle or a column separating the
windshield and the door.
[0013]
The camera 115 can include one or more sensors for capturing an image based on the received
electromagnetic signals (e.g., infrared or visible light signals). For example, the camera 115 may
be a visible light sensor for detecting an electromagnetic signal of about 390-700 nanometers (ie
visible light), a ranging system using an infrared projector, and a sensor for capturing an image
in three-dimensional space, or A combination of the two can be provided. The information
captured by the camera 115 may be either 2D or 3D information. In one aspect, the depth (ie, the
distance between the user 120 and the camera 115) may be known. For example, the audio
system may be designed for a room in which the user 120 is sitting on the couch at a
predetermined distance from the camera 115 and the speaker 105. Using only 2D information,
audio system 110 adjusts speaker 105 based on the location of user 120 on the couch. Also, the
depth may not be known. Thus, the camera 110 images 3D information to determine the distance
between the user 120 and the camera 115.
[0014]
Using the information captured by the camera 115, the audio system 100 tracks the movement
of the user 120 in 1D, 2D, or 3D space. Based on the location of the user 120 (e.g., the location of
the user's face or ear), the audio system 100 instructs the actuator 110 to turn the speaker 105
to optimize the performance of the audio system 100. . For example, when the speaker 105 faces
the user 120, optimum performance is obtained. As the user 120 moves, the actuator 110
redirects the speaker 105 and continues to point to the location of the user's ear in 3D space.
03-05-2019
7
[0015]
FIG. 2 is a block diagram of a system 200 for adjusting the speaker 105 based on face
recognition. System 200 comprises a camera 115, a computing device 210 and a speaker system
235. The camera 115 comprises a depth sensor 205 for collecting depth information to
determine the distance between the camera 115 and the user. However, as noted above, in other
examples, camera 115 may not collect depth information.
[0016]
The camera 115 is coupled to a computing device 210 comprising a processor 215 and a
memory 220. Computing device 210 may be a general purpose computing device, such as a
laptop computer, tablet, server, desktop computer, etc., or a dedicated computing device to
implement the aspects and implementations described herein. Processor 215 may be any
processing element suitable for performing the functions described herein. Processor 215 can
represent a single processing element or multiple processing elements, and can each comprise
one or more processing cores. The memory 220 may be volatile or non-volatile memory, which
may comprise hard disk, RAM, flash memory, etc. As shown herein, memory 220 comprises face
recognition application 225 and actuator controller 230. The face recognition application 225
receives 2D or 3D data captured by the camera 115 and identifies users in the area in front of
the camera 115. Face recognition application 225 may generate one or more coordinates that
identify the user's location (eg, the location of the user's face) in 2D or 3D space. Using these
coordinates, actuator controller 230 determines control signals corresponding to actuator 110 to
move speaker 105 to optimize system 200 performance. For example, when the performance is
improved when the speaker 105 is directed to the user's ear, the actuator control unit 240
determines the control signal so that the speaker 105 is directed to the user's ear.
[0017]
In one aspect, actuator controller 230 may comprise a transformation function or algorithm for
converting the coordinates provided by face recognition application 225 into control signals of
actuator 110. For example, application 225 may restore one or more of x, y and z coordinates
that identify the user's location in front of camera 115. The transformation function can use x, y
and z coordinates as input and outputs a control signal corresponding to the actuator 110 so that
the loudspeaker 105 points towards the user. The transformation function may be generated
during a setup phase in which one or more points in free space are mapped to specific settings of
03-05-2019
8
the actuator 110. These mappings can then be generalized to form a transform function that can
map a set of free space coordinates to settings corresponding to the actuator 110. However, this
is just one non-limiting way of generating a transformation function that converts the 2D or 3D
coordinates into an actuator signal that causes the speaker 105 to point in the direction of the
received coordinates.
[0018]
In one aspect, the actuator controller 230 can use the coordinates provided by the face
recognition application 225 to identify different coordinates. For example, the face recognition
application 225 may restore the coordinates of the user's nose in 3D space. However, to direct
the speaker 105 to the user's ear, the actuator controller 230 may use predefined tuning
parameters to estimate where the user's ear is likely to be. The adjustment parameters may
change based on the user's distance from the camera 115. For example, the adjustment
parameter increases when the user is close to the camera 115. By using the adjustment
parameters and changing the coordinates, the actuator controller 230 can generate coordinates
corresponding to the user's ear that can be used, for example, as input to the transformation
function to determine actuator control signals. .
[0019]
In another example, the actuator controller 230 changes the coordinates provided by the face
recognition application 225 depending on how many users have been detected. For example,
once the application 225 outputs coordinates for three different users, the actuator controller
230 can average the coordinates specifying the location between the users. In this manner,
regardless of the number or location of users in system 200, actuator controller 230 can be
designed to change the coordinates provided by face recognition application 225 to adjust
speaker 105. .
[0020]
The speaker system 235 includes the actuator 110 described in FIG. 1 and the speaker 105. The
speaker system 235 can comprise a single body that encapsulates both of these components or a
support structure of the components. In one example, actuator 110 may be separate from
speaker 105, and speaker system 235 may include mechanical elements such as cables, chains,
03-05-2019
9
or pneumatic hoses for transmitting power from actuator 110 to speaker 105. .
[0021]
FIG. 3 is a method 300 of adjusting a speaker using face recognition. To enhance understanding,
the blocks of method 300 are described in conjunction with the system shown in FIGS. 4A-4C. At
block 305, the face recognition application 225 uses the data captured by the camera 115 to
identify a face. As shown in system 400 of FIG. 4A, face recognition application 225 identifies
bounding box 405 around user 401. The examples provided herein are not limited to any
particular algorithm for identifying user 401 based on data captured by camera 115. In this
example, the application 225 identifies a bounding box 405 centered on the head of the user
401, while in other examples the face recognition application 225 identifies a geometric center
of the user's face. Coordinates or multiple coordinates corresponding to different features of the
user 401, such as eyes, ears, mouth, etc. can be restored.
[0022]
The face recognition application 225 sends the coordinates of the bounding box 405 to the
actuator controller 230. At block 310 of method 300, actuator controller 230 uses the
coordinates of bounding box 405 to identify a point or area. For example, when only one user
401 is identified by the face recognition application 225, the actuator controller 230 can direct
the speaker 105 to the user's ear. In one example, the face recognition application 225 can
identify the coordinates of the user's ear and provide it to the actuator controller 230. However,
in the example shown in FIG. 4A, the actuator controller 230 uses the coordinates of the
bounding box 405 to calculate the location of the user's ear.
[0023]
As shown in system 420 of FIG. 4B, actuator controller 230 determines ear location 425 using
adjustment parameters that may differ depending on the distance between user 401 and camera
115 or speaker 105. . For example, the adjustment parameter may be a predetermined value
subtracted from the middle coordinate on the left side of the bounding box 405 giving the
coordinates of the ear location 425. Of course, the type and value of the adjustment parameter
may change depending on the coordinates provided by the face recognition application 225.
That is, the application 225 requires additional adjustment parameters to output the user's nose
03-05-2019
10
coordinates as compared to when the bounding box 405 is output. Further, as noted above, when
the face recognition application 225 identifies multiple users, the actuator controller 230 can
change the coordinates differently. For example, instead of estimating the location 425 of the
user's ear, the actuator controller 230 may use the coordinates provided by the face recognition
application 225 to identify areas or points between multiple users .
[0024]
At block 315, the actuator controller 230 uses the coordinates from the application 225 to
convert specific points or regions into actuator control signals. The controller 230 can use a
transformation function or algorithm that maps a point (ie, the location 425 of the user's ear) to
a control signal that turns the speaker 105 to point at that point. The actuator 110 receives these
signals and changes the direction in which the speaker 105 is pointing. An area 410 indicates an
area in front of the speaker 105 (for example, including 90% of the sound output from the
speaker 105) in which the sound output of the speaker is the maximum volume. As shown, user
401 is outside area 410. In this manner, user 401 may experience inferior audio performance
than was in region 410.
[0025]
At block 320, in response to the control signal, the actuator 110 is oriented as the speaker 105 is
pointing so that the audio output defined by the region 410 includes the location 425 of the
user's ear, as shown in FIG. 4C. adjust. In one example, the orientation of the speaker 105 is
adjusted such that the location 425 of the user's ear is at least within the area 410. That is,
instead of moving the loudspeaker 105 until the direction in which the loudspeaker 105 is
pointing intersects the location 425, it is only necessary that the location 425 be within the area
410. By not requiring precise alignment, system 450 can improve the experience of user 401
and use less expensive camera 115 when executing application 225, saving processing time. It
may output coordinates with low accuracy. Nevertheless, these coordinates are accurate enough
to derive a control signal that ensures that the location 425 is within the area 410, even if the
speaker 105 is not directly facing the user's ear Will. In addition, since the camera 115 is
physically separated from the speaker 105, when the audio output area 410 is changed (that is,
the direction of the speaker 105 is changed), the direction of the camera 115 is fixed to the
current direction. You can leave it. In other words, the camera 115 continues to turn in the same
direction, but the audio output area 410 of the speaker 105 changes.
03-05-2019
11
[0026]
5A and 5B describe adjusting speaker placement based on the passengers of the vehicle 500. FIG.
Specifically, FIGS. 5A and 5B show the first half of the vehicle 500, but the rear seats and rear
speakers (if any) are omitted. As shown in the top view of FIGS. 5A and 5B, the vehicle 500
comprises a camera 115 and two speakers 505 mounted on a dashboard or to a pillar of the
vehicle 500. For clarity of the description, the actuators used to move the speakers 505 and the
data captured by the camera 115 are processed and the computing device used to determine the
control signals for the actuators is omitted. ing. Nevertheless, in one embodiment, the computing
device is incorporated into an on-board computer used to operate the vehicle or an infotainment
system incorporated into the vehicle 500.
[0027]
Based on the data captured by the camera 115, the face recognition application on the
computing device determines the number of passengers in the vehicle 500 and the locations of
the passengers in 2D or 3D space. In FIG. 5A, the computing device determines that there is only
one passenger in the vehicle 500 located at location 510 (ie, the driver). In response, the actuator
controller in the computing device identifies a point or area to which the speaker 505 is to be
directed.
[0028]
In one aspect, the point or area may be the same for both speaker 505A and speaker 505B. For
example, both speakers point to the same 3D point. Alternatively, the computing device can
calculate another point or area for the speaker 505. For example, the speaker 505A can face the
driver's left ear while the speaker 505B can face the driver's right ear. The audio system can then
output different sounds to the speakers 505, or surround sound can be used to provide a more
immersive experience for the driver. However, if the sound output for both speakers 505 is the
same, it is more optimal to point both speakers 505 to a common point in front of the user. If the
speaker 505A is intended for the driver's left ear and the speaker 505B is directed to the driver's
right ear, the different distances between the driver and the two speakers cause different levels
of sound to the driver's discomfort There is. Of course, in one embodiment, the computing device
compensates for the difference in distance by increasing the audio output of the speaker 505B
(or decreasing the output of the speaker 505A) when each speaker is directed to a different ear
Can. Nevertheless, by tracking the user's location 510, the audio system can adjust the speaker
03-05-2019
12
505 to optimize the audio performance.
[0029]
In FIG. 5B, the computing device determines that the vehicle 500 has two passengers. That is,
there is one person at location 510 and one person at location 515. In order to identify the
passenger, the camera 115 is arranged so that both the driver's seat and the passenger seat on
the right front are within its field of view. The face recognition application scans the image data
generated by the camera 115 to locate the passengers on the vehicle 500. In this example, as
shown in FIG. 5A, the adjustment of the loudspeaker by the audio system is different because
there are two passengers instead of one. For example, the computing device may adjust the
speaker 505A to face the driver as opposed to the speaker 505B being directed to the passenger.
In one embodiment, the speaker 505A can be directed to a central location on the face, such as
the driver's nose, so that the audio output of the speaker 505A is heard equally by the driver's
ear. Similarly, the speaker 505B can be adjusted to point to a central location on the passenger's
face, such as location 515, so that its output is heard equally by both ears. However, another
optimized solution is to direct the speakers 505A and 505B directly to either the driver or the
passenger's ear. The particular solution used may vary depending on the user preferences, the
type of speakers used, the particular sound of the vehicle 500 etc.
[0030]
5A and 5B show that the computing device changes the way in which the speakers are adjusted
based on the number of passengers in the vehicle. More generally speaking, certain techniques
for improving the performance of the audio system can be modified based on the number of
users in view of the camera 115. For example, the audio system may be a home entertainment
system that uses different arrangements of speakers depending on the number of users present
in the room rather than the vehicle.
[0031]
In addition to considering the location of the user and / or the number of identified users, the
computing device can optimize the performance of the system according to the preferences of
the specified user. For example, in addition to recognizing the location of the user's face, the face
recognition application can identify the user's name based on the user's face features. Once the
03-05-2019
13
user is identified, the computing device can retrieve preferences associated with the user. For
example, user A may prefer lower and less loud sounds as compared to user B. Also, while user A
prefers stereo sound, user B prefers surround sound. The computing device can take these
settings into account when optimizing performance. For example, if the user A is a driver, the
computing device may change the ratio of bass to the pitch of the speaker 505. In one aspect, the
system can make other electrical changes to the audio output by the speaker 505, such as
modifying the manner in which the audio signal is processed (e.g., equalization, changing delays),
etc. .
[0032]
Although the vehicle 500 is shown as a car, the embodiments described herein can be applied to
other types of vehicles, such as boats, motorcycles, airplanes and the like. Vehicle 500 may also
include any number of speakers and cameras to identify and optimize the performance of the
audio system.
[0033]
6A and 6B illustrate adjusting the placement of the speakers based on the number of passengers
in the vehicle 600. FIG. As shown in the top views of FIGS. 6A and 6B, the vehicle 600 includes
cameras 115A and 115B and four speakers 605. The speakers 605A and 605B are mounted in
front of the vehicle 600, and the speakers 605C and 605D are mounted on the rear of the vehicle
600. For clarity, the actuator that moves the speaker 605 and the computing device used to
process the data captured by the camera 115 and to determine control signals for the actuator
are omitted. In one embodiment, the computing device is mounted on an onboard computer for
operating the vehicle or an infotainment system incorporated into the vehicle 600.
[0034]
Based on the data captured by the cameras 115A and 115B, the face recognition application on
the computing device determines the number of passengers present in the vehicle 600 and the
locations of the passengers present in 2D or 3D space. For that purpose, the camera 115A is
disposed in front of the vehicle 600, and the camera 115B is attached to the rear of the vehicle
600. Two cameras are preferred as the rear view of the vehicle 600 is relatively blocked by the
two front seats with respect to the camera 115A. However, in other embodiments, the vehicle
03-05-2019
14
600 may use only the camera 115 to identify passengers in front of and behind the vehicle 600.
[0035]
In FIG. 6A, the computing device determines that only one passenger is present in the vehicle
600 located at location 610 (ie, the driver's seat). That is, based on the image data provided by
the rear camera 115B, the face recognition application determines that the passenger is not
present in the rear seat of the vehicle 600, while the image data provided by the front camera
115A is located where the driver is Indicated at 610. In response, the computing device identifies
the point or area to which the speaker 605 points to optimize the performance of the audio
system. In one embodiment, all four speakers 605 point to the same 3D point associated with
location 610. For example, the computing device may calculate a central location relative to the
user and generate respective actuator signals such that the four speakers 605 are directed to this
location. In addition, the two left speakers (speakers 605A and 605C) may face the driver's left
ear, while the two right speakers (speakers 605B and 605D) may face the driver's right ear.
Alternatively, the computing device may arrange the speakers 605 to direct all the speakers 605
to different 3D points or areas in order to provide the driver with a surround sound experience.
[0036]
As discussed above, the computing device may use a face recognition application to optimize the
audio system by identifying the user in a unique manner. For example, the computing device may
use the face recognition application to consider whether the user prefers more bass or treble,
and change the parameters to suit the user's preferences. In one embodiment, the computing
device may comprise an I / O interface for inputting preferences determined by the computing
device. Alternatively, the computing device may be coupled to the information system of the
vehicle 600 that shares the user's preferences with the computing device to change the audio or
video parameters. Alternatively or additionally, the computing device may use the historical
information to learn user preferences. For example, when only user A is a passenger of a vehicle,
initially the computing device may direct all four speakers 605 to a central location. Using the I /
O interface, user A may inform the computing device that she likes surround sound when she is
the only passenger in the vehicle. In this way, the computing device can learn and adjust audio /
video parameters or other parameters (e.g., seat and handle adjustments) for a particular user or
group.
[0037]
03-05-2019
15
In FIG. 6B, the computing device determines that there are passengers of the vehicle 600 sitting
at locations 610, 615, 620 and 625. For the rear seat passengers at locations 620 and 625, the
computing device coordinates the rear right speaker 605D facing towards location 625 and the
left rear speaker 605C facing towards location 620. Both speakers 605C and 605D may face one
of the passengers' ears at these locations. In contrast, for the front seat passengers at locations
610 and 615, the computing device adjusts the front right speaker 605B and the left front
speaker 605A to point towards the location 630 between the locations 610 and 615. To do so,
the computing device may average the coordinates of locations 610 and 615 to identify location
630. Therefore, FIG. 6B illustrates that the optimal arrangement of the speakers 605 in the first
half of the vehicle 600 and the optimum arrangement of the speakers 605 in the second half of
the vehicle 600 are different. Stated differently, the computing device may use different speaker
arrangements depending on different locations of the user in the vehicle 600 to provide optimal
performance to the user in the vehicle 600. For example, different locations have different
acoustical properties, ie, if the same user moves to different locations, the computing device still
uses different speaker arrangements to provide more improved performance. For example, if
user A is at the driver's seat, the computing device may adjust speaker 605A to point directly to
the user's ear. However, if the user A is at the rear of the vehicle, the computing device may
instruct the speaker 605C to point to the central portion of the user's back of the head.
[0038]
According to the different embodiments, embodiments and aspects described herein, it is
contemplated to adjust the speaker placement to optimize the audio experience of one or more
users. The use of "optimal" is not intended to imply that the loudspeaker arrangement should be
the best arrangement, but rather the optimal or optimized arrangement is to keep the
loudspeaker fixed. , Improve the speaker placement related to the user's experience. In other
words, the embodiments described herein use an actuator to improve the viewing experience to
change the direction in which the speaker is pointing, and to adapt to the current position of one
or more users.
[0039]
FIG. 7 is a system 700 that identifies a path 715 to direct the moveable speaker 105 based on
face recognition. In some cases, it is desirable to generate a sound experience that simulates the
directional movement of an object along a path. To do so, system 700 includes a user 701, a
03-05-2019
16
camera 115, a face recognition application 225, an actuator controller 230 and a speaker 105.
The camera 115 captures image data including the user 701, and the data is transmitted to the
face recognition application 225. Application 225 uses a face recognition algorithm that
identifies bounding box 705 that defines the location of the user's face in 3D or 2D space. Of
course, other face recognition algorithms may use different means than bounding box 705 to
identify the user's face.
[0040]
The face recognition application 225 sends the coordinates of the bounding box 705 to the
actuator controller 230. In this example, instead of using the coordinates to identify the point or
recognition of points pointed by the speaker 105, the actuator controller 230 determines the
path 715. By instructing the audio output of the speaker 105 to track the path 715, the speaker
105 is running by a moving sound source (eg, a bird flying over the user 701 or by the aircraft
or by the user 701) Can be used to simulate the sound generated from a person). In one aspect,
actuator controller 230 may determine path 715 in response to receipt of a command from the
audio system to simulate the sound generated from the moving sound source. For example,
actuator controller 230 may wait until an audio controller (eg, a movie or video game controller)
sends an instruction to actuator controller 230 to determine a particular sound path 715. In one
embodiment, the audio controller and actuator controller 230 are synchronized, and as the
actuator controller 230 moves the speaker 105 to track the path 715, the audio controller
outputs a sound corresponding to the moving sound source . For example, as the output area
710 of the speaker 105 moves along the path 715, the speaker 105 outputs a bird's singing
sound.
[0041]
To determine path 715, the audio controller may signal the type of movement of the sound or
audio output of actuator controller 230. Although path 715 of FIG. 7 is linear, in other
embodiments, path 715 may be one or more curves, loops, etc. For example, path 715 may
simulate a bird circling around the head of the user 701 or a mosquito that is buzzing around the
user's ear. As such, according to this information, actuator controller 230 uses the coordinates of
bounding box 705 to identify path 715. In the illustrated embodiment, the actuator controller
230 identifies a first predetermined offset that identifies a first point of 3D space relative to the
upper left corner of the bounding box 705 and a second point of 3D space relative to the upper
right corner of the bounding box 705. A second predetermined offset may be used. The actuator
controller 230 then generates a path 715 by drawing a line between the first and second points.
03-05-2019
17
[0042]
Actuator controller 230 may use speaker 105 to separately calculate path 715 according to the
sound to be simulated. For example, for a mosquito that is buzzing around user 701's ear,
actuator controller 230 uses the coordinates of bounding box 705 to estimate the position of the
ear and use the random number generator to Determine a random path near your ear.
Alternatively, actuator controller 230 may use a predetermined vertical offset that identifies a
point on the head of user 701. The actuator controller 230 then calculates a circle centered at
the point on the user 701 to use as the path 715. In this manner, actuator controller 230 may be
configured to use different approaches to calculate path 715 to simulate different moving sound
sources.
[0043]
System 700 may be used with audio / video presentations such as movies, television shows,
video games and the like. For example, system 700 may be installed in a theater to identify the
location of one or more users and provide a customer-fit audio experience to the users of each
user or group. In one embodiment, system 700 includes a plurality of speakers 105 (eg, speakers
for each user in a theater) that use respective actuators that move respective output areas 710 of
speakers 105 along different individual paths 715. Have. Alternatively, multiple speakers 105
may be used to mimic different sound sources near the user 701. While another speaker 105
passes by the user 701 with a humming sound, for example, tracking the path that mimics the
sound of bullets and arrows, the other speaker 105 allows the song of a bird to pass over the
user 701. It is also possible to track the imitation route. System 700 may also be used for audio
presentations, in the absence of a corresponding video presentation. For example, system 700
may be used for an animatronic / puppet show, or to provide a more immersive environment for
user 701 during stage performance of a live actor.
[0044]
Although the description of the various embodiments has been presented for the purpose of
illustration, it is not intended to be exhaustive or to limit the disclosed embodiments. It will be
apparent to those skilled in the art that numerous modifications and variations can be made
without departing from the scope and spirit of the described embodiments. The terminology used
03-05-2019
18
herein is for the purpose of best describing the principles of the embodiments, practical
applications, and improvements in the art found in the marketplace, or for those skilled in the art
who are skilled in the art, to practice disclosed herein. It was chosen so that the form could be
understood.
[0045]
In the above description, reference is made to the embodiments presented in the present
disclosure. However, the scope of the present disclosure is not limited to the specifically
described embodiments. Instead, the combination of the foregoing description and elements is
considered to facilitate and practice the contemplated embodiment, regardless of whether it is a
different embodiment. Furthermore, although the embodiments disclosed herein may achieve
more benefits compared to other possible solutions and the prior art, whether or not the
particular benefits have been achieved in a given embodiment, It does not limit the scope of the
present disclosure. Thus, the described aspects, features, embodiments and benefits are merely
exemplary and, unless explicitly stated in the claims, the requirements or limitations of the
appended claims are: Not considered.
[0046]
Aspects of the present disclosure are generally referred to herein as circuits , modules or
systems , all hardware embodiments, all software embodiments (firmware, resident software)
, Microcode, etc.) or a combination of software and hardware.
[0047]
The present disclosure may be a system, method and / or computer program product.
The computer program product may include a computer readable storage medium (or media)
having computer readable program instructions for the processor to perform the aspects of the
present disclosure.
[0048]
03-05-2019
19
The computer readable storage medium may be a tangible device that stores and stores for use
by the instruction execution device. Computer readable storage media include, but are not limited
to, electronic storage, magnetic storage, optical storage, electromagnetic storage, semiconductor
storage, or any suitable combination thereof. A more specific illustrative list of computer
readable storage media is not exhaustive, but portable computer diskettes, hard disks, random
access memory (RAM), read only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), static random access memory (SRAM), portable compact read only
memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disc, punch card
Mechanical coding devices such as the above, the raised structure of the instruction storage
groove and a preferred combination of the aforementioned devices. A computer readable storage
medium as used herein may be any freely propagating electromagnetic wave or wire propagating
through a radio wave or waveguide or other transmission medium (eg light pulses passing
through a fiber optic cable). Something like an electrical signal sent through is not interpreted as
a temporary signal itself.
[0049]
The computer readable program instructions described herein may be from a computer readable
storage medium, on a computing / processing device, or on a network such as the Internet, a
local area network, a wide area network and / or a wireless. It may be downloaded to an external
computer or an external storage device via a network. The network may consist of copper cables,
optical transmission conductors, wireless transmissions, routers, firewalls, switches, gateway
computers and / or edge servers. A network adapter card or interface of the computing /
processing device receives computer readable program instructions from the network and stores
the computer readable program instructions in a computer readable storage medium in the
computing / processing device, respectively. Send out.
[0050]
Computer readable program instructions for performing the operations of this disclosure may be
assembled instructions, instruction set architecture (ISA) instructions, machine instructions,
machine dependent instructions, microcode, firmware instructions, status setting data, or one or
more Either source code or object code written in a combination of the following programming
languages, and conventional procedures such as object oriented programming languages such as
Smalltalk, C ++, and "C" programming languages or similar programming languages. Contains
programming language. The computer readable program instructions execute entirely on the
user's computer, partially execute on the user's computer as a stand-alone software package,
03-05-2019
20
partially execute on the user's computer, and partially on the remote computer It may run or run
entirely on a remote computer or server. In the latter scenario, the remote computer is connected
to or connected to the user's computer via any type of network, including a local area network
(LAN) or a wide area network (WAN). Can be made to an external computer (eg, via the Internet
using an Internet service provider). In some embodiments, an electronic circuit may include, for
example, a programmable logic circuit, a field programmable gate array (FPGA), or a
programmable logic array (PLA) to implement aspects of the disclosure. In some cases, computer
readable program instructions may be executed utilizing state information of the computer
readable program instructions which makes B.
[0051]
Aspects of the present disclosure are described herein with reference to flowchart illustrations
and / or block diagrams of methods, apparatus (systems), computer program products according
to embodiments of the present disclosure. The blocks of the flowchart illustrations and / or block
diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can
be implemented by computer readable program instructions.
[0052]
Computer readable program instructions may be provided to a processor of a general purpose
computer, special purpose computer or programmable data processing device, or other
programmable data processing device manufacturing machine, which instructions are for
computer or other programmable data processing. It creates means for performing the functions
/ actions performed via the processor of the device but specified in one or more blocks of the
flowcharts and / or block diagrams. Computer readable storage medium having instructions
stored thereon, as computer readable program instructions may also be stored on a computer
readable storage medium capable of instructing a computer, programmable data processing
apparatus and / or other apparatus functioning in a specific manner. Are comprised of products
that include instructions for performing the functions / operations identified in one or more
blocks of the flowcharts and / or block diagrams.
[0053]
The computer readable program instructions may also be carried on a computer, other
programmable data processing device or other device that causes a series of operating steps to
be performed on the computer, such as a computer, other programmable device or other The
03-05-2019
21
instructions performed by the device perform the functions / actions specified in one or more
blocks of the flowcharts and / or block diagrams.
[0054]
The flowcharts and block diagrams of the drawings illustrate the architecture, functionality, and
possible operations of the system of methods, systems and computer program products
according to various embodiments of the present disclosure.
In this regard, each block of the flowchart or block diagram shows a module, segment, or portion
of an instruction, which block is comprised of one or more executable instructions that perform a
particular logical function. In alternative embodiments, it may occur that the functions noted in
the blocks are out of the order noted in the figures. For example, two blocks that are contiguous
may in fact be executed substantially simultaneously, and sometimes may be executed in the
reverse order, depending on the functionality involved. The combination of block diagrams and /
or flowchart illustrated blocks with block diagram and / or flowchart illustrated blocks may be
performed by a particular purpose hardware-based system performing particular functions or
operations, or particular purpose hardware. Execute a combination of hardware and computer
instructions.
[0055]
Although the matters described above relate to the embodiment of the present disclosure, other
further embodiments of the present disclosure can be considered without departing from the
basic scope, so that the scope is defined by the following claims. It is determined.
03-05-2019
22
1/--страниц
Пожаловаться на содержимое документа