close

Вход

Забыли?

вход по аккаунту

код для вставкиСкачать
1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Oracle Exadata Management
Deep Dive with Oracle
Enterprise Manager 12c
Deba Chatterjee
Principal Product Manager, Oracle
Joe Cornell
Database Administrator, Land ‘O Lakes
Safe Harbor
The following is intended to outline our general product direction. It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver
any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and
timing of any features or functionality described for Oracle ’ s
products remains at the sole discretion of Oracle.
3
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle Confidential. Internal Only.
3
Program Agenda
Exadata Component Monitoring
Common Performance Issues
Customer Experience from the Real-World
– Presentation by Land O’ Lakes
4
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Monitoring
Database
Storage Server
Infiniband Network
KVM, PDU, ILOM, CISCO
SWITCH
5
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Monitoring Architecture
How is the Exadata Database Machine monitored ?

OEM Agents with Exadata Plug-in are
deployed on each Compute Node

Storage Server internally monitored by
ILOMs and MS (Management Server).

Agent uses SSH and SNMP to monitor the
Storage Servers.

Agent uses SSH to collect monitoring
information from the IB switches.

6
The Agent subscribes to SNMP traps to
monitors the other DBM components such
as ILOM, PDU, KVM etc.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
ORACLE DATABASE MACHINE
COMPUTE NODE #1
DATABASE SERVER
Oracle Enterprise Manager 12c
Agent
OMS
Exadata Plug-in
ssh & SNMP
Exadata Storage
Server
Exadata Infiniband
Switches and
Network
Other DBM
Devices
PDU
KVM
ILOM
CISCO
S/W
Exadata Monitoring
Integrated view of Hardware and Software



Hardware view
–
Schematic of cells, compute nodes and switches
–
Hardware components alerts
–
Integrated resource utilization views
Exadata Plug-in (12.1.0.4) release has
support for
–
SPARC Supercluster
–
Multi-Rack
–
Storage Expansion Rack
Configuration view
–
7
Version summary and configuration information
of all components
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Monitoring
Storage Cell Management
 Storage Cell monitoring and
administration support
- Cell Home page and performance
pages
- Execute Cellcli commands on a set
of cells or all cells
- Performance and workload
distribution charts help analyze
performance contentions
 Management by Cell Group
- All cells used by a database
automatically placed in a group e.g.
cellsys target
8
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Monitoring
Infiniband Network Management
 Infiniband network and switches are
discovered as part of the Database
Machine target
 Network home page and
performance page
-
Real time and historical usage
information
 Topology view of Network with
switch and port level details
9
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Monitoring
Monitoring other hardware components
Common metrics monitored
 Power supply failure
 Fan failure
 Temperature out of range
Specific metrics monitored
 Cisco Switch
_
Configuration change tracking and
reporting
_
Unauthorized SNMP access
 Keyboard, Video, Mouse (KVM) for X2
–
10
Server connected to KVM added/removed,
powered on/off
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Monitoring
Exadata Service Dashboard
 Service Dashboard for a single
pane of glass view of all
Exadata components
 Out of the box job for creating
dashboard to monitor
performance and usage
metrics
11
–
Database Machine System
–
Database Machine components
–
Database Systems on Exadata
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Configuration Monitoring
Health Check Plug-in
 Exadata health check plug-in
consumes the exachk output
Execute Exachk
(2.1.3 and above)
 Evaluates the output against pre-
Execution Output
XML
XML
XML
defined health check templates
 Generates relevant alerts
EM 12c Agent
Exadata Health
Checks Plug-in
Console
Metric Evaluation
OMS
12
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Exadata Configuration Monitoring
Health Check Plug-in Page
 Review the
failures in the
Health Check
Plug-in Page
 Sort by Status
to easily detect
the failures
13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Program Agenda
Exadata Component Monitoring
Common Performance Issues
Customer Experience from the Real-World
– Presentation by Land O’ Lakes
14
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Classifying the Performance Problems
Types of Performance problems
Performance
Problems
Hardware
Network
15
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Software
Disk
SQL
Performance
Issues
Database
System
Issues
Hardware Problems
Network Issues
Performance
Problems
Hardware
Network
16
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Software
Disk
SQL
Performance
Issues
Database
System
Issues
Hardware Problems
Network Issue
 Bad port or loose cable can impact the performance of the database
 Ports with Errors are marked as Red
 Details of the problem can be found in Open Metric Events
17
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Hardware Problems
Resolving Network Issues
 Perform Infiniband Administration tasks to disable a bad port
 Other tasks that can be performed are
– Enable Port
– Clear Performance counters
– Clear Error Counters
18
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Hardware Problems
Disk Issues
Performance
Problems
Hardware
Network
19
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Software
Disk
SQL
Performance
Issues
Database
System
Issues
Hardware Problems
Types of Disk Issues
Disk
Failures
20
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Over
Utilized
Hard Disks
Under
Utilized
Flash Disks
Hardware Problems
Disk Failure : Cell health
 Hard Disk or Flash Disk failures lead to bad database performance
 Cell Health is determined by any Open, Critical, Unsuppressed alerts from “Cell
Generated Alert” SNMP metric
21
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Hardware Problems
Disk Failure : Load Imbalance
 Bad disk causes I/O Imbalance
 Gives an indication of the percentage of maximum average I/O load from the cell disk.
 Metric thresholds needs to be set at the Storage Server level
22
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Hardware Problems
Exadata System Health
• Exadata System Health is computed using information collected from
Network, Disk and ASM.
Network
• Infiniband Switch:
Degraded Port / Port
with Errors
23
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Disk
• Disk failures
• Configuration Issues
• Load Imbalance
Software Setup
• ASM Disk group
Issues
Exadata System Health
Integration with Database Performance Page
 Drill down from
database Performance
page
• Provides composite
view of all health
indicators
• Week, Day or the
default 2 hours view
can be used to analyze
trends of various
issues.
24
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Over Utilization of Hard Disks
Cell Performance Page
 Performance page of the Exadata Storage Server Target provides real-time and historical
utilization information
 Exadata Cell Utilization Limit Lines introduced in Exadata Plug-in 12.1.0.5
 Helps to determine at what time of day, the IO bandwidth is exhausted
25
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Hard Disks Utilized
Correlate with Database Workload Distribution
 Identify the database which caused the increased I/O usage
 Identify what caused the increased I/O activity
 Make sure a single database is not running away with all the I/O bandwidth
26
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
I/O Resource Utilization
Quiz: Do you see any problems with the I/O utilization pattern?

One Database
is running away
with all I/O
bandwidth

How do you
prevent this
from happening
?
27
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
IO Resource Management
Goal : Ensure I/O bandwidth to all Databases
 Makes sure one database is not running
Exadata IORM
away with all I/O bandwidth
 Keep disks well utilized
 Keep I/O latencies low
Across
Databases
One Database
 Prioritize log writes, control file I/Os
 Control how much disk bandwidth each
DB, Category or Consumer Group uses
28
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Inter-database
Resource Plan
Category
resource plan
Intra-database
Resource Plan
Common IORM Setup
Inter-database IORM




29
Based on the name of the database
initiating the request
Database
Level 1
DBM
60%
Useful when you need to manage I/O
priorities across these databases
CRM
Allocate I/O resources across
databases by means of an IORM plan
configured on each storage cell.
IORM plans should be identically
configured on each storage cell.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
OTHER
Level2
Level3
80%
100%
CellCLI> ALTER IORMPLAN –
dbPlan = ( (name=DBM, level=1 allocation=60),
(name=CRM, level=2 allocation=80),
(name=other, level =3 allocation=100))
Disk Objectives
 IORM distinguishes between small (less than 128K in size) and large I/O requests
 Low-Latency OLTP type requests are usually small requests
 High-throughput DW type requests are usually large requests
 Comparing large (LG) and small (SM) I/O requests in the IORM metrics helps to determine
the type
of workload
Day Time
Plan
Objective
OLTP
LOW_LATENCY
Reports
Low-Priority
30
Allocation
Description
80
For applications that are extremely sensitive to I/O latency. DW applications impacted
10
10
HIGH_THROUGHPUT
Best possible throughput for DW transactions
BALANCED
Strikes a balance between low latency and high throughput
AUTO
IORM determines the best optimization objective
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Setup IORM using EM 12c
Navigation: Exadata Storage Server  Administration  Manage IO Resource
31
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Under Utilized Flash
Improve Flash Utilization
 Common DW problem scenario:
– HDD disks are busy but flash are idle due to large reads issued by smart scans
bypassing flash cache
 Solution:
– Use flash for KEEP objects so large reads can be offloaded to flash
– Execute the following steps:
1.
Run IO intensity report @?/rdbms/admin/spawrio
2.
Ensure the total size of KEEP objects do not overwhelm flash cache size
3.
32
–
Be aware that allowed KEEP size is restricted to 80% of flashcache size
–
Target small tables with lots of reads for KEEP
Mark each candidate table as KEEP
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Under Utilized Flash
I/O Intensity Report - Spawrio.sql
33
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Under Utilized Flash
Evaluate Total Keep Size
 Analyze the flash cache utilization rate prior to KEEP
 Ensure that newly marked KEEP objects do not trump other critical workloads effectively
utilizing flash cache
From Previous Example
Id
---------------------------------------EDW_ATS.TECS_PHC(P2011)
EDW_ATS.TECS_PHC(P2011)
EDW_ATS.ENTITY_ADDR
Space
IO
Type
GB
Intensity
---------- ------ -----------TABLE PART 67.8
1,284.9
TABLE PART 67.8
1,284.9
TABLE
83.6
408.1
Total KEEP size = 67.8 + 67.8 + 83.6 = 219.2 GB
Default Flash Cache size per cell in X3 = 1609.14 GB and X2 = 364.75 GB
34
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Under Utilized Flash
Mark Objects as KEEP

35
Run the following SQLs -
ALTER TABLE TECS_PHC MODIFY PARTITION P2011
STORAGE (CELL_FLASH_CACHE KEEP);
-
ALTER TABLE ENTITY_ADDR STORAGE
(CELL_FLASH_CACHE KEEP);
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Database Problems
SQL Performance Issues
Performance
Problems
Hardware
Network
36
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Software
Disk
SQL
Performance
Issues
Database
System
Issues
Exadata Aware SQL Monitoring
 Real time monitoring of
application SQL
 I/O performance graphs
with Exadata information
- Cell offload efficiency
- Cell smart scan
 Rich metric data
- CPU
- I/O requests
- I/O throughput
- PGA Usage
- Temp Usage
37
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Database Problems
Database System Issues
Performance
Problems
Hardware
Network
38
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Software
Disk
SQL
Performance
Issues
Database
System
Issues
Database System Issue
Parallel Downgrades
 What do you see in the Parallel column ?
 Use Parallel Queuing for Consistent Parallel execution
39
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Database System Issue
Parallel Statement Queuing
 Enable by setting parallel_degree_policy = ‘auto’
– Automatic setting of DOP
– Parallel Statement Queuing
 Availability
– Introduced: 11.2.0.1
– Integrated with Resource Manager: 11.2.0.2
 Objective
– Run enough parallel statements to keep the system very busy
– Queue any subsequent parallel statements – avoid DOP downgrades
 Configure by a Resource Plan
40
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Summary
Top Things to remember
 Cell Health Indicator
 Exadata System Health
 How to identify over utilization of hard disks
 How to identify objects for Flash Keep
 How to identify SQL and database system issues with SQL Monitoring
41
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Program Agenda
Exadata Component Monitoring
Common Performance Issues
Customer Experience from the Real-World
– Presentation by Land O’ Lakes
42
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Customer Case Study
Managing Exadata with EM12c
Joe Cornell
Database Administrator
Topics





Who is Land O’Lakes?
Land O’Lakes Datacenter overview
Monitoring Infrastructure
Challenges in Managing Exadata
How Land O’Lakes used Enterprise Manager to solve
Exadata management challenges
 Best Practice Recommendations
Copyright © 2013. All rights reserved.
Land O’Lakes, Inc. today
 ~10,000 employees
 3 diversified businesses
 3,200 direct producer-members and 1,000
 Annual revenue +$14 billion
member-cooperatives
 Serve +300,000 agricultural producers
 300+ facilities in the U.S.
Copyright © 2013. All rights reserved.
 Goal to double revenues and increase
international growth in the next 10 years
Data Center Overview





Formerly HP-UX – PA RISC and IA64
Migrating to all Linux (OEK)
253 databases
74 database servers (17 Linux, 57 HP)
Two Exadata X2-2 half racks
– Production
– NonProduction
 Support JDE ERP, OTM, SOA, OBI EE Analytics
Copyright © 2013. All rights reserved.
EM12c Monitoring Infrastructure
 OMS and Repository share a blade server
– File System(GB)
719.61
– Memory Size(MB)
48294
– Address Length(bits)
64-bit
– Model: ProLiant BL460c G6
– CPU: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
– CPU: Sockets = 2 / Cores = 8
Copyright © 2013. All rights reserved.
EM12c Monitoring Infrastructure - HA
 Onsite HA via a second Blade
 No load balancer
– manually assign / move
– availability goal allows an hour of down time
 physical standby
– local
– in Alpharetta for disaster recovery
 This set of three blades also supports our RMAN catalog
Copyright © 2013. All rights reserved.
EM12c Monitoring Infrastructure - Targets
 Targets
– 7868 Total
– by Operating System:
• Linux
(7387)
• HP-UX
(124)
• Windows
(74)
Copyright © 2013. All rights reserved.
Exadata Management Challenges
 PGA Bug for OBI EE
 Lack of offloading OLTP systems
 SQL performance
Copyright © 2013. All rights reserved.
PGA Bug
 In August 2012, we had an unexpected node eviction on our
Exadata production system
 Symptoms were related to extremely fast consumption of
memory by PGA
 Memory used was far in excess of the PGA_aggregate_target
setting
 This was associated with several specific but critical OBIEE
reports
Copyright © 2013. All rights reserved.
EM12c versus PGA Bug
 We created a Metric Extension that sampled PGA allocations
 Alerted the on call DBA if any session took over 10 GB
 DBA’s were able to prevent several crashes by killing
sessions before excessive memory was consumed
 This gave us time to find the issue and deploy a workaround
– We disabled query rewrite since it was an acceptable workaround
for bug 14574904 in our environment
– There is patch available.
Copyright © 2013. All rights reserved.
Exadata Performance
 SQL
 I/O System
 Repository Queries
Copyright © 2013. All rights reserved.
SQL Performance on Exadata
 SQL Monitor shows offloading percentage
 This is very useful in getting full use of all the features in
Exadata
Copyright © 2013. All rights reserved.
SQL Performance – Resources over time
 SQL Monitor can show when each resource was consumed
 This proved very useful in tracking down the PGA bug
Copyright © 2013. All rights reserved.
Exadata I/O Performance
 The “All Metrics” page has some great charts that let you see
where resources are being consumed
 We added a metric extension to collect usage of flash cache
– uses cellcli to gather information daily via EM12c job
 We wanted to make sure the OLTP databases were getting all
the flash cache they need
 We use sql developer to pull the data out for analysis
Copyright © 2013. All rights reserved.
Exadata I/O – Flash Cache usage
•
•
•
•
DBP1 is the OBI EE data warehouse
• DW does not need as much flash
cache
ENFM3PRD, FDJD1PRD and
ENEV2PRD are OLTP databases
• this shows they have all the cache
they need
Of the 2.5 TB available, 2.1 is typically
in use
This shows appropriate use of cache
• if we run low, DW flash cache will
be scaled back
Copyright © 2013. All rights reserved.
Collection Day
DB UNIQUE
NAME
CACHE KEEP
MBYTE
20130910
DBFS
40697
0
20130910
DBP1
383858
206112
20130910
ENDW3PSB
196968
0
20130910
ENEV2PRD
43017
37628
20130910
ENFM3PRD
166100
0
20130910
FDJD1PRD
917284
0
20130910
OBJD1PRD
362619
569
CACHE MB
Aggregate IOPS for Production Exadata
 We recently hit the expected maximum IOPS
Copyright © 2013. All rights reserved.
Storage Cell CPU Consumption
 The I/O was evenly distribute on all 7 cells and did nearly saturate CPU
resources
Copyright © 2013. All rights reserved.
Leveraging the EM12c Repository
Copyright © 2013. All rights reserved.
Repository Queries for Exadata
 You can find several EM12c repository queries we developed
to leverage EM12c data for managing Exadata in the following
location:
– https://www.dropbox.com/s/tug18zknn0drv3n/CON9852_Exadata_E
M12c.zip
Copyright © 2013. All rights reserved.
Best Practices
 OMS Heap
 Sharing EM12c Environment with other teams
Copyright © 2013. All rights reserved.
OMS Heap Issue
 Started getting “Out of
Memory” errors
 OMS performance got very
slow
 MOS Note 794165.1 explains
how to increase the memory
available to the OMS
 When we increased memory
from 1G to 4G, performance
vastly improved.
Copyright © 2013. All rights reserved.
Sharing an EM12c Environment with others
Copyright © 2013. All rights reserved.
Best Practices for Sharing
 Don’t use the SYSMAN account for objects owned by either team
– create shared superuser account to be used by each team
 Use groups
 Use names showing lifecycle, team ownership and target types
– examples:
• LOL_DBA_Production_Databases
• LOL_MW_Nonproduction_SOA
 Use metric templates with names that mimic groups
 Be nice 
Copyright © 2013. All rights reserved.
Copyright © 2013. All rights reserved.
Questions?
Photo
Caption
Arial 18pt
69
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
1/--страниц
Пожаловаться на содержимое документа