Tuesday, July 7, 2009

How to reduce Oracle infrastructure costs (and keep your job)

How to reduce Oracle infrastructure costs (and keep your job)

A step-by-step process, with expert advice, tips and tricks for reducing IT Infrastructure costs.
By Robert Geier (Robert.Geier@ContractOracle.com)

If you have ever had to negotiate with a builder you may have heard the saying "I can build it Fast, Good, Cheap, choose two.". The same rule applies to IT projects. When the economy is growing companies are competing for market share and IT Managers and DBAs will be told "build it Fast and Good". As a result, DBAs focus on build time, build quality, and uptime, allowing unnecessary expenses and waste to creep into the systsm. When the economy is shrinking companies focus on cost cuts and efficiency and DBAs will be told "now make it Cheap".

In the current economy the best way to keep your job is to be smarter than the DBA sitting next to you, and be proactive about cost reduction. Your first thought may be that your systems are well run, and there is no room for improvement, but if you step back and look at the efficiency, you may be disappointed. In the past I worked for a large multinational company with a good reputation that hired the best DBAs, and bought the best servers. They followed industry "best practice" for maximum uptime, but when I did an efficiency survey across their systems the waste became obvious. For example :-

* they bought the best servers for DEV, TEST, PROD, DR, but the average CPU usage on the servers was only 5%. (95% wasted hardware, Unix SA, Data center ...)
* they bought the best SAN storage, but they kept 10% free space in the filesystem, 20% free space in tablespaces, and 10% free space in blocks. (40% wasted SAN, SAN Admin ...)
* they didn't trust autoextend, so they monitored millions of datafiles across thousands of databases, and manually extended them. (Many man-years of expensive DBA wasted.)

Of course there is no way to get to 100% hardware efficiency without impacting system availability, but in a large company even a 1% improvement can save millions of dollars and justify your salary. There are lots of ways for a smart DBA to reduce IT costs, and sometimes even small changes in policy can have large benefits. Just a simple change like turning on autoextend could reduce disk usage by 20% and save thousands of hours of DBA time, and millions of dollars a year. A simple rule like no new server purchases until average CPU usage reaches 50% can save many millions of dollars. Following is a rough plan along with a list of easy wins that I have used on previous cost reduction projects that may also help you :-

1. Start with a full accurate inventory and centralised monitoring. List all databases, servers, disk, SLA, owner, and metrics including disk usage, maximum and average cpu usage, cost, depreciation etc. Talk to your manager and your accountant to understand your costs and get a better idea of how they can help you. Ask your accountant what the company policy is on hardware depreciation, and leasing vs purchase. Identify cost reduction opportunities, document them, and set up a project plan with reasonable timelines for delivery. Start with easy wins, and keep the project timeline short, and the goals achievable.
2. Talk to your users. Be honest about the costs and SLA requirements for each database and suggest ways they can help you to reduce costs. Explain to developers how their decisions impact costs (poor sizing estimates, unused development systems, poorly written code, old data kept in the database etc). Consider moving low priority data to cheap servers, storage, network, data centers, and reduce frequency of backup and monitoring. Consider if DR, backup, or archival are needed based on the SLA.
3. Compare the costs of different Oracle licensing models. Options include CPU/Core vs Named User vs Site. Compare different CPU models and processor core pricing factors. Remember that license costs are negotiable, so talk to your Oracle Sales rep.
4. Consolidate databases. Patch all databases to the same level, standardise the characterset, and use DNS alias and DB Service for easier service relocation (not IP, hostname, SID). Use NCHAR instead of CHAR fields to avoid characterset conversion issues. Audit for security before consolidating, and avoid grants to PUBLIC or granting DBA or ANY roles (select any table etc). Consolidation means fewer sets of SYSTEM, TEMP, UNDO, REDO, ORACLE_HOME, and more efficient use of memory. The result is reduced disk space, tape space, network traffic, license and support costs, upgrades, patching, and DBA work. There is also potential for better performance by direct data access, and lower system integration costs. Try to combine applications with similar SLA and get signoff from the application owners that they understand the SLA. Stop adding more applications to the database before it becomes impossible to arrange downtime. If a user requests a new database, ask why it can't be just a new schema in an existing database.
5. Consolidate servers. If you have databases which cannot be consolidated (due to vendor support, version requirements, security etc), move them onto the same server. This can result in reduced hardware costs, and reduced license and support fees for databases, monitoring, filesystem, and backup software.
6. Use N+1 Active/Passive clusters instead of 1+1 clusters. Less wasted hardware.
7. Use Active/Active RAC clusters instead of Active/Passive clusters. Less wasted hardware and higher uptime, but be aware that the shared datafiles are still a single point of failure.
8. Compress large tables and indexes. Up to 80% reduction in disk space and tape usage, with lower network traffic and potential for performance improvement due to fewer blocks to read.
9. Use large ASM disk groups instead of many small filesystems. More efficient use of disk space, striping and mirroring, fewer hot spots, possible to grow and shrink storage with no database downtime, and no expensive filesystem. ASM is not just for RAC.
10. Turn on autoextend for permanent tablespaces. Monitor filesystem space and file MAXSIZE. Use ASSM, and limit file INCREMENT_BY to reduce transaction wait times during file growth. Avoid autoextend on UNDO, TEMP tablespaces to limit runaway transactions. Turning on autoextend can result in a 20% reduction in disk space, lower network usage, less manual work by DBAs, with minimal performance impact. Every GB saved on PROD can be multiplied by disk mirrors, DEV, TEST, and DR environments.
11. Use compressed database and archivelog backups. This will greatly reduce tape and/or disk usage, and potentially result in faster backups due to lower network traffic to tape unit and fewer blocks to write. Consider backup from standby to reduce load on the primary database.
12. Use incremental backups with change tracking logs. No need to backup unchanged blocks every day, means reduced tape and/or disk usage.
13. Use larger block sizes for larger segments. More efficient reads, less wasted disk space, larger files, less manual work by DBAs.
14. Set PCT_FREE 0 for WORM tables. Save 10% of disk space, tape space, network traffic. (WORM = Write Once Read Many - only inserts, no updates or deletes)
15. Delete old data or archive to cheap storage. Use partitions and/or transportable tablespaces for easy removal.
16. Set inactive tablespaces read only. Read only tablespaces can be on cheaper disk, are easily transportable, no need to replicate, no need to restore to dev and test, no need to backup every day.
17. Use Dataguard or Streams instead of SAN level replication. Dataguard can easily switch the direction of replication, and streams can be Active/Active multi-master for more efficient use of DR hardware.
18. Avoid keeping a backup on disk. Dataguard failover is much faster than database restore.
19. Frequently backup and delete archivelogs to reduce disk space. If you do need to keep a copy on disk, a compressed backup piece is more efficient than uncompressed archive logs.
20. Clone the ORACLE_HOME instead of installing every time. Save time on install, patching, upgrades and be certain that the builds are identical.
21. Use DBCA templates to reduce DBA workload when creating databases. Save time on post install tasks like setup of monitoring users security, and be certain builds are identical.
22. Turn on auditing to identify inactive users, then lock and drop them. Potential to reduce security risks, identify data to drop, and reduce named user license costs.
23. Monitor or audit for inactive tables and indexes to be dropped. Dropping unused segments can lead to disk and tape savings and potential performance improvements on inserts, updates, deletes.
24. Empty the recycle bin. The recycle bin is useful for restoring an accidentally dropped table, but not forever.
25. Use profiles. Limit the number of connections a user can have to the database, or the amount of CPU they can use. Also useful for enforcing password complexity and expiry.
26. Reorganize sparse tables and rebuild indexes to lower high water mark. Reduce disk usage and improve performance.
27. Use external tables instead of permanent staging tables. Less disk space needed, less redo/archive log to replicate and backup, and fewer disk writes.
28. Use global temporary tables for temporary data. Transactions are not written to permanent datafiles, are not written to redo/archive, and are not backed up or replicated.
29. Use nologging and bulk transactions. Reduce archivelogs on disk and tape.
30. Move less data. Instead of moving data around and having copies in every database try keeping it in a single location and access it via Database Links, Heterogeneous Services, SOA, ODBC, JDBC, or external tables.
31. Reduce processing peaks. Reduces hardware, improves performance, and reduces licence costs. Reduce the frequency of scheduled jobs, reduce the size of peak period transactions and increase the size of off-peak transactions using streams, materialized views, parallel processing etc.
32. Move low SLA processing like batch jobs off high SLA databases. Use Logical Dataguard, Streams, or materialised views for near-realtime data on low SLA machines.
33. Identify high resource transactions and tune them (top 10 CPU, READS, SORTS, REDO etc)
34. Identify which applications only need Oracle Standard Edition. Considerable license and support cost savings are possible if you don't need Enterprise Edition options like partitioning.
35. Virtualize DEV and TEST servers. This can be done either at server level (VM) or disk level (disk snaps). Make more efficient use of servers and storage.
36. Use cheaper PC software. E.g Linux, Cygwin, SQL Developer instead of Windows, Exceed, Toad.
37. Just in time provisioning. Why buy the production server and software licences on day 1 of a 2 year development project ? Why have 2 years of disk growth online ? Delay the purchase, lower your costs, and lower your workload. Read and understand the software licence, and if you have any doubt follow up with the vendor or company counsel.
38. Standardise on one monitoring product. Reduce software license costs, monitoring servers, email alerts, and server CPU usage.
39. Implement centralised password authentication. If a user only has one username and password for every PC, Server, Database, or Application login, think about how many thousands of Helpdesk calls for changing passwords can be avoided. If a person leaves the company, only one account needs to be locked.
40. End with accurate metrics listing databases, servers, disk usage, maximum and average cpu usage, costs etc. Calculate the cost savings, make sure your boss knows about them, and start again. You may not have reduced the total spend, but you should at least have lowered the growth rate.

It does take time for the cost reduction to show, so start ASAP, identify the easiest wins, add the improvements back into the new build standards and monitoring systems, then repeat the cycle. Remember that it is easy to plan cost reductions, but change introduces risk and fear, and getting agreement from the developers, users, application owners, and other DBAs will take persuasion and coercion. Other employees may be more concerned with uptime than cost reduction, and you won't get agreement for every change you propose, so choose your battles carefully. Put together a cost/benefit proposal for your manager detailing estimated savings, along with risks, and if you can convince them, their political support will be invaluable for removing roadblocks.

The methods of cost reduction listed in this document are only examples and may not work in every company, but I hope they will start you on the right path. Feedback is welcome.