Oracle’s RMAN (Recovery Manager) tool is the Oracle-recommended tool to create and manage your Oracle Database backups. It’s not clear, however, which RMAN settings are ideal for your environment. The trade-off depends on three factors: backup size, how much CPU power you want to dedicate to the RMAN backup, and how long you want it to run.
Test System Configuration
Although these RMAN tests were run in an Exadata environment, the results are applicable to any well-tuned Oracle server since almost all RMAN operations occur at the database layer instead of the storage layer (cell servers):
- Exadata X6-8:
- All cores enabled on 2 servers (288 total available)
- 14 storage servers
- DATA disk group, high redundancy
- RECO disk group, normal redundancy
- All backups from DATA => RECO
- Source database:
- 2.3 TB total size
- ~5 Application Users
- Single large application schema with most tables/indexes in a single BIGFILE tablespace
There are only two non-default settings in RMAN that are changed for each of these tests:
CONFIGURE DEVICE TYPE DISK PARALLELISM 32 BACKUP TYPE TO COMPRESSED BACKUPSET; CONFIGURE COMPRESSION ALGORITHM 'low' OPTIMIZE FOR LOAD TRUE AS OF RELEASE 'default';
The PARALLELISM parameter varies from 8 to 32, and the COMPRESSED parameter is omitted for the uncompressed backup test. The BACKUP command looks like this:
BACKUP SECTION SIZE 32G PLUGGABLE DATABASE DEV;
With BIGFILE tablespaces, the SECTION SIZE parameter is critical — it allows multiple channels to back up a single datafile (every BIGFILE tablespace is one datafile). For all of these tests, I used the standard calculation recommended by Oracle to maximize throughput of an RMAN backup of a BIGFILE tablespace. With the size of the largest BIGFILE tablespace, I used this calculation:
SECTION_SIZE = BIGFILE_SIZE / #CHANNELS / 2
This ensures that each channel will be used twice for a subset of the largest BIGFILE tablespace.
A few assumptions were made when testing RMAN — for example, no extensive testing was done with BACKUP AS COPY, but in that case, the performance characteristics are almost identical to using RMAN BACKUPSETs with no compression. The variables used during these tests are as follows:
- Number of RMAN channels (DOP): 8, 16, 32, 64
- Compression level: none, basic, low, medium, high
- CPU seconds used
- elapsed time
- backup size
The dependent variables change based on the combination of the two independent variables (compression level and DOP), so the graphs are 3-D to accurately reflect the relationship between the independent and dependent variables.
The following graphs show the results when backing up the 2.3 TB database with every combination of DOP and compression type.
The first test reflects how many total CPU seconds were used by each combination of DOP and compression type:
The one obvious takeaway is that “basic” compression (the default) appears to have a lot more overhead than other methods; the CPU seconds used is proportional to the number of CPUs you allocate to the backup — that’s expected. But how “bad” is “basic” compression and how “good” are the others? Read on.
If you have plenty of disk space, you might be able to avoid much compression and avoid dedicating too many CPUs to an RMAN backup:
When looking at compression levels, it’s clear that a higher DOP doesn’t reduce the backup size, it will just finish faster. The “basic” compression type seems to do a good job — but at what cost? Read on.
For many customers, elapsed time is the most critical factor when configuring RMAN. You don’t want to run RMAN during any ETL jobs nor do you want to impact report batches or ad-hoc usage during the day:
As you might expect, elapsed time is higher when you use a lower DOP, but it appears that the “high” compression type becomes less efficient at lower DOPs with a larger SECTION SIZE.
The term “compression efficiency” tells us how much compression we get per CPU second. The smaller compressed size of an RMAN backup comes at an increasingly higher cost (diminishing returns).
The results are comparable at any DOP: some compression methods are more efficient than others and the previous charts hinted at that. The “low” compression method seems to provide the best compression per CPU second. The “basic” (default) compression method is eclipsed in inefficiency only by “high”, but if you want the absolute smallest backup size, “high” is the way to go (but it sure seems not to be worth it!).
Restore and Recover
Ideally, you will only need to back up your database and never have to restore it. However, you want to be ready to restore if the worst happens, so you would probably test a full database restore and recover every 3-6 months. Therefore, the time it takes to restore from an RMAN backup is important. In this set of tests, not all RMAN backups were restored, but with the default backup type in RMAN set to OPTIMIZE FOR LOAD TRUE the expectation is that a restore operation will take less time than the corresponding backup operation. Across the board, this is true — with “low” compression, a backup that took 5 minutes to create used about the same amount of CPU and took about 3 minutes and 30 seconds to restore and recover the database.
A few conclusions can be drawn from these tests.
First, use as many CPUs as you can to back up your database — all of the metrics related to RMAN backups (CPU time, elapsed time, backup size) scale linearly with the number of CPUs used. With newer Intel Xeon processors, the compression capabilities get even better, with some of those capabilities now in hardware at the microcode level.
Second, the “free” compression level of “basic” (without the Advanced Compression license) is free for a reason — it’s not very efficient. If you have the Advanced Compression license, use “low”. It gives you the best “bang for the buck” when considering the amount of CPU used to compress X bytes down to Y bytes.