We did some diagnostics on the databases, and worked out that only about 10% of data blocks in a clone were ever updated, so deduplicating at filesystem or storage block level could result in 90% saving. The client was a large conservative bank, so the preference was to use existing products from large vendors. We evaluated products from Delphix, NetApp SnapManager for Oracle, and Oracle ZFS appliance, and ended up choosing the NetApp product due to a combination of functionality, cost, and in-house skills.
Since then I have been wondering if we could have done it cheaper, and have done a quick evaluation of the BTRFS filesystem which does allow deduplication at filesystem level.
BTRFS is a "copy on write" filesystem and is listed as EXPERIMENTAL, so I don't think it is ready to use in a PROD environment, but is worth testing to support DEV and TEST environments depending on the cost of downtime.
You can find more information here :- http://btrfs.wiki.kernel.org
The following test was done using Oracle VirtualBox running Oracle Linux 6 2.6.39-300.28.1.el6uek.x86_64. BTRFS is included in the Linux kernel, but I needed to install package btrfs-progs :-
yum install btrfs-progs
Format the storage device using mkfs.btrfs :-
[root@btrfs ~]# mkfs.btrfs /dev/sdb WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdb nodesize 4096 leafsize 4096 sectorsize 4096 size 12.00GB Btrfs Btrfs v0.19
You can see that we formatted a 12GB device with devault 4K block size. Block size can impact performance, so if the BTRFS filesystem will only contain database files with 8K blocks, I suspect you would see better performance if the filesystem was also formatted to use 8K blocks.
Now mount the filesystem :-
[root@btrfs ~]# mount /dev/sdb /btrfs1
And add an entry to /etc/fstab :-
/dev/sdb /btrfs1 btrfs defaults 0 0
Using the "btrfs filesystem show" and "btrfs filesystem df" commands we can see the usage. Note that the filesystem has reserved 2GB out of 12GB.
[root@btrfs /]# btrfs filesystem show /dev/sdb
Label: none uuid: f6f7d6c7-14ca-4c01-8c33-3066fc7f4ad3
Total devices 1 FS bytes used 92.00KB
devid 1 size 12.00GB used 2.04GB path /dev/sdb
Btrfs Btrfs v0.19
[root@btrfs /]# btrfs filesystem df /btrfs1
Data: total=8.00MB, used=64.00KB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=24.00KB
Metadata: total=8.00MB, used=0.00
We can also see that usage at OS level.
[root@btrfs /]# df -k /btrfs1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 120 10457024 1% /btrfs1
Create some directories for the Oracle binaries and datafiles :-
[root@btrfs ~]# cd /btrfs1
[root@btrfs u01]# mkdir app
[root@btrfs u01]# mkdir data
[root@btrfs u01]# chown oracle:dba app data
And now we can start using some of the BTRFS functionality. Create subvolumes for the source database binaries and datafiles. BTRFS allows us to take snapshots of the subvolumes at a later time, so it is best to plan subvolume layout before installing files.
Create subvolumes using "btrfs subvolume create" :-
Create subvolumes using "btrfs subvolume create" :-
[root@btrfs btrfs1]# su - oracle
[oracle@btrfs app]$ cd /btrfs1/app
[oracle@btrfs app]$ btrfs subvolume create source
Create subvolume './source'
[oracle@btrfs app]$ cd /btrfs1/data
[oracle@btrfs data]$ btrfs subvolume create source
Create subvolume './source'
Install the Oracle binaries and database in the subvolumes.
Oracle binaries in /btrfs1/app/source and create database "source" in /btrfs1/data/source.
Oracle binaries in /btrfs1/app/source and create database "source" in /btrfs1/data/source.
This install used about 5.6GB of disk.
[oracle@btrfs source]$ df /btrfs1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661424 4938480 54% /btrfs1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661424 4938480 54% /btrfs1
Database source currently has 1.1GB of datafiles, using total 1.3GB in the filesystem.
SQL> select sum(bytes)/1024/1024/1024 from dba_data_files;
SUM(BYTES)/1024/1024/1024
-------------------------
1.11816406
[oracle@btrfs source]$ pwd
/btrfs1/data/source
[oracle@btrfs source]$ du -k .
1346188 .
[oracle@btrfs source]$ ls
control01.ctl redo01.log redo03.log system01.dbf undotbs01.dbf
control02.ctl redo02.log sysaux01.dbf temp01.dbf users01.dbf
If we want to take a cold backup of the source database before we run some tests, all we need to do is shutdown and create a snapshot. First check how much space is available in the filesystem and how much is used by the source database.
[oracle@btrfs data]$ pwd
/btrfs1/data
[oracle@btrfs data]$ df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5661820 4938124 54% /btrfs1
[oracle@btrfs data]$ du -sk *
1346188 source
Create the snapshot using "btrfs subvolume snapshot".[oracle@btrfs data]$ btrfs subvolume snapshot source backup1 Create a snapshot of 'source' in './backup1'
And we can see that the snapshot did not use any additional disk, but we now have a second set of datafiles in directory backup1. [oracle@btrfs data]$ df -k . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 12582912 5661820 4938124 54% /btrfs1 [oracle@btrfs data]$ du -sk * 1346188 source
1346188 backup1
We can now start the database and run our tests, knowing we have a backup available. Because BTRFS is a "copy on write" filesystem, any block that is changed in the source database will be copied to a new location, and the original left in place as part of the backup1 snapshot. Since the binaries are also installed in a BTRFS subvolume, we could also take a snapshot of them in case we wanted to test patching or relinking.
If we needed additional test environments we can use the files from a snapshot, recreate the controlfile, and open resetlogs.
To create a new clone database called "target", take a snapshot :-
[oracle@btrfs data]$ btrfs subvolume snapshot source target
Create a snapshot of 'source' in './target'
Then create the parameter file inittarget.ora, audit directories, and script to recreate the controlfile.
[oracle@btrfs data]$ export ORACLE_SID=target
[oracle@btrfs data]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Thu Feb 14 11:44:17 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> @/home/oracle/recrctl_target.sql
ORACLE instance started.
Total System Global Area 1570009088 bytes
Fixed Size 2213696 bytes
Variable Size 922749120 bytes
Database Buffers 637534208 bytes
Redo Buffers 7512064 bytes
Control file created.
Database altered.
Tablespace altered.
SQL> select name from v$database;
NAME
---------
TARGET
SQL> select name from v$datafile;
NAME
-----------------------------------------------------------------
/btrfs1/data/target/system01.dbf
/btrfs1/data/target/sysaux01.dbf
/btrfs1/data/target/undotbs01.dbf
/btrfs1/data/target/users01.dbf
At the OS level we can see both databases running, against their own files, with very little additional storage consumed.
[oracle@btrfs data]$ ps -ef | grep pmon
oracle 5581 1 0 11:44 ? 00:00:00 ora_pmon_target
oracle 5708 1 0 11:42 ? 00:00:00 ora_pmon_source
[oracle@btrfs data]$ df /btrfs1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 12582912 5792420 4808972 55% /btrfs1
[oracle@btrfs data]$ du -sk *
1346188 source
1346188 backup1
1346828 target
The result :-
Using BTRFS snapshots, we now have two running databases, and one cold backup, with deduplication at filesystem block level. If we had created the backup and clone in a normal filesystem (ext3 etc) we would have used almost 2.7GB of disk, but with deduplication we have only used about 0.1GB. We now have 4.8GB free on the filesystem, so could create many more databases to keep the developers happy without making the accountants unhappy. Disk usage will increase as blocks are updated, but unless 100% of the blocks are updated we will still see an advantage from deduplication.
Obviously this was a small example to demonstrate the use of BTRFS snapshots and deduplication, but the same will apply with large databases, with considerable savings. In the example above we shutdown the source database to take a snap, but if the database is in archivelog mode we could just take a hot backup, or we could run a Dataguard Standby, and defer recovery to take a snapshot. My tests were only run on a small VM which is not suitable for performance testing, so additional testing on physical hardware would be useful if database performance is important.
Using BTRFS snapshots, we now have two running databases, and one cold backup, with deduplication at filesystem block level. If we had created the backup and clone in a normal filesystem (ext3 etc) we would have used almost 2.7GB of disk, but with deduplication we have only used about 0.1GB. We now have 4.8GB free on the filesystem, so could create many more databases to keep the developers happy without making the accountants unhappy. Disk usage will increase as blocks are updated, but unless 100% of the blocks are updated we will still see an advantage from deduplication.
Obviously this was a small example to demonstrate the use of BTRFS snapshots and deduplication, but the same will apply with large databases, with considerable savings. In the example above we shutdown the source database to take a snap, but if the database is in archivelog mode we could just take a hot backup, or we could run a Dataguard Standby, and defer recovery to take a snapshot. My tests were only run on a small VM which is not suitable for performance testing, so additional testing on physical hardware would be useful if database performance is important.
The benefits of block level deduplication are obvious, with significant cost savings on storage, but we should also consider the downside. There is an overhead to deduplication, so database users are likely to see a degrade in performance. It should also be noted that BTRFS is currently listed as experimental, so a bit more work needs to be done before it is ready for important systems, but it is worth testing to see what is possible. In addition to the deduplication features offered by BTRFS, the filesystem also supports compression which would further reduce disk usage.
No comments:
Post a Comment