Hbase Performance - 2 Compression

Compression In HBASE

Disk Performance is always a big issue in Hadoop Cluster. Since you are playing with tremendous amount of data so you will require the tremendous amount of disk space with better IO performance if the data is compressed it will required the less disk space and result will be that it may increase the performance of Read and Write or may decrease since can spend more time in compression and decompression (CPU Load increase), But in HBASE in Snappy Compression it effects with positive performance increase

Basic Types of compression in HBASE
• Snappy
• LZO
• Gzip

Gzip - is expensive in terms of CPU resources. it is required for the data which is not frequently accessed
LZO and Snappy  are good for the data which accessed frequently if the data is compressed it will required the less disk space and result will be that it may increase the performance of Read and Write or may decrease since can spend more time in compression and decompression Performance result with Snappy Compression 
Case -0
Mapper job, which does not, includes the reducer Job.
Job - Read the data from HBASE  and Store in HDFS using Pig Normal
27.9 GB of data present in HBASE stored in another table after exporting with snappy compression 


Normal
Snappy
Online Regions
128
31
No of Mapper Running at a time
34
34
Job Time 


Average Mapper Time
34sec
1mins, 15sec
Max Mapper Time
55sec
1mins, 38sec
No. Of Rows
12348704
12348704
No Of version Configured
10000
10000
Size in HDFS
27.9 G 
5.0 G



Case -1 
Note - for 100000 Number of Rows (Having 10 Column and 2 Column Family )
In Snappy - Increased Version 1k for 5 column
In Normal - Increased 1K version for 2 column

Normal for 2 Column
Snappy for 5 Column
Snappy For 2 Column
Regions
199
46
35
Job Time
2mins, 37sec
1mins, 41sec 
1mins, 36sec
Max Mapper Time
36sec
 1mins, 11sec
1mins, 32sec
Average Mapper Time
23sec
 51sec
1mins, 7sec

Result - Average mapper time in Snappy is increased but since regions are less the Number of mapper is less as compared to normal and have overall positive impact on Compression


Case -2
5K version for 24000 Keys 



Normal 2 column
Snappy 2 Column
Regions
252
41
Job Time
3mins, 7sec
1mins, 47sec
Max Mapper Time
35sec
1mins, 22sec
Avg Mapper Time
22sec
58sec

Comments