Compression In HBASE
Disk Performance is always a big issue in Hadoop Cluster. Since you are playing with tremendous amount of data so you will require the tremendous amount of disk space with better IO performance if the data is compressed it will required the less disk space and result will be that it may increase the performance of Read and Write or may decrease since can spend more time in compression and decompression (CPU Load increase), But in HBASE in Snappy Compression it effects with positive performance increase
Basic Types of compression in HBASE
• LZO
• Gzip
Gzip - is expensive in terms of CPU resources. it is required for the data which is not frequently accessed
LZO and Snappy are good for the data which accessed frequently if the data is compressed it will required the less disk space and result will be that it may increase the performance of Read and Write or may decrease since can spend more time in compression and decompression Performance result with Snappy Compression
Case -0
Mapper job, which
does not, includes the reducer Job.
Job - Read the data
from HBASE and
Store in HDFS using Pig Normal
27.9 GB of data
present in HBASE stored in another table after exporting with snappy
compression
|
|
Normal
|
Snappy
|
|
Online
Regions
|
128
|
31
|
|
No
of Mapper Running at a time
|
34
|
34
|
|
Job
Time
|
|
|
|
Average
Mapper Time
|
34sec
|
1mins, 15sec
|
|
Max
Mapper Time
|
55sec
|
1mins,
38sec
|
|
No.
Of Rows
|
12348704
|
12348704
|
|
No
Of version Configured
|
10000
|
10000
|
|
Size
in HDFS
|
27.9 G
|
5.0 G
|
Case -1
Note - for 100000 Number of Rows (Having 10 Column and 2 Column Family )
In Snappy - Increased Version 1k for 5 column
In Normal - Increased 1K version for 2 column
Note - for 100000 Number of Rows (Having 10 Column and 2 Column Family )
In Snappy - Increased Version 1k for 5 column
In Normal - Increased 1K version for 2 column
|
|
Normal
for 2 Column
|
Snappy
for 5 Column
|
Snappy
For 2 Column
|
|
Regions
|
199
|
46
|
35
|
|
Job
Time
|
2mins, 37sec
|
1mins, 41sec
|
1mins, 36sec
|
|
Max
Mapper Time
|
36sec
|
1mins,
11sec
|
1mins,
32sec
|
|
Average
Mapper Time
|
23sec
|
51sec
|
1mins, 7sec
|
Result
- Average mapper time in Snappy is increased but since regions are less the
Number of mapper is less as compared to normal and have overall positive impact
on Compression
Case -2
|
5K
version for 24000 Keys
|
|
|
Normal
2 column
|
Snappy
2 Column
|
|
Regions
|
252
|
41
|
|
Job
Time
|
3mins, 7sec
|
1mins, 47sec
|
|
Max
Mapper Time
|
35sec
|
1mins,
22sec
|
|
Avg
Mapper Time
|
22sec
|
58sec
|
Comments
Post a Comment