LZO Compression
A block compression algorithm ,it splits the file and compress it in to individual blocks and write onto HDFS. So even when you read it doesn't need to merge all the blocks and put in one place. It can read blocks at the node level, decompress and fetch.
LZ4 – a newer variant optimized for speed at the cost of compression ratio
Snappy Compression
Snappy is a project developed by google https://github.com/google/snappy. Snappy is related to Lempel-ZIV family of compression algorithm, (LZO is also part of this family)
it takes whole file, compress it and load into blocks. The drawback here is when we try to read the blocks, it can decompress only at file level. It can't decompress at node level instead it has to bring all blocks to one place and then read from file header till end of the file. The reason behind this behavior is file in Snappy it is not splittable before loading in HDFS. It compress the file as it is and process further.
Benefits of Snappy Over LZO Compression
In Context of Apache Hadoop
A block compression algorithm ,it splits the file and compress it in to individual blocks and write onto HDFS. So even when you read it doesn't need to merge all the blocks and put in one place. It can read blocks at the node level, decompress and fetch.
LZ4 – a newer variant optimized for speed at the cost of compression ratio
Snappy Compression
Snappy is a project developed by google https://github.com/google/snappy. Snappy is related to Lempel-ZIV family of compression algorithm, (LZO is also part of this family)
it takes whole file, compress it and load into blocks. The drawback here is when we try to read the blocks, it can decompress only at file level. It can't decompress at node level instead it has to bring all blocks to one place and then read from file header till end of the file. The reason behind this behavior is file in Snappy it is not splittable before loading in HDFS. It compress the file as it is and process further.
Benefits of Snappy Over LZO Compression
In Context of Apache Hadoop
- Snappy is faster in Decompression and comparable in Compression than LZO, so in total trip time Snappy is superior than LZO Compression
- Snappy Comes under BSD license so can be shipped with Hadoop, LZO comes with GPL license so downloaded and installed separately(Cloudera installation HBase contains Snappy)
This use alone justifies installing Snappy, but there are other places Snappy can be used within Hadoop applications. For example, Snappy can be used for block compression in all the commonly-used Hadoop file formats, including Sequence Files, Avro Data Files, and HBase tables.
One thing to note is that Snappy is intended to be used with a container format, like Sequence Files or Avro Data Files, rather than being used directly on plain text, for example, since the latter is not splittable and can’t be processed in parallel using MapReduce. This is different to LZO, where is is possible to index LZO compressed files to determine split points so that LZO files can be processed efficiently in subsequent processing.
How to use Snappy with Hadoop
Snappy support was added to Hadoop in HADOOP-7206, which will be available in the forthcoming 0.23.0 Apache release. Enabling map output compression is as simple as adding the following to mapred-site.xml:
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
Comments
Post a Comment