Добавить
Уведомления

What are different types of compression techniques in Sqoop? || Gzip, Snappy and Bzip2 Compressio

Compression technique in Sqoop: While saving data to HDFS, decrease the overall size occupied on HDFS by using compression technique. Mainly 3 compression techniques are available in sqoop. 1. Gzip Compression 2. Snappy Compression 3. Bzip2 Compression --compress is used to enable the compression. --compression-codec is used with specific compression algorithm. Compression technique in Sqoop: When using the --compress parameter in sqoop command, output files will be compressed using the Gzip codec, and all files will end up with a .gz extension. Gzip Files Extension .gz. Bzip2 Files Extension .bz2. Snappy Files Extension .snappy. Gzip Compression Speed medium. Bzip2 Compression Speed slow. Snappy Compression Speed fast. Gzip Degree of Compression medium. Bzip2 Degree of Compression high. Snappy Degree of Compression medium. You need to make sure the compressed map output is allowed in your Hadoop configuration file. Configure Compression technique in Sqoop: In mapred-site.xml: name: mapreduce.map.output.compress value: true name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.GzipCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.SnappyCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.BZip2Codec Gzip Compressing technique: Compress imported data with Gzip compression technique. By default sqoop uses GzipCodec compress technique. To enable compress, parameter is: --compress 1.1 . Gzip Compression using --compress Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compress 1.2. Gzip Compression using --compression-codec GzipCodec Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compression-codec org.apache.hadoop.io.compress.GzipCodec -m 1 Snappy Compressing technique: Compress imported data with snappy compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_snappy --compression-codec org.apache.hadoop.io.compress. SnappyCodec Bzip2 Compressing technique: Compress imported data with Bzip2 compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_bz2 --compression-codec org.apache.hadoop.io.compress. BZip2Codec

12+
19 просмотров
2 года назад
12+
19 просмотров
2 года назад

Compression technique in Sqoop: While saving data to HDFS, decrease the overall size occupied on HDFS by using compression technique. Mainly 3 compression techniques are available in sqoop. 1. Gzip Compression 2. Snappy Compression 3. Bzip2 Compression --compress is used to enable the compression. --compression-codec is used with specific compression algorithm. Compression technique in Sqoop: When using the --compress parameter in sqoop command, output files will be compressed using the Gzip codec, and all files will end up with a .gz extension. Gzip Files Extension .gz. Bzip2 Files Extension .bz2. Snappy Files Extension .snappy. Gzip Compression Speed medium. Bzip2 Compression Speed slow. Snappy Compression Speed fast. Gzip Degree of Compression medium. Bzip2 Degree of Compression high. Snappy Degree of Compression medium. You need to make sure the compressed map output is allowed in your Hadoop configuration file. Configure Compression technique in Sqoop: In mapred-site.xml: name: mapreduce.map.output.compress value: true name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.GzipCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.SnappyCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.BZip2Codec Gzip Compressing technique: Compress imported data with Gzip compression technique. By default sqoop uses GzipCodec compress technique. To enable compress, parameter is: --compress 1.1 . Gzip Compression using --compress Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compress 1.2. Gzip Compression using --compression-codec GzipCodec Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compression-codec org.apache.hadoop.io.compress.GzipCodec -m 1 Snappy Compressing technique: Compress imported data with snappy compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_snappy --compression-codec org.apache.hadoop.io.compress. SnappyCodec Bzip2 Compressing technique: Compress imported data with Bzip2 compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_bz2 --compression-codec org.apache.hadoop.io.compress. BZip2Codec

, чтобы оставлять комментарии