Package org.apache.druid.utils
Class CompressionUtils
- java.lang.Object
-
- org.apache.druid.utils.CompressionUtils
-
public class CompressionUtils extends Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CompressionUtils.Format
-
Field Summary
Fields Modifier and Type Field Description static long
COMPRESSED_TEXT_WEIGHT_FACTOR
-
Constructor Summary
Constructors Constructor Description CompressionUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static InputStream
decompress(InputStream in, String fileName)
Decompress an input stream from a file, based on the filename.static String
getGzBaseName(String fname)
Get the file name without the .gz extensionstatic FileUtils.FileCopyResult
gunzip(com.google.common.io.ByteSource in, File outFile)
Gunzip from the input stream to the output filestatic FileUtils.FileCopyResult
gunzip(com.google.common.io.ByteSource in, File outFile, com.google.common.base.Predicate<Throwable> shouldRetry)
A gunzip function to store locallystatic FileUtils.FileCopyResult
gunzip(File pulledFile, File outFile)
gunzip the file to the output file.static FileUtils.FileCopyResult
gunzip(InputStream in, File outFile)
Unzips the input stream via a gzip filter.static long
gunzip(InputStream in, OutputStream out)
gunzip from the source stream to the destination stream.static long
gzip(com.google.common.io.ByteSource in, com.google.common.io.ByteSink out, com.google.common.base.Predicate<Throwable> shouldRetry)
static FileUtils.FileCopyResult
gzip(File inFile, File outFile)
GZip compress the contents of inFile into outFilestatic FileUtils.FileCopyResult
gzip(File inFile, File outFile, com.google.common.base.Predicate<Throwable> shouldRetry)
Gzips the input file to the outputstatic long
gzip(InputStream inputStream, OutputStream out)
Copy inputStream to out while wrapping out in a GZIPOutputStream Closes both input and outputstatic GZIPInputStream
gzipInputStream(InputStream in)
Fixes java bug 7036144 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7036144 which affects concatenated GZipstatic boolean
isGz(String fName)
Checks to see if fName is a valid name for a "*.gz" filestatic boolean
isZip(String fName)
Checks to see if fName is a valid name for a "*.zip" filestatic FileUtils.FileCopyResult
unzip(com.google.common.io.ByteSource byteSource, File outDir, com.google.common.base.Predicate<Throwable> shouldRetry, boolean cacheLocally)
Unzip the byteSource to the output directory.static FileUtils.FileCopyResult
unzip(File pulledFile, File outDir)
Unzip the pulled file to an output directory.static FileUtils.FileCopyResult
unzip(InputStream in, File outDir)
Unzip from the input stream to the output directory, using the entry's file name as the file name in the output directory.static void
validateZipOutputFile(String sourceFilename, File outFile, File outDir)
static long
zip(File directory, File outputZipFile)
Zip the contents of directory into the file indicated by outputZipFile.static long
zip(File directory, File outputZipFile, boolean fsync)
Zip the contents of directory into the file indicated by outputZipFile.static long
zip(File directory, OutputStream out)
Zips the contents of the input directory to the output stream.
-
-
-
Field Detail
-
COMPRESSED_TEXT_WEIGHT_FACTOR
public static final long COMPRESSED_TEXT_WEIGHT_FACTOR
- See Also:
- Constant Field Values
-
-
Method Detail
-
zip
public static long zip(File directory, File outputZipFile, boolean fsync) throws IOException
Zip the contents of directory into the file indicated by outputZipFile. Sub directories are skipped- Parameters:
directory
- The directory whose contents should be added to the zip in the output stream.outputZipFile
- The output file to write the zipped data tofsync
- True if the output file should be fsynced to disk- Returns:
- The number of bytes (uncompressed) read from the input directory.
- Throws:
IOException
-
zip
public static long zip(File directory, File outputZipFile) throws IOException
Zip the contents of directory into the file indicated by outputZipFile. Sub directories are skipped- Parameters:
directory
- The directory whose contents should be added to the zip in the output stream.outputZipFile
- The output file to write the zipped data to- Returns:
- The number of bytes (uncompressed) read from the input directory.
- Throws:
IOException
-
zip
public static long zip(File directory, OutputStream out) throws IOException
Zips the contents of the input directory to the output stream. Sub directories are skipped- Parameters:
directory
- The directory whose contents should be added to the zip in the output stream.out
- The output stream to write the zip data to. Caller is responsible for closing this stream.- Returns:
- The number of bytes (uncompressed) read from the input directory.
- Throws:
IOException
-
unzip
public static FileUtils.FileCopyResult unzip(com.google.common.io.ByteSource byteSource, File outDir, com.google.common.base.Predicate<Throwable> shouldRetry, boolean cacheLocally) throws IOException
Unzip the byteSource to the output directory. If cacheLocally is true, the byteSource is cached to local disk before unzipping. This may cause more predictable behavior than trying to unzip a large file directly off a network stream, for example. * @param byteSource The ByteSource which supplies the zip data- Parameters:
byteSource
- The ByteSource which supplies the zip dataoutDir
- The output directory to put the contents of the zipshouldRetry
- A predicate expression to determine if a new InputStream should be acquired from ByteSource and the copy attempted again. If you want to retry on any exception, useFileUtils.IS_EXCEPTION
.cacheLocally
- A boolean flag to indicate if the data should be cached locally- Returns:
- A FileCopyResult containing the result of writing the zip entries to disk
- Throws:
IOException
-
unzip
public static FileUtils.FileCopyResult unzip(File pulledFile, File outDir) throws IOException
Unzip the pulled file to an output directory. This is only expected to work on zips with lone files, and is not intended for zips with directory structures.- Parameters:
pulledFile
- The file to unzipoutDir
- The directory to store the contents of the file.- Returns:
- a FileCopyResult of the files which were written to disk
- Throws:
IOException
-
validateZipOutputFile
public static void validateZipOutputFile(String sourceFilename, File outFile, File outDir) throws IOException
- Throws:
IOException
-
unzip
public static FileUtils.FileCopyResult unzip(InputStream in, File outDir) throws IOException
Unzip from the input stream to the output directory, using the entry's file name as the file name in the output directory. The behavior of directories in the input stream's zip is undefined. If possible, it is recommended to use unzip(ByteStream, File) instead- Parameters:
in
- The input stream of the zip data. This stream is closedoutDir
- The directory to copy the unzipped data to- Returns:
- The FileUtils.FileCopyResult containing information on all the files which were written
- Throws:
IOException
-
gunzip
public static FileUtils.FileCopyResult gunzip(File pulledFile, File outFile)
gunzip the file to the output file.- Parameters:
pulledFile
- The source of the gz dataoutFile
- A target file to put the contents- Returns:
- The result of the file copy
- Throws:
IOException
-
gunzip
public static FileUtils.FileCopyResult gunzip(InputStream in, File outFile) throws IOException
Unzips the input stream via a gzip filter. use gunzip(ByteSource, File, Predicate) if possible- Parameters:
in
- The input stream to run through the gunzip filter. This stream is closedoutFile
- The file to output to- Throws:
IOException
-
gzipInputStream
public static GZIPInputStream gzipInputStream(InputStream in) throws IOException
Fixes java bug 7036144 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7036144 which affects concatenated GZip- Parameters:
in
- The raw input stream- Returns:
- A GZIPInputStream that can handle concatenated gzip streams in the input
- Throws:
IOException
- See Also:
which should be used instead for streams coming from files
-
gunzip
public static long gunzip(InputStream in, OutputStream out) throws IOException
gunzip from the source stream to the destination stream.- Parameters:
in
- The input stream which is to be decompressed. This stream is closed.out
- The output stream to write to. This stream is closed- Returns:
- The number of bytes written to the output stream.
- Throws:
IOException
-
gunzip
public static FileUtils.FileCopyResult gunzip(com.google.common.io.ByteSource in, File outFile, com.google.common.base.Predicate<Throwable> shouldRetry)
A gunzip function to store locally- Parameters:
in
- The factory to produce input streamsoutFile
- The file to store the result intoshouldRetry
- A predicate to indicate if the Throwable is recoverable- Returns:
- The count of bytes written to outFile
-
gunzip
public static FileUtils.FileCopyResult gunzip(com.google.common.io.ByteSource in, File outFile)
Gunzip from the input stream to the output file- Parameters:
in
- The compressed input stream to read fromoutFile
- The file to write the uncompressed results to- Returns:
- A FileCopyResult of the file written
-
gzip
public static long gzip(InputStream inputStream, OutputStream out) throws IOException
Copy inputStream to out while wrapping out in a GZIPOutputStream Closes both input and output- Parameters:
inputStream
- The input stream to copy data from. This stream is closedout
- The output stream to wrap in a GZIPOutputStream before copying. This stream is closed- Returns:
- The size of the data copied
- Throws:
IOException
-
gzip
public static FileUtils.FileCopyResult gzip(File inFile, File outFile, com.google.common.base.Predicate<Throwable> shouldRetry)
Gzips the input file to the output- Parameters:
inFile
- The file to gzipoutFile
- A target file to copy the uncompressed contents of inFile toshouldRetry
- Predicate on a potential throwable to determine if the copy should be attempted again.- Returns:
- The result of the file copy
- Throws:
IOException
-
gzip
public static long gzip(com.google.common.io.ByteSource in, com.google.common.io.ByteSink out, com.google.common.base.Predicate<Throwable> shouldRetry)
-
gzip
public static FileUtils.FileCopyResult gzip(File inFile, File outFile)
GZip compress the contents of inFile into outFile- Parameters:
inFile
- The source of dataoutFile
- The destination for compressed data- Returns:
- A FileCopyResult of the resulting file at outFile
- Throws:
IOException
-
isZip
public static boolean isZip(String fName)
Checks to see if fName is a valid name for a "*.zip" file- Parameters:
fName
- The name of the file in question- Returns:
- True if fName is properly named for a .zip file, false otherwise
-
isGz
public static boolean isGz(String fName)
Checks to see if fName is a valid name for a "*.gz" file- Parameters:
fName
- The name of the file in question- Returns:
- True if fName is a properly named .gz file, false otherwise
-
getGzBaseName
public static String getGzBaseName(String fname)
Get the file name without the .gz extension- Parameters:
fname
- The name of the gzip file- Returns:
- fname without the ".gz" extension
- Throws:
IAE
- if fname is not a valid "*.gz" file name
-
decompress
public static InputStream decompress(InputStream in, String fileName) throws IOException
Decompress an input stream from a file, based on the filename.- Throws:
IOException
-
-