public interface DataSegmentPusher
| Modifier and Type | Field and Description |
|---|---|
static com.google.common.base.Joiner |
JOINER |
| Modifier and Type | Method and Description |
|---|---|
static String |
generateUniquePath() |
default List<String> |
getAllowedPropertyPrefixesForHadoop()
Property prefixes that should be added to the "allowedHadoopPrefix" config for passing down to Hadoop jobs.
|
static String |
getDefaultStorageDir(DataSegment segment,
boolean useUniquePath) |
static String |
getDefaultStorageDirWithExistingUniquePath(DataSegment segment,
String uniquePath) |
String |
getPathForHadoop() |
String |
getPathForHadoop(String dataSource)
Deprecated.
|
default String |
getStorageDir(DataSegment dataSegment)
Deprecated.
backward-compatibiliy shim that should be removed on next major release;
use
getStorageDir(DataSegment, boolean) instead. |
default String |
getStorageDir(DataSegment dataSegment,
boolean useUniquePath) |
default String |
makeIndexPathName(DataSegment dataSegment,
String indexName) |
Map<String,Object> |
makeLoadSpec(URI finalIndexZipFilePath) |
DataSegment |
push(File file,
DataSegment segment,
boolean useUniquePath)
Pushes index files and segment descriptor to deep storage.
|
default DataSegment |
pushToPath(File indexFilesDir,
DataSegment segment,
String storageDirSuffix) |
@Deprecated String getPathForHadoop(String dataSource)
String getPathForHadoop()
DataSegment push(File file, DataSegment segment, boolean useUniquePath) throws IOException
file - directory containing index filessegment - segment descriptoruseUniquePath - if true, pushes to a unique file path. This prevents situations where task failures or replica
tasks can either overwrite or fail to overwrite existing segments leading to the possibility
of different versions of the same segment ID containing different data. As an example, a Kafka
indexing task starting at offset A and ending at offset B may push a segment to deep storage
and then fail before writing the loadSpec to the metadata table, resulting in a replacement
task being spawned. This replacement will also start at offset A but will read to offset C and
will then push a segment to deep storage and write the loadSpec metadata. Without unique file
paths, this can only work correctly if new segments overwrite existing segments. Suppose that
at this point the task then fails so that the supervisor retries again from offset A. This 3rd
attempt will overwrite the segments in deep storage before failing to write the loadSpec
metadata, resulting in inconsistencies in the segment data now in deep storage and copies of
the segment already loaded by historicals.
If unique paths are used, caller is responsible for cleaning up segments that were pushed but
were not written to the metadata table (for example when using replica tasks).IOExceptiondefault DataSegment pushToPath(File indexFilesDir, DataSegment segment, String storageDirSuffix) throws IOException
IOException@Deprecated default String getStorageDir(DataSegment dataSegment)
getStorageDir(DataSegment, boolean) instead.default String getStorageDir(DataSegment dataSegment, boolean useUniquePath)
default String makeIndexPathName(DataSegment dataSegment, String indexName)
default List<String> getAllowedPropertyPrefixesForHadoop()
static String getDefaultStorageDir(DataSegment segment, boolean useUniquePath)
static String getDefaultStorageDirWithExistingUniquePath(DataSegment segment, String uniquePath)
static String generateUniquePath()
Copyright © 2011–2023 The Apache Software Foundation. All rights reserved.