public class TableTunnel.UploadSession extends Object
UploadSession 表示向ODPS表中上传数据的会话,一般通过TableTunnel
来创建。
上传 Session 是 INSERT INTO 语义,即对同一张表或 partition 的多个/多次上传 Session 互不影响。
Session ID 是Session的唯一标识符,可通过 getId()
获取。
UploadSession 通过创建 RecordWriter
来完成数据的写入操作。
每个 RecordWriter 对应一个 HTTP Request,单个 UploadSession 可创建多个RecordWriter。
创建 RecordWriter 时需指定 block ID,block ID是 RecordWriter 的唯一标识符,取值范围 [0, 20000),单个block上传的数据限制是
100G。
同一 UploadSession 中,使用同一 block ID 多次打开 RecordWriter 会导致覆盖行为,最后一个调用 close() 的 RecordWriter
所上传的数据会被保留。同一RecordWriter实例不能重复调用 close().
RecordWriter 对应的 HTTP Request超时为 120s,若 120s 内没有数据传输,service 端会主动关闭连接。特别提醒,HTTP协议本身有8K
buffer。
最后调用 commit(Long[])
来提交本次上传的所有数据块。
commit 操作可以重试,除非遇到以下异常:
Modifier and Type | Method and Description |
---|---|
void |
commit()
不进行校验的会话提交
|
void |
commit(Long[] blocks)
提交本次上传的所有数据块
|
org.apache.arrow.vector.types.pojo.Schema |
getArrowSchema() |
Long |
getAvailBlockId()
多个线程中的
TunnelBufferedWriter 将通过这个接口获得写入的 blockId
为了防止 blockId 重复分配,对于 curBlockId 的访问必须加锁。 |
Long[] |
getBlockList()
获取当前会话已经上传成功的数据块列表
|
Configuration |
getConfig() |
String |
getId()
获取会话ID
|
String |
getQuotaName() |
String |
getResource() |
TableSchema |
getSchema()
获取表结构
|
TableTunnel.UploadStatus |
getStatus()
获取会话状态
|
boolean |
isShouldTransform() |
Record |
newRecord()
创建临时
Record 对象 |
RecordPack |
newRecordPack() |
RecordPack |
newRecordPack(CompressOption option)
新建一个 ProtobufRecordPack,数据压缩方式 option
|
RecordPack |
newRecordPack(int capacity,
CompressOption option)
新建一个 ProtobufRecordPack,预设流 buffer 大小为 capacity, 数据压缩方式 option
|
ArrowRecordWriter |
openArrowRecordWriter(long blockId) |
ArrowRecordWriter |
openArrowRecordWriter(long blockId,
CompressOption option) |
ArrowRecordWriter |
openArrowRecordWriter(long blockId,
CompressOption option,
long blockVersion) |
RecordWriter |
openBufferedWriter()
打开一个无压缩
TunnelBufferedWriter 用来写入数据 |
RecordWriter |
openBufferedWriter(boolean compress)
打开
TunnelBufferedWriter 用来写入数据 |
RecordWriter |
openBufferedWriter(CompressOption compressOption)
打开
TunnelBufferedWriter 用来写入数据 |
RecordWriter |
openBufferedWriter(CompressOption compressOption,
long timeout)
打开
TunnelBufferedWriter 用来写入数据 |
RecordWriter |
openBufferedWriter(CompressOption compressOption,
long timeout,
TableTunnel.BlockVersionProvider versionProvider)
打开
TunnelBufferedWriter 用来写入数据 |
RecordWriter |
openRecordWriter(long blockId)
打开
RecordWriter 用来写入数据 |
RecordWriter |
openRecordWriter(long blockId,
boolean compress)
打开
RecordWriter 用来写入数据 |
RecordWriter |
openRecordWriter(long blockId,
CompressOption compress)
打开
RecordWriter 用来写入数据 |
RecordWriter |
openRecordWriter(long blockId,
CompressOption compress,
long blockVersion) |
void |
writeBlock(long blockId,
RecordPack pack)
打开http链接,写入pack数据,然后关闭链接,多次向同一个block写入时会覆盖之前数据
|
void |
writeBlock(long blockId,
RecordPack pack,
long timeout)
打开http链接,写入pack数据,然后关闭链接,多次向同一个block写入时会覆盖之前数据
|
void |
writeBlock(long blockId,
RecordPack pack,
long timeout,
long blockVersion) |
public boolean isShouldTransform()
public Long getAvailBlockId()
TunnelBufferedWriter
将通过这个接口获得写入的 blockId
为了防止 blockId 重复分配,对于 curBlockId 的访问必须加锁。TunnelException
public void commit() throws TunnelException, IOException
TunnelException
IOException
public void writeBlock(long blockId, RecordPack pack) throws IOException
blockId
- 块标识pack
- pack数据IOException
public void writeBlock(long blockId, RecordPack pack, long timeout) throws IOException
blockId
- 块标识pack
- pack数据timeout
- 超时时间 单位 ms 仅对 ProtobufRecordPack 有效 <=0 无超时IOException
public void writeBlock(long blockId, RecordPack pack, long timeout, long blockVersion) throws IOException, TunnelException
IOException
TunnelException
public RecordWriter openRecordWriter(long blockId) throws TunnelException, IOException
RecordWriter
用来写入数据
BlockId是由用户选取的0~19999之间的数值,标识本次上传数据块
blockId
- 块标识TunnelException
IOException
public RecordWriter openRecordWriter(long blockId, boolean compress) throws TunnelException, IOException
RecordWriter
用来写入数据blockId
- 块标识compress
- 数据传输是否进行压缩TunnelException
IOException
public RecordWriter openRecordWriter(long blockId, CompressOption compress) throws TunnelException, IOException
RecordWriter
用来写入数据blockId
- 块标识compress
- 数据传输是否进行压缩TunnelException
IOException
public RecordWriter openRecordWriter(long blockId, CompressOption compress, long blockVersion) throws TunnelException, IOException
TunnelException
IOException
public RecordWriter openBufferedWriter() throws TunnelException
TunnelBufferedWriter
用来写入数据TunnelException
public RecordWriter openBufferedWriter(boolean compress) throws TunnelException
TunnelBufferedWriter
用来写入数据compress
- 数据传输是否进行压缩TunnelException
public RecordWriter openBufferedWriter(CompressOption compressOption) throws TunnelException
TunnelBufferedWriter
用来写入数据compressOption
- 数据传输压缩选项TunnelException
public RecordWriter openBufferedWriter(CompressOption compressOption, long timeout) throws TunnelException
TunnelBufferedWriter
用来写入数据compressOption
- 数据传输压缩选项timeout
- 超时时间 单位 ms <=0 代表无超时. 推荐值: (BufferSizeInMB / UploadBandwidthInMB) * 1000 * 120%TunnelException
public RecordWriter openBufferedWriter(CompressOption compressOption, long timeout, TableTunnel.BlockVersionProvider versionProvider) throws TunnelException
TunnelBufferedWriter
用来写入数据compressOption
- 数据传输压缩选项timeout
- 超时时间 单位 ms <=0 代表无超时. 推荐值: (BufferSizeInMB / UploadBandwidthInMB) * 1000 * 120%versionProvider
- BlockVersion 提供者,为内部产生的 blockId 分别指定 block version, null 代表不使用此功能TunnelException
public org.apache.arrow.vector.types.pojo.Schema getArrowSchema()
public ArrowRecordWriter openArrowRecordWriter(long blockId) throws TunnelException, IOException
TunnelException
IOException
public ArrowRecordWriter openArrowRecordWriter(long blockId, CompressOption option) throws TunnelException, IOException
TunnelException
IOException
public ArrowRecordWriter openArrowRecordWriter(long blockId, CompressOption option, long blockVersion) throws TunnelException, IOException
TunnelException
IOException
public void commit(Long[] blocks) throws TunnelException, IOException
blcoks表示用户记录的已经成功上传的数据块列表,用来与服务器端做完整性校验
blocks
- 用户预期已经上传成功的数据块列表TunnelException
- 如果提供的Block列表与Server端存在的Block不一致抛出异常IOException
public String getId()
public TableSchema getSchema()
public String getQuotaName()
public TableTunnel.UploadStatus getStatus() throws TunnelException, IOException
TunnelException
IOException
public Configuration getConfig()
public Record newRecord()
public RecordPack newRecordPack() throws IOException
IOException
public RecordPack newRecordPack(CompressOption option) throws IOException
option
- IOException
public RecordPack newRecordPack(int capacity, CompressOption option) throws IOException
capacity
- option
- IOException
public Long[] getBlockList() throws TunnelException, IOException
TunnelException
IOException
public String getResource()
Copyright © 2024 Alibaba Cloud Computing. All rights reserved.