org.apache.hadoop.hbase.master
Class SplitLogManager

java.lang.Object
  extended by org.apache.hadoop.hbase.zookeeper.ZooKeeperListener
      extended by org.apache.hadoop.hbase.master.SplitLogManager

@InterfaceAudience.Private
public class SplitLogManager
extends ZooKeeperListener

Distributes the task of log splitting to the available region servers. Coordination happens via zookeeper. For every log file that has to be split a znode is created under /hbase/splitlog. SplitLogWorkers race to grab a task.

SplitLogManager monitors the task znodes that it creates using the timeoutMonitor thread. If a task's progress is slow then resubmit(String, Task, ResubmitDirective) will take away the task from the owner SplitLogWorker and the task will be up for grabs again. When the task is done then the task's znode is deleted by SplitLogManager.

Clients call splitLogDistributed(Path) to split a region server's log files. The caller thread waits in this method until all the log files have been split.

All the zookeeper calls made by this class are asynchronous. This is mainly to help reduce response time seen by the callers.

There is race in this design between the SplitLogManager and the SplitLogWorker. SplitLogManager might re-queue a task that has in reality already been completed by a SplitLogWorker. We rely on the idempotency of the log splitting task for correctness.

It is also assumed that every log splitting task is unique and once completed (either with success or with error) it will be not be submitted again. If a task is resubmitted then there is a risk that old "delete task" can delete the re-submission.


Nested Class Summary
static interface SplitLogManager.TaskFinisher
          SplitLogManager can use objects implementing this interface to finish off a partially done task by SplitLogWorker.
 
Field Summary
static int DEFAULT_MAX_RESUBMIT
           
static int DEFAULT_TIMEOUT
           
static int DEFAULT_UNASSIGNED_TIMEOUT
           
static int DEFAULT_ZK_RETRIES
           
 boolean ignoreZKDeleteForTesting
           
protected  ReentrantLock recoveringRegionLock
          In distributedLogReplay mode, we need touch both splitlog and recovering-regions znodes in one operation.
 
Fields inherited from class org.apache.hadoop.hbase.zookeeper.ZooKeeperListener
watcher
 
Constructor Summary
SplitLogManager(ZooKeeperWatcher zkw, org.apache.hadoop.conf.Configuration conf, Stoppable stopper, MasterServices master, ServerName serverName, boolean masterRecovery)
          Wrapper around SplitLogManager(ZooKeeperWatcher zkw, Configuration conf, Stoppable stopper, MasterServices master, ServerName serverName, boolean masterRecovery, TaskFinisher tf) that provides a task finisher for copying recovered edits to their final destination.
SplitLogManager(ZooKeeperWatcher zkw, org.apache.hadoop.conf.Configuration conf, Stoppable stopper, MasterServices master, ServerName serverName, boolean masterRecovery, SplitLogManager.TaskFinisher tf)
          Its OK to construct this object even when region-servers are not online.
 
Method Summary
static void deleteRecoveringRegionZNodes(ZooKeeperWatcher watcher, List<String> regions)
           
 org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.SplitLogTask.RecoveryMode getRecoveryMode()
           
static org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.RegionStoreSequenceIds getRegionFlushedSequenceId(ZooKeeperWatcher zkw, String serverName, String encodedRegionName)
          This function is used in distributedLogReplay to fetch last flushed sequence id from ZK
static boolean isRegionMarkedRecoveringInZK(ZooKeeperWatcher zkw, String regionEncodedName)
          check if /hbase/recovering-regions/ exists.
 void nodeDataChanged(String path)
           
static long parseLastFlushedSequenceIdFrom(byte[] bytes)
           
 void setRecoveryMode(boolean isForInitialization)
          This function is to set recovery mode from outstanding split log tasks from before or current configuration setting
 long splitLogDistributed(List<org.apache.hadoop.fs.Path> logDirs)
          The caller will block until all the log files of the given region server have been processed - successfully split or an error is encountered - by an available worker region server.
 long splitLogDistributed(org.apache.hadoop.fs.Path logDir)
           
 long splitLogDistributed(Set<ServerName> serverNames, List<org.apache.hadoop.fs.Path> logDirs, org.apache.hadoop.fs.PathFilter filter)
          The caller will block until all the hbase:meta log files of the given region server have been processed - successfully split or an error is encountered - by an available worker region server.
 void stop()
           
 
Methods inherited from class org.apache.hadoop.hbase.zookeeper.ZooKeeperListener
getWatcher, nodeChildrenChanged, nodeCreated, nodeDeleted
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_TIMEOUT

public static final int DEFAULT_TIMEOUT
See Also:
Constant Field Values

DEFAULT_ZK_RETRIES

public static final int DEFAULT_ZK_RETRIES
See Also:
Constant Field Values

DEFAULT_MAX_RESUBMIT

public static final int DEFAULT_MAX_RESUBMIT
See Also:
Constant Field Values

DEFAULT_UNASSIGNED_TIMEOUT

public static final int DEFAULT_UNASSIGNED_TIMEOUT
See Also:
Constant Field Values

ignoreZKDeleteForTesting

public boolean ignoreZKDeleteForTesting

recoveringRegionLock

protected final ReentrantLock recoveringRegionLock
In distributedLogReplay mode, we need touch both splitlog and recovering-regions znodes in one operation. So the lock is used to guard such cases.

Constructor Detail

SplitLogManager

public SplitLogManager(ZooKeeperWatcher zkw,
                       org.apache.hadoop.conf.Configuration conf,
                       Stoppable stopper,
                       MasterServices master,
                       ServerName serverName,
                       boolean masterRecovery)
                throws InterruptedIOException,
                       org.apache.zookeeper.KeeperException
Wrapper around SplitLogManager(ZooKeeperWatcher zkw, Configuration conf, Stoppable stopper, MasterServices master, ServerName serverName, boolean masterRecovery, TaskFinisher tf) that provides a task finisher for copying recovered edits to their final destination. The task finisher has to be robust because it can be arbitrarily restarted or called multiple times.

Parameters:
zkw - the ZK watcher
conf - the HBase configuration
stopper - the stoppable in case anything is wrong
master - the master services
serverName - the master server name
masterRecovery - an indication if the master is in recovery
Throws:
org.apache.zookeeper.KeeperException
InterruptedIOException

SplitLogManager

public SplitLogManager(ZooKeeperWatcher zkw,
                       org.apache.hadoop.conf.Configuration conf,
                       Stoppable stopper,
                       MasterServices master,
                       ServerName serverName,
                       boolean masterRecovery,
                       SplitLogManager.TaskFinisher tf)
                throws InterruptedIOException,
                       org.apache.zookeeper.KeeperException
Its OK to construct this object even when region-servers are not online. It does lookup the orphan tasks in zk but it doesn't block waiting for them to be done.

Parameters:
zkw - the ZK watcher
conf - the HBase configuration
stopper - the stoppable in case anything is wrong
master - the master services
serverName - the master server name
masterRecovery - an indication if the master is in recovery
tf - task finisher
Throws:
org.apache.zookeeper.KeeperException
InterruptedIOException
Method Detail

splitLogDistributed

public long splitLogDistributed(org.apache.hadoop.fs.Path logDir)
                         throws IOException
Parameters:
logDir - one region sever hlog dir path in .logs
Returns:
cumulative size of the logfiles split
Throws:
IOException - if there was an error while splitting any log file
IOException

splitLogDistributed

public long splitLogDistributed(List<org.apache.hadoop.fs.Path> logDirs)
                         throws IOException
The caller will block until all the log files of the given region server have been processed - successfully split or an error is encountered - by an available worker region server. This method must only be called after the region servers have been brought online.

Parameters:
logDirs - List of log dirs to split
Returns:
cumulative size of the logfiles split
Throws:
IOException - If there was an error while splitting any log file

splitLogDistributed

public long splitLogDistributed(Set<ServerName> serverNames,
                                List<org.apache.hadoop.fs.Path> logDirs,
                                org.apache.hadoop.fs.PathFilter filter)
                         throws IOException
The caller will block until all the hbase:meta log files of the given region server have been processed - successfully split or an error is encountered - by an available worker region server. This method must only be called after the region servers have been brought online.

Parameters:
logDirs - List of log dirs to split
filter - the Path filter to select specific files for considering
Returns:
cumulative size of the logfiles split
Throws:
IOException - If there was an error while splitting any log file

deleteRecoveringRegionZNodes

public static void deleteRecoveringRegionZNodes(ZooKeeperWatcher watcher,
                                                List<String> regions)

nodeDataChanged

public void nodeDataChanged(String path)
Overrides:
nodeDataChanged in class ZooKeeperListener

stop

public void stop()

parseLastFlushedSequenceIdFrom

public static long parseLastFlushedSequenceIdFrom(byte[] bytes)
Parameters:
bytes - - Content of a failed region server or recovering region znode.
Returns:
long - The last flushed sequence Id for the region server

isRegionMarkedRecoveringInZK

public static boolean isRegionMarkedRecoveringInZK(ZooKeeperWatcher zkw,
                                                   String regionEncodedName)
                                            throws org.apache.zookeeper.KeeperException
check if /hbase/recovering-regions/ exists. Returns true if exists and set watcher as well.

Parameters:
zkw -
regionEncodedName - region encode name
Returns:
true when /hbase/recovering-regions/ exists
Throws:
org.apache.zookeeper.KeeperException

getRegionFlushedSequenceId

public static org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.RegionStoreSequenceIds getRegionFlushedSequenceId(ZooKeeperWatcher zkw,
                                                                                                                           String serverName,
                                                                                                                           String encodedRegionName)
                                                                                                                    throws IOException
This function is used in distributedLogReplay to fetch last flushed sequence id from ZK

Parameters:
zkw -
serverName -
encodedRegionName -
Returns:
the last flushed sequence ids recorded in ZK of the region for serverName
Throws:
IOException

setRecoveryMode

public void setRecoveryMode(boolean isForInitialization)
                     throws org.apache.zookeeper.KeeperException
This function is to set recovery mode from outstanding split log tasks from before or current configuration setting

Parameters:
isForInitialization -
Throws:
org.apache.zookeeper.KeeperException
InterruptedIOException

getRecoveryMode

public org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.SplitLogTask.RecoveryMode getRecoveryMode()


Copyright © 2007-2016 The Apache Software Foundation. All Rights Reserved.