java.lang.Object

org.archive.crawler.framework.Engine

public class Engine extends Object

Implementation for Engine. Jobs and profiles are stored in a directory called the jobsDir. The jobs are contained as subdirectories of jobDir.

Author:: pjack, gojomo

Field Summary

Fields

Modifier and Type

Field

Description

protected HashMap<String,CrawlJob>

jobConfigs

map of job short names -> CrawlJob instances

protected File

jobsDir

directory where job directories are expected

static final String

LOGS_DIR_NAME

protected String

profileCxmlPath

static final String

REPORTS_DIR_NAME
Constructor Summary

Constructors

Constructor

Description

Engine(File jobsDir)
Method Summary

Modifier and Type

Method

Description

boolean

addJobDirectory(File dir)

Adds a job directory to the Engine known jobConfigs if not extant.

void

copy(CrawlJob orig, File destDir, boolean asProfile)

Copy a job to a new location, possibly making a job a profile or a profile a runnable job.

void

copy(CrawlJob cj, String copyTo, boolean asProfile)

Copy a job to a new location, possibly making a job a profile or a profile a runnable job.

boolean

createNewJobWithDefaults(File newJobDir)

create a new job dir and copy profile CXML into as non-profile CXML

void

deleteJob(CrawlJob job)

void

findJobConfigs()

Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory

String

getHeritrixVersion()

CrawlJob

getJob(String shortName)

Map<String,CrawlJob>

getJobConfigs()

protected File

getJobDirectoryFrom(File jobPathFile)

Return the job directory File read from the supplied ".jobpath" file, or null on any error.

File

getJobsDir()

protected InputStream

getProfileCxmlResource()

String

heapReport()

Map<String,Object>

heapReportData()

void

requestLaunch(String shortName)

void

shutdown()

boolean

waitForNoRunningJobs(long timeout)

Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses.

void

writeJobPathFile(CrawlJob job)

Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOGS_DIR_NAME
  
  public static final String LOGS_DIR_NAME
  See Also:
  
  Constant Field Values
- REPORTS_DIR_NAME
  
  public static final String REPORTS_DIR_NAME
  See Also:
  
  Constant Field Values
- jobsDir
  
  protected File jobsDir
  
  directory where job directories are expected
- jobConfigs
  
  protected HashMap<String,CrawlJob> jobConfigs
  
  map of job short names -> CrawlJob instances
- profileCxmlPath
  
  protected String profileCxmlPath
Constructor Details
- Engine
  
  public Engine(File jobsDir)
Method Details
- findJobConfigs
  
  public void findJobConfigs()
  
  Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory
- getJobDirectoryFrom
  
  protected File getJobDirectoryFrom(File jobPathFile)
  
  Return the job directory File read from the supplied ".jobpath" file, or null on any error.
- addJobDirectory
  
  public boolean addJobDirectory(File dir)
  
  Adds a job directory to the Engine known jobConfigs if not extant.
  
  Parameters:
  
  dir - directory to be added
  
  Returns:
  
  true if directory successfully added, false for any failure
- getJobConfigs
  
  public Map<String,CrawlJob> getJobConfigs()
- copy
  
  public void copy(CrawlJob orig, File destDir, boolean asProfile) throws IOException
  
  Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
  
  Parameters:
  
  orig - CrawlJob representing source
  
  destDir - File location destination
  
  asProfile - true if destination should become a profile
  
  Throws:
  
  IOException
- copy
  
  public void copy(CrawlJob cj, String copyTo, boolean asProfile) throws IOException
  
  Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
  
  Parameters:
  
  cj - CrawlJob representing source
  
  copyTo - String location destination; interpreted relative to jobsDir
  
  asProfile - true if destination should become a profile
  
  Throws:
  
  IOException
- getHeritrixVersion
  
  public String getHeritrixVersion()
- deleteJob
  
  public void deleteJob(CrawlJob job) throws IOException
  
  Throws:
  
  IOException
- requestLaunch
  
  public void requestLaunch(String shortName)
- getJob
  
  public CrawlJob getJob(String shortName)
- getJobsDir
  
  public File getJobsDir()
- heapReportData
  
  public Map<String,Object> heapReportData()
- heapReport
  
  public String heapReport()
- shutdown
  
  public void shutdown()
- waitForNoRunningJobs
  
  public boolean waitForNoRunningJobs(long timeout)
  
  Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses. Use '0' for no timeout (wait as long as necessary.
  
  Parameters:
  
  timeout -
  
  Returns:
  
  true if timeout occurred and a job is (possibly) still running
- getProfileCxmlResource
  
  protected InputStream getProfileCxmlResource()
  
  Returns:
  
  InputStream resource from defined profile CXML path
- createNewJobWithDefaults
  
  public boolean createNewJobWithDefaults(File newJobDir)
  
  create a new job dir and copy profile CXML into as non-profile CXML
  
  Parameters:
  
  newJobDir - new job directory
- writeJobPathFile
  
  public void writeJobPathFile(CrawlJob job) throws IOException
  
  Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.
  
  Parameters:
  
  job - CrawlJob whose main directory the .jobpath should point to
  
  Throws:
  
  IOException - for any IO error

Class Engine

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOGS_DIR_NAME

REPORTS_DIR_NAME

jobsDir

jobConfigs

profileCxmlPath

Constructor Details

Engine

Method Details

findJobConfigs

getJobDirectoryFrom

addJobDirectory

getJobConfigs

copy

copy

getHeritrixVersion

deleteJob

requestLaunch

getJob

getJobsDir

heapReportData

heapReport

shutdown

waitForNoRunningJobs

getProfileCxmlResource

createNewJobWithDefaults

writeJobPathFile