Class Engine

java.lang.Object
org.archive.crawler.framework.Engine

public class Engine extends Object
Implementation for Engine. Jobs and profiles are stored in a directory called the jobsDir. The jobs are contained as subdirectories of jobDir.
Author:
pjack, gojomo
  • Field Details

  • Constructor Details

    • Engine

      public Engine(File jobsDir)
  • Method Details

    • findJobConfigs

      public void findJobConfigs()
      Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory
    • getJobDirectoryFrom

      protected File getJobDirectoryFrom(File jobPathFile)
      Return the job directory File read from the supplied ".jobpath" file, or null on any error.
    • addJobDirectory

      public boolean addJobDirectory(File dir)
      Adds a job directory to the Engine known jobConfigs if not extant.
      Parameters:
      dir - directory to be added
      Returns:
      true if directory successfully added, false for any failure
    • getJobConfigs

      public Map<String,CrawlJob> getJobConfigs()
    • copy

      public void copy(CrawlJob orig, File destDir, boolean asProfile) throws IOException
      Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
      Parameters:
      orig - CrawlJob representing source
      destDir - File location destination
      asProfile - true if destination should become a profile
      Throws:
      IOException
    • copy

      public void copy(CrawlJob cj, String copyTo, boolean asProfile) throws IOException
      Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
      Parameters:
      cj - CrawlJob representing source
      copyTo - String location destination; interpreted relative to jobsDir
      asProfile - true if destination should become a profile
      Throws:
      IOException
    • getHeritrixVersion

      public String getHeritrixVersion()
    • deleteJob

      public void deleteJob(CrawlJob job) throws IOException
      Throws:
      IOException
    • requestLaunch

      public void requestLaunch(String shortName)
    • getJob

      public CrawlJob getJob(String shortName)
    • getJobsDir

      public File getJobsDir()
    • heapReportData

      public Map<String,Object> heapReportData()
    • heapReport

      public String heapReport()
    • shutdown

      public void shutdown()
    • waitForNoRunningJobs

      public boolean waitForNoRunningJobs(long timeout)
      Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses. Use '0' for no timeout (wait as long as necessary.
      Parameters:
      timeout -
      Returns:
      true if timeout occurred and a job is (possibly) still running
    • getProfileCxmlResource

      protected InputStream getProfileCxmlResource()
      Returns:
      InputStream resource from defined profile CXML path
    • createNewJobWithDefaults

      public boolean createNewJobWithDefaults(File newJobDir)
      create a new job dir and copy profile CXML into as non-profile CXML
      Parameters:
      newJobDir - new job directory
    • writeJobPathFile

      public void writeJobPathFile(CrawlJob job) throws IOException
      Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.
      Parameters:
      job - CrawlJob whose main directory the .jobpath should point to
      Throws:
      IOException - for any IO error