Package org.archive.crawler.framework
Class Engine
java.lang.Object
org.archive.crawler.framework.Engine
Implementation for Engine. Jobs and profiles are stored in a
directory called the jobsDir. The jobs are contained as subdirectories of
jobDir.
- Author:
- pjack, gojomo
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionboolean
addJobDirectory
(File dir) Adds a job directory to the Engine known jobConfigs if not extant.void
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.void
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.boolean
createNewJobWithDefaults
(File newJobDir) create a new job dir and copy profile CXML into as non-profile CXMLvoid
void
Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directoryprotected File
getJobDirectoryFrom
(File jobPathFile) Return the job directory File read from the supplied ".jobpath" file, or null on any error.protected InputStream
void
requestLaunch
(String shortName) void
shutdown()
boolean
waitForNoRunningJobs
(long timeout) Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses.void
writeJobPathFile
(CrawlJob job) Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.
-
Field Details
-
LOGS_DIR_NAME
- See Also:
-
REPORTS_DIR_NAME
- See Also:
-
jobsDir
directory where job directories are expected -
jobConfigs
map of job short names -> CrawlJob instances -
profileCxmlPath
-
-
Constructor Details
-
Engine
-
-
Method Details
-
findJobConfigs
public void findJobConfigs()Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory -
getJobDirectoryFrom
Return the job directory File read from the supplied ".jobpath" file, or null on any error. -
addJobDirectory
Adds a job directory to the Engine known jobConfigs if not extant.- Parameters:
dir
- directory to be added- Returns:
- true if directory successfully added, false for any failure
-
getJobConfigs
-
copy
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.- Parameters:
orig
- CrawlJob representing sourcedestDir
- File location destinationasProfile
- true if destination should become a profile- Throws:
IOException
-
copy
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.- Parameters:
cj
- CrawlJob representing sourcecopyTo
- String location destination; interpreted relative to jobsDirasProfile
- true if destination should become a profile- Throws:
IOException
-
getHeritrixVersion
-
deleteJob
- Throws:
IOException
-
requestLaunch
-
getJob
-
getJobsDir
-
heapReportData
-
heapReport
-
shutdown
public void shutdown() -
waitForNoRunningJobs
public boolean waitForNoRunningJobs(long timeout) Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses. Use '0' for no timeout (wait as long as necessary.- Parameters:
timeout
-- Returns:
- true if timeout occurred and a job is (possibly) still running
-
getProfileCxmlResource
- Returns:
- InputStream resource from defined profile CXML path
-
createNewJobWithDefaults
create a new job dir and copy profile CXML into as non-profile CXML- Parameters:
newJobDir
- new job directory
-
writeJobPathFile
Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.- Parameters:
job
- CrawlJob whose main directory the .jobpath should point to- Throws:
IOException
- for any IO error
-