Class DiskSpaceMonitor

java.lang.Object
org.archive.crawler.monitor.DiskSpaceMonitor
All Implemented Interfaces:
EventListener, org.springframework.context.ApplicationListener<org.springframework.context.ApplicationEvent>

public class DiskSpaceMonitor extends Object implements org.springframework.context.ApplicationListener<org.springframework.context.ApplicationEvent>
Monitors the available space on the paths configured. If the available space drops below a specified threshold a crawl pause is requested.

Monitoring is done via the java.io.File.getUsableSpace() method. This method will sometimes fail on network attached storage, returning 0 bytes available even if that is not actually the case.

Paths that do not resolve to actual filesystem folders or files will not be evaluated (i.e. if java.io.File.exists() returns false no further processing is carried out on that File).

Paths are checked available space whenever a StatSnapshotEvent occurs.

Author:
Kristinn Sigurðsson
  • Field Details

    • monitorPaths

      protected List<String> monitorPaths
    • pauseThresholdMiB

      protected long pauseThresholdMiB
    • controller

      protected CrawlController controller
    • configPathConfigurer

      protected org.archive.spring.ConfigPathConfigurer configPathConfigurer
    • monitorConfigPaths

      protected boolean monitorConfigPaths
  • Constructor Details

    • DiskSpaceMonitor

      public DiskSpaceMonitor()
  • Method Details

    • setMonitorPaths

      public void setMonitorPaths(List<String> monitorPaths)
      Parameters:
      monitorPaths - List of filesystem paths that should be monitored for available space.
    • getMonitorPaths

      public List<String> getMonitorPaths()
    • setPauseThresholdMiB

      public void setPauseThresholdMiB(long pauseThresholdMiB)
      Set the minimum amount of space that must be available on all monitored paths. If the amount falls below this pause threshold on any path the crawl will be paused.
      Parameters:
      pauseThresholdMiB - The desired pause threshold value. Specified in megabytes (MiB).
    • getPauseThresholdMiB

      public long getPauseThresholdMiB()
    • setMonitorConfigPaths

      public void setMonitorConfigPaths(boolean monitorConfigPaths)
      If enabled, all the paths returned by ConfigPathConfigurer.getAllConfigPaths() will be monitored in addition to any paths explicitly specified via setMonitorPaths(List).

      true by default.

      Note: This is not guaranteed to contain all paths that Heritrix writes to. It is the responsibility of modules that write to disk to register their activity with the ConfigPathConfigurer and some may not do so.

      Parameters:
      monitorConfigPaths - If config paths should be monitored for usable space.
    • getMonitorConfigPaths

      public boolean getMonitorConfigPaths()
    • setCrawlController

      @Autowired public void setCrawlController(CrawlController controller)
      Autowire access to CrawlController
    • getCrawlController

      public CrawlController getCrawlController()
    • setConfigPathConfigurer

      @Autowired public void setConfigPathConfigurer(org.archive.spring.ConfigPathConfigurer configPathConfigurer)
      Autowire access to ConfigPathConfigurer
    • getConfigPathConfigurer

      public org.archive.spring.ConfigPathConfigurer getConfigPathConfigurer()
    • onApplicationEvent

      public void onApplicationEvent(org.springframework.context.ApplicationEvent event)
      Checks available space on StatSnapshotEvents.
      Specified by:
      onApplicationEvent in interface org.springframework.context.ApplicationListener<org.springframework.context.ApplicationEvent>
    • checkAvailableSpace

      protected void checkAvailableSpace(File path)
      Probe via File.getUsableSpace to see if monitored paths have fallen below the pause threshold. If so, request a crawl pause.
      Parameters:
      path - The filesystem path to check for usable space