Class BuildHostMap

java.lang.Object
it.unimi.dsi.webgraph.BuildHostMap

public class BuildHostMap
extends java.lang.Object
A class computing host-related data given a list of URLs (usually, the URLs of the nodes of a web graph). All processing is performed by the static utility method run(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger).

Warning: this class provides a main method that saves the host list to standard output, but it does some logging, too, so be careful not to log to standard output.

Author:
Sebastiano Vigna
  • Field Summary

    Fields 
    Modifier and Type Field Description
    static java.util.regex.Pattern DOTTED_ADDRESS  
  • Constructor Summary

    Constructors 
    Constructor Description
    BuildHostMap()  
  • Method Summary

    Modifier and Type Method Description
    static void main​(java.lang.String[] arg)  
    static void run​(java.io.BufferedReader br, java.io.PrintStream hosts, java.io.DataOutputStream mapDos, java.io.DataOutputStream countDos, boolean topPrivateDomain, ProgressLogger pl)
    This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DOTTED_ADDRESS

      public static final java.util.regex.Pattern DOTTED_ADDRESS
  • Constructor Details

  • Method Details

    • run

      public static void run​(java.io.BufferedReader br, java.io.PrintStream hosts, java.io.DataOutputStream mapDos, java.io.DataOutputStream countDos, boolean topPrivateDomain, ProgressLogger pl) throws java.io.IOException, java.net.URISyntaxException
      This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.
      Parameters:
      br - the buffered reader returning the list of URLs.
      hosts - the print stream where hosts will be printed.
      mapDos - the data output stream where the map from URLs to hosts will be written (one integer per URL).
      countDos - the data output stream where the host counts will be written (one integer per host).
      topPrivateDomain - if true, we use InternetDomainName.topPrivateDomain() to map to top private domains, rather than hosts.
      pl - a progress logger, or null.
      Throws:
      java.io.IOException
      java.net.URISyntaxException
    • main

      public static void main​(java.lang.String[] arg) throws java.io.IOException, JSAPException, java.net.URISyntaxException
      Throws:
      java.io.IOException
      JSAPException
      java.net.URISyntaxException