Class URLUtil
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.URLUtil
-
public class URLUtil extends Object
Utility class for URL analysis
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
getHost(String url)
Returns the lowercased hostname for the url or null if the url is not well formed.static String[]
getHostSegments(String url)
Partitions of the hostname of the url by "."static String[]
getHostSegments(URL url)
Partitions of the hostname of the url by "."static String
getPage(String url)
Returns the page for the url.static URL
resolveURL(URL base, String target)
Resolve relative URL-s and fix a few java.net.URL errors in handling of URLs with embedded params and pure query targets.static String
toASCII(String url)
static String
toUNICODE(String url)
-
-
-
Method Detail
-
resolveURL
public static URL resolveURL(URL base, String target) throws MalformedURLException
Resolve relative URL-s and fix a few java.net.URL errors in handling of URLs with embedded params and pure query targets.- Parameters:
base
- base urltarget
- target url (may be relative)- Returns:
- resolved absolute url.
- Throws:
MalformedURLException
-
getHostSegments
public static String[] getHostSegments(URL url)
Partitions of the hostname of the url by "."
-
getHostSegments
public static String[] getHostSegments(String url) throws MalformedURLException
Partitions of the hostname of the url by "."- Throws:
MalformedURLException
-
getHost
public static String getHost(String url)
Returns the lowercased hostname for the url or null if the url is not well formed.- Parameters:
url
- The url to check.- Returns:
- String The hostname for the url.
-
getPage
public static String getPage(String url)
Returns the page for the url. The page consists of the protocol, host, and path, but does not include the query string. The host is lowercased but the path is not.- Parameters:
url
- The url to check.- Returns:
- String The page for the url.
-
-