Class RobotsTags
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.RobotsTags
-
public class RobotsTags extends Object
Normalises the robots instructions provided by the HTML meta tags or the HTTP X-Robots-Tag headers.
-
-
Field Summary
Fields Modifier and Type Field Description static String
ROBOTS_NO_CACHE
static String
ROBOTS_NO_FOLLOW
static String
ROBOTS_NO_FOLLOW_STRICT
Whether to interpret the noFollow directive strictly (remove links) or not (remove anchor and do not track original URL).static String
ROBOTS_NO_INDEX
-
Constructor Summary
Constructors Constructor Description RobotsTags()
RobotsTags(Metadata metadata, String protocolMDprefix)
Get the values from the fetch metadata *
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extractMetaTags(String content)
Extracts meta tags based on the value of the content attribute *void
extractMetaTags(DocumentFragment doc)
boolean
isNoCache()
boolean
isNoFollow()
boolean
isNoIndex()
void
normaliseToMetadata(Metadata metadata)
Adds a normalised representation of the directives in the metadata *
-
-
-
Field Detail
-
ROBOTS_NO_INDEX
public static final String ROBOTS_NO_INDEX
- See Also:
- Constant Field Values
-
ROBOTS_NO_FOLLOW
public static final String ROBOTS_NO_FOLLOW
- See Also:
- Constant Field Values
-
ROBOTS_NO_FOLLOW_STRICT
public static final String ROBOTS_NO_FOLLOW_STRICT
Whether to interpret the noFollow directive strictly (remove links) or not (remove anchor and do not track original URL). True by default.- See Also:
- Constant Field Values
-
ROBOTS_NO_CACHE
public static final String ROBOTS_NO_CACHE
- See Also:
- Constant Field Values
-
-
Method Detail
-
extractMetaTags
public void extractMetaTags(DocumentFragment doc) throws XPathExpressionException
- Throws:
XPathExpressionException
-
extractMetaTags
public void extractMetaTags(String content)
Extracts meta tags based on the value of the content attribute *
-
normaliseToMetadata
public void normaliseToMetadata(Metadata metadata)
Adds a normalised representation of the directives in the metadata *
-
isNoIndex
public boolean isNoIndex()
-
isNoFollow
public boolean isNoFollow()
-
isNoCache
public boolean isNoCache()
-
-