|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.parser.callback.DefaultCallback
it.unimi.dsi.parser.callback.LinkExtractor
public class LinkExtractor
A callback extracting links.
This callbacks extracts links existing in the web page. The
links are then accessible in urls
(a set of String
s). Note that
we guarantee that the iteration order in the set is exactly
the order in which links have been met (albeit copies appear
just once).
Field Summary | |
---|---|
Set<String> |
urls
The URLs resulting from the parsing process. |
Fields inherited from interface it.unimi.dsi.parser.callback.Callback |
---|
EMPTY_CALLBACK_ARRAY |
Constructor Summary | |
---|---|
LinkExtractor()
|
Method Summary | |
---|---|
String |
base()
Returns the URL specified by the BASE element. |
void |
configure(BulletParser parser)
Configure the parser to parse elements and certain attributes. |
String |
metaLocation()
Returns the URL specified by META HTTP-EQUIV elements of location type. |
String |
metaRefresh()
Returns the URL specified by META HTTP-EQUIV elements of refresh type. |
void |
startDocument()
Receive notification of the beginning of the document. |
boolean |
startElement(Element element,
Map<Attribute,MutableString> attrMap)
Receive notification of the start of an element. |
Methods inherited from class it.unimi.dsi.parser.callback.DefaultCallback |
---|
cdata, characters, endDocument, endElement, getInstance |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public final Set<String> urls
Constructor Detail |
---|
public LinkExtractor()
Method Detail |
---|
public void configure(BulletParser parser)
The required attributes are SRC , HREF , HTTP-EQUIV , and CONTENT .
configure
in interface Callback
configure
in class DefaultCallback
public void startDocument()
Callback
The callback must use this method to reset its internal state so that it can be resued. It must be safe to invoke this method several times.
startDocument
in interface Callback
startDocument
in class DefaultCallback
public boolean startElement(Element element, Map<Attribute,MutableString> attrMap)
Callback
For simple elements, this is the only notification that the callback will ever receive.
startElement
in interface Callback
startElement
in class DefaultCallback
element
- the element whose opening tag was found.attrMap
- a map from Attribute
s to MutableString
s.
public String metaLocation()
null
result iff there is at least one META HTTP-EQUIV
element specifying a location URL (if there is more than one, we keep the first one).
null
.public String base()
null
result iff there is at least one BASE element specifying a derelativisation URL
(if there is more than one, we keep the first one).
null
.public String metaRefresh()
null
result iff there is at least one META HTTP-EQUIV
element specifying a refresh URL (if there is more than one, we keep the first one).
null
.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |