Class URLGenerator
java.lang.Object
org.graphstream.stream.SourceBase
org.graphstream.algorithm.generator.BaseGenerator
org.graphstream.algorithm.generator.URLGenerator
- All Implemented Interfaces:
Generator,org.graphstream.stream.Source
- Direct Known Subclasses:
WikipediaGenerator
public class URLGenerator extends BaseGenerator
Generate a graph using the web. Some urls are given to start and the
generator will extract links on these pages. Each url is a node and there is
an edge between two urls when one has a link to the other. Links are
extracted using the "href" attribute of html elements.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classURLGenerator.Modestatic interfaceURLGenerator.URLFilterDefines url filter. -
Constructor Summary
Constructors Constructor Description URLGenerator(String... startFrom) -
Method Summary
Modifier and Type Method Description voidacceptOnlyMatchingURL(String regex)Can be used to filter url.voidaddHostFilter(String... hosts)Can be used to filter url according to the host.voidaddURL(String url)Add an url to process.voidbegin()Begin the graph generation.voiddeclineMatchingURL(String regex)Can be used to filter url.voidenableProgression(boolean on)booleannextEvents()Perform the next step in generating the graph.voidsetDepthLimit(int depthLimit)Set the maximum steps before stop.voidsetDirected(boolean on)Create directed edges.voidsetEdgeWeightAttribute(String attribute)Set the attribute key used to store weight of edges.voidsetMode(URLGenerator.Mode mode)Set the way that url are converted to node id.voidsetNodeWeightAttribute(String attribute)Set the attribute key used to store weight of nodes.voidsetThreadCount(int count)Set the amount of threads used to parse urls.Methods inherited from class org.graphstream.algorithm.generator.BaseGenerator
addEdgeAttribute, addEdgeAttribute, addEdgeAttribute, addEdgeLabels, addNodeAttribute, addNodeAttribute, addNodeAttribute, addNodeLabels, end, isUsingInternalGraph, removeEdgeAttribute, removeNodeAttribute, setDirectedEdges, setRandomSeed, setUseInternalGraphMethods inherited from class org.graphstream.stream.SourceBase
addAttributeSink, addElementSink, addSink, attributeSinks, clearAttributeSinks, clearElementSinks, clearSinks, elementSinks, removeAttributeSink, removeElementSink, removeSink, sendAttributeChangedEvent, sendAttributeChangedEvent, sendEdgeAdded, sendEdgeAdded, sendEdgeAttributeAdded, sendEdgeAttributeAdded, sendEdgeAttributeChanged, sendEdgeAttributeChanged, sendEdgeAttributeRemoved, sendEdgeAttributeRemoved, sendEdgeRemoved, sendEdgeRemoved, sendGraphAttributeAdded, sendGraphAttributeAdded, sendGraphAttributeChanged, sendGraphAttributeChanged, sendGraphAttributeRemoved, sendGraphAttributeRemoved, sendGraphCleared, sendGraphCleared, sendNodeAdded, sendNodeAdded, sendNodeAttributeAdded, sendNodeAttributeAdded, sendNodeAttributeChanged, sendNodeAttributeChanged, sendNodeAttributeRemoved, sendNodeAttributeRemoved, sendNodeRemoved, sendNodeRemoved, sendStepBegins, sendStepBegins
-
Constructor Details
-
Method Details
-
begin
public void begin()Description copied from interface:GeneratorBegin the graph generation. This usually is the place for initialization of the generator. After calling this method, call theGenerator.nextEvents()method to add elements to the graph. -
nextEvents
public boolean nextEvents()Description copied from interface:GeneratorPerform the next step in generating the graph. While this method returns true, there are still more elements to add to the graph to generate it. Be careful that some generators never return false here, since they can generate graphs of arbitrary size. For such generators, simply stop calling this method when enough elements have been generated. A call to this method can produce an undetermined number of nodes and edges. Checking nodes count is advisable when generating the graph to avoid an unwanted big graph.- Returns:
- true while there are elements to add to the graph.
-
addURL
Add an url to process.- Parameters:
url- a new url
-
setDirected
public void setDirected(boolean on)Create directed edges.- Parameters:
on- true to create directed edges
-
setNodeWeightAttribute
Set the attribute key used to store weight of nodes. Whenever a node is reached, its weight is increased by one.- Parameters:
attribute- attribute key of the weight of nodes
-
setEdgeWeightAttribute
Set the attribute key used to store weight of edges. Whenever an edge is reached, its weight is increased by one.- Parameters:
attribute- attribute key of the weight of edges
-
setMode
Set the way that url are converted to node id. When mode is Mode.FULL, then the id is the raw url. With Mode.PATH, the query of the url is truncated so the url http://host/path?what=xxx will be converted as http://host/path. With Mode.HOST, the url is converted to the host name so the url http://host/path will be converted as http://host.- Parameters:
mode- mode specifying how to convert url to have node id
-
setThreadCount
public void setThreadCount(int count)Set the amount of threads used to parse urls. Threads are created in thenextEvents()step. At the end of this method, all working thread have stop.- Parameters:
count- amount of threads
-
setDepthLimit
public void setDepthLimit(int depthLimit)Set the maximum steps before stop. If 0 or less, limit is disabled.- Parameters:
depthLimit- maximum steps before stop
-
enableProgression
public void enableProgression(boolean on) -
acceptOnlyMatchingURL
Can be used to filter url. Url not matching this regex will be discarded.- Parameters:
regex- regex used to filter url
-
declineMatchingURL
Can be used to filter url. Url matching this regex will be discarded.- Parameters:
regex- regex used to filter url
-
addHostFilter
Can be used to filter url according to the host. Note that several calls to this method may lead to discard all url. All hosts should be gived in a single call.- Parameters:
hosts- list of accepted hosts
-