Index a single file into elasticsearch.
Index a single file into elasticsearch.
to be indexed
to communicate with the elasticsearch instance
Index a single segment into elasticsearch.
Index a single segment into elasticsearch.
to be indexed
also describes the format of the segment
name of source for reference
index of segment in file (for deduplication)
to communicate with the elasticsearch instance
Index a file tree into the elasticSearch instance.
Index a file tree into the elasticSearch instance. Divides work into nThreads*4 Futures. Each future syncs on currentFile which is a logging variable, and then grabs the next file from the stream if it is not empty.
file stream to be indexed
a sequence of Futures each representing the work done by a thread on this file tree.
Index a file into the elasticsearch instance, following the convention of the waterloo corpus.
Index a file into the elasticsearch instance, following the convention of the waterloo corpus. Sentences are encapsulated by <SENT> ... </SENT> tags.
path to the input directory
to communicate with the elasticsearch instace
Build an index in ElasticSearch using the corpora specified in config.
On failure, dump serialized requests to this path.
Get Index Name and Index Type.
Take the config for a corpus, resolve paths, and return a simple object containing information about the corpus.
Regex used to split sentences in waterloo corpus.
CLI to build an Elastic Search index on Aristo corpora. In order to build the index, you need to have elasticsearch running. Download latest version of elasticsearch, go to the 'bin' folder and run it: ./elasticsearch Refer http://joelabrahamsson.com/elasticsearch-101/ to get started. Takes in Config object containing corpus and other information necessary to build the index.