public interface PepperImporter extends PepperModule
A mapping task in the Pepper workflow is not a monolithic block. It consists of several smaller steps.
public MyModule() { super("Name of the module"); setSupplierContact(URI.createURI("Contact address of the module's supplier")); setSupplierHomepage(URI.createURI("homepage of the module")); setDesc("A short description of what is the intention of this module, for instance which formats are importable. "); this.addSupportedFormat("The name of a format which is importable e.g. txt", "The version corresponding to the format name", null); }
public boolean isReadyToStart() { return (true); }
public Double isImportable(URI corpusPath) { return null; }
PepperImporterImpl
and
PepperExporterImpl
provide an automatic mechanism to im- or export
the corpus-structure. This mechanism is adaptable step by step, according to
your specific purpose. Since many formats do not care about the
corpus-structure and they only encode the document-structure, the
corpus-structure is simultaneous to the file structure of a corpus. Pepper's
default mapping maps the root-folder to a root-corpus (SCorpus
object). A sub-folder then corresponds to a sub-corpus (SCorpus
object). The relation between super- and sub-corpus, is represented as a
SCorpusRelation
object. Following the assumption, that files contain
the document-structure, there is one SDocument
corresponding to each
file in a sub-folder. The SCorpus
and the SDocument
objects
are linked with a SCorpusDocumentRelation
.getIdentifier2ResourceTable()
. this.getDocumentEndings().add("file ending");You can also add the value
PepperModule.ENDING_LEAF_FOLDER
to import
not files but leaf folders as SDocument
objects. Another possibility
is to add the value PepperModule.ENDING_ALL_FILES
to import all files
no matter their ending.
PepperModule.createPepperMapper(Identifier)
a PepperMapper
object needs
to be initialized and returned. The PepperMapper
is the major part
major part doing the mapping. It provides the methods
PepperMapper.mapSCorpus()
to handle the mapping of a single
SCorpus
object and PepperMapper.mapSDocument()
to handle a
single SDocument
object. Both methods are invoked by the Pepper
framework. To set the PepperMapper.getResourceURI()
, which offers the
mapper the file or folder of the current SCorpus
or SDocument
object, this filed needs to be set in the
PepperModule.createPepperMapper(Identifier)
method. The following snippet shows a
dummy of that method:
public PepperMapper createPepperMapper(Identifier sElementId) { PepperMapper mapper = new PepperMapperImpl() { @Override public DOCUMENT_STATUS mapSCorpus() { // handling the mapping of a single corpus // accessing the current file or folder getResourceURI(); // returning, that the corpus was mapped successfully return (DOCUMENT_STATUS.COMPLETED); } @Override public DOCUMENT_STATUS mapSDocument() { // handling the mapping of a single document // accessing the current file or folder getResourceURI(); // returning, that the document was mapped successfully return (DOCUMENT_STATUS.COMPLETED); } }; // pass current file or folder to mapper. When using // PepperImporter.importCorpusStructure or // PepperExporter.exportCorpusStructure, the mapping between file or // folder // and SCorpus or SDocument was stored here mapper.setResourceURI(getIdentifier2ResourceTable().get(sElementId)); return (mapper); }
public void end() { super.end(); // do some clean up like closing of streams etc. }
Modifier and Type | Field and Description |
---|---|
static String |
NEGATIVE_FILE_EXTENSION_MARKER
A character or character sequence to mark a file extension as not to be
one of the imported ones.
|
ENDING_ALL_FILES, ENDING_FOLDER, ENDING_LEAF_FOLDER, ENDING_TAB, ENDING_TXT, ENDING_XML
Modifier and Type | Method and Description |
---|---|
FormatDesc |
addSupportedFormat(String formatName,
String formatVersion,
org.eclipse.emf.common.util.URI formatReference) |
CorpusDesc |
getCorpusDesc()
TODO docu
|
Collection<String> |
getCorpusEndings()
Returns a collection of all file endings for a
SCorpus object. |
Collection<String> |
getDocumentEndings()
Returns list containing all format endings for files, which are
importable and could be mapped to
SDocument or
SDocumentGraph objects by this Pepper module. |
Map<org.corpus_tools.salt.graph.Identifier,org.eclipse.emf.common.util.URI> |
getIdentifier2ResourceTable()
Stores
Identifier objects corresponding to either a
SDocument or a SCorpus object, which has been created
during the run of importCorpusStructure(SCorpusGraph) . |
Collection<String> |
getIgnoreEndings()
Returns a collection of filenames, not to be imported.
|
List<FormatDesc> |
getSupportedFormats()
Returns a list of formats, which are importable by this
PepperImporter object. |
void |
importCorpusStructure(org.corpus_tools.salt.common.SCorpusGraph corpusGraph)
This method is called by Pepper at the start of a conversion process to
create the corpus-structure.
|
Double |
isImportable(org.eclipse.emf.common.util.URI corpusPath)
This method is called by Pepper and returns if a corpus located at the
given
URI is importable by this importer. |
void |
setCorpusDesc(CorpusDesc corpusDesc)
TODO docu
|
org.corpus_tools.salt.SALT_TYPE |
setTypeOfResource(org.eclipse.emf.common.util.URI resource)
This method is a callback and can be overridden by derived importers.
|
createPepperMapper, done, done, end, getComponentContext, getCorpusGraph, getDesc, getFingerprint, getModuleController, getModuleType, getName, getProgress, getProgress, getProperties, getResources, getSaltProject, getSelfTestDesc, getStartProblems, getSupplierContact, getSupplierHomepage, getSymbolicName, getTemproraries, getVersion, isMultithreaded, isReadyToStart, proposeImportOrder, setCorpusGraph, setDesc, setIsMultithreaded, setPepperModuleController_basic, setPepperModuleController, setProperties, setResources, setSaltProject, setSupplierContact, setSupplierHomepage, setSymbolicName, setTemproraries, setVersion, start, start
static final String NEGATIVE_FILE_EXTENSION_MARKER
List<FormatDesc> getSupportedFormats()
PepperImporter
object.CorpusDesc getCorpusDesc()
void setCorpusDesc(CorpusDesc corpusDesc)
Map<org.corpus_tools.salt.graph.Identifier,org.eclipse.emf.common.util.URI> getIdentifier2ResourceTable()
Identifier
objects corresponding to either a
SDocument
or a SCorpus
object, which has been created
during the run of importCorpusStructure(SCorpusGraph)
.
Corresponding to the Identifier
object this table stores the
resource from where the element shall be imported.corpus_1 | /home/me/corpora/myCorpus |
corpus_2 | /home/me/corpora/myCorpus/subcorpus |
doc_1 | /home/me/corpora/myCorpus/subcorpus/document1.xml |
doc_2 | /home/me/corpora/myCorpus/subcorpus/document2.xml |
Collection<String> getDocumentEndings()
SDocument
or
SDocumentGraph
objects by this Pepper module.Collection<String> getCorpusEndings()
SCorpus
object.
See . This list contains per default value
. To remove the default value, call
Collection.remove(Object)
on getCorpusEndings()
. To add
endings to the collection, call Collection#add(Ending)
and to
remove endings from the collection, call
Collection#remove(Ending)
.Collection<String> getIgnoreEndings()
Collection#add(Ending)
and to remove endings from the collection,
call Collection#remove(Ending)
.org.corpus_tools.salt.SALT_TYPE setTypeOfResource(org.eclipse.emf.common.util.URI resource)
importCorpusStructure(SCorpusGraph)
). During the traversal of
the file-structure the method
importCorpusStructure(SCorpusGraph)
calls this method for each
resource, to determine if the resource either represents a
SCorpus
, a SDocument
object or shall be ignored. getDocumentEndings()
SALT_TYPE.SDOCUMENT
is returned
getCorpusEndings()
SALT_TYPE#SCorpus
is returnedgetDocumentEndings()
contains PepperModule.ENDING_ALL_FILES
,
for each file (which is not a folder) SALT_TYPE.SDOCUMENT
is
returnedgetDocumentEndings()
contains PepperModule.ENDING_LEAF_FOLDER
, for each leaf folder SALT_TYPE.SDOCUMENT
is returnedgetCorpusEndings()
contains PepperModule.ENDING_FOLDER
, for
each folder SALT_TYPE.SCORPUS
is returnedresource
- URI
resource to be specifiedSALT_TYPE.SCORPUS
if resource represents a
SCorpus
object, SALT_TYPE.SDOCUMENT
if resource
represents a SDocument
object or null, if it shall be
igrnored.void importCorpusStructure(org.corpus_tools.salt.common.SCorpusGraph corpusGraph) throws PepperModuleException
SCorpus
), documents
(represented represented via the Salt element SDocument
) and a
linking between corpora and a corpus and a document (represented via the
Salt element SCorpusRelation
and SCorpusDocumentRelation
). Each corpus corpus can contain 0..* subcorpus and 0..* documents, but
a corpus cannot contain both document and corpus. setTypeOfResource(URI)
is called to set the type of the
resource. If the type is a SALT_TYPE.SDOCUMENT
a
SDocument
object is created for the resource, if the type is a
SALT_TYPE.SCORPUS
a SCorpus
object is created, if the
type is null, the resource is ignored.corpusGraph
- an empty graph given by Pepper, which shall contains the
corpus structurePepperModuleException
FormatDesc addSupportedFormat(String formatName, String formatVersion, org.eclipse.emf.common.util.URI formatReference)
Double isImportable(org.eclipse.emf.common.util.URI corpusPath)
URI
is importable by this importer. If yes, 1 must be
returned, if no 0 must be returned. If it is not quite sure, if the given
corpus is importable by this importer any value between 0 and 1 can be
returned. If this method is not overridden, null is returned.Copyright © 2009–2019 Humboldt-Universität zu Berlin, INRIA. All rights reserved.