org.carrot2.source.xml
Class XmlDocumentSource

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.source.xml.XmlDocumentSource
All Implemented Interfaces:
IDocumentSource, IProcessingComponent

public class XmlDocumentSource
extends ProcessingComponentBase
implements IDocumentSource

Fetches documents from XML files and streams. For additional flexibility, an XSLT stylesheet can be applied to the XML stream before it is deserialized into Carrot2 data.

See Also:
xml

Field Summary
 List<Document> documents
          Documents read from the XML data.
 String query
          After processing this field may hold the query read from the XML data, if any.
 boolean readAll
          If true, all documents are read from the input XML stream, regardless of the limit set by results.
 int results
          The maximum number of documents to read from the XML data if readAll is false.
 String title
          The title (file name or query attribute, if present) for the search result fetched from the resource.
 IResource xml
          The resource to load XML data from.
 Map<String,String> xmlParameters
          Values for custom placeholders in the XML URL.
 IResource xslt
          The resource to load XSLT stylesheet from.
 Map<String,String> xsltParameters
          Parameters to be passed to the XSLT transformer.
 
Constructor Summary
XmlDocumentSource()
           
 
Method Summary
 void init(IControllerContext context)
          Invoked after component's attributes marked with Init and Input annotations have been bound, but before calls to any other methods of this component.
 void process()
          Performs the processing required to fulfill the request.
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose
 

Field Detail

xml

public IResource xml
The resource to load XML data from. You can either create instances of IResource implementations directly or use ResourceLookup to look up IResource instances from a variety of locations.

One special IResource implementation you can use is URLResourceWithParams. It allows you to specify attribute placeholders in the URL that will be replaced with actual values at runtime. The placeholder format is ${attribute}. The following common attributes will be substituted:

Additionally, custom placeholders can be used. Values for the custom placeholders should be provided in the xmlParameters attribute.

Attribute label:
XML Resource
Attribute level:
Basic
Attribute group:
XML data

xslt

public IResource xslt
The resource to load XSLT stylesheet from. The XSLT stylesheet is optional and is useful when the source XML stream does not follow the Carrot2 format. The XSLT transformation will be applied to the source XML stream, the transformed XML stream will be deserialized into Documents.

The XSLT IResource can be provided both on initialization and processing time. The stylesheet provided on initialization will be cached for the life time of the component, while processing-time style sheets will be compiled every time processing is requested and will override the initialization-time stylesheet.

To pass additional parameters to the XSLT transformer, use the xsltParameters attribute.

Attribute label:
XSLT Stylesheet
Attribute level:
Medium
Attribute group:
XML transformation

xmlParameters

public Map<String,String> xmlParameters
Values for custom placeholders in the XML URL. If the type of resource provided in the xml attribute is URLResourceWithParams, this map provides values for custom placeholders found in the XML URL. Keys of the map correspond to placeholder names, values of the map will be used to replace the placeholders. Please see xml for the placeholder syntax.

Attribute label:
XML Parameters
Attribute level:
Advanced
Attribute group:
XML data

xsltParameters

public Map<String,String> xsltParameters
Parameters to be passed to the XSLT transformer. Keys of the map will be used as parameter names, values of the map as parameter values.

Attribute label:
XSLT Parameters
Attribute level:
Advanced
Attribute group:
XML transformation

query

public String query
After processing this field may hold the query read from the XML data, if any. For the semantics of this field on input, see xml.


results

public int results
The maximum number of documents to read from the XML data if readAll is false.


readAll

public boolean readAll
If true, all documents are read from the input XML stream, regardless of the limit set by results.

Attribute label:
Read all documents
Attribute level:
Basic
Attribute group:
Search query

title

public String title
The title (file name or query attribute, if present) for the search result fetched from the resource.


documents

public List<Document> documents
Documents read from the XML data.

Constructor Detail

XmlDocumentSource

public XmlDocumentSource()
Method Detail

init

public void init(IControllerContext context)
Description copied from interface: IProcessingComponent
Invoked after component's attributes marked with Init and Input annotations have been bound, but before calls to any other methods of this component. After a call to this method completes without an exception, attributes marked with Init Output will be collected. In this method, components should perform initializations based on the initialization-time attributes. This method is called once in the life time of a processing component instance.

Specified by:
init in interface IProcessingComponent
Overrides:
init in class ProcessingComponentBase
Parameters:
context - An instance of IControllerContext of the controller to which this component instance will be bound.

process

public void process()
             throws ProcessingException
Description copied from interface: IProcessingComponent
Performs the processing required to fulfill the request. This method is called once per request.

Specified by:
process in interface IProcessingComponent
Overrides:
process in class ProcessingComponentBase
Throws:
ProcessingException - when processing failed. If thrown, the IProcessingComponent.afterProcessing() method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.


Copyright (c) Dawid Weiss, Stanislaw Osinski