Tokenize

Since Camel 2.0

The tokenizer language is a built-in language in camel-core, which is most often used with the Split EIP to split a message using a token-based strategy.

The tokenizer language is intended to tokenize text documents using a specified delimiter pattern. It can also be used to tokenize XML documents with some limited capability. For a truly XML-aware tokenization, the use of the XML Tokenize language is recommended as it offers a faster, more efficient tokenization specifically for XML documents.

Tokenize Options

The Tokenize language supports 11 options, which are listed below.

Name Default Java Type Description

token

String

Required The (start) token to use as tokenizer, for example you can use the new line token. You can use simple language as the token to support dynamic tokens.

endToken

String

The end token to use as tokenizer if using start/end token pairs. You can use simple language as the token to support dynamic tokens.

inheritNamespaceTagName

String

To inherit namespaces from a root/parent tag name when using XML You can use simple language as the tag name to support dynamic names.

headerName

String

Name of header to tokenize instead of using the message body.

regex

Boolean

If the token is a regular expression pattern. The default value is false.

xml

Boolean

Whether the input is XML messages. This option must be set to true if working with XML payloads.

includeTokens

Boolean

Whether to include the tokens in the parts when using pairs The default value is false.

group

String

To group N parts together, for example to split big files into chunks of 1000 lines. You can use simple language as the group to support dynamic group sizes.

groupDelimiter

String

Sets the delimiter to use when grouping. If this has not been set then token will be used as the delimiter.

skipFirst

Boolean

To skip the very first element.

trim

true

Boolean

Whether to trim the value to remove leading and trailing whitespaces and line breaks.

Example

The following example shows how to take a request from the direct:a endpoint then split it into pieces using an Expression, then forward each piece to direct:b:

<route>
  <from uri="direct:a"/>
  <split>
    <tokenize token="\n"/>
    <to uri="direct:b"/>
  </split>
</route>

And in Java DSL:

from("direct:a")
    .split(body().tokenize("\n"))
        .to("direct:b");

For more examples see Split EIP.