Tokenize
Since Camel 2.0
The tokenizer language is a built-in language in camel-core, which is
most often used only with the Splitter EIP to split
a message using a token-based strategy.
The tokenizer language is intended to tokenize text documents using a
specified delimiter pattern. It can also be used to tokenize XML
documents with some limited capability. For a truly XML-aware
tokenization, the use of the XMLTokenizer
language is recommended as it offers a faster, more efficient
tokenization specifically for XML documents. For more details
see Splitter.
Tokenize Options
The Tokenize language supports 10 options, which are listed below.
Name | Default | Java Type | Description |
---|---|---|---|
token |
|
The (start) token to use as tokenizer, for example you can use the new line token. You can use simple language as the token to support dynamic tokens. |
|
endToken |
|
The end token to use as tokenizer if using start/end token pairs. You can use simple language as the token to support dynamic tokens. |
|
inheritNamespaceTagName |
|
To inherit namespaces from a root/parent tag name when using XML You can use simple language as the tag name to support dynamic names. |
|
headerName |
|
Name of header to tokenize instead of using the message body. |
|
regex |
|
|
If the token is a regular expression pattern. The default value is false |
xml |
|
|
Whether the input is XML messages. This option must be set to true if working with XML payloads. |
includeTokens |
|
|
Whether to include the tokens in the parts when using pairs The default value is false |
group |
|
To group N parts together, for example to split big files into chunks of 1000 lines. You can use simple language as the group to support dynamic group sizes. |
|
skipFirst |
|
|
To skip the very first element |
trim |
|
|
Whether to trim the value to remove leading and trailing whitespaces and line breaks |