NLTKTextSplitter
Property | Pattern | Type | Deprecated | Definition | Title/Description |
---|---|---|---|---|---|
+ implementation | No | const | No | - | Implementation |
- chunk_size | No | integer | No | - | Chunk Size |
- chunk_overlap | No | integer | No | - | Chunk Overlap |
- keep_separator | No | boolean | No | - | Keep Separator |
- strip_whitespace | No | boolean | No | - | Strip Whitespace |
- separator | No | string | No | - | Separator |
- language | No | string | No | - | Language |
1. Property implementation
Title: Implementation
Type | const |
Required | Yes |
Specific value: "NLTKTextSplitter"
2. Property chunk_size
Title: Chunk Size
Type | integer |
Required | No |
Default | 4000 |
Description: Maximum size of chunks to return
3. Property chunk_overlap
Title: Chunk Overlap
Type | integer |
Required | No |
Default | 200 |
Description: Overlap in characters between chunks
4. Property keep_separator
Title: Keep Separator
Type | boolean |
Required | No |
Default | false |
Description: Whether to keep the separator in the chunks
5. Property strip_whitespace
Title: Strip Whitespace
Type | boolean |
Required | No |
Default | true |
Description: If True
, strips whitespace from the start and end of every document
6. Property separator
Title: Separator
Type | string |
Required | No |
Default | "\n\n" |
Description: Separator to split on
7. Property language
Title: Language
Type | string |
Required | No |
Default | "english" |
Description: Language to use for tokenization