Skip to content

NLTKTextSplitter

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ implementationNoconstNo-Implementation
- chunk_sizeNointegerNo-Chunk Size
- chunk_overlapNointegerNo-Chunk Overlap
- keep_separatorNobooleanNo-Keep Separator
- strip_whitespaceNobooleanNo-Strip Whitespace
- separatorNostringNo-Separator
- languageNostringNo-Language

1. Property implementation

Title: Implementation

Typeconst
RequiredYes

Specific value: "NLTKTextSplitter"

2. Property chunk_size

Title: Chunk Size

Typeinteger
RequiredNo
Default4000

Description: Maximum size of chunks to return

3. Property chunk_overlap

Title: Chunk Overlap

Typeinteger
RequiredNo
Default200

Description: Overlap in characters between chunks

4. Property keep_separator

Title: Keep Separator

Typeboolean
RequiredNo
Defaultfalse

Description: Whether to keep the separator in the chunks

5. Property strip_whitespace

Title: Strip Whitespace

Typeboolean
RequiredNo
Defaulttrue

Description: If True, strips whitespace from the start and end of every document

6. Property separator

Title: Separator

Typestring
RequiredNo
Default"\n\n"

Description: Separator to split on

7. Property language

Title: Language

Typestring
RequiredNo
Default"english"

Description: Language to use for tokenization