Today, on 8 November 2025, Isaacus’ semchunk library hit one million monthly downloads for the first time, cementing its status as the world’s most popular semantic chunking algorithm.
Using a complex set of heuristics, semchunk breaks long documents up into smaller chunks while preserving as much context as possible, facilitating those high-information chunks to then be fed to AI models for processing, which often either do not support long context windows or suffer from performance degradation due to context overload.
semchunk has become an integral component of retrieval augmented generation pipelines, being used in Microsoft’s Intelligence Toolkit and Docling by IBM.
Over time, Isaacus has continued to improve the accuracy and efficiency of semchunk to the point where a standard consumer PC can now chunk the entire Gutenberg Corpus (which consists of 18 books and 3 billion tokens) in the span of three seconds.
Isaacus remains committed to delivering superior chunking and RAG performance to the AI community, particularly in the legal domain.
We recently released the Kanon 2 Embedder embedding model, currently ranked first at legal information retrieval on the Massive Legal Embedding Benchmark, and are now working on Kanon 2 Enricher, an entirely new type of AI model that will perform hierarchical document segmentation, mapping divisions, articles, clauses, paragraphs, and items within legal documents to their exact character offsets. Kanon 2 Enricher will also be capable of extracting and linking named entities, including people, organizations, locations, key dates, citations, and more.
To stay on top of our latest news, you can follow us on LinkedIn and Reddit.