Arborium: Tree-sitter code highlighting with Native and WASM targets
arborium
Finding good tree-sitter grammars is hard. In arborium, every grammar:
- Is generated with tree-sitter 0.26
- Builds for WASM & native via cargo
- Has working highlight queries
We hand-picked grammars, added missing highlight queries, and updated them
to the latest tree-sitter. Tree-sitter parsers compiled to WASM need libc
symbols (especially a C allocator)—we provide
arborium-sysroot
which re-exports dlmalloc and other essentials for wasm32-unknown-unknown.
Output formats
HTML — custom elements like
instead of
. More compact markup. No
JavaScript required.
Traditional
fn
arborium
ANSI — 24-bit true color for terminal applications.
Platforms
macOS, Linux, Windows — tree-sitter handles
generating native crates for these platforms. Just add the
dependency and go.
WebAssembly — that one’s hard. Compiling Rust to
WASM with C code that assumes a standard library is tricky. We
provide a sysroot that makes this work, enabling
Rust-on-the-frontend scenarios like this demo.
Get Started
Rust (native or WASM)
Add to your Cargo.toml:
arborium = { version = "2", features = ["lang-rust"] }
Then highlight code:
let html = arborium::highlight("rust", source)?;
Script tag (zero config)
Add this to your HTML and all
blocks get highlighted automatically:Your code blocks should look like this:
fn main() {}fn main() {}fn main() {}Configure via data attributes:
With
data-manual, call
window.arborium.highlightAll()when ready.
npm (ESM)
For bundlers or manual control:
import { loadGrammar, highlight } from '@arborium/arborium';
const html = await highlight('rust', sourceCode);
Grammars are loaded on-demand from jsDelivr (configurable).
Integrations
Your crate docs
Highlight TOML, shell, and other languages in your rustdoc. Create
arborium-header.html:
Then in Cargo.toml:
[package.metadata.docs.rs]
rustdoc-args = ["--html-in-header", "arborium-header.html"]
docs.rs team
If you maintain docs.rs or rustdoc, you could integrate arborium
directly! Either merge this PR
for native rustdoc support, or use arborium-rustdoc as a
post-processing step:
# Process rustdoc output in-place
arborium-rustdoc ./target/doc ./target/doc-highlighted
It streams through HTML, finds
blocks, and highlights them in-place. Works with rustdoc's theme system.
miette-arborium
Syntax highlighting for miette error diagnostics. Beautiful, accurate highlighting in your CLI error messages.
use miette::GraphicalReportHandler;
use miette_arborium::ArboriumHighlighter;
let handler = GraphicalReportHandler::new()
.with_syntax_highlighting(ArboriumHighlighter::new());
dodeca
An incremental static site generator with zero-reload live updates via
WASM DOM patching, Sass/SCSS, image processing, font subsetting, and
arborium-powered syntax highlighting.
Nothing to configure—it just works. Arborium is built in
and automatically highlights all code blocks.
Languages
96 languages included, each behind a
feature flag. Enable only what you need, or use
all-languages for everything.
Each feature flag comment includes the grammar's license, so you always know
what you're shipping.
Theme support
The highlighter supports themes for both HTML and
ANSI output.
Bundled themes:
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Alabaster
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Ayu Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Ayu Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Catppuccin Frappé
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Catppuccin Latte
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Catppuccin Macchiato
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Catppuccin Mocha
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Cobalt2
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Dayfox
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Desert256
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Dracula
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
EF Melissa Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
GitHub Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
GitHub Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Gruvbox Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Gruvbox Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Kanagawa Dragon
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Light Owl
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Lucius Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Melange Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Melange Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Monokai
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Nord
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
One Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Rosé Pine Moon
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Rustdoc Ayu
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Rustdoc Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Rustdoc Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Solarized Dark
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Solarized Light
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Tokyo Night
fn main () {
let x = 42 ;
println! ( "Hello" ) ;
}
Zenburn
Custom themes can be defined programmatically using RGB colors and style
attributes (bold, italic, underline, strikethrough).
Grammar Sizes
Each grammar includes the full tree-sitter runtime embedded in its WASM module.
This adds a fixed overhead to every grammar bundle, on top of the grammar-specific parser tables.
Smallest
-
Average
-
Largest
-
Total
-
| Language | C Lines | Size | Distribution |
|---|
WASM Build Pipeline
Every grammar is compiled to WASM with aggressive size optimizations. Here's the complete build pipeline:
1. cargo build
We compile with nightly Rust using -Zbuild-std to rebuild the standard library with our optimization flags:
-Cpanic=immediate-abort
Skip unwinding machinery
-Copt-level=s
Optimize for size, not speed
-Clto=fat
Full link-time optimization across all crates
-Ccodegen-units=1
Single codegen unit for maximum optimization
-Cstrip=symbols
Remove debug symbols
2. wasm-bindgen
Generate JavaScript bindings with --target web for ES module output.
3. wasm-opt
Final size optimization pass with Binaryen's optimizer:
-Oz
Aggressive size optimization
--enable-bulk-memory
Faster memory operations
--enable-mutable-globals
Required for wasm-bindgen
--enable-simd
SIMD instructions where applicable
Despite all these optimizations, WASM bundles are still large because each one embeds the full tree-sitter runtime.
We're exploring ways to share the runtime across grammars, but that's the architecture trade-off for now.
FAQ
Why not
highlight.js or Shiki?
Those use regex-based tokenization (TextMate grammars). Regexes
can't count brackets, track scope, or understand structure—they just
pattern-match.
Tree-sitter actually parses your code into a syntax tree,
so it knows that fn is a keyword only in the right
context, handles deeply nested structures correctly, and recovers
gracefully from syntax errors.
IDEs with LSP support (like rust-analyzer) can do even better with
semantic highlighting—they understand types and
dependencies across files—but tree-sitter gets you 90% of the way
there without needing a full language server.
Why the name
"arborium"?
Arbor is Latin for tree (as in tree-sitter), and
-ium denotes a place or collection (like aquarium,
arboretum).
It's a place where tree-sitter grammars live.
I
have a grammar that's not included. Can you add it?
Yes!
Open an issue
on the repo with a link to the grammar.
We'll review it and add it if the grammar and highlight queries are
in good shape.
Why not use the WASM builds from tree-sitter CLI?
When doing full-stack Rust, it's nice to have exactly the
same code on the frontend and the backend.
Rust crates compile to both native and WASM, so you get one
dependency that works everywhere.
Why are
tree-sitter parsers so large?
Tree-sitter uses table-driven LR parsing. The grammar compiles down
to massive state transition tables—every possible parser state and
every possible token gets an entry.
These tables are optimized for O(1) lookup speed, not size. A
complex grammar like TypeScript can have tens of thousands of
states.
The tradeoff is worth it: you get real parsing (not regex hacks)
that handles edge cases correctly and recovers gracefully from
syntax errors.