Using WebAssembly to Accelerate Markdown Rendering

Markdown rendering is very important to the performance of Semaphor - every message you send and read is a Markdown document - so we're always looking for ways to improve the performance of rendering Markdown. A couple months ago Jonathan Moore and I wondered how easy it would be to integrate WebAssembly into a React component, replacing the render() function, and we thought that moving Markdown parsing into Rust would be a great way to test this idea out.

What we came up with is react-wasm-bridge, an experimental component that passes props into a Rust WebAssembly module and provides an interface to build React render trees (and more!).

Rust was a natural choice because it has moved very quickly towards supporting WebAssembly. Originally it supported compilation through Emscripten, which produced really large, bloated binaries. Emscripten was designed to support existing C/C++ code, and includes basic POSIX support, including a virtual filesystem. It's great when you want to port DOSBOX to the web, but it's a bit much when all you want to do is some calculation. Thankfully, WebAssembly was eventually supported as a direct target, which allowed for much more slim modules.

One of the standout efforts in Rust's WebAssembly support is wasm-bindgen. WebAssembly implements a very basic machine akin to low-level physical hardware. The only data type it really understands is numbers. Programmers of course understand that everything is just numbers, but the translation between JavaScript's high level concepts and WebAssembly's low-level concepts creates a pain point. It's not just tedious - it's very easy to get wrong! My very first implementation allocated and copied strings manually, and because I forgot to null-terminate the string, it would crash. Wasm-bindgen creates cross-language bindings that perform this drudgery for you by simply adding the #[wasm_bindgen] attribute to your function. You really shouldn't be writing a Rust/WebAssembly project without it.

We wanted a couple of things out of a decent Markdown renderer:

  1. It should be safe - parsing user input is a dangerous game, and the more we can do to isolate it, the better
  2. It should be fast - rendering messages is most of what Semaphor does

Semaphor's current Markdown renderer is markdown-it. It's a very robust and surprisingly fast implementation, but using it with React is not entirely safe. Since markdown-it outputs an HTML string, we have to inject it into a <div> with dangerouslySetInnerHTML. We've never really been happy with that solution.

So one of the goals of this new implementation is that it wouldn't involve any HTML injection. It would create elements (or element representations) directly. To this end we created a Builder class (again, using very cool wasm-bindgen features) that allows Rust code to construct a React element tree through a stateful procedural interface (we do want to create a declarative interface, but this was easier for the proof of concept and maps especially well to Markdown parsing). The fun thing about this Builder interface is that it can be theoretically used to build any kind of tree, like a JS object or DOM nodes (more on that later). And for the security conscious, you can make a restricted Builder that refuses to output certain elements or attributes.

In addition to the safety of more restrictive element generation, the WebAssembly environment acts as a sandbox. It has no access to JavaScript except via functions exported to it, making any code execution exploit in the parsing code far less useful to an attacker.

And of course we wanted it to be faster. It seems like you could gain speed by removing the HTML parsing step. But you must always bench it. How much faster is it really? Well, the first time I benchmarked it, it wasn't faster at all. And even after working on it for a bit, the answer is still "It depends."

The first problem is that loading and instantiating WebAssembly isn't as fast as it could be. Browsers are making strides in streamlining this process, but the initial load will still take some time compared to JS, which is very well optimized. If you only want to render one Markdown document, this would be a very poor approach.

The second problem is that the way React builds DOM is slow. The original Builder called React.createElement to make a tree suitable for returning from a React component's render function. But this turned out to be about 50% slower than the markdown-it solution. We were excited about the potential security advantages, but half again slower is a bitter pill to swallow.

After some discussion, we decided to try taking React out of the loop and create a Builder that outputs DOM nodes directly. After all, Semaphor's messages are immutable, so there's never a need to re-render them. And it's a slightly more fair comparison - our markdown-it approach also skips React. Adopting that approach made it far more competitive.

The final problem is that the WebAssembly is still on the bleeding edge. Initially I was only able to test in Firefox because wasm-bindgen and Webpack didn't yet support asynchronous loading and Chrome prohibited synchronous loading of WASM modules over 4KB. But when that was fixed, the results were surprising. In Firefox, markdown-it is still slightly faster. In Chrome, our WASM approach came out way ahead. All of these results are measured from componentWillMount to componentDidMount, in production/release mode, rendering 100 test documents.

Browser markdown-it WASM
Firefox 162ms 175ms
Chrome 197ms 84ms

As you can see, it's not a clear win if you need broad browser support. But the technology is improving every day and I expect this will change rapidly. And there are still improvements to be made to the bridge.

You can take a look at the Markdown implementation at https://github.com/SpiderOak/react-markdown-wasm (which of course built on the react-wasm-bridge at https://github.com/SpiderOak/react-wasm-bridge)

Lessons learned:

  1. Understand your problem. You can make useful optimizations when you nail down your use case.
  2. Don't assume WebAssembly is faster. Bench it. And bench it tomorrow, because it'll probably be different.
  3. WebAssembly can provide useful isolation for security purposes.
  4. React is still not fast at rendering deep trees of trivial HTML elements.

I hope this was interesting and thank you for reading!

Using WebAssembly to Accelerate Markdown Rendering
Share this
All content is licensed with: