Tree Diffing

This page summarises some of the other tree diffing tools available.

If you're in a hurry, start by looking at Autochrome. It's extremely capable, and has an excellent description of the design.

If you're interested in a summary of the academic literature, this blog post (and its accompanying paper -- mirrored under a CC BY-NC license) are great resources.

json-diff (2012)

Languages: JSON
Algorithm: Pairwise comparison
Output: CLI colours

json-diff performs a structural diff of JSON files. It considers subtrees to be different if they don't match exactly, so e.g. "foo" and ["foo"] are entirely different.

json-diff is also noteworthy for its extremely readable display of results.

GumTree (2014)

Languages: ~10 programming languages
Parser: Several, including srcML
Algorithm: Top-down, then bottom-up
Ouput: HTML, Swing GUI, or text

GumTree can parse several programming languages and then performs a tree-based diff, outputting an HTML display.

The GumTree algorithm is described in the associated paper 'Fine-grained and accurate source code differencing' by Falleri et al (DOI, PDF). It performs a greedy top-down search for identical subtrees, then performs a bottom-up search to match up the rest.

Tree Diff (2017)

Languages: S-expression data format
Algorithm: A* search
Output: Merged s-expression file

Tristan Hume wrote a tree diffing algorithm during his 2017 internship and Jane Street. The source code is not available, but he has a blog post discussing the design in depth.

This project finds minimal diffs between s-expression files used as configuration by Jane Street. It uses A* search to find the minimal diff between them, and builds a new s-expression with a section marked with :date-switch for the differing parts.

(Jane Street also has patdiff, but that seems to be a line-oriented diff with some whitespace/integer display polish. It doesn't understand that e.g. whitespace in "foo " is meaningful).

Autochrome (2017)

Languages: Clojure
Parser: Custom, preserves comments
Algorithm: Dijkstra (previously A* search)
Output: HTML

Autochrome parses Clojure with a custom parser that preserves comments. Autochrome uses Dijkstra's algorithm to compare syntax trees.

Autochrome's webpage includes worked examples of the algorithm and a discussion of design tradeoffs. It's a really great resource for understanding tree diffing techniques in general.

graphtage (2020)

Languages: JSON, XML, HTML, YAML, plist, and CSS
Parser: json5, pyYAML, ignores comments
Algorithm: Levenshtein distance
Output: CLI colours

graphtage compares structured data by parsing into a generic file format, then displaying a diff. It even allows things like diffing JSON against YAML.

As with json-diff, it does not consider ["foo"] and "foo" to have any similarities.

Diffsitter (2020)

Parser: Tree-sitter
Algorithm: Longest-common-subsequence
Output: CLI colours

Diffsitter is another tree-sitter based diff tool. It uses LCS diffing on the leaves of the syntax tree.

sdiff (2021)

Languages: Scheme
Parser: Scheme's built-in read, ignores comments
Algorithm: MH-Diff from the Chawathe paper
Output: CLI colours

Semantically meaningful S-expression diff: Tree-diff for lisp source code was presented at FOSDEM 2021.