Natural Graph Networks: A Fresh Take on Graph Neural Networks
Written by: Claude Sonnet 4
Directed by:
- Chaohui Guo
- TC Chen
- Daniel Daza
Graph Neural Networks (GNNs) have become a cornerstone of machine learning on graphs, but they suffer from fundamental limitations that restrict their expressiveness. A 2020 NeurIPS paper by Pim de Haan, Taco Cohen, and Max Welling introduces Natural Graph Networks (NGNs), a novel approach that addresses these limitations by thinking about graph learning in terms of local message passing on edge neighborhoods rather than traditional node-centric approaches.
The Problem with Traditional GNNs
To understand why NGNs matter, let’s first examine what’s wrong with standard Graph Neural Networks like GCNs (Graph Convolutional Networks) and GraphSAGE.
Traditional GNNs follow a simple recipe:
- Each node aggregates features from its neighbors
- The aggregated features are transformed through a neural network
- This process repeats for multiple layers
Mathematically, a standard GNN layer can be written as:
\[h_i^{(l+1)} = \sigma\left(W^{(l)} \cdot \text{AGG}\left(\{h_j^{(l)} : j \in \mathcal{N}(i)\}\right)\right)\]where $h_i^{(l)}$ is the feature vector of node $i$ at layer $l$, $\mathcal{N}(i)$ are the neighbors of node $i$, and AGG is some aggregation function like mean or sum.
The fundamental problem: This approach has limited expressive power. The seminal work by Xu et al. (2019) proved that standard GNNs cannot distinguish between certain graph structures that are fundamentally different. In particular, they’re limited by the 1-Weisfeiler-Lehman (1-WL) test - they can only distinguish graphs that the 1-WL algorithm can distinguish.
Enter Natural Graph Networks: A Local Perspective
Natural Graph Networks take a radically different approach. Instead of thinking about message passing between individual nodes, NGNs focus on edge neighborhoods and use local subgraph processing.
The Core Idea: Edge-Centric Message Passing
The key insight is this: for every directed edge $p \to q$ in a graph, NGNs:
- Extract a local neighborhood: Collect all nodes within $k$ hops of either $p$ or $q$
- Create markers: Add special “marker” features that identify which nodes are $p$ and $q$ in this subgraph
- Process locally: Run a message network (typically a small GNN) on this marked subgraph
- Extract the message: Take the output features at node $q$ as the message $m_{p \to q}$
- Aggregate: Sum all incoming messages at each node
Mathematically, the message from node $p$ to node $q$ is:
\[m_{p \to q} = \text{MessageNet}\left(\mathcal{G}_{pq}^{(k)}, \mathbf{X}_{pq} \oplus \mathbf{M}_{pq}\right)[q]\]where:
- $\mathcal{G}_{pq}^{(k)}$ is the $k$-hop neighborhood around edge $(p,q)$
- $\mathbf{X}_{pq}$ are the node features in this subgraph
- $\mathbf{M}_{pq}$ are binary marker features indicating nodes $p$ and $q$
- $[q]$ extracts the features at position $q$ in the subgraph
Why This Works: Higher-Order Structure Awareness
The magic of this approach lies in its ability to capture higher-order graph structure. While traditional GNNs only see immediate neighbors at each step, NGNs see the entire $k$-hop neighborhood structure around each edge in a single operation.
Consider two graphs where traditional GNNs would produce identical node embeddings but NGNs would not:
Graph A: Graph B:
1---2 1---2
| | |\ /|
3---4 | X |
|/ \|
3---4
A traditional GNN sees the same local structure around each node in both graphs. But an NGN processing edge $1 \to 2$ sees different 2-hop neighborhoods: in Graph A, there are two 4-paths from 1 to 2, while in Graph B, there’s a direct 2-path plus longer paths through the crossing pattern.
Implementation: How It Works in Practice
Let’s look at how this is implemented in practice, based on the code in the repository.
Edge Neighborhood Construction
The first step is building the $k$-hop edge neighborhood. For an edge $p \to q$, this means finding all nodes that are within $k$ steps of either $p$ or $q$:
def edge_neighborhood_nodes(G, p, q, k=1):
"""Find all nodes within k hops of p or q"""
def within_k(start):
visited = {start}
frontier = {start}
for _ in range(k):
next_nodes = set()
for u in frontier:
next_nodes.update(G.neighbors(u))
next_nodes -= visited
visited |= next_nodes
frontier = next_nodes
return visited
return within_k(p) | within_k(q) # Union of both neighborhoods
Marker Features: Telling the Network What to Focus On
The crucial innovation is the marker features. After extracting the edge neighborhood subgraph, NGNs append two binary channels to the node features:
- Channel 1: Set to 1 for node $p$, 0 elsewhere
- Channel 2: Set to 1 for node $q$, 0 elsewhere
def add_marker_features(X, p_sub, q_sub):
"""Add binary marker channels for p and q"""
m, d = X.shape
markers = np.zeros((m, 2))
markers[p_sub, 0] = 1.0 # Mark p
markers[q_sub, 1] = 1.0 # Mark q
return np.concatenate([X, markers], axis=1)
These markers are critical - they tell the message network “pay attention to the relationship between these two specific nodes” in the context of their neighborhood structure.
The Message Network
The message network itself can be any graph neural network. In the paper’s implementation, it’s a simple 2-layer GCN:
class MessageGCN(nn.Module):
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
self.gcn1 = GCNLayer(in_dim, hidden_dim) # in_dim includes +2 for markers
self.gcn2 = GCNLayer(hidden_dim, out_dim)
def forward(self, x, adj):
h = F.relu(self.gcn1(x, adj))
return self.gcn2(h, adj)
Putting It All Together
The complete NGN layer processes each directed edge in the graph:
def forward(self, G, X):
n, d = X.shape
output = torch.zeros((n, self.output_dim))
for p, q in G.edges():
# Build p->q neighborhood
subgraph, p_sub, q_sub = build_subgraph_for_edge(G, p, q, k=self.k)
X_sub = add_marker_features(X[subgraph], p_sub, q_sub)
A_sub = adjacency_matrix(subgraph)
# Run message network
H_sub = self.message_network(X_sub, A_sub)
# Extract message at q's position and add to q's features
message_pq = H_sub[q_sub]
output[q] += message_pq
# Process reverse direction q->p
# ... (similar process)
return output
Theoretical Foundations: Why NGNs Are More Powerful
The power of NGNs comes from their ability to capture local graph structure in a way that traditional GNNs cannot.
Connection to Higher-Order Graph Properties
Traditional GNNs are fundamentally limited by their locality. At each layer, they can only aggregate information from immediate neighbors. Even with multiple layers, the information flow follows a very specific pattern that misses important structural relationships.
NGNs, on the other hand, look at the complete local structure around each edge. This allows them to detect:
- Triangular patterns: Is edge $(p,q)$ part of triangles? How many?
- Path diversity: Are there multiple paths between $p$ and $q$ through their neighborhood?
- Structural roles: What role do $p$ and $q$ play in their local neighborhood?
Permutation Invariance and Natural Transformations
The “natural” in Natural Graph Networks refers to their connection to category theory and natural transformations. However, you don’t need to understand category theory to appreciate the core insight: NGNs respect the symmetries inherent in graph data.
More concretely, NGNs are permutation invariant - if you relabel the nodes in a graph, the NGN’s output changes in a predictable way that respects the relabeling. This is a stronger guarantee than what standard GNNs provide.
Experimental Results: Do They Actually Work?
The paper demonstrates NGNs’ effectiveness on several benchmarks:
Graph Classification on TU Datasets
The implementation in this repository tests NGNs on 8 standard graph classification datasets from the TU collection:
- MUTAG: Chemical compounds, mutagenic activity prediction
- PROTEINS: Protein structures, enzyme classification
- NCI1/NCI109: Chemical compounds, cancer activity
- IMDB-BINARY/MULTI: Movie collaboration networks
Key Results: NGNs consistently outperform standard GNNs, with particularly strong improvements on datasets where structural properties matter most.
Computational Complexity Trade-offs
The main drawback of NGNs is computational cost. For each edge $(p,q)$, they must:
- Extract a $k$-hop neighborhood (can be expensive for large $k$)
- Run a message network on this subgraph
- Repeat for all edges (potentially O(E) subgraph computations)
In practice, this means NGNs are:
- More expensive than traditional GNNs
- More powerful at capturing structural patterns
- Better suited for smaller graphs where structure matters more than scale
When to Use NGNs
Based on the theoretical insights and experimental results, NGNs are particularly valuable when:
- Structure matters: Your task requires understanding complex topological relationships
- Graphs are reasonably small: The computational overhead is manageable
- Standard GNNs fail: You’ve hit the expressiveness limitations of traditional approaches
- Local patterns are important: Your problem involves detecting specific motifs or structural roles
Implementation Tips and Practical Considerations
If you’re thinking about using NGNs in your own work, here are some practical insights from the codebase:
Neighborhood Size Selection
The choice of $k$ (neighborhood size) is crucial:
- k=1: Captures immediate neighborhood structure, computationally efficient
- k=2: Includes triangles and 4-cycles, good balance of power and efficiency
- k≥3: Very expensive, may suffer from over-smoothing
The paper uses $k=1$ for most experiments, suggesting this is often sufficient.
Message Network Design
The message network doesn’t need to be complex. A simple 2-layer GCN works well:
- Layer 1: Process marked features with neighborhood structure
- Layer 2: Compute final message representation
More complex message networks don’t necessarily help and add computational cost.
Training Considerations
NGNs require some special handling during training:
- Memory usage: Each edge processes its own subgraph, increasing memory requirements
- Batch processing: Hard to batch efficiently due to variable subgraph sizes
- Gradient flow: Messages aggregate additively, which can cause gradient scaling issues
Conclusion: A New Tool for Graph Learning
Natural Graph Networks represent a significant conceptual advance in graph neural networks. By shifting focus from node-centric to edge-centric processing, they unlock new levels of expressiveness that can capture structural patterns invisible to traditional GNNs.
Key Takeaways:
- Higher expressiveness: NGNs can distinguish graph structures that confuse traditional GNNs
- Local structure awareness: They see complete $k$-hop neighborhoods around edges
- Principled design: Based on solid theoretical foundations (natural transformations)
- Practical trade-offs: More powerful but more expensive than standard GNNs
When to consider NGNs:
- Your graphs have rich structural patterns that matter for your task
- Standard GNNs aren’t capturing the relationships you need
- You can afford the computational overhead for better expressiveness
- You’re working with molecular graphs, social networks, or other structure-rich domains
As graph learning continues to evolve, NGNs offer a valuable perspective: sometimes the best way to understand a graph isn’t to think about individual nodes, but to focus on the local structural relationships that give the graph its meaning. In many real-world applications, it’s these local patterns - not global connectivity - that carry the most important information.
The code in this repository provides a clean, educational implementation that makes it easy to experiment with these ideas. Whether you’re a researcher pushing the boundaries of graph learning or a practitioner looking for better tools, Natural Graph Networks deserve a place in your toolkit.