Simplify Network — simplify_network • flownet

Spatially simplify a network graph using shortest paths or node clustering methods. This further simplifies the network topology but does not preserve full connectivity. It should ideally be called after consolidate_graph() if the network is still too large/complex.

Usage

simplify_network(
  graph_df,
  nodes = NULL,
  method = c("shortest-paths", "cluster"),
  directed = FALSE,
  cost.column = "cost",
  by = NULL,
  radius_km = list(nodes = 7, cluster = 20),
  verbose = TRUE,
  ...
)

Arguments

graph_df: A data.frame with columns from and to representing the graph edges. For the cluster method, the graph must also have columns FX, FY, TX, TY representing node coordinates.
nodes: For method = "shortest-paths": either an atomic vector of node IDs, or a data.frame with columns from and to specifying origin-destination pairs. For method = "cluster": an (optional) atomic vector of node IDs to preserve. These nodes will be kept as cluster centroids, and nearby nodes (within radius_km$nodes) will be assigned to their clusters. Remaining nodes are clustered using leaderCluster.
method: Character string (default: "shortest-paths"). Method to use for simplification: "shortest-paths" computes shortest paths between nodes and keeps only traversed edges; "cluster" clusters nodes using the leaderCluster algorithm and contracts the graph.
directed: Logical (default: FALSE). Whether the graph is directed. For method = "shortest-paths": controls path computation direction. For method = "cluster": if TRUE, A->B and B->A remain as separate edges after contraction; if FALSE, edges are normalized so that from < to before grouping.
cost.column: Character string (default: "cost"). Name of the cost column in graph_df. Alternatively, a numeric vector of edge costs with length equal to nrow(graph_df). With method = "cluster", a numeric vector of node weights matching nodes_from_graph(graph_df) can be provided.
by: Link characteristics to preserve/not simplify across, passed as a one-sided formula or character vector of column names. Typically includes attributes like mode, type, or capacity. For method = "shortest-paths": paths are computed separately for each group defined by by, with edges not in the current group penalized (cost multiplied by 100) to compel mode-specific routes. For method = "cluster": edges are grouped by from, to, AND by columns, preventing consolidation across different modes/types.
radius_km: Named list with elements nodes (default: 7) and cluster (default: 20). Only used for method = "cluster". nodes: radius in kilometers around preserved nodes. Graph nodes within this radius will be assigned to the nearest preserved node's cluster. cluster: radius in kilometers for clustering remaining nodes using leaderCluster.
verbose: Logical (default: TRUE). Whether to print progress messages/bars.
...: For method = "cluster": additional arguments passed to collap for edge attribute aggregation.

Value

A data.frame containing the simplified graph with:

For method = "shortest-paths":
- All columns from the input graph_df (for edges that were kept)
- Attribute "edges": integer vector of edge indices from the original graph
- Attribute "edge_counts": integer vector indicating how many times each edge was traversed
For method = "cluster":
- edge - New edge identifier
- from, to - Cluster centroid node IDs
- FX, FY, TX, TY - Coordinates of cluster centroid nodes
- Aggregated edge attributes from the original graph
- Attribute "group.id": mapping from original edges to simplified edges
- Attribute "group.starts": start indices of each group
- Attribute "group.sizes": number of original edges per simplified edge

Details

simplify_network() provides two methods to simplify large transport networks:

Method: "shortest-paths"

Validates that all origin and destination nodes exist in the network
Computes shortest paths from each origin to all destinations using igraph
Marks all edges that are traversed by at least one shortest path
Returns only the subset of edges that were traversed
If nodes is a data frame with from and to columns, paths are computed from each unique origin to its specified destinations

Method: "cluster"

Requires the graph to have spatial coordinates (FX, FY, TX, TY)
If nodes is provided, these nodes are preserved as cluster centroids
Nearby nodes (within radius_km$nodes km) are assigned to the nearest preserved node
Remaining nodes are clustered using leaderCluster with radius_km$cluster as the clustering radius
For each cluster, the node closest to the cluster centroid is selected as representative
The graph is contracted by mapping all nodes to their cluster representatives
Self-loops (edges where both endpoints map to the same cluster) are dropped
For undirected graphs (directed = FALSE), edges are normalized so from < to, merging opposite-direction edges; for directed graphs, A->B and B->A remain separate
Edge attributes are aggregated using collap (default: mean for numeric, mode for categorical); customize via ...

Examples

library(flownet)
library(sf)

# Convert segments to undirected graph
graph <- africa_segments |>
  linestrings_from_graph() |>
  linestrings_to_graph() |>
  create_undirected_graph(FUN = "fsum")

# Get city/port nodes to preserve
nodes_df <- nodes_from_graph(graph, sf = TRUE)
nearest_nodes <- nodes_df$node[st_nearest_feature(africa_cities_ports, nodes_df)]

# Initial consolidation
graph <- consolidate_graph(graph, keep = nearest_nodes, w = ~ passes)
#> Consolidating undirected graph graph with 11385 edges using full recursion
#> Initial node degrees:
#>    1    2    3    4    5    6    7    8    9   10 
#>  125 5293 3240  395  102   31    4    2    1    1 
#> 
#> Dropped 44 loop edges
#> Dropped 11 edges leading to singleton nodes
#> Oriented 3431 undirected intermediate edges
#> Contracted 5079 intermediate nodes
#> Oriented 10 undirected intermediate edges
#> Contracted 10 intermediate nodes
#> Aggregated 11330 edges down to 6597 edges
#> Oriented 361 undirected intermediate edges
#> Contracted 375 intermediate nodes
#> Oriented 8 undirected intermediate edges
#> Contracted 8 intermediate nodes
#> Aggregated 6597 edges down to 6264 edges
#> Oriented 41 undirected intermediate edges
#> Contracted 43 intermediate nodes
#> Oriented 1 undirected intermediate edges
#> Contracted 1 intermediate nodes
#> Aggregated 6264 edges down to 6224 edges
#> Oriented 2 undirected intermediate edges
#> Contracted 3 intermediate nodes
#> Aggregated 6224 edges down to 6221 edges
#> No nodes to contract, returning graph
#> 
#> Consolidated undirected graph graph from 11385 edges to 6221 edges (54.6%)
#> Final node degrees:
#>    1    2    3    4    5    6    7    8 
#>  114  236 3246  392   84   18    2    1 

# Method 1: Shortest-paths simplification (keeps only traversed edges)
graph_simple <- simplify_network(graph, nearest_nodes,
                                 method = "shortest-paths",
                                 cost.column = ".length")
#> Created graph with 4093 nodes and 6221 edges...
#> Retained 4422/6221 edges traversed by shortest paths (71.1%)
nrow(graph_simple)  # Reduced number of edges
#> [1] 4422

# \donttest{
# Method 2: Cluster-based simplification (contracts graph spatially)
# Compute node weights for clustering
node_weights <- collapse::rowbind(
  collapse::fselect(graph, node = from, gravity_rd),
  collapse::fselect(graph, to, gravity_rd),
  use.names = FALSE) |>
  collapse::collap(~ node, "fsum")

graph_cluster <- simplify_network(graph, nearest_nodes,
                                  method = "cluster",
                                  cost.column = node_weights$gravity_rd,
                                  radius_km = list(nodes = 30, cluster = 27),
                                  w = ~ passes)
#> Clustering nodes close to 'keep' nodes using a radius of 30km
#> Clustering the remaining nodes with the leaderCluster algorithm using a radius of 27km
#> leaderCluster algorithm converged in 7 iterations
#> Dropped 3540 self-loop edges (following clustering)
#> Oriented 637 undirected edges
#> Contracting 2681 edges down to 2430 edges
nrow(graph_cluster)
#> [1] 2430
# }