-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
314 lines (213 loc) · 15.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# midlines
<!-- badges: start -->
<!-- badges: end -->
The goal of `midlines` is to estimate the midline of one or more polygons, taking inputs in [`sf`](https://github.com/r-spatial/sf) formats.
`midlines` wraps around several functions from other packages, predominantly [`sf`](https://github.com/r-spatial/sf). It's development has been a learning experience for the author, but hopefully some users will find it useful. Comments, suggestions and contributions are welcome.
## Installation
You can install the current version of the package from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
# library(devtools)
devtools::install_github("RichardPatterson/midlines")
```
N.B. to install midlines from GitHub, you will need to install and attach the devtools package if you have not already done so.
## An example - finding the midline of a river
The package contains an example polygon representing a short section of the River Thames in central London. This was derived from [OpenStreetMap](https://www.openstreetmap.org/) data, downloaded using the [`OSMdata`](https://docs.ropensci.org/osmdata/) package and as such, copyright is retained by OpenStreetMap contributors.
```{r parameters, include = FALSE}
#par(mar = c(4, 4, 0.1, 0.1))
par(mar = c(0,0,0,0))
knitr::opts_knit$set(global.device = TRUE)
```
```{r, thames, message = FALSE}
# Load libraries
library(midlines)
library(sf)
library(units)
plot(thames$geometry)
```
To estimate the midline of this stretch of the Thames, use the `midline_draw` function.
```{r draw}
# create a sf collection for the midline
m1 = midlines_draw(thames)
# plot midline with a different colour for each segment (line_id)
plot(m1$geometry, col = m1$line_id, add = TRUE)
```
We can see the midline has been drawn but there are many unwanted short side branches, especially where the banks of the river are not smooth. To understand why, we need to know what `midline_draw` has done. `midlines_draw` used the [`sf`](https://github.com/r-spatial/sf) function `st_voronoi` to perform an [`Voronoi`](https://en.wikipedia.org/wiki/Voronoi_diagram) tessellation and then removed all line segments not entirely within the polygon. `midlines` can help remove these unwanted side branches, but it is possible you can prevent their creation (see sections on [border lines](boarder-line) [gaps](zig-zags-and-gaps) and [zig-zagging](#zig-zagging)).
## Cleaning the midline
To remove the unwanted branches, the line segments at the end of a line can be identified using a binary variable (`removed_flag`) added to the dataset with the `midlines_clean` function. The flagged line segments can be explored and removed as required. Specifying the `n_removed` option will flag that number of line segments from the end of each line. Previous experimentation indicated that the unwanted branches in this example are made up of ≤ 4 line segments.
```{r device_off1, include = FALSE}
knitr::opts_knit$set(global.device = FALSE)
```
```{r clean1}
# remove 4 line segments from the end of each line
m2 = midlines_clean(m1, n_removed = 4)
# plot flagged (red) and unflagged (blue) line segments
plot(m2$geometry, col = c("BLUE", "RED")[m2$removed_flag])
# plot only the cleaned midline
plot(m2$geometry[m2$removed_flag==0], col = "BLUE")
```
## Retaining the ends of the midline
In addition to the unwanted side branches, `midlines_clean` flagged for removal the line segments at the end of the midline itself. If losing the ends of the midline is a problem, `midlines_check` can unflag some line segments.
As the unwanted side branches are short, we can unflag longer groups of flagged line segments. We know that the unwanted side branches contain ≤ 4 line segments, whereas the midlines itself is much longer. Therefore, if we remove up to 5 line segments, we know that any removed lines of 5 or more line segments are not unwanted side chains but part of the midline. `midlines_check` takes as an input the output from `midlines_clean` and adds another binary variable (`removed_flag2`), which should flag the unwanted side chains but not the ends of the midlines.
```{r n_removed}
m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, n_removed = 5)
# plot initially flagged in red and remainder in blue
plot(m3$geometry, col = c("BLUE", "RED")[m3$removed_flag2])
# overlay those initially flagged but identifed as having 5+ segments in green
plot(m3$geometry[m3$removed_flag==1 & m3$removed_flag2==0], col = "GREEN", add = TRUE)
```
The ends of the midline are restored (green) but the unwanted side branches are still flagged for removal (red). In addition to adding back longer lines based on the number of line segments, the `length` option can be used to specify a length (in [Units](https://r-quantities.github.io/units/)). In this example, specifying `length = set_units(230,"m")` gives the same results as `n_removed = 5`
There is a trade off between removing all unwanted side branches and retaining the desired midlines. Some experimentation might be required to get the desired results. In this case we've unflagged the forked midline on the right end of our river segment and also some small bits of unconnected line (although see [removing unwanted small bits of midlines](#removing-unwanted-small-bits-of-midlines).
## Border line
If there is a specific area of interest, this can be used to help control the estimation and cleaning of the midlines. This works best when the polygon used for estimation is larger than the area of actual interest. The border of the area of interest is specified as an sf linestring, as in this example.
```{r border_line}
# generate a linestring marking the border
bbox_line = st_cast(st_as_sfc(st_bbox(c(xmin = 535070, ymin = 177800, xmax = 542560, ymax = 181550), crs = st_crs(thames))), "LINESTRING")
m1 = midlines_draw(thames, border_line = bbox_line)
plot(thames$geometry)
plot(m1$geometry, col = "BLUE", add = TRUE)
plot(bbox_line, add = TRUE)
```
Specifying the boarder_line option in `midlines_clean` and `midlines_check` can also ensure the line ends that contact the boarder line are not flagged for removal.
```{r border_line1}
m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, border_line = bbox_line)
plot(m3$geometry[m3$removed_flag2 == 0], col = "BLUE"[m3$removed_flag2])
plot(bbox_line, add = TRUE)
```
It is also possible to specify the border_line option with midlines_draw and midlines_clean, which crops out midline segments outside the area of interest at the outset and safes time by not flagging and unflagging those lines.
## Gaps
So far we've ignored the side branches north of the the Thames that join the main channel. Attempts to estimate the midline of the left channel has been entirely unsuccessful and the right channel midline has a marked zig-zag pattern. The [zig-zagging](#zig-zagging) section below covers that issue but here we look more closely at the lack of any midline in the left of the two side branches.
```{r gaps1}
# generate a polygon to focus on the left side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 536070, ymin = 180700, xmax = 537800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 536120, ymin = 180760, xmax = 537750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")
plot(thames$geometry)
plot(bbox_poly, add = TRUE)
```
```{r gaps2}
# crop the larger thames polygon to the area we want
side1 = st_intersection(thames, bbox_poly)
plot(side1$geometry)
plot(bbox_line_s, add = TRUE)
# estimating midline of the original polygon
m_s1 = midlines_draw(side1, border_line = bbox_line_s)
plot(m_s1$geometry, add = TRUE, col = "RED")
```
The lack of lines is due to the narrowness of the channel, or more precisely, the gaps between points in the polygon perimeter relative to the width of the polygon. The gap between points that was sufficient for the wider main River Thames is insufficient to allow successful midline estimation in the narrow channel. The `midlines_draw` function has a `dfMaxLength` option specify the maximum distance between points and to add additional points as required. This argument is passed to sf::st_segmentize so see that help file for more details.
```{r gaps3}
# some trial and error revealed 8 meters to be the optimal max length between points
plot(side1$geometry)
plot(bbox_line_s, add = TRUE)
m_s1 = midlines_draw(side1, dfMaxLength = set_units(8,"m"), border_line = bbox_line_s)
#plot(ml2$geometry, add = TRUE, col = "BLUE")
# using the border_line to prevent removal of lines of interest
m_s2 = midlines_clean(m_s1, n_removed = 20, border_line = bbox_line_s)
plot(m_s2$geometry, col = c("BLUE", "RED")[m_s2$removed_flag], add = TRUE)
```
Conversely, if the points on the perimeter are too dense and result in a very complicated tessellation or slow computation times, then sf::st_line_sample might also be useful.
## Zig-zagging
A zig-zagging midline results from the offset between points on either side of the polygon when the Voronoi tessellation is done. To illustrate this, there are a series of midlines drawn with very simple oblong polygons.
```{r zig-zag1}
mat1 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,11,2,9,2,7,2,5,2,3,2,1,2,0,2,0,0),ncol=2, byrow=TRUE)
pts1 = st_multipoint(mat1)
pol1 = st_polygon(list(mat1))
ml1 = midlines_draw(pol1)
plot(pol1)
plot(pts1, add = TRUE)
plot(ml1$geometry, col = ml1$line_id, add = TRUE)
```
When the points on the polygon perimeter are offset, this results in the zig-zag pattern. One way to resolve this is to ensure that points are directly opposite one another on the polygon.
```{r zig-zag2}
mat2 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,12,2,10,2,8,2,6,2,4,2,2,2,0,2,0,0),ncol=2, byrow=TRUE)
pts2 = st_multipoint(mat2)
pol2 = st_polygon(list(mat2))
ml2 = midlines_draw(pol2)
plot(pol2)
plot(pts2, add = TRUE)
plot(ml2$geometry, col = ml2$line_id, add = TRUE)
```
However, its not always possible to control the points that make up the polygons of interest. Another option is to reduce the gap between the points so even though they are offset the zig-zag is less pronounced.
```{r zig-zag3}
mat3 = matrix(c(0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10,0,11,0,12,0,
12,2,11.5,2,10.5,2,9.5,2,8.5,2,7.5,2,6.5,2,5.5,2,4.5,2,3.5,2,2.5,2,1.5,2,0.5,2,0,2,0,0),ncol=2, byrow=TRUE)
pts3 = st_multipoint(mat3)
pol3 = st_polygon(list(mat3))
ml3 = midlines_draw(pol3)
plot(pol3)
plot(pts3, add = TRUE)
plot(ml3$geometry, col = ml3$line_id, add = TRUE)
```
Returning to the river example and looking more closely at the zig-zag midline of the right side channel.
```{r zig-zag4}
# generate a polygon to focus on the right side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 538700, ymin = 180750, xmax = 539800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 538750, ymin = 180800, xmax = 539750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")
plot(thames$geometry)
plot(bbox_poly, add = TRUE)
# crop the larger thames polygon to the area we want
side2 = st_intersection(thames,bbox_poly)
plot(side2$geometry)
plot(bbox_line_s, add = TRUE)
# estimating midline of the original polygon
m_s1 = midlines_draw(side2, border_line = bbox_line_s)
plot(ml1$geometry, add = TRUE, col = "RED")
```
One option would be to use the dfMaxLength option with midlines_draw like we did on the left side channel.
```{r smooth}
plot(side2$geometry)
m_s1a = midlines_draw(side2, border_line = bbox_line_s, dfMaxLength = set_units(8,"m"))
# clean this to remove extraneous lines
m_s2a = midlines_clean(m_s1a, border_line = bbox_line_s, n_removed = 5)
plot(m_s2a$geometry[m_s2a$removed_flag==0], add = TRUE, col = "RED")
#plot(ml2$geometry, add = TRUE, col = "RED")
```
An alternative is to smooth the midline using a rolling average - basically a wrapper round the rollapply function in the excellent [Zoo](https://cran.r-project.org/web/packages/zoo/index.html) package.
```{r}
# smooth with a rolling average
m_s2b = midlines_smooth(m_s1)
plot(side2$geometry)
plot(m_s2b$geometry, add = TRUE, col = "BLUE")
```
`midlines_smooth` will smooth lines in a network of midlines but it does not move the point where two or more lines join. The `width` option can be specified for `midlines_smooth`, this is passed to rollapply so see documentation there for more details. Beware that the wider the window the greater tenancy to cut the corners and therefore it becomes less like a midline.
## And finally
### Removing unwanted small bits of midlines
Sometime it might be useful to remove small bits of midline that are not connected the to main midline. For example, in our earlier example
```{r debit}
plot(m2$geometry)
# remove the short bits of line
m2 = midlines_debit(m2, length = set_units(500, "m") )
plot(m2$geometry)
```
### De-densifying midlines
In addition to manipulating the density of points on the perimeter of the initial polygon, it is sometimes desirable to manipulate the density of points that form the midline. This results in fewer longer line segments, rather than many shorter line segments. This can be done using `midlines_dedensify` either before or after cleaning. Like with the `midlines_smooth` function, the points where lines join are not affected but between line joins the number of points can be reduced. This function calls sf::st_line_sample so see this for the relevant options.
## Voronoi polygons
The estimation of midlines is done using [`Voronoi`](https://en.wikipedia.org/wiki/Voronoi_diagram) tessellation via the sf::st_voronoi function. Some understanding of this process can help to understand how the attributes of the polygon will influence it's estimated midline. For a given set of points, Voronoi tessellation identifies the area around each point, i.e., that are closer to that point than any other. Below is an demonstration using the points in the perimeter of the river Thames polygon as the starting point.
```{r voronoi, echo = FALSE}
j = st_line_sample(st_cast(thames, "LINESTRING"), density = 1/250)
pols = st_collection_extract(st_voronoi(do.call(c, st_geometry(j))))
plot(pols)
```
## Another use case
An alternative use case is to merge two versions of a road network that are measured differently and therefore do not exactly match. Use buffers to creating a polygon around both networks and then estimate the midline of this polygon to give a combined network against which each can be compared. See this [blog](https://richardpatterson.github.io/2023/08/06/network_merge.html) post for more details.
## Citation
If you use this package in your work it would be helpful if you would cite it. A suggested citation is:
_Patterson, R. midlines: Estimate Polygon Midlines. R package version 0.0.0.9002. DOI: 10.17863/CAM.100113_
## Alternatives
* The very comprehensive R package [`cmgo`](https://github.com/AntoniusGolly/cmgo) does many of the same things as this package and much more. If your aim to to estimate the midline of a river, you should probably start with [`cmgo`](https://github.com/AntoniusGolly/cmgo)
* [centerline](https://centerline.readthedocs.io/en/latest/) is a Python library with a similar aim