stud/IV/mj-msc.tex

914 lines
35 KiB
TeX
Raw Normal View History

2021-04-13 04:49:27 +03:00
\documentclass[a4paper]{article}
2021-03-30 17:37:13 +03:00
2021-05-03 15:42:33 +03:00
\usepackage[T1]{fontenc}
2021-05-03 15:27:00 +03:00
\usepackage[american]{babel}
2021-03-30 17:37:13 +03:00
\usepackage[utf8]{inputenc}
2021-05-03 15:42:33 +03:00
\usepackage [autostyle,english=american]{csquotes}
2021-04-12 16:58:44 +03:00
\MakeOuterQuote{"}
2021-05-03 15:27:00 +03:00
\usepackage[maxbibnames=99,style=numeric,sorting=none,alldates=edtf]{biblatex}
2021-04-13 08:31:01 +03:00
\addbibresource{bib.bib}
2021-04-25 16:15:27 +03:00
\usepackage[
pdfusetitle,
2021-05-03 15:42:33 +03:00
pdfkeywords={Line Generalization,Line Simplification,Wang--Mueller},
2021-04-25 16:15:27 +03:00
pdfborderstyle={/S/U/W 0} % /S/U/W 1 to enable reasonable decorations
]{hyperref}
2021-03-30 17:37:13 +03:00
\usepackage{enumitem}
\usepackage[toc,page,title]{appendix}
\usepackage{caption}
\usepackage{subcaption}
2021-05-03 15:14:06 +03:00
\usepackage{dcolumn}
2021-03-30 17:37:13 +03:00
\usepackage{gensymb}
2021-04-13 05:20:37 +03:00
\usepackage{units}
2021-03-30 17:37:13 +03:00
\usepackage{varwidth}
\usepackage{tabularx}
\usepackage{float}
\usepackage{tikz}
2021-04-13 10:05:52 +03:00
\usepackage{fancyvrb}
2021-04-29 23:12:48 +03:00
\usepackage{layouts}
2021-04-24 17:56:13 +03:00
%\usepackage{charter}
%\usepackage{setspace}
%\doublespacing
2021-04-24 12:49:37 +03:00
2021-05-03 15:42:33 +03:00
\input{version.inc}
\input{vars.inc}
\IfFileExists{./editorial-version}{\def \mjEditorial {}}{}
\ifx \mjEditorial \undefined
2021-04-24 12:49:37 +03:00
\usepackage{minted}
2021-04-24 17:56:13 +03:00
\newcommand{\inputcode}[2]{\inputminted[fontsize=\small]{#1}{#2}}
2021-04-24 12:49:37 +03:00
\else
\usepackage{verbatim}
\newcommand{\inputcode}[2]{\verbatiminput{#2}}
\fi
2021-04-12 12:00:17 +03:00
\newcommand{\onpage}[1]{\ref{#1} on page~\pageref{#1}}
\newcommand{\titlecite}[1]{\citetitle{#1}\cite{#1}}
2021-03-30 17:37:13 +03:00
\newcommand{\DP}{Douglas \& Peucker}
\newcommand{\VW}{Visvalingam--Whyatt}
\newcommand{\WM}{Wang--M{\"u}ller}
2021-04-29 21:33:44 +03:00
\newcommand{\WnM}{Wang and M{\"u}ller}
2021-04-29 16:51:30 +03:00
% {\WM} algoritmo realizacija kartografinei upių generalizacijai
2021-04-28 12:33:30 +03:00
\newcommand{\MYTITLE}{{\WM} algorithm realization for cartographic line generalization}
2021-05-03 15:42:33 +03:00
\newcommand{\MYTITLENOCAPS}{wang--m{\"u}ller algorithm realization for cartographic line generalization}
2021-04-01 15:46:34 +03:00
\newcommand{\MYAUTHOR}{Motiejus Jakštys}
2021-03-30 17:37:13 +03:00
2021-04-01 15:46:34 +03:00
\title{\MYTITLE}
\author{\MYAUTHOR}
\date{\VCDescribe}
2021-03-30 17:37:13 +03:00
\begin{document}
2021-04-01 15:46:34 +03:00
\begin{titlepage}
\begin{center}
2021-04-29 19:01:53 +03:00
\includegraphics[width=0.2\textwidth]{vu.pdf} \\[4ex]
2021-04-01 15:46:34 +03:00
2021-04-29 18:42:39 +03:00
\large
\textbf{\textsc{
vilnius university \\
faculty of chemistry and geosciences \\
department of cartography and geoinformatics
}} \\[8ex]
2021-04-01 15:46:34 +03:00
\textbf{\MYAUTHOR} \\[8ex]
2021-04-29 18:37:44 +03:00
\normalsize
A thesis presented for the degree of Master in Cartography \\[8ex]
\LARGE
2021-05-03 15:42:33 +03:00
\textbf{\textsc{\MYTITLENOCAPS}}
2021-04-29 18:37:44 +03:00
2021-04-01 15:46:34 +03:00
\vfill
2021-04-29 18:37:44 +03:00
\normalsize
Supervisor Dr. Andrius Balčiūnas \\[16ex]
2021-04-01 15:46:34 +03:00
\VCDescribe
\end{center}
\end{titlepage}
2021-03-30 17:37:13 +03:00
\begin{abstract}
\label{sec:abstract}
2021-04-29 16:51:30 +03:00
Currently available line simplification algorithms are rooted in mathematics
2021-04-29 21:33:44 +03:00
and geometry, and are unfit for bendy map features like rivers and
coastlines. {\WnM} observed how cartographers simplify these natural
features and created an algorithm. We implemented this algorithm and
documented it in great detail. Our implementation makes {\WM} algorithm
freely available in PostGIS, and this paper explains it.
2021-04-29 16:51:30 +03:00
2021-03-30 17:37:13 +03:00
\end{abstract}
\newpage
\tableofcontents
2021-04-29 21:34:41 +03:00
\newpage
2021-03-30 17:37:13 +03:00
\listoffigures
\newpage
2021-04-13 04:49:27 +03:00
\section{Introduction}
2021-03-30 17:37:13 +03:00
\label{sec:introduction}
2021-04-18 16:38:03 +03:00
\iffalse
NOTICE: this value should be copied to layer2img.py:TEXTWIDTH, so dimensions
of inline images are reasonable.
2021-04-20 18:21:07 +03:00
Textwidth in cm: {\printinunitsof{cm}\prntlen{\textwidth}}
2021-04-18 16:38:03 +03:00
\fi
2021-04-18 16:27:19 +03:00
2021-04-01 09:42:08 +03:00
When creating small-scale maps, often the detail of the data source is greater
2021-04-24 17:56:13 +03:00
than desired for the map. While many features can be removed or simplified, it
is more tricky with natural features that have many bends, like coastlines,
2021-04-29 21:03:22 +03:00
rivers or forest boundaries.
2021-04-01 09:42:08 +03:00
2021-04-24 17:56:13 +03:00
To create a small-scale map from a large-scale data source, features need to be
2021-04-29 21:03:22 +03:00
generalized, i.e. detail should be reduced. While performing the generalization, it
2021-04-24 17:56:13 +03:00
is important to retain the "defining" shape of the original feature. Otherwise,
if the generalized feature looks too different than the original, the result
will look unrealistic.
2021-04-01 09:42:08 +03:00
For example, if a river is nearly straight, it should be nearly straight after
2021-04-29 21:03:22 +03:00
generalization. A too straightened river will look like a canal, and the other
way around --- too curvy would not reflect the natural shape. Conversely, if
the river is highly wiggly, the number of bends should be reduced, but not
removed altogether.
2021-04-01 09:42:08 +03:00
Generalization problem for other objects can often be solved by other
non-geometric means:
\begin{itemize}
\item Towns and cities can be filtered and generalized by number of
inhabitants.
\item Roads can be eliminated by the road length, number of lanes, or
classification of the road (local, regional, international).
\end{itemize}
2021-04-24 17:56:13 +03:00
To sum up, natural line generalization problem can be viewed as a task of
finding a delicate balance between two competing goals:
2021-04-01 09:42:08 +03:00
\begin{itemize}
\item Reduce detail by removing or simplifying "less important" features.
\item Retain enough detail, so the original is still recognize-able.
\end{itemize}
Given the discussed complexities, a fine line between under-generalization
2021-04-24 17:56:13 +03:00
(leaving object as-is) and over-generalization (making a straight line) needs
to be found. Therein lies the complexity of generalization algorithms: all have
2021-04-01 09:42:08 +03:00
different trade-offs.
2021-04-16 07:28:17 +03:00
\section{Literature review and problematic}
2021-04-01 09:42:08 +03:00
\label{sec:literature-review}
A number of cartographic line generalization algorithms have been researched.
2021-04-24 17:56:13 +03:00
The "classical" ones are {\DP}\cite{douglas1973algorithms} and
{\VW}\cite{visvalingam1993line} in combination with
Chaikin's\cite{chaikin1974algorithm}.
This section reviews the classical ones, which, besides being around for a long
time, offer easily accessible implementations, as well as more modern ones,
which only theorize, but do not provide an implementation.
2021-04-01 09:42:08 +03:00
2021-04-17 16:16:27 +03:00
\subsection{Available algorithms}
\subsubsection{{\DP}, {\VW} and Chaikin's}
2021-04-01 09:42:08 +03:00
2021-04-24 17:56:13 +03:00
{\DP}\cite{douglas1973algorithms} and {\VW}\cite{visvalingam1993line} are
2021-04-13 08:31:01 +03:00
"classical" line generalization computer graphics algorithms. They are
relatively simple to implement, require few runtime resources. Both of them
accept only a single parameter, based on desired scale of the map, which makes
2021-04-24 17:56:13 +03:00
them straightforward to adjust for different scales.
2021-04-01 09:42:08 +03:00
2021-04-01 15:32:16 +03:00
Both algorithms are part of PostGIS, a free-software GIS suite:
\begin{itemize}
2021-04-13 08:31:01 +03:00
\item {\DP} via
2021-04-24 17:56:13 +03:00
\href{https://postgis.net/docs/ST_Simplify.html}{PostGIS \texttt{ST\_Simplify}}.
2021-04-01 09:42:08 +03:00
2021-04-13 08:31:01 +03:00
\item {\VW} via
2021-04-24 17:56:13 +03:00
\href{https://postgis.net/docs/ST_SimplifyVW.html}{PostGIS \texttt{SimplifyVW}}.
2021-04-01 15:32:16 +03:00
\end{itemize}
2021-04-01 09:42:08 +03:00
2021-04-17 21:52:08 +03:00
It may be worthwhile to post-process those through a widely available Chaikin's
2021-04-24 17:56:13 +03:00
line smoothing algorithm\cite{chaikin1974algorithm} via
2021-04-13 08:31:01 +03:00
\href{https://postgis.net/docs/ST_ChaikinSmoothing.html}{PostGIS
2021-04-24 17:56:13 +03:00
\texttt{ST\_ChaikinSmoothing}}.
To use in generalization examples, we will use two rivers: Šalčia and Visinčia.
Figure~\ref{fig:salvis-25} illustrates the original two rivers without any
processing.
2021-04-01 09:42:08 +03:00
2021-04-24 17:56:13 +03:00
These rivers were chosen, because they have both large and small bends, and
thus convenient to analyze for both small and large scale generalization.
2021-04-20 18:21:07 +03:00
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{salvis-25k}
2021-04-24 23:45:03 +03:00
\caption{Example rivers for visual tests (1:25000).}
2021-04-20 18:21:07 +03:00
\label{fig:salvis-25}
\end{figure}
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-50k}
2021-04-24 23:45:03 +03:00
\caption{Example scaled 1:50000.}
2021-04-20 18:21:07 +03:00
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\centering
\includegraphics[width=.2\textwidth]{salvis-250k}
2021-04-24 23:45:03 +03:00
\caption{Example scaled 1:250000.}
2021-04-20 18:21:07 +03:00
\end{subfigure}
2021-04-26 11:54:00 +03:00
\caption{Down-scaled original river.}
2021-04-20 18:21:07 +03:00
\label{fig:salvis-50-250}
\end{figure}
Same rivers, unprocessed, but with higher density (scales 1:50000 and 1:250000)
are depicted in figure~\onpage{fig:salvis-50-250}. Some river features are so
compact that a reasonably thin line depicting the river is touching itself,
creating a thicker line. As a result, generalization for this river for a
smaller scale is worthy.
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-douglas-64-50k}
\caption{Using {\DP}}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-visvalingam-64-50k}
\caption{Using {\VW}}
\end{subfigure}
\caption{Generalized using classical algorithms (1:50000).}
\label{fig:salvis-generalized-50k}
\end{figure}
Figure~\onpage{fig:salvis-generalized-50k} illustrates the same river bend, but
generalized using {\DP} and {\VW} algorithms. The resulting lines are jagged,
thus the resulting line looks unlike a real river. To smoothen the jaggedness,
traditionally, Chaikin's\cite{chaikin1974algorithm} is applied after
generalization, illustrated in
figure~\onpage{fig:salvis-generalized-chaikin-50k}.
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-douglas-64-chaikin-50k}
2021-04-29 21:43:32 +03:00
\caption{{\DP} + Chaikin's}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-visvalingam-64-chaikin-50k}
2021-04-29 21:43:32 +03:00
\caption{{\VW} + Chaikin's}
\end{subfigure}
\caption{Generalized and smoothened river (1:50000).}
\label{fig:salvis-generalized-chaikin-50k}
\end{figure}
2021-04-28 17:23:26 +03:00
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-overlaid-douglas-64-chaikin-50k}
2021-04-29 21:43:32 +03:00
\caption{{\DP} + Chaikin's}
2021-04-28 17:23:26 +03:00
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-overlaid-visvalingam-64-chaikin-50k}
2021-04-29 21:43:32 +03:00
\caption{{\VW} + Chaikin's}
2021-04-28 17:23:26 +03:00
\end{subfigure}
2021-04-29 21:43:32 +03:00
\caption{Zoomed-in generalized and smoothened river + original.}
2021-04-28 17:23:26 +03:00
\label{fig:salvis-overlaid-generalized-chaikin-50k}
\end{figure}
2021-04-29 16:51:30 +03:00
The resulting generalized and smoothened example
(figure~\onpage{fig:salvis-generalized-chaikin-50k}) yields a more
aesthetically pleasant result, however, it obscures natural river features.
Given the absence of rocks, the only natural features that influence the river
direction are topographic:
2021-04-26 10:16:20 +03:00
\begin{itemize}
2021-04-29 16:51:30 +03:00
\item Relatively straight river (completely straight or with small-angled
bends over a relatively long distance) implies greater slope, more
water, and/or faster flow.
2021-04-29 18:08:21 +03:00
\item Bendy river, on the contrary, implies slower flow, slighter slope,
2021-04-29 16:51:30 +03:00
and/or less water.
2021-04-26 10:16:20 +03:00
\end{itemize}
2021-04-29 18:08:21 +03:00
Both {\VW} and {\DP} have a tendency to remove the small bends altogether, a
valuable characterization of the river.
Sometimes low-water rivers in slender slopes have many bends next to each
other. In low resolutions (either in small-DPI screens or paper, or when the
river is sufficiently zoomed out, or both), the small bends will amalgamate to
2021-04-29 18:22:27 +03:00
a unintelligible blob. Figure~\onpage{fig:pixel-amalgamation} illustrates two
real-world examples where a bendy river, normally 1 or 2 pixels wide, creates a
wide area, of which the shapes of the bend are unintelligible. In this example,
classical algorithms would remove these bends altogether. A cartographer would
retain a few of those distinctive bends, but would increase the distance
between the bends, remove some of the bends, or both.
2021-04-29 18:08:21 +03:00
\begin{figure}[h]
\includegraphics[width=\textwidth]{amalgamate1}
\caption{Narrow bends amalgamating into large unintelligible blobs}
\label{fig:pixel-amalgamation}
\end{figure}
2021-04-29 18:22:27 +03:00
For the reasons discussed in this section, the "classical" {\DP} and {\VW} are
not well suited for natural river generalization, and a more robust line
generalization algorithm is worthwhile for to look for.
2021-04-26 10:16:20 +03:00
2021-04-17 16:16:27 +03:00
\subsubsection{Modern approaches}
2021-04-01 15:32:16 +03:00
2021-04-29 16:51:30 +03:00
% TODO:
% https://pdfs.semanticscholar.org/e80b/1c64345583eb8f7a6c53834d1d40852595d5.pdf
% A New Algorithm for Cartographic Simplification of Streams and Lakes Using
% Deviation Angles and Error Bands
2021-04-01 15:32:16 +03:00
Due to their simplicity and ubiquity, {\DP} and {\VW} have been established as
2021-04-01 19:30:20 +03:00
go-to algorithms for line generalization. During recent years, alternatives
have emerged. These modern replacements fall into roughly two categories:
2021-04-01 09:42:08 +03:00
2021-03-30 17:37:13 +03:00
\begin{itemize}
2021-04-13 09:12:06 +03:00
2021-03-30 17:37:13 +03:00
\item Cartographic knowledge was encoded to an algorithm (bottom-up
2021-04-13 09:12:06 +03:00
approach). One among these are \titlecite{wang1998line}, also known
as {\WM}'s algorithm.
2021-04-01 09:42:08 +03:00
\item Mathematical shape transformation which yields a more cartographic
2021-04-13 09:12:06 +03:00
result. E.g. \titlecite{jiang2003line},
\titlecite{dyken2009simultaneous}, \titlecite{mustafa2006dynamic},
\titlecite{nollenburg2008morphing}.
2021-03-30 17:37:13 +03:00
\end{itemize}
2021-04-01 15:32:16 +03:00
Authors of most of the aforementioned articles have implemented the
2021-04-24 17:56:13 +03:00
generalization algorithm, at least to generate the illustrations in the
articles. However, code is not available for evaluation with a desired data
set, much less for use as a basis for creating new maps. To author's knowledge,
{\WM}\cite{wang1998line} is available in a commercial product, but requires a
purchase of the commercial product suite, without a way to license the
standalone algorithm.
2021-04-01 15:32:16 +03:00
2021-04-01 16:13:29 +03:00
Lack of robust openly available generalization algorithm implementations poses
a problem for map creation with free software: there is not a similar
2021-04-01 15:32:16 +03:00
high-quality simplification algorithm to create down-scaled maps, so any
cartographic work, which uses line generalization as part of its processing,
will be of sub-par quality. We believe that availability of high-quality
open-source tools is an important foundation for future cartographic
experimentation and development, thus it it benefits the cartographic society
as a whole.
2021-03-30 17:37:13 +03:00
2021-04-24 17:56:13 +03:00
{\WM}'s commercial availability signals something about the value of the
algorithm: at least the authors of the commercial software suite deemed it
worthwhile to include it. However, not everyone has access to the commercial
software suite, access to funds to buy the commercial suite, or access to the
operating system required to run the commercial suite. PostGIS, in contrast, is
free on itself, and runs on free platforms. Therefore, algorithm
implementations that run on PostGIS or other free platforms are useful to a
wider cartographic society than proprietary ones.
2021-04-17 16:16:27 +03:00
\subsection{Problematic with generalization of rivers}
2021-04-13 10:36:47 +03:00
2021-04-13 04:49:27 +03:00
\section{Methodology}
2021-04-01 09:42:08 +03:00
\label{sec:methodology}
2021-03-30 17:37:13 +03:00
2021-04-13 09:12:06 +03:00
The original {\WM}'s algorithm \cite{wang1998line} leaves something to be
desired for a practical implementation: it is not straightforward to implement
the algorithm from the paper alone.
2021-04-01 16:24:47 +03:00
Explanations in this document are meant to expand, rather than substitute, the
2021-04-13 09:12:06 +03:00
original description in {\WM}. Therefore familiarity with the original paper is
2021-04-24 17:56:13 +03:00
assumed, and, for some sections, having the original close-by is necessary to
2021-04-13 09:12:06 +03:00
meaningfully follow this document.
2021-04-24 17:56:13 +03:00
This paper describes {\WM} in detail that is more useful for anyone who wishes
to follow the algorithm implementation more closely: each section is expanded
with additional commentary, and richer illustrations for non-obvious steps. In
many cases, corner cases are discussed and clarified.
2021-04-01 16:13:29 +03:00
2021-04-24 17:56:13 +03:00
Assume Euclidean geometry throughout this document, unless noted otherwise.
2021-04-12 12:00:17 +03:00
2021-04-13 04:49:27 +03:00
\subsection{Vocabulary and terminology}
2021-04-24 23:45:03 +03:00
\label{sec:vocab}
This section defines vocabulary and terms as defined in the rest of the paper.
\begin{description}
\item[Vertex] is a point on a plane, can be expressed by a pair of $(x,y)$
coordinates.
2021-04-26 11:54:00 +03:00
\item[Line Segment] or \textsc{segment} joins two vertices by a straight
line. A segment can be expressed by two coordinate pairs: $(x_1, y_1)$
and $(x_2, y_2)$. Line Segment and Segment are used interchangeably
2021-04-12 17:09:01 +03:00
throughout the paper.
2021-04-26 11:54:00 +03:00
\item[Line] or \textsc{linestring}, represents a single linear feature in
2021-04-24 17:56:13 +03:00
the real world. For example, a river or a coastline.
Geometrically, A line is a series of connected line segments, or,
equivalently, a series of connected vertices. Each vertex connects to
two other vertices, except those vertices at either ends of the line:
these two connect to a single other vertex.
2021-04-12 19:19:10 +03:00
\item[Bend] is a subset of a line that humans perceive as a curve. The
geometric definition is complex and is discussed in
2021-04-26 11:54:00 +03:00
section~\ref{sec:definition-of-a-bend}.
2021-04-12 14:14:35 +03:00
2021-04-13 09:20:50 +03:00
\item[Baseline] is a line between bend's first and last vertex.
2021-04-13 10:05:52 +03:00
\item[Sum of inner angles] TBD.
2021-04-24 23:45:03 +03:00
\item[Algorithmic Complexity] also called \textsc{big o notation}, is a
2021-04-26 11:54:00 +03:00
relative measure to explain how long will the algorithm runs depending
on it's input. It is widely used in computing science when discussing
the efficiency of a given algorithm.
For example, given $n$ objects and time complexity of $O(log(n))$, the
time it takes to execute the algorithm is logarithmic to $n$.
Conversely, if complexity is $O(n^2)$, then the time it takes to
execute the algorithm is quadratic depending on the input. Importantly,
if the input size doubles, the time it takes to run the algorithm
quadruples.
$O$ notation was first suggested by
2021-04-26 11:21:36 +03:00
Bachmann\cite{bachmann1894analytische} and
2021-04-28 12:33:30 +03:00
Landau\cite{landau1911} in late XIX'th century, and clarified
2021-04-26 11:54:00 +03:00
and popularized for computing science by Donald
Knuth\cite{knuth1976big} in the 1970s.
2021-04-24 23:45:03 +03:00
\end{description}
2021-04-13 04:49:27 +03:00
\subsection{Automated tests}
2021-04-12 12:04:56 +03:00
\label{sec:automated-tests}
2021-04-01 16:13:29 +03:00
2021-04-01 16:24:47 +03:00
As part of the algorithm realization, an automated test suite has been
developed. Shapes to test each function have been hand-crafted and expected
results have been manually calculated. The test suite executes parts of the
algorithm against a predefined set of geometries, and asserts that the output
matches the resulting hand-calculated geometry.
2021-04-12 12:00:17 +03:00
The full set of test geometries is visualized in
2021-04-24 17:56:13 +03:00
figure~\ref{fig:test-figures}.
2021-04-01 20:54:20 +03:00
2021-04-12 14:14:35 +03:00
\begin{figure}[h]
2021-04-01 19:30:20 +03:00
\centering
2021-04-18 18:11:21 +03:00
\includegraphics[width=\textwidth]{test-figures}
2021-04-29 21:43:32 +03:00
\caption{Geometries for automated test cases.}
2021-04-01 19:30:20 +03:00
\label{fig:test-figures}
\end{figure}
2021-04-01 16:24:47 +03:00
The full test suite can be executed with a single command, and completes in a
few seconds. Having an easily accessible test suite boosts confidence that no
2021-04-03 12:37:59 +03:00
unexpected bugs have snug in while modifying the algorithm.
2021-04-24 23:45:03 +03:00
\subsection{Reproducing generalizations in this paper}
\label{sec:reproducing-the-paper}
It is widely believed that the ability to reproduce the results of a published
study is important to the scientific community. In practice, however, it is
often hard to impossible: research methodologies, as well as algorithms
themselves, are explained in prose, which, due to the nature of the non-machine
language, lends itself to inexact interpretations.
This article, besides explaining the algorithm in prose, \emph{includes} the
program of the algorithm in a way that can be executed on reader's workstation.
On top of it, all the illustrations in this paper are generated using that
algorithm, from a predefined list of test geometries (test geometries were
explained in section~\ref{sec:automated-tests}).
Instructions how to re-generate all the visualizations are found in
appendix~\ref{sec:code-regenerate}. The visualization code serves as a good
example reference for anyone willing to start using the algorithm.
2021-04-13 04:49:27 +03:00
\section{Description of the implementation}
2021-04-01 16:13:29 +03:00
2021-04-24 17:56:13 +03:00
Like alluded in section~\ref{sec:introduction}, {\WM} paper skims over
2021-04-13 08:31:01 +03:00
certain details, which are important to implement the algorithm. This section
goes through each algorithm stage, illustrating the intermediate steps and
explaining the author's desiderata for a more detailed description.
2021-04-12 12:04:56 +03:00
Illustrations of the following sections are extracted from the automated test
cases, which were written during the algorithm implementation (as discussed in
section~\onpage{sec:automated-tests}).
2021-04-24 17:56:13 +03:00
Illustrated lines are black. Bends themselves are linear features.
Discriminating between bends in illustrations might be tricky, because
sometimes a single \textsc{line segment} can belong to two bends.
Given that, there is another way to highlight bends in a schematic drawing: by
converting them to polygons and by altering their background colors. It works
as follows:
2021-04-12 10:26:20 +03:00
\begin{itemize}
2021-04-12 12:04:56 +03:00
\item Join the first and last vertices of the bend, creating a polygon.
\item Color the polygons using distinct colors.
2021-04-12 10:26:20 +03:00
\end{itemize}
2021-04-24 17:56:13 +03:00
This type of illustration works quite well, since polygons created from bends
are almost never overlapping, and discriminating different backgrounds is
easier than discriminating different line shapes or colors.
\subsection{Debugging}
NOTE: this will explain how intermediate debugging tables (\texttt{wm\_debug})
work. This is not related to the algorithm, but the only the implementation
itself (probably should come together with paper's regeneration and unit
tests).
2021-04-29 18:22:27 +03:00
\subsection{Merging pieces of the river into one}
NOTE: explain how different river segments are merged into a single line. This
is not explained in the {\WM} paper, but is a necessary prerequisite. This is
implemented in \texttt{aggregate-rivers.sql}.
2021-05-03 15:14:06 +03:00
\subsection{Bend scaling and dimensions}
\label{sec:bend-scaling-and-dimensions}
{\WM} accepts a single input parameter: the diameter of a half-circle. If the
bend's adjusted size (explained in detail in
section~\onpage{sec:shape-of-a-bend}) is greater than the area of the
half-circle, then the bend will be left untouched. If the bend's adjusted size
is smaller than the area of the provided half-circle, the bend will be
simplified: either exaggerated, combined or eliminated.
The half-circle's diameter depends on the desired scale of the target map: it
should be small enough to retain small but visible bends,
The extent of line simplification depends on the desired target scale.
Simplification should be more aggressive for smaller target scales, and
less aggressive for larger scales. This section goes through the process
of finding the correct variable to {\WM} algorithm.
What is the minimal, but still eligible figure that can should be displayed on
the map?
According to \titlecite{cartoucheMinimalDimensions}, the map is typically held
at a distance of 30cm. Recommended minimum symbol size given viewing distance
of 45cm (1.5 feet) is 1.5mm, as analyzed in \titlecite{mappingunits}.
In our case, our target is line bend, rather than a symbol. Assume 1.5mm is a
diameter of the bend. A semi-circle of 1.5mm diameter is depicted in
figure~\ref{fig:half-circle}. In other words, a bend of this size or larger,
when adjusted to scale, will not be generalized.
\begin{figure}[h]
\centering
\begin{tikzpicture}[x=1mm,y=1mm]
\draw[] (-10, 0) -- (-.75,0) arc (225:-45:.75) -- (10, 0);
\end{tikzpicture}
2021-05-03 16:10:53 +03:00
\caption{Smallest feature that will be not generalized (to scale).}
2021-05-03 15:14:06 +03:00
\label{fig:half-circle}
\end{figure}
{\WM} algorithm does not have a notion of scale, but it does have a notion of
2021-05-03 16:10:53 +03:00
distance: it accepts a single parameter $D$, the half-circle's diameter.
Assuming measurement units in projected coordinate system are meters (for
example, \titlecite{epsg3857}), conversion is depicted in
2021-05-03 15:14:06 +03:00
table~\ref{table:scale-halfcirlce-diameter}.
\newcolumntype{d}[1]{D{.}{.}{#1} }
\begin{table}[h]
\centering
2021-05-03 16:10:53 +03:00
\begin{tabular}{|c @{:}r | D{.}{.}{1} |}
2021-05-03 15:14:06 +03:00
\hline
2021-05-03 16:10:53 +03:00
\multicolumn{2}{|c|}{Scale} & \multicolumn{1}{c|}{$D(m)$} \\ \hline
1 & 10000 & 15 \\ \hline
1 & 15000 & 22.5 \\ \hline
1 & 25000 & 37.5 \\ \hline
1 & 50000 & 50 \\ \hline
1 & 250000 & 250 \\ \hline
2021-05-03 15:14:06 +03:00
\end{tabular}
2021-05-03 16:10:53 +03:00
\caption{{\WM} parameter $D$ (half-circle diameter) for popular scales.}
2021-05-03 15:14:06 +03:00
\label{table:scale-halfcirlce-diameter}
\end{table}
2021-04-13 04:49:27 +03:00
\subsection{Definition of a Bend}
2021-04-12 12:00:17 +03:00
\label{sec:definition-of-a-bend}
2021-04-01 16:13:29 +03:00
2021-04-12 17:09:01 +03:00
The original article describes a bend as:
2021-04-12 16:58:44 +03:00
2021-04-14 17:28:45 +03:00
\begin{displaycquote}{wang1998line}
2021-04-12 16:58:44 +03:00
A bend can be defined as that part of a line which contains a number of
subsequent vertices, with the inflection angles on all vertices included in
the bend being either positive or negative and the inflection of the bend's
two end vertices being in opposite signs.
2021-04-14 17:28:45 +03:00
\end{displaycquote}
2021-04-12 16:58:44 +03:00
2021-04-12 17:09:01 +03:00
While it gives a good intuitive understanding of what the bend is, this section
provides more technical details. Here are some non-obvious characteristics that
are necessary when writing code to detect the bends:
2021-04-12 16:58:44 +03:00
\begin{itemize}
\item End segments of each line should also belong to bends. That way, all
segments belong to 1 or 2 bends.
\item First and last segments of each bend (except for the two end-line
2021-04-24 17:56:13 +03:00
segments) are also the first vertex of the next bend.
2021-04-12 16:58:44 +03:00
\end{itemize}
Properties above may be apparent when looking at illustrations at this article
or reading here, but they are nowhere as such when looking at the original
article.
2021-04-10 19:40:04 +03:00
2021-04-24 17:56:13 +03:00
Figure~\ref{fig:fig8-definition-of-a-bend} illustrates article's figure 8,
2021-04-12 17:09:01 +03:00
but with bends colored as polygons: each color is a distinctive bend.
\begin{figure}[h]
\centering
2021-04-18 18:11:21 +03:00
\includegraphics[width=\textwidth]{fig8-definition-of-a-bend}
2021-04-24 23:45:03 +03:00
\caption{Originally figure 8: detected bends are highlighted.}
2021-04-12 17:09:01 +03:00
\label{fig:fig8-definition-of-a-bend}
\end{figure}
2021-04-13 04:49:27 +03:00
\subsection{Gentle Inflection at End of a Bend}
2021-04-01 16:13:29 +03:00
2021-04-12 16:58:44 +03:00
The gist of the section is in the original article:
2021-04-14 17:28:45 +03:00
\begin{displaycquote}{wang1998line}
2021-04-12 16:58:44 +03:00
But if the inflection that marks the end of a bend is quite small, people
would not recognize this as the bend point of a bend
2021-04-14 17:28:45 +03:00
\end{displaycquote}
2021-04-12 16:58:44 +03:00
2021-04-24 17:56:13 +03:00
Figure~\ref{fig:fig5-gentle-inflection} visualizes original paper's figure 5,
when a single vertex is moved outwards the end of the bend.
\begin{figure}[h]
\centering
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{fig5-gentle-inflection-before}
2021-04-24 23:45:03 +03:00
\caption{Before applying the inflection rule.}
\end{subfigure}
\hfill
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{fig5-gentle-inflection-after}
2021-04-24 23:45:03 +03:00
\caption{After applying the inflection rule.}
\end{subfigure}
2021-04-24 23:45:03 +03:00
\caption{Originally figure 5: gentle inflections at the ends of the bend.}
\label{fig:fig5-gentle-inflection}
\end{figure}
2021-04-12 16:58:44 +03:00
The illustration for this section was clear, but insufficient: it does not
specify how many vertices should be included when calculating the end-of-bend
2021-04-24 17:56:13 +03:00
inflection. The iterative approach was chosen --- as long as the angle is "right"
2021-04-12 14:14:35 +03:00
and the distance is decreasing, the algorithm should keep re-assigning vertices
to different bends; practically not having an upper bound on the number of
iterations.
2021-04-12 10:10:39 +03:00
2021-04-12 16:58:44 +03:00
To prove that the algorithm implementation is correct for multiple vertices,
additional example was created, and illustrated in
figure~\ref{fig:inflection-1-gentle-inflection}: the rule re-assigns two
2021-04-24 17:56:13 +03:00
vertices to the next bend.
2021-04-12 10:10:39 +03:00
\begin{figure}[h]
\centering
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{inflection-1-gentle-inflection-before}
2021-04-24 23:45:03 +03:00
\caption{Before applying the inflection rule.}
2021-04-12 10:10:39 +03:00
\end{subfigure}
\hfill
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{inflection-1-gentle-inflection-after}
2021-04-24 23:45:03 +03:00
\caption{After applying the inflection rule.}
2021-04-12 10:10:39 +03:00
\end{subfigure}
2021-04-24 17:56:13 +03:00
\caption{Gentle inflection at the end of the bend when multiple vertices
2021-04-24 23:45:03 +03:00
are moved.}
\label{fig:inflection-1-gentle-inflection}
2021-04-12 10:10:39 +03:00
\end{figure}
2021-04-24 17:56:13 +03:00
Note that to find and fix the gentle bends' inflections, the algorithm should
run twice, both ways. Otherwise, if it is executed only one way, the steps will
fail to match some bends that should be adjusted. Current implementation works
as follows:
2021-04-12 20:14:27 +03:00
\begin{enumerate}
\item Run the algorithm from beginning to the end.
\item \label{rev1} Reverse the line and each bend.
\item Run the algorithm again.
\item \label{rev2} Reverse the line and each bend.
\item Return result.
\end{enumerate}
2021-04-24 17:56:13 +03:00
Reversing the line and its bends is straightforward to implement, but costly:
the two reversal steps cost additional time and memory. The algorithm could be
made more optimal with a similar version of the algorithm, but the one which
goes backwards. In this case, steps \ref{rev1} and \ref{rev2} could be spared,
that way saving memory and computation time.
2021-04-13 04:49:27 +03:00
The "quite small angle" was arbitrarily chosen to $\smallAngle$.
2021-04-13 04:26:19 +03:00
2021-04-13 04:49:27 +03:00
\subsection{Self-line Crossing When Cutting a Bend}
2021-04-01 16:13:29 +03:00
2021-04-24 17:56:13 +03:00
When bend's baseline crosses another bend, it is called self-crossing.
2021-04-24 23:45:03 +03:00
Self-crossing is undesirable for the upcoming bend manipulation operators, thus
should be removed. There are a few rules on when and how they should be removed
--- this section explains them in higher detail, discusses their time
complexity and applied optimizations. Figure~\ref{fig:fig6-selfcrossing} is
copied from the original article.
2021-04-13 09:53:44 +03:00
\begin{figure}[h]
\centering
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
2021-04-17 17:19:41 +03:00
\includegraphics[width=\textwidth]{fig6-selfcrossing-before}
2021-04-24 23:45:03 +03:00
\caption{Bend's baseline (dotted) is crossing a neighboring bend.}
2021-04-13 09:53:44 +03:00
\end{subfigure}
\hfill
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
2021-04-17 17:19:41 +03:00
\includegraphics[width=\textwidth]{fig6-selfcrossing-after}
2021-04-24 23:45:03 +03:00
\caption{Self-crossing removed.}
2021-04-13 09:53:44 +03:00
\end{subfigure}
2021-04-24 23:45:03 +03:00
\caption{Originally figure 6: simple case of self-line crossing.}
2021-04-17 17:19:41 +03:00
\label{fig:fig6-selfcrossing}
2021-04-13 09:53:44 +03:00
\end{figure}
2021-04-13 09:20:50 +03:00
2021-04-17 17:19:41 +03:00
\begin{figure}[h]
\centering
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
2021-04-17 17:19:41 +03:00
\includegraphics[width=\textwidth]{selfcrossing-1-before}
2021-04-24 23:45:03 +03:00
\caption{Bend's baseline (dotted) is crossing a non-neighboring bend.}
2021-04-17 17:19:41 +03:00
\end{subfigure}
\hfill
2021-04-20 18:21:07 +03:00
\begin{subfigure}[b]{.49\textwidth}
2021-04-17 17:36:32 +03:00
\includegraphics[width=\textwidth]{selfcrossing-1-after}
2021-04-24 23:45:03 +03:00
\caption{Self-crossing removed.}
2021-04-17 17:36:32 +03:00
\end{subfigure}
2021-04-24 23:45:03 +03:00
\caption{Self-crossing with non-neighboring bend.}
2021-04-17 17:19:41 +03:00
\label{fig:selfcrossing-1-non-neighbor}
\end{figure}
2021-04-24 23:45:03 +03:00
Looking at the {\WM} paper alone, it may seem like self-crossing may happen
only with the neighboring bend. This would mean an efficient $O(n)$
implementation\footnote{where $n$ is the number of bends in a line. See
explanation of \textsc{algorithmic complexity} in section~\ref{sec:vocab}.}.
However, as one can see in figure~\ref{fig:selfcrossing-1-non-neighbor}, it may
not be the case: any other bend in the line may be crossing it.
If one translates the requirements to code in a straightforward way, it would
be quite computationally expensive: naively implemented, complexity of checking
every bend with every bend is $O(n^2)$. In other words, the time it takes to
run the algorithm grows quadratically with the with the number of vertices.
2021-05-03 15:42:33 +03:00
It is possible to optimize this step and skip checking a large number of bends.
Only bends whose sum of inner angles is larger than $180^\circ$ can ever
self-cross. That way, only a fraction of bends need to be checked. The
worst-case complexity is still $O(n^2)$, when all bends' inner angles are
larger than $180^\circ$. Having this optimization, the algorithmic complexity
(as a result, the time it takes to execute the algorithm) is drops by the
fraction of bends whose sum of inner angles is smaller than $180^\circ$.
2021-04-13 10:05:52 +03:00
2021-04-13 04:49:27 +03:00
\subsection{Attributes of a Single Bend}
2021-04-01 16:13:29 +03:00
2021-04-14 16:48:11 +03:00
\textsc{Compactness Index} is "the ratio of the area of the polygon over the
2021-04-14 10:46:56 +03:00
circle whose circumference length is the same as the length of the
2021-04-14 17:29:52 +03:00
circumference of the polygon" \cite{wang1998line}. Given a bend, its
compactness index is calculated as follows:
2021-04-14 10:46:56 +03:00
\begin{enumerate}
2021-04-14 11:49:52 +03:00
\item Construct a polygon by joining first and last vertices of the bend.
2021-04-14 10:46:56 +03:00
2021-04-30 14:30:12 +03:00
\item Calculate area of the polygon $A_p$.
2021-04-14 10:46:56 +03:00
2021-04-30 14:30:12 +03:00
\item Calculate perimeter $P$ of the polygon. The same value is the
circumference of the circle: $C = P$.
2021-04-14 10:46:56 +03:00
2021-04-30 14:30:12 +03:00
\item Given circle's circumference $C$, circle's area $A_c$ is:
2021-04-14 10:46:56 +03:00
\[
2021-04-30 14:30:12 +03:00
A_{c} = \frac{C^2}{4\pi}
2021-04-14 10:46:56 +03:00
\]
2021-04-30 14:30:12 +03:00
\item Compactness index is $\frac{A_p}{A_c}$:
2021-04-14 10:46:56 +03:00
\[
2021-04-30 14:30:12 +03:00
cmp = \frac{A_p}{A_c} =
\frac{A_p}{ \frac{C^2}{4\pi} } =
\frac{4\pi A_p}{C^2}
2021-04-14 10:46:56 +03:00
\]
\end{enumerate}
2021-04-14 16:48:11 +03:00
Other than that, once this section is implemented, each bend will have a list
of properties, upon which actions later will be performed.
2021-04-13 04:49:27 +03:00
\subsection{Shape of a Bend}
2021-05-03 15:14:06 +03:00
\label{sec:shape-of-a-bend}
2021-04-01 16:13:29 +03:00
2021-04-14 16:48:11 +03:00
This section introduces \textsc{adjusted size}, which trivially derives from
\textsc{compactness index} $cmp$ and shape's area $A$:
\[
adjsize = \frac{0.75 A}{cmp}
\]
Adjusted size becomes necessary later to compare bends with each other, and
find out similar ones.
2021-04-14 17:28:45 +03:00
\subsection{Isolated Bend}
2021-04-14 20:03:34 +03:00
Bend itself and its "isolation" can be described by \textsc{average curvature},
2021-04-14 17:28:45 +03:00
which is \textcquote{wang1998line}{geometrically defined as the ratio of
inflection over the length of a curve.}
2021-04-14 20:03:34 +03:00
Two conditions must be true to claim that a bend is isolated:
\begin{enumerate}
\item \textsc{average curvature} of neighboring bends, should be larger
2021-04-24 23:45:03 +03:00
than the "candidate" bend's curvature. The article did not offer a
value, this implementation arbitrarily chose $\isolationThreshold$.
2021-04-14 20:03:34 +03:00
\item Bends on both sides of the "candidate" should be longer than a
certain value. This implementation does not (yet) define such a
constraint and will only follow the average curvature constraint above.
\end{enumerate}
2021-04-13 04:49:27 +03:00
\subsection{The Context of a Bend: Isolated and Similar Bends}
2021-04-01 16:13:29 +03:00
2021-04-14 16:48:11 +03:00
To find out whether two bends are similar, they are compared by 3 components:
\begin{enumerate}
2021-04-14 17:32:00 +03:00
\item \textsc{adjusted size}
\item \textsc{compactness index}
2021-04-14 16:48:11 +03:00
\item Baseline length
\end{enumerate}
2021-04-24 23:45:03 +03:00
Components 1, 2 and 3 represent a point in a 3-dimensional space, and Euclidean
2021-04-14 16:48:11 +03:00
distance $d$ between those is calculated to differentiate between bends $p$ and
$q$:
\[
2021-04-14 17:28:45 +03:00
d(p,q) = \sqrt{(adjsize_p-adjsize_q)^2 +
(cmp_p-cmp_q)^2 +
(baseline_p-baseline_q)^2}
2021-04-14 16:48:11 +03:00
\]
2021-04-14 17:28:45 +03:00
2021-04-24 23:45:03 +03:00
The smaller the distance $d$, the more similar the bends are.
2021-04-14 16:48:11 +03:00
2021-04-13 04:49:27 +03:00
\subsection{Elimination Operator}
2021-04-01 16:13:29 +03:00
2021-04-29 18:22:27 +03:00
NOTE: not implemented.
2021-04-13 04:49:27 +03:00
\subsection{Combination Operator}
2021-04-01 16:13:29 +03:00
2021-04-29 18:22:27 +03:00
NOTE: not implemented.
2021-04-13 04:49:27 +03:00
\subsection{Exaggeration Operator}
2021-04-01 16:13:29 +03:00
2021-04-29 18:22:27 +03:00
NOTE: not implemented.
2021-04-13 04:49:27 +03:00
\section{Program Implementation}
2021-04-01 16:13:29 +03:00
2021-04-29 18:22:27 +03:00
NOTE: this should provide a higher-level overview of the written code:
\begin{itemize}
\item State machine (which functions call when).
\item Algorithmic complexity.
\item Expected runtime given the number of bends/vertices, some performance
experiments.
\end{itemize}
2021-04-13 04:49:27 +03:00
\section{Results of Experiments}
2021-04-01 16:13:29 +03:00
2021-04-29 18:22:27 +03:00
NOTE: this can only be filled after the algorithm implementation is complete.
2021-04-13 04:49:27 +03:00
\section{Conclusions}
2021-03-30 17:37:13 +03:00
\label{sec:conclusions}
2021-04-29 18:22:27 +03:00
NOTE: write when all the sections before this are be complete.
2021-04-13 04:49:27 +03:00
\section{Related Work and future suggestions}
2021-03-30 17:37:13 +03:00
\label{sec:related_work}
2021-04-29 18:22:27 +03:00
NOTE: write after section~\ref{sec:conclusions} is complete.
2021-03-30 17:37:13 +03:00
\printbibliography
\begin{appendices}
2021-04-13 04:49:27 +03:00
\section{Code listings}
2021-03-30 17:37:13 +03:00
2021-04-29 18:22:27 +03:00
This section contains code listings of a subset of files tightly related to the
{\WM} algorithm.
2021-04-24 23:45:03 +03:00
\subsection{Re-generating this paper}
\label{sec:code-regenerate}
2021-04-13 09:20:50 +03:00
2021-04-24 23:45:03 +03:00
Like explained in section~\ref{sec:reproducing-the-paper}, illustrations in
this paper are generated from a small list of sample geometries. To observe
the source geometries or regenerate this paper, run this script (assuming
name of this document is {\tt mj-msc-full.pdf}):
2021-03-30 17:37:13 +03:00
2021-04-24 12:49:37 +03:00
\inputcode{bash}{extract-and-generate}
2021-03-30 17:37:13 +03:00
2021-04-29 21:40:15 +03:00
\subsection{Function \texttt{ST\_SimplifyWV}}
2021-04-28 12:33:30 +03:00
\inputcode{postgresql}{wm.sql}
2021-04-13 09:20:50 +03:00
2021-04-29 21:40:15 +03:00
\subsection{Function \texttt{aggregate\_rivers}}
2021-04-29 18:22:27 +03:00
\inputcode{postgresql}{aggregate-rivers.sql}
2021-03-30 17:37:13 +03:00
\end{appendices}
\end{document}