wm/mj-msc.tex
2021-05-19 22:57:47 +03:00

571 lines
20 KiB
TeX

\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{a4wide}
\usepackage [autostyle, english=american]{csquotes}
\MakeOuterQuote{"}
\usepackage[maxbibnames=99,style=numeric,sorting=none]{biblatex}
\addbibresource{bib.bib}
\usepackage[pdfusetitle]{hyperref}
\usepackage{enumitem}
\usepackage[toc,page,title]{appendix}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{gensymb}
\usepackage{units}
\usepackage{varwidth}
\usepackage{tabularx}
\usepackage{float}
\usepackage{tikz}
\usepackage{minted}
\usepackage{fancyvrb}
\input{version.inc}
\input{vars.inc}
\newcommand{\onpage}[1]{\ref{#1} on page~\pageref{#1}}
\newcommand{\titlecite}[1]{\citetitle{#1} \cite{#1}}
\newcommand{\DP}{Douglas \& Peucker}
\newcommand{\VW}{Visvalingam--Whyatt}
\newcommand{\WM}{Wang--M{\"u}ller}
\newcommand{\MYTITLE}{Cartographic Generalization of Lines using free software (example of rivers)}
\newcommand{\MYAUTHOR}{Motiejus Jakštys}
\title{\MYTITLE}
\author{\MYAUTHOR}
\date{\VCDescribe}
\begin{document}
\begin{titlepage}
\begin{center}
\includegraphics[width=0.4\textwidth]{vu}
\huge
\textbf{\MYTITLE} \\[4ex]
\LARGE
\textbf{\MYAUTHOR} \\[8ex]
\vfill
A thesis presented for the degree of\\
Master in Cartography \\[3ex]
\large
\VCDescribe
\end{center}
\end{titlepage}
\begin{abstract}
\label{sec:abstract}
Current open-source line generalization solutions have their roots in
mathematics and geometry, and are not fit for natural objects like rivers
and coastlines. This paper discusses our implementation of {\WM} algorithm
under and open-source license, explains things that we would had
appreciated in the original paper and compares our results to different
generalization algorithms.
\end{abstract}
\newpage
\tableofcontents
\listoffigures
\newpage
\section{Introduction}
\label{sec:introduction}
When creating small-scale maps, often the detail of the data source is greater
than desired for the map. This becomes especially acute for natural features
that have many bends, like coastlines, rivers and forest boundaries.
To create a small-scale map from a large-scale data source, these features need
to be generalized: detail should be reduced. However, while doing so, it is
important to preserve the "defining" shape of the original feature, otherwise
the result will look unrealistic.
For example, if a river is nearly straight, it should be nearly straight after
generalization, otherwise a too straightened river will look like a canal.
Conversely, if the river is highly wiggly, the number of bends should be
reduced, but not removed.
Generalization problem for other objects can often be solved by other
non-geometric means:
\begin{itemize}
\item Towns and cities can be filtered and generalized by number of
inhabitants.
\item Roads can be eliminated by the road length, number of lanes, or
classification of the road (local, regional, international).
\end{itemize}
Natural line generalization problem can be viewed as having two competing
goals:
\begin{itemize}
\item Reduce detail by removing or simplifying "less important" features.
\item Retain enough detail, so the original is still recognize-able.
\end{itemize}
Given the discussed complexities, a fine line between under-generalization
(leaving object as-is) and over-generalization (making a straight line) must be
found. Therein lies the complexity of generalization algorithms: all have
different trade-offs.
\section{Literature review}
\label{sec:literature-review}
A number of cartographic line generalization algorithms have been researched.
The "classical" ones are {\DP} and {\VW}.
\subsection{{\DP} and {\VW}}
{\DP} \cite{douglas1973algorithms} and {\VW} \cite{visvalingam1993line} are
"classical" line generalization computer graphics algorithms. They are
relatively simple to implement, require few runtime resources. Both of them
accept only a single parameter, based on desired scale of the map, which makes
them very simple to adjust for different scales.
Both algorithms are part of PostGIS, a free-software GIS suite:
\begin{itemize}
\item {\DP} via
\href{https://postgis.net/docs/ST_Simplify.html}{PostGIS Simplify}.
\item {\VW} via
\href{https://postgis.net/docs/ST_SimplifyVW.html}{PostGIS SimplifyVW}.
\end{itemize}
Since both algorithms produce jagged output lines, it is worthwhile to process
those through a widely available Chaikin's line smoothing
algorithm \cite{chaikin1974algorithm} via
\href{https://postgis.net/docs/ST_ChaikinSmoothing.html}{PostGIS
ChaikinSmoothing}.
Even though {\DP} and {\VW} are simple to understand and computationally
efficient, they have serious deficiencies for cartographic natural line
generalization.
<TODO: expand on deficiencies>
\subsection{Modern approaches}
Due to their simplicity and ubiquity, {\DP} and {\VW} have been established as
go-to algorithms for line generalization. During recent years, alternatives
have emerged. These modern replacements fall into roughly two categories:
\begin{itemize}
\item Cartographic knowledge was encoded to an algorithm (bottom-up
approach). One among these are \titlecite{wang1998line}, also known
as {\WM}'s algorithm.
\item Mathematical shape transformation which yields a more cartographic
result. E.g. \titlecite{jiang2003line},
\titlecite{dyken2009simultaneous}, \titlecite{mustafa2006dynamic},
\titlecite{nollenburg2008morphing}.
\end{itemize}
Authors of most of the aforementioned articles have implemented the
generalization algorithm, at least to generate the visuals in the articles.
However, I wasn't able to find code for any of those to evaluate with my
desired data set, or use as a basis for my own maps. {\WM} \cite{wang1998line}
is available in a commercial product.
Lack of robust openly available generalization algorithm implementations poses
a problem for map creation with free software: there is not a similar
high-quality simplification algorithm to create down-scaled maps, so any
cartographic work, which uses line generalization as part of its processing,
will be of sub-par quality. We believe that availability of high-quality
open-source tools is an important foundation for future cartographic
experimentation and development, thus it it benefits the cartographic society
as a whole.
<TODO: expand on each paper>
\section{Methodology}
\label{sec:methodology}
The original {\WM}'s algorithm \cite{wang1998line} leaves something to be
desired for a practical implementation: it is not straightforward to implement
the algorithm from the paper alone.
Explanations in this document are meant to expand, rather than substitute, the
original description in {\WM}. Therefore familiarity with the original paper is
assumed, and, for some sections, having it close-by is necessary to
meaningfully follow this document.
In this paper we describe {\WM} in a detail that is more useful for algorithm:
each section will be expanded, with more elaborate and exact illustrations for
every step of the algorithm.
Algorithms discussed in this paper assume Euclidean geometry.
\subsection{Vocabulary and terminology}
This section defines vocabulary and terms as defined in the rest of the paper.
\begin{description}
\item[Vertex] is a point on a plane, can be expressed by a pair of $(x,y)$
coordinates.
\item[Line Segment (or Segment)] joins two vertices by a straight line. A
segment can be expressed by two coordinate pairs: $(x_1, y_1)$ and
$(x_2, y_2)$. Line Segment and Segment are used interchangeably
throughout the paper.
\item[Line] represents a single linear feature in the real world. For
example, a river or a coastline. {\tt LINESTRING} in GIS terms.
Geometrically, A line is a series of connected line segments, or,
equivalently, a series of connected vertices. Each vertex connects to
two other vertices, except those vertices at either ends of the line:
these two connect to a single other vertex.
\item[Bend] is a subset of a line that humans perceive as a curve. The
geometric definition is complex and is discussed in
section~\onpage{sec:definition-of-a-bend}.
\item[Baseline] is a line between bend's first and last vertex.
\item[Sum of inner angles] TBD.
\end{description}
\subsection{Radians and Degrees}
This document contains a few constant angles expressed in radians.
Table~\ref{table:radians} summarizes some of the values used in this document
and the implementation.
\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Degrees & $30^\circ$ & $45^\circ$ & $90^\circ$ & $180^\circ$ & $360^\circ$ \\
\hline
Radians & $\nicefrac{\pi}{6}$ & $\nicefrac{\pi}{4}$ & $\nicefrac{\pi}{2}$ & $\pi$ & $2\pi$ \\
\hline
\end{tabular}
\caption{Popular degree and radian values}
\label{table:radians}
\end{table}
\subsection{Automated tests}
\label{sec:automated-tests}
As part of the algorithm realization, an automated test suite has been
developed. Shapes to test each function have been hand-crafted and expected
results have been manually calculated. The test suite executes parts of the
algorithm against a predefined set of geometries, and asserts that the output
matches the resulting hand-calculated geometry.
The full set of test geometries is visualized in
figure~\onpage{fig:test-figures}. The figure includes arrows depicting line
direction.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{test-figures}
\caption{Line geometries for automated test cases}
\label{fig:test-figures}
\end{figure}
The full test suite can be executed with a single command, and completes in a
few seconds. Having an easily accessible test suite boosts confidence that no
unexpected bugs have snug in while modifying the algorithm.
\section{Description of the implementation}
Like alluded in section~\onpage{sec:introduction}, {\WM} paper skims over
certain details, which are important to implement the algorithm. This section
goes through each algorithm stage, illustrating the intermediate steps and
explaining the author's desiderata for a more detailed description.
Illustrations of the following sections are extracted from the automated test
cases, which were written during the algorithm implementation (as discussed in
section~\onpage{sec:automated-tests}).
Lines in illustrations are black, and bends are heavily colored after
converting them to polygons. Bends are converted to polygons (for illustration
purposes) using the following algorithm:
\begin{itemize}
\item Join the first and last vertices of the bend, creating a polygon.
\item Color the polygons using distinct colors.
\end{itemize}
\subsection{Definition of a Bend}
\label{sec:definition-of-a-bend}
The original article describes a bend as:
\begin{displaycquote}{wang1998line}
A bend can be defined as that part of a line which contains a number of
subsequent vertices, with the inflection angles on all vertices included in
the bend being either positive or negative and the inflection of the bend's
two end vertices being in opposite signs.
\end{displaycquote}
While it gives a good intuitive understanding of what the bend is, this section
provides more technical details. Here are some non-obvious characteristics that
are necessary when writing code to detect the bends:
\begin{itemize}
\item End segments of each line should also belong to bends. That way, all
segments belong to 1 or 2 bends.
\item First and last segments of each bend (except for the two end-line
segments) is also the first vertex of the next bend.
\end{itemize}
Properties above may be apparent when looking at illustrations at this article
or reading here, but they are nowhere as such when looking at the original
article.
Figure~\ref{fig:fig8-definition-of-a-bend} illustrates article's Figure 8,
but with bends colored as polygons: each color is a distinctive bend.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{fig8-definition-of-a-bend}
\caption{Originally Figure 8: detected bends are highlighted}
\label{fig:fig8-definition-of-a-bend}
\end{figure}
\subsection{Gentle Inflection at End of a Bend}
The gist of the section is in the original article:
\begin{displaycquote}{wang1998line}
But if the inflection that marks the end of a bend is quite small, people
would not recognize this as the bend point of a bend
\end{displaycquote}
Figure~\ref{fig:fig5-gentle-inflection} visualizes original paper's Figure 5,
when a single vertex is moved outwards the end of the bend.
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{fig5-gentle-inflection-before}
\caption{Before applying the inflection rule}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{fig5-gentle-inflection-after}
\caption{After applying the inflection rule}
\end{subfigure}
\caption{Originally Figure 5: gentle inflections at the ends of the bend}
\label{fig:fig5-gentle-inflection}
\end{figure}
The illustration for this section was clear, but insufficient: it does not
specify how many vertices should be included when calculating the end-of-bend
inflection. We chose the iterative approach --- as long as the angle is "right"
and the distance is decreasing, the algorithm should keep re-assigning vertices
to different bends; practically not having an upper bound on the number of
iterations.
To prove that the algorithm implementation is correct for multiple vertices,
additional example was created, and illustrated in
figure~\ref{fig:inflection-1-gentle-inflection}: the rule re-assigns two
vertices to the next bend instead of one.
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.45\textwidth}
\includegraphics[width=\textwidth]{inflection-1-gentle-inflection-before}
\caption{Before applying the inflection rule}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.45\textwidth}
\includegraphics[width=\textwidth]{inflection-1-gentle-inflection-after}
\caption{After applying the inflection rule}
\end{subfigure}
\caption{Gentle inflection at the end of the bend when multiple vertices is moved}
\label{fig:inflection-1-gentle-inflection}
\end{figure}
To find and fix the gentle bends' inflections requires to run the algorithm in
both directions; if implemented as documented, the steps will fail to match
some bends that should be mutated. This implementation does it in the following
way:
\begin{enumerate}
\item Run the algorithm from beginning to the end.
\item \label{rev1} Reverse the line and each bend.
\item Run the algorithm again.
\item \label{rev2} Reverse the line and each bend.
\item Return result.
\end{enumerate}
The current implementation is the most straightforward, but not optimal:
reversing of lines and bends could be avoided by walking backwards the lines.
In this case, steps \ref{rev1} and \ref{rev2} could be spared, thus saving
memory and computation time.
The "quite small angle" was arbitrarily chosen to $\smallAngle$.
\subsection{Self-line Crossing When Cutting a Bend}
When bend's baseline crosses another bend, it is called self-crossing. This is
undesirable in the upcoming operators, and self-crossings should be removed
following the rules of the article.
\begin{figure}[h]
\centering
\begin{subfigure}[b]{.4\textwidth}
\includegraphics[width=\textwidth]{fig6-self-crossing-before}
\caption{Bend's baseline is crossing another bend}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.4\textwidth}
\includegraphics[width=\textwidth]{fig6-self-crossing-after}
\caption{Self-crossing removed}
\end{subfigure}
\caption{Originally Figure 6: self-line crossing}
\label{fig:fig6-self-crossing}
\end{figure}
The self-line-crossing may happen not by the neighboring bend, but by any other
bend in the line. For example, the baseline of the bend $(A, B)$ may cross
different bends in between, as depicted in figure~\onpage{fig:ascii-selfcross}.
\begin{figure}[h]
\centering
\begin{BVerbatim}
\ \
B\ | _ __
| | / \ / \
| |___/ \___/A |
\_________________|
\end{BVerbatim}
\caption{A baseline crossing non-neighboring in-between bends}
\label{fig:ascii-selfcross}
\end{figure}
Naively implemented, checking every bend with every bend is costs $O(n^2)$.
It is possible to optimize this step and skip checking some of the bends. Only
bends whose sum of inner angles is $\pi$ can ever self-cross. If the value is
less than $\pi$, it cannot cross other bends. That way, only a fraction of
bends need to be checked.
\subsection{Attributes of a Single Bend}
\textsc{Compactness Index} is "the ratio of the area of the polygon over the
circle whose circumference length is the same as the length of the
circumference of the polygon" \cite{wang1998line}. Given a bend, its
compactness index is calculated as follows:
\begin{enumerate}
\item Construct a polygon by joining first and last vertices of the bend.
\item Calculate area of the polygon $P$.
\item Calculate perimeter of the polygon $u$. The same value is the
circumference of the circle.
\item Given circle's perimeter $u$, circle's area $A$ is:
\[
A = \frac{u^2}{4\pi}
\]
\item Compactness index is $\nicefrac{P}{A}$:
\[
cmp = \frac{P}{A} = \frac{P}{ \frac{u^2}{4\pi} } = \frac{4\pi P}{u^2}
\]
\end{enumerate}
Other than that, once this section is implemented, each bend will have a list
of properties, upon which actions later will be performed.
\subsection{Shape of a Bend}
This section introduces \textsc{adjusted size}, which trivially derives from
\textsc{compactness index} $cmp$ and shape's area $A$:
\[
adjsize = \frac{0.75 A}{cmp}
\]
Adjusted size becomes necessary later to compare bends with each other, and
find out similar ones.
\subsection{Isolated Bend}
Bend itself and its extensions can be described by \textsc{average curvature},
which is \textcquote{wang1998line}{geometrically defined as the ratio of
inflection over the length of a curve.}
\subsection{The Context of a Bend: Isolated and Similar Bends}
To find out whether two bends are similar, they are compared by 3 components:
\begin{enumerate}
\item \textsc{adjusted size}
\item \textsc{compactness index}
\item Baseline length
\end{enumerate}
These 3 components represent a point in the 3-dimensional space, and Euclidean
distance $d$ between those is calculated to differentiate between bends $p$ and
$q$:
\[
d(p,q) = \sqrt{(adjsize_p-adjsize_q)^2 +
(cmp_p-cmp_q)^2 +
(baseline_p-baseline_q)^2}
\]
The smaller the distance $d$, the more similar the bends are.
\subsection{Elimination Operator}
\subsection{Combination Operator}
\subsection{Exaggeration Operator}
\section{Program Implementation}
\section{Results of Experiments}
\section{Conclusions}
\label{sec:conclusions}
\section{Related Work and future suggestions}
\label{sec:related_work}
\printbibliography
\begin{appendices}
\section{Code listings}
\subsection{Reproducing the generalizations in this paper}
We strongly believe in the ability to reproduce the results is critical for any
scientific work. To make it possible for this paper, all source files and
accompanying scripts have been attached to the PDF. To re-generate this
document and its accompanying graphics, run this script (assuming name of
this document is {\tt mj-msc-full.pdf}):
\inputminted[fontsize=\small]{bash}{extract-and-generate}
This was tested on Linux Debian 11 with upstream packages only.
%\subsection{Algorithm code listings}
%\inputminted[fontsize=\small]{postgresql}{wm.sql}
\end{appendices}
\end{document}