Share:

It Began with the Sea Kayakers

Whilst the modern day paddlesports community in Switzerland embraces a wide variety of paddlecraft types – everything from the Stand Up Paddleboards, though Packrafts to White Water Kayaks – all paddlesports can trace their roots to ancient hunting canoes, and in particular to the ocean-going kayaks of Inuit people. So it’s perhaps fitting that this project too began with the sea kayakers!

Sea kayaking has grown in popularity in Switzerland in recent years, as it overcomes the cultural stigma of being historically labelled “a sport for old people”, leading to a roundtable meeting of paddlers active in sea kayaking circles from across Switzerland in the summer of 2021, where one significant impediment to the development of sea kayaking, and indeed all paddlesports, within Switzerland was identified: a lack of logistical information.

Whilst many people, and even the governing body Swiss Canoe, had previously built websites to track the geospatial data needed to plan paddlesports activities, no solution has seen widespread adoption. Many people knew a lot about the paddling environment around the country, however no-one had ever tried to consolidate it all in one place.

So I volunteered to attempt this! Which, somehow, reminded me of this xkcd classic:

Standards

A fool’s errand? Perhaps, but nothing ventured, nothing gained. Besides, Cloudreach kindly agreed to donate some of my working time to the project!

Problem Analysis: Why have other solutions floundered?

When one is foolhardy enough to try where others have fallen, it’s perhaps wise to start by investigating why the previous attempts were unsuccessful. In reviewing the plethora of prior Swiss paddlesports mapping websites, I concluded that there were three primary reasons for their limited usage:

  • Limited information: previous attempts tended to focus on documenting launching spots; places where a paddler can launch their craft from. Whilst valuable information, this alone is not sufficient to plan a day out on the water. Paddlers need a range of other logistical, and crucially safety & navigational, information.
  • Poor visual accessibility: all existing solutions utilised a standard base map from providers such as Google Maps or OpenStreetMap (OSM). The visual presentation of these maps is, naturally, designed for the most common use cases, which tends to be people navigating in a car or other road vehicle. As cars generally don’t float, these visual styles de-emphasised waterways while highlighting a range of information irrelevant to paddlers, leading to a cluttered visual presentation ill suited to the use case.
  • Inflexible data model: being based on mapping tools from Google and OSM, previous attempts were largely limited to the “markers-on-a-map” data model provided by these services. Users were, essentially, shown a semi-interactive picture of a map with information embedded within it. This data model inherently limited what these solutions could achieve, and was perhaps the main blocker for capturing additional geospatial data and presenting it with a wider range interactivity.

Overall, I concluded that this project needed to eschew the conventional mapping tools and seek out alternative approaches to building a Geospatial Information System (GIS).

Data Modelling: What should the data contain?

Now that I’d decided not to use the standard online mapping tools a major question had to be answered; what data, exactly, needs to be stored and visualised?

It was immediately apparent that a key change from previous solutions would be to introduce a relational data model, where the connections between different pieces of information could be modelled and queried, however a full blown RDBMS would have blown the budget of exactly CHF 0. So I settled on using a GraphQL-based headless Content Management System (CMS) called GraphCMS.

GraphQL would allow for simple, single queries to traverse the relationships between entities and retrieve all data related to a subject. For example, by retrieving all of the information related to a given lake or river in one consolidated dataset, even where that data was spread across multiple entity types and multiple layers of relationships. This would be a key enabler of frontend design and implementation later on.

Why a Headless CMS?

Choosing a headless CMS offered distinct advantages to this project, when compared to either a SQL database or a conventional CMS.

A conventional CMS, such as WordPress, combines both the content and its presentation into a single entity. Effectively you’re managing pages of a website directly, by selecting a page layouts and storing the content in that layout. This is great where the people maintaining the website aren’t programmers, however for my purposes it limited the flexibility of presentation too much.

I felt certain that I’d need to iterate on the data visualisation and page layouts over time, as it seemed unlikely I’d nail the perfect layout on the first go. Additionally it was clear that the same data would need to be reused in multiple pages, for example the detail pages for a particular launching spot and for the related waterway would both need to display the same information about protected areas, like nature reserves, where watercraft are prohibited. Consequently, any conventional CMS would be unsuitable.

SQL databases are great for flexibility in data modeling, and naturally support relational data, however they also come with a lot of baggage. The database server needs to be maintained to some degree, even with PaaS options such as AWS RDS, a major issue for a community project with no funding!

Furthermore the query model of SQL was suboptimal for this use case. Retrieving relational data via SQL requires a multi-query process of first retrieving the foreign key from the first table, then looking up the record with that key in the related table. Whilst these steps can be nested as subqueries within a larger query, this approach would offer no obvious advantages for this project over a more graph-like query approach, where connections between entities can be traversed without the need for foreign key lookups, whilst adding complexity to the queries.

GraphCMS offered a one more major plus; it would enable the project to provide a public GraphQL API for the data, unlocking it from whatever app this project created and affording other members of the paddlesports community the opportunity to reuse and remix the data to suit their needs. Think the Paddel Buch site’s terrible? Great! Here’s all the data, please feel free to build something better!

Designing the Data Model

With GraphQL selected as the query language, it was time to define the data model. Considering the most common context within which paddlers ask logistical questions is either a lake or a river – a waterway – it was clear that waterways should be the central “hub” of the data model. All other entity types should be related to one or more waterways to best answer the most common questions paddlers may ask of the data.

Additionally, I decided that every entity should have a URL slug. This is a URL-safe string which can be used to build URLs for unique pages for each item. Whilst some entity types were unlikely to have unique pages for each item, I decided that the low effort addition of unique URL slugs would open up more possibilities for website structures in the future. It was also consistent with the ethos of public GraphQL API; it would enable others to build unique pages for any entries they wished, even if I couldn’t see a use for that myself!

For each waterway there was a clear set of entities that needed to be included within the model:

  • Spots – the actual launching spots, from where paddlers can enter and exit the waterway, or just stop for a picnic
  • Protected Areas – the areas where watercraft are prohibited, such as nature reserves, industrial areas, and designated swimming areas
  • Obstacles – natural or artificial objects, mainly on rivers, which block the passage of paddlecraft, such as dams
  • Paddling Environments – the type of environment which the paddler would be entering, such as lakes, rivers, and white water, as this plays a major role in planning a safe journey

With these entity types identified, the following data model quickly took shape:

Custom Serverless Geospatial App

This model covered all of the entities identified whilst allowing for entries for any of them to be easily queried via their related waterway. However for anyone who’s designed a data model before, one question may immediately jump to mind; why are the “Type” entities, entities instead of enumerations?

The answer lies in one other, major advantage of GraphCMS for this project; localisation. Switzerland is a highly linguistically diverse country, with four official languages and many regional variations of these in daily use. Providing the information in multiple languages would be essential to the project’s long term success.

GraphCMS supports localising entries on a field-by-field basis, perfect for the requirements. However it doesn’t support localisation of enumerations. With the names of the “types” also needing to be presented to the user in their own language, this led naturally to using entities instead of enumerations.

This design would be the foundation for addressing two of the three issues identified during the problem analysis; the limited range of information available and inflexibility of the data models previously used.

Data Engineering: Where’s all the information going to come from?

Of course any data model is pretty much worthless without the data to fill it! Whilst many members of the paddlesports community, myself included, had kept notes of launching spots we’d used, other types of data would be a little harder to acquire.

In order for the app, and more broadly the public API, to offer the long term value I was aiming for the data needed to be genuinely geospatial in nature. That is to say, the CMS couldn’t just record that a particular lake existed, it had to also accurately know where, precisely, that lake was, it’s size, and shape. This geospatial data would permit the real world geographic relationships between entries to be derived in the future, such as looking at whether one item is geospatially proximate to another.

I’d already selected GeoJSON as the geospatial data format for its broad compatibility with development languages and tools, now I needed an initial source of GeoJSON data for waterways, obstacles, and protected area entity types. For this I turned to OSM’s Overpass API, which allows for geospatial data to be programmatically extracted from OSM.

Using Overpass Turbo I was able to work backwards from OSM’s visual interface to work out the Overpass QL queries to extract large datasets covering all instances of each of these entity types in Switzerland. Manual review showed that these datasets weren’t fully aligned with ground truth, particularly for protected areas and obstacles, however they were way more comprehensive than any of the official data sources I could find. 

So now I simply had to process each of them into records that conformed with my data model, and then into GraphQL mutations against GraphCMS. To handle this data engineering I went with Databricks Community Edition, firstly because I already had Databricks and PySpark experience from my work at Cloudreach, but also because it’s price tag of “free” fit with my budget of “zero”!

Along the way I encountered a couple of unexpected challenges…

Unique Naming is Hard

Did you know that Switzerland has, according to OSM, at least 8 lakes named “Schwarzsee”? Kanton Graubünden alone apparently has 3!

I’d originally planned to derive the URL slugs for each item from the German name of the waterway, obstacle, etc., choosing German both for its wide usage in Switzerland and the fact I could speak it myself (well, a bit!). However this fell apart when, while investigating mysterious errors about daily GraphQL mutations, I realised I had unique items with duplicate names.

Initially I tried to simply add the name of the Kanton (federal state) to the lake or river name, leading to the aforementioned discovery of three Schwarzsees within Kanton Graubünden, so a little more engineering was required.

I ultimately settled on a solution utilising the Nominatim API to perform a reverse geolocation lookup from the first set of coordinates in the waterway’s GeoJSON to get an address, then extracting parts of that address to add to the waterway’s name. The GeoPy library provided a straightforward way to implement this in Python code:

from geopy.geocoders import Nominatim
import time

def get_kanton(row):
 # Wait 1.5 seconds; needed to fit within the Nominatim geocoding API's 1 req/sec rate limit
 time.sleep(1.5)
 # Configure the geolocator object
 geolocator = Nominatim(user_agent="paddelbuch")

 # Extract the first coordinates for the waterway
 coords = row["coordinates"][0][0]

 # Query the Nominatim geocoding API
 geocode_result = geolocator.reverse([coords[1], coords[0]]).raw
 # In Switzerland some addresses return the Kanton name in all four official languages
 # If this is the case, take the German (first) Kanton name
 if "/" in  geocode_result["address"]["state"]:
   kanton = geocode_result["address"]["state"].split("/")[0]
 else:
   kanton = geocode_result["address"]["state"]
 # To address cases where multiple waterways with the same name exist in the same Kanton
 # Add a prefix to the Kanton name based on what's available in the geocoding results
 if "postcode" in geocode_result["address"]:
   prefix = geocode_result["address"]["postcode"]
 elif "city" in geocode_result["address"]:
   prefix = geocode_result["address"]["city"]
   prefix +=","
 elif "village" in geocode_result["address"]:
   prefix = geocode_result["address"]["village"]
   prefix +=","

 # Return the processed result
 if prefix:
   return f"{prefix} {kanton}"
 else:
   return kanton

kantonUDF = udf(lambda x, y: get_kanton(x, y))

There were two key steps in using the Nominatim geolocator with GeoPy:

1. To initialise the geolocator with a user agent string before attempting to to call it

2. To implement rate limiting to remain within the API’s 1 request per second limit, which I accomplished by limiting the parallelism of the Spark cluster to “1” and adding a 1.5 second sleep to the start of the function. In a corporate environment this could be overcome by utilising a commercial geocoding API, however for this community project cost was a much bigger concern than speed

Finally, registering the Python function as a Spark User Defined Function (UDF) enabled me to call the function from within a .withColumn() call on Spark dataframes later on.

Spatial Joins can get Messy

As I mentioned earlier, one of the main reasons for building a system with true geospatial data was to enable geographic relationships to be modelled and analysed. This became necessary when handling the obstacle entity type.

Just extracting the obvious obstacles, dams and weirs, through the Overpass API yielded over 800 items, all of which needed to be mapped to a waterway. Naturally this was too big of a job to be done manually, thankfully Geopandas came to the rescue. Geopandas is a project which aims to do for geospatial data in Python what the venerable Pandas tool did for general tabular data, and includes the handy .sjoin() – spatial join – function.

Using .sjoin() I was able to spatially join a dataframe of obstacles to a dataframe of waterways, automating the process of mapping these two entity types:

# Convert coordinates system to metres
gdf_obstacles.to_crs(epsg=3763, inplace=True)

# Generate buffered geometries
gdf_obstacles["match_radius"] = gdf_obstacles.geometry.buffer(200)

# Convert coordinate systems back to lat/long degrees
gdf_obstacles.to_crs(epsg=4326, inplace=True)
gdf_obstacles.set_geometry("match_radius", inplace=True)
gdf_obstacles.to_crs(epsg=4326, inplace=True)

# sjoin the obstacles dataframe to the waterways dataframe, producing a mapping of obstacles to waterways, then clean up the dataframe for export.
gdf_obstacles_mapped = gpd.sjoin(gdf_obstacles, gdf_waterways, how="left", op="intersects")
gdf_obstacles_mapped.drop(columns=["match_radius"], inplace=True)
gdf_obstacles_mapped.set_geometry("geometry", inplace=True)

As the OSM data I used for rivers consisted of just the centre line of each river, rather than data about the positions of the banks, and some obstacle data consisted only of a GeoJSON point, some additional steps were needed to successfully spatially join the datasets.

I determined it would be easiest to add a 200 metre “buffer” to the obstacle data, sufficient to encompass more than half the width of the widest river in Switzerland, the Rhein. To simplify generating the 200 metre buffer I first had to convert the coordinates system for the dataframe to metres, then back to degrees. With this done, it was a simple matter to spatially join the dataframes.

This did create some data quality issues though, which could only be resolved manually. Most notably wherever a dam or weir was near the intersection of two rivers. Even with match radii as low as 50 metres there were a fairly consistent number of duplicates in the joined dataframe, with no clear way to programmatically determine which duplicate represented reality. In the end, the only solution was to manually compare each duplicate to a map to determine which was correct. For me, this illustrated that extracting ground truth from ambiguous data is often a job for human judgement.

Given the goal of modelling geographic relationships between items, spatial joins were heavily used throughout the data engineering phase. 

Frontend Development: How are people going to see all this data?

Having defined a data model and filled the CMS with content, the data was technically already publicly accessible via the GraphQL API. However as most people in the Swiss paddlesports community don’t happen to be fluent in GraphQL, this wasn’t very useful! A website visualising the data was required.

The zero budget reality of the project became a critical driver for the technology selection here. I needed a tech stack that would provide:

  • Minimal, or better yet no, ongoing hosting costs
  • A deployment requiring no ongoing operations effort – a.k.a. “NoOps”
  • A simply and low effort development path, given I’m not a web developer by trade

All of which pointed to a natural conclusion; a statically generated website.

Static Site Generators, or “SSGs”, are a class of tools which generate static frontends from dynamic backends. Instead of the frontend pages being generated from database or CMS queries when a user requests them using server-side code, SSGs pre-generate all possible pages in a single compilation action. This leads to production deployments containing no active server-side components, meaning they can be hosted from simple storage services, like AWS’ Simple Storage Service (a.k.a. S3).

This removes basically all operations effort from the production deployment – it’s just a bunch of static files after all – and eliminates the possibility of a frontend security vulnerability resulting in servers being breached or database content being altered. In my case, this removed the need for any linkage between the CMS and frontend in production.

Selecting an SSG

There are many SSG tools available, and conventionally this would be where we’d talk about performing a detailed analysis comparing each to the project’s requirements and… I didn’t do that! Instead, I made my selection based on the availability of templates and “how to” articles which would get me as close as possible to my goal as fast as possible. And in this vein I started with the great article How to Create a Travel Bucket List Map with Gatsby, React Leaflet, & GraphCMS.

Gatsby is an SSG tool for React which bases its data layer on GraphQL and has a wide range of helpful plugins for common React packages. It also provides a File System Route API which provides a simple way to template pages based on data retrieved via GraphQL. For example, a Gatsby template with the file path:

/src/pages/wasserlaeufe/{graphCmsWaterway.slug}.js

Uses the URL slugs added to each waterway item in GraphCMS to auto-create a unique page for each waterway, such as:

wasserlaeufe/brienzersee

wasserlaeufe/zurichsee

And so forth. This would be vital for the project as by this point I had over 1000 items in the CMS, and with two page languages for each item that would be over 2000 unique pages to create. Far too much to even consider doing by hand! 

Within the template, a simple GraphQL query template is automatically filled out with the URL slug value at compilation time to retrieve all items related to the waterway in one go:

export const pageQuery = graphql`
 query WaterwayPageQuery($slug: String!) {
   graphCmsWaterway(slug: {eq: $slug}) {
     name
     geometry
     spots {
       name
       description {
         raw
       }
       location {
         latitude
         longitude
       }
       spotType {
         slug
       }
       slug
     }
     protectedAreas {
       name
       geometry
       slug
       protectedAreaType {
         name
         slug
       }
       isAreaMarked
     }
     obstacles {
       slug
       portageRoute
       geometry
       name
       description {
         raw
       }
       isPortageNecessary
       isPortagePossible
       obstacleType {
         name
       }
     }
   }
 }
`;

NB: the above code is subject to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

Even as someone without a software engineering or web developer background I found Gatsby to be highly approachable. With the help of the article, I was able to build a rough prototype of the site in just a couple of days!

However the full, first version of the website took about a month’s worth of effort to complete, working alone in short bursts. A fair chunk of this time was spent working out a couple of pesky issues I encountered…

There’s no DOM at Build Time

An early lesson learnt for me was a key way in which the “develop” environment in Gatsby differs from the build time environment.

When you’re developing a website with Gatsby, the gatsby develop command allows you to spin up a local web server which builds pages on the fly, via which the effect of your code changes can be tested. This is important for SSGs in particular as you’re often writing templates which get processed into HTML at build time, making it sometimes challenging to determine if your tweaks are having the desired effect. In this mode, as the pages are being built only when you access them, the full browser Document Object Model (DOM) is available to each page’s Javascript components.

When it comes to deploy a Gatsby website the gatsby build command is used to, unsurprisingly, build the final HTML, CSS, and Javascript for each and every page in the site. This process does not involve a browser though, so during the build no DOM is available to Javascript components in the pages. For many Javascript libraries, this causes them to fail, resulting in the overall gatsby build process failing.

Fortunately this can be easily solved by having Gatsby check if the DOM is available before running the affected code. In my case, I use a helper function that performs this check:

export function isDomAvailable() {
 return (
   typeof window !== "undefined" &&
   !!window.document &&
   !!window.document.createElement
 );
}

NB: the above code is subject to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

Which is then called in the page templates to determine whether or not to execute affected code, such as zooming a map to a specific boundary in Leaflet:

if (isDomAvailable()) {
 const geometry = L.geoJSON(graphCmsWaterway.geometry)
 const mapBounds = geometry.getBounds()
 mapSettings = {
   bounds: mapBounds
 };
}

NB: the above code is subject to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

And with this check in place on all affected pages the build process can complete successfully.

A Simple Little Data Type Disagreement

There’s a great Gatsby plugin for GraphCMS which simplifies the retrieval of items from GraphCMS, most especially where multiple languages are used as it enables the language of the results to be specified for the entire query, rather than for each localised field individually. Initially I built a single-language version of the full site using this plugin, thinking that this would make localisation of the pages nice and easy when the time came.

When the time came, after a little research I selected i18next as the localisation library in no small part due to the neat integration of its Gatsby plugin with the File System Route API. This should also make the auto-generation of multiple language pages for each item straightforward, I thought.

And then the error messages began! Complaints about a data type mismatch on the location and locale fields respectively between the gatsby-plugin-react-i18next plugin and the gatsby-source-graphcms plugin.

These error messages come about in part due to the page queries work in Gatsby. Each template must have a single page query, which must return all of the data needed to build the page. Whilst both plugins accepted a variable to define the language of the page being built, they disagreed about the data type of that variable. I18next expected a simple string, whilst GraphCMS wanted a custom locale data type unique to it’s product. Furthermore, i18next would only fill a single variable, which must be named language within the page query. So to fix this I had to somehow get a single variable to “be” two different data types simultaneously!

That’s impossible of course, and I quickly found that no-one on Stackoverflow, GraphCMS’ Slack channels, or elsewhere online was able to offer much help in solving this conundrum. Eventually however, I happened on to a solution:

exports.createSchemaCustomization = ({
   actions: { createTypes, printTypeDefinitions }
 }) => {
   createTypes(`
   type Locale implements Node {
     language: GraphCMS_Locale
   }
   `);
 };

NB: the above code is subject to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

This code, placed in the gatsby-node.js file, utilises Gatsby’s ability to alter the schema of a data source at build time to change i18next’s schema for the language variable to match GraphCMS’ custom locale data type. This works as at build time Gatsby builds a kind of “virtual GraphQL server” from the data sources – the query layer – and from there can not just query the data, but also customise its schema on the fly without changing the original data sources. This allowed me to make i18next act as if it used the same data type as GraphCMS, and as both data types were, ultimately, just strings, this bypassed the issue entirely.

Designing the Map Style

You may recall that there was one point identified during the problem analysis phase that has yet to be addressed; the visual presentation of the data. With existing solutions using the default OSM and Google Maps styles, the obvious way to differentiate Paddel Buch and improve upon these was to design a custom map style specifically for paddlesports. And for this, I turned to Mapbox.

Mapbox provides a range of mapping, geolocation, and navigation services, including their Mapbox Studio product which allows users to produce custom maps and map styles for use in their websites and applications. I elected to use Mapbox Studio to design a new map style for Paddel Buch, but what should this style be like?

If we look at the commonly used OSM and Google Maps default styles for the Interlaken region in Switzerland, a prime stomping ground for paddlers, we can determine what exactly their issues for paddlesports maps are:

Map design

map

Whilst the Google Maps style is less busy than the OSM one, both include a lot of information which isn’t relevant to paddlesports, such as ski resorts, mountain peaks, differing road types, etc. This has two major impacts on the legibility of these map styles for Paddel Buch’s purpose:

  • The overall “visual clutter” effect makes it harder for the user to identify what information is relevant. It requires the user to expend greater cognitive effort to filter the information in their head, leading to a less enjoyable experience
  • It consumes a lot of colour space, leaving less available for visualising the data we’re actually interested in

The second point is especially important, and perhaps non-obvious. According to data visualisation export Andy Kirk, the maximum number of different colours that can be practically used to differentiate different categories of data in a single visualisation is ten. After this point, the different colours become too derivative -too similar to each other – for people to visually distinguish the categories.

Counting the colours used on the Google Maps example, five distinct colours are used just for text labels. Not counting different shades of the same colour, arguably over ten colours are used in total, leaving no colour space for visualising our own data!

So our map style needs to use far fewer colours, leaving as much space as possible for our data. Considering this, I settled on the following design guidelines for the custom map style:

  • All land shall be a single colour with shading of this colour used to illustrate terrain. This should be a lighter colour to provide greater contrast for data overlays
  • Water shall be a bright blue colour, in order that it “pops” making rivers and lakes highly distinguishable at all zoom levels
  • All road types shall be shown in white. Roads remain important landmarks for navigation and must be shown, however they should not be a visual distraction from the data overlays
  • Exactly two colours will be used for labels; a dark colour for contrast on lighter backgrounds, and a light colour to contrast on darker backgrounds
  • All extraneous information, such as points of interest, public transport, mountain peaks, etc., shall be removed completely

Applying these design guidelines resulted in the following custom Paddel Buch map style:

map

And this is as far as the data visualisation design has progressed between building version 1 of the website and writing this article. Since version 1 was released I’ve been focused on data collection, which has been a superb excuse to take my own sea kayak out on the waterways of Switzerland and establish some ground truth!

Rendering the Maps

With a custom map style designed and a dataset collected to display on it the next logical question was clear; how to render all of this in the visitor’s browser?

I mentioned the popular Leaflet Javascript library a couple of times earlier, and indeed it’s the means I chose to tackle this point. Leaflet has a large and highly active community behind it, as well as strong support for use with both Gatsby and React. Importantly for this project, it also nicely supports displaying GeoJSON data as layers on a map.

In order to display all of the items returned from by the GraphQL page query it was necessary to utilise a .map() call on the returned data structure to iterate through all of the returned items for each entity type. Combining this with a .filter() call to select only the subset of items which matched specific criteria providing the basis for using different visualisations for items with differing characteristics:

{ graphCmsWaterway.obstacles.map(obstacle => {
 const { name, geometry, obstacleType, portageRoute, slug } = obstacle;
 return (
   <div>
     <GeoJSON data={geometry} style={layerStyle.obstacleStyle}>
       <Popup>
         <b>{name}</b>
         <br />{obstacleType.name}
         <p><Link to={`/hindernisse/${slug}`}><Trans>More details</Trans></Link></p>
       </Popup>
       </GeoJSON>
       <GeoJSON data={portageRoute} style={layerStyle.portageStyle}>
         <Popup><b><Trans>Portage route for</Trans> {name}</b></Popup>
       </GeoJSON>
   </div>
   )
})}

{ graphCmsWaterway.spots
 .filter(spot => spot.spotType.slug === "einsteig-aufsteig")
 .map(spot => {
 const { name, location, description, slug } = spot;
 const position = [location.latitude, location.longitude];
 return (
   <Marker key={slug} position={position} icon={(!!spotEinsteigAufsteigIcon) ? spotEinsteigAufsteigIcon : null}>
     {<Popup>
       <b>{name}</b>
       <RichText content={description.raw} />
       <p><Link to={`/einsteigsorte/${slug}`}><Trans>More details</Trans></Link></p>
     </Popup>}
   </Marker>
   );
})}

NB: the above code is subject to the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

The above code example also shows how Leaflet is integrated with React & Gatsby. By utilising tags like <Marker> and  <Popup>, Leaflet maps can be semantically described with the actual Leaflet code being generated by the gatsby build process. The <GeoJSON> tag can accept either a full GeoJSON object, or the simpler GeoJSON geometry data structure. This enabled me to utilise the simpler geometry data structures in GraphCMS items, avoiding the data overhead of a full GeoJSON object.

Additionally, the <Trans> tag also shows how i18next integrates with React & Gatsby in a similar manner. Here the <Trans> tag indicates text which must be translated when localised pages are generated. 

Deployment: How are people going to access this website?

With the first version of the website built it was time to decide on a deployment strategy, ideally one that utilised a Continuous Integration / Continuous Deployment (CI/CD) approach to automate as much of this process as possible and make future development easier. Luckily a ready made and zero budget-friendly option was at hand.

Having built the frontend with Gatsby, using the free tier of Gatsby Cloud was a logical choice. This service provides both a managed CI/CD platform for testing and building Gatsby frontends, and a hosting service.

I integrated Gatsby Cloud with both the GitHub repository for the frontend code and GraphCMS. On the CMS side this ensures that any time a new item is created or an existing one updated the frontend is automatically re-built to incorporate the new data, a critical step for a statically generated website.

In Github, the integration listens for pull requests on the main branch, performing two functions:

  • Performing an automatic integration test when the pull request is created, to verify that the changes don’t cause build failures
  • Rebuilding the production frontend when a pull request is merged, automatically deploying the changes into production

Next came that perennial ops troublemaker, DNS. For this I turned to AWS Route 53, creating a hosted zone and recordsets for the required host names directing traffic towards Gatsby Cloud’s hosting.

Finally, I had to tackle a classic “only occurs in the production environment” problem. I had, of course, secured my Mapbox token with URL restrictions to ensure that it couldn’t be misused. To enforce URL restrictions, Mapbox utilities the referer header of the tile requests, which worked fine in the development environment. However in production the browsers weren’t sending a referer header in the tile requests!

After some digging, I learnt that the Gatsby Cloud hosting set the Referrer-Policy value to same-origin by default. This policy means that the visitor’s browser is directed to only include the referer header in requests to URLs in the site’s own domain. Fortunately, Gatsby provides the handy gatsby-plugin-gatsby-cloud plugin to modify the hosting configuration from within the site’s code at deployment time. Using this plugin the default security headers can be disabled and replaced with custom ones in gatsby-config.js:

{
 resolve: `gatsby-plugin-gatsby-cloud`,
 options: {
   mergeSecurityHeaders: false,
   allPageHeaders: [
     "X-Frame-Options: DENY",
     "X-XSS-Protection: 1; mode=block",
     "X-Content-Type-Options: nosniff",
     "Referrer-Policy: strict-origin-when-cross-origin"
   ]
 }
}

The strict-origin-when-cross-origin referrer policy ensures that the referer header is included in all requests, with only the origin (i.e. the server portion of the URL) included in the header when the request is to a different domain from the site, such as to Mapbox’s tile server.

And… fertig, version 1 was live at www.paddelbuch.ch!

The Big Picture

The overall solution involved pulling together services from six distinct vendors with code and integrations to rapidly build a custom mini Geospatial Information System (GIS) from scratch, as shown in this high-level architectural diagram of the solution:

For me, this neatly illustrates the kind of speed of value creation which can be obtained when we leverage higher-order cloud services, beyond basic Infrastructure-as-a-Service (IaaS) offerings, to create business solutions. Using these Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS) products it is possible for a small team, or even an individual, to deliver at pace solutions that would have been prohibitively difficult to build in the not too distant past.

chris bingham blog

 

This community project illustrates just part of the range of cloud-related services Cloudreach can offer…

————-