Using GIS Cloud With External Databases
The cloud-hosted database that is included with every Map Editor license is very convenient so that your data is kept secure, but access to the actual database server is not available. The limitations imposed by a lack of that access can be circumvented by configuring GIS Cloud to retrieve data directly from a server that you host and manage, such as AWS RDS for PostGIS.
Some reasons why you would go through all this trouble are:
- you don’t want to migrate or mirror an existing spatial database of data to the GIS Cloud database
- your organization uses an existing spatial database for other applications such as QGIS or Carto (perhaps consider the QGIS plugin or the GIS Cloud)
- you need to have direct control over where the data is hosted, such as for compliance reasons
- you will use direct database access to facilitate integration or reporting
- your spatial data has a data model that is too complex to work with GIS Cloud
- you want to use Triggers/Views/Materialized Views to interact with your data
If you are considering working with an external database, this article can provide you with a few hints on making the process easier.
Your first consideration is where your external database will be hosted. Every time two devices need to communicate there is a short delay known as latency. The further away each device is, the more likely you will experience slower performance.
Global GIS Cloud servers are hosted on AWS in the “us-east-1” availability region. Therefore, if you can host your database there, it will have the best chance for low latency because the two servers are “close” to each other.
This diagram shows the impact that path distance can have on a user.
If your existing database is stored on-premise, then you may wish to consider hosting a mirror on AWS to shorten the path that GIS Cloud needs to take to retrieve the data while maintaining your “primary” database on-premise. Rather than this, it may be better to mirror your data directly to GIS Cloud’s database for optimal performance.
If you need to host your data elsewhere for compliance reasons, check with GIS Cloud as there are some locally deployed instances in other countries.
We were able to improve the MDC access speed by a factor of 10 by hosting the data in the same AWS availability zone as GIS Cloud servers.
When GIS Cloud accesses tables via an external connection, it will try to treat the tables the same as tables in the GIS Cloud database. Because there is no way to know the details about how your table is designed in GIS Cloud, it is better to copy the table structure that GIS Cloud uses so that you do not experience any unusual behavior.
The first consideration is recognizing that geometry columns should be created as one of the primary types (point, linestring, polygon, or the multi versions of these). Similar to most GIS applications, GIS Cloud layers are limited to a single geometry type per layer.
Following from this would be setting the name of your geometry column. If your geometry column is using, say the PostGIS default of “geom”, the column will show up in your layer definition as if it were an attribute. If you use “__geometry” as the name, this will be hidden automatically by GIS Cloud as it recognizes it as the geometry.
Once you have taken care of geometry, there are a few other columns to consider for your database table. The most important is the Primary Key which is how databases look up records efficiently. This column should be named “ogc_fid” to hide it from the user automatically. Also, GIS Cloud defers the generation of unique feature IDs to the database so if you plan to collect new features, the Primary Key must autoincrementing (e.g. serial type or as appropriate to your SQL dialect) and the primary key must not be a composite of multiple columns.
Next is choosing data types. The GIS Cloud REST API and forms are quite permissive with respect to what kind of data they accept. What I mean by this is that if you pass a number to GIS Cloud and the destination is a string, it will coerce the value to match, minimizing errors. However, when configuring your table schema you may be tempted to be very strict with your column types. This can cause unexpected conflicts as the GIS Cloud API has done its best to work with your data, but the database has rejected the value entered into a Mobile Data Collection form.
A classic example of this might be a datetime which can take manydifferent formats, but GIS Cloud will happily work with dates as strings which eliminates the need to debug mystery errors.
Related to data types is the question of adding a “NOT NULL” constraint to your columns. This is often sensible in schema design, but can result in unexpected failures if your GIS Cloud form attached to the layer doesn’t align with your database constraints. To minimize unexpected issues, it is better to impose required fields at the Form level rather than at the Database level.
Finally, the most critical table requirement if you intend to collect new features into your external table. GIS Cloud is not able to easily determine the Primary Key requirements from your external table and therefore adopts a hands-off approach. Therefore, your Primary Key column should be defined as auto-incrementing/generating eliminating as appropriate to your SQL dialect.
Triggers/Views/Materialized Views, oh my!
GIS Cloud supports the use of Views to access spatial tables which can be a great way to “flatten” a more complex data model using a set of SELECT JOIN ON statements. This means you might have one spatial table that has a Foreign Key to join to another SQL table, that itself might have a Foreign Key to join to another table; with a view you can create one “table” that has columns from all three tables to display in GIS Cloud.
PostgreSQL, which is the basis of PostGIS, has a somewhat unique feature of Materialized Views. These are just like a View but the result of the SQL query is cached within the database and must be manually triggered to update. The advantage of this approach is that you can define spatial and non-spatial indexes on the View results which can speed up searching/retrieval.
It also offers a way to separate a “working” copy of your GIS data and a “published” copy. However, GIS Cloud doesn’t support these tables yet, so to get around this you need to create a View that selects from the Materialized View.
Lastly, a classic feature of databases is Triggers. If you have your own database then you have no restriction in applying triggers. But do be careful to consider potential performance impacts as GIS Cloud still needs to read and write to your database and will not be aware of these triggers. Anything that delays responding to a SELECT or UPDATE statement may negatively impact application performance.
Jon Stanger is the Head Solution Architect at Spatial Partners
At Spatial Partners we discover and deliver full solutions for any complicated data challenge. We are your problem-solving partner, and we thrive on complexity.