Scale in a Time of Web Maps

Scale has taken on a completely new meaning for me. In my training and early career, scale referred to a conversion measurement indicating a comparison between a measurement on a paper map and a measurement in the real world. The big ‘thing’ about GIS was that it was scale-less; you could zoom in as much as you wanted and the map changes accordingly, amazing!

The word scale for me is now a combination of a number of concepts, some old and some new. The idea of the conversion between a screen measure and a real world measurement is still pertinent. Though in web mapping parlance this term has some-what devolved to the term “zoom” or “zoom-level” which on reflection is a horrible degradation, though usefully user-centric.

In general, the term scale for me is now more about data than it is about display. In web terms when we talk about scale, and we refer to the size of data the enormity of a repository, database or storage engine. If one ‘gets to scale‘ then you receive your badge of honour and its implied you have figured out how to manage ever larger amounts of data and can do something useful with it. Of course the ‘doing something useful with it’ means you typically have competence around display or management.

In web mapping terms scale here can be about how to draw gazillions of features on a map, however, not necessarily how to usefully draw gazillions of points on a map.

Oh, great lots of data, thanks…

At Sparkgeo we have worked with numerous companies who deal with scale regularly. What I have discovered is one of the great conceits of our modern web mapping life:

Just because you can draw a gazillion points on a map, does not at all mean that you should.

In fact, the decision to draw anything on a map needs to take into consideration both the traditional understanding of scale. At what geographic density does it make sense to draw the features on the map? But also balance that with the scale at which the data is relevant. This characterization of data scale having an effect on an analytical outcome has always been a central feature of traditional GIS analysis. In our modern life of geospatial applications, it is very easy to forget we are still applying traditional GIS concepts albeit within a different purview and with new technologies.

As such, its easy to forget that data scale and indeed data quality have a direct impact on the algorithmic quality of whatever we are doing. It took till version 5.6 for MySQL to consider that geographic analysis should be considered beyond the Minimum Bounding Rectangle, we’re only on 5.7 now.

So wait, our technology does matter to scale then? Yes it does, especially when it constrains your data’s ability to be functional at a certain scale, even if it is to meet the demands of scale. If your technology constrains your data’s ability to perform, then your data if defined by your technology. So your scale is limited by your scale.

Yup, its getting pretty murky, I agree.

But instead of clarifying, because frankly there is no clarity here, consider this: what scale is your crowdsourced geospatial data? This question is beautifully complex. Your scale will be determined by a mix of context, application, device, and storage technology variables. Most interestingly, it is this mix within a single data source. An example of this complexity is the ease with which one can programatically to switch from device GPS to GeoIP depending on GPS signal availability. This means within a single table the variation of geographic accuracy is between 5 meters and city / regional. Again, just because you can collect data in this manner, this does not mean you should; there is a significant risk of breaking algorithmic expectations. Your local search is useless if you are basing user location on GeoIP, its not very local.

This variety of seemingly structured but hugely variable data is a new feature of our industry getting to scale. Tread carefully, however. We have enormous opportunity to build geospatial applications which can change lives. It is very easy to get tied up in the joy of solving our scale problem, whilst forgetting that we are trying to capture, manipulate and display that data at entirely the wrong scale.

Old problems are new again.

This post was originally published as a LinkedIn post