Thursday, December 19, 2013

Lidar data for Tiger Mountain, King County

An orienteering map is in thinking for Tiger Mountain "middle earth" for homebrew oc, but a major pain is LIDAR/elevation data available for the region. King County does have lidar data which is available to general public in some form, but it is old (like 2001) and for some reason it is extremely horrible, especially when under tree cover. This blog post is to serve as an explanation to what is "unusable lidar data" and why lidar data is not always perfect, especially in King County.

How LIDAR data works

If you know what is LIDAR and how it works, skip over. If you don't, this probably explains it in yet another way and most likely includes some imprecision. Ask in comments if something is unclear. 

A plane with extremely precisely known position shoots a laser ray against earth for a tiny impulse. It tries to note when it sees the "red" point on the ground. Sometimes it hits trees, sometimes it hits bushes, sometimes houses, rocks, but sometimes bare earth - these all are called reflections. Sometimes the "red" laser spot is seen on both a tree and on some bushes, these are called first and second reflections. 

The machinery notes when it saw the "red" dot on the ground and how it was acquired. It notes its planes position, angle of the laser beam, intensity of reflection. From this data, one can generate - (x,y,z) data - where x, y - coordinate on ground, z - some altitude. 

The generated data is LARGE. For example - some data sets contain 1 point per square meter. That yields 1 million points per square kilometer. If described in (x,y,z) in a somewhat efficient format - it would yield 12 bytes per triplet and 12 megabytes or so per 1 square kilometer. In an in-efficient format (like human readable text) - it could be as much as 30 bytes per point and 30 megabytes per square kilometer. 

From this cloud of data then by some clever algorithms, different points are picked - for example, bare earth points - ones which represent surface of earth with trees and rocks and houses removed. In some cases this is hard or to do. Especially in data sets where points are few. Another interesting point set is the top surface data. Subtract bare earth from that and you get vegetation, rock and building altitudes. All useful for mapping. 

King county LIDAR data

Lidar data in King County can be downloaded in several places, but as far as I can see it is of the same origin. The easiest place to download it is this: http://viewer.nationalmap.gov/viewer/ . Play around a bit with it. It can provide quite a few different data sets. It does not call the most precise LIDAR data, but you can find it by looking for the most dense altitude data.

King county LIDAR data can be also downloaded from Puget Sound LIDAR consortium. They also share full return data (which includes all reflections seen by the plane) and some more.

Tiger mountain data

The area we are interested in is here: http://www.gmap-pedometer.com/?r=6160161
It is a bit less than half a square of a kilometer.
This is the reference of it in the old USGS topomap:
Note the nice 20 feet contours, the road, the shape of the hills. The contours here are questionable a bit - they are acquired by analyzing areal photographs. Did the magician who created them really saw this?

This first "LIDAR data" image shows the individual bare earth points as vertices. The rest are approximations of the points. This data is downloaded from the USGS website mentioned above. The exact file name - ned19_n47x50_w122x00_wa_puget_sound_2000.zip.

You can clearly see the road. The algorithm on the LIDAR original data has identified the points correctly as bare earth and the road is cleanly seen. The rest are some large triangulation planes, which are completely unhelpful. Why? Well, this is how 5 meter contours would look like on it:
OMG. What the hell is that? You can, of course smoothen them or what not, but the hills are gone. There is nothing really that matches to what was seen. The LIDAR data presented here is USELESS for orienteering contours.

Can we get better data?

One solution would be to re-fly this area and probably get better data since the LIDAR technology is 10 years older now than when this data was created. This is unlikely to happen unless someone with money cares. I just don't see it happening. 

A solution which I have been thinking of is looking at the original data and trying to get out more information from there. There are two thoughts why it might work: 
1) the algorithms were unlikely sophisticated. This was close to the beginning of LIDAR when the data was acquired. 
2) When the data was acquired, the projects included both cities and forests. I find it unlikely that an algorithm good for towns is also usable for tree covers. 
3) my home computer power probably is a good match to what the company had 10 years ago. Also I have lots of time and somewhat limited data I care about. 

Here is a picture of All-returns data (trees, rocks, grass - everything):
There are lots of points there (around half a million)! There are also some holes (and I have no idea why I don't have any data in that square there, but I just did not get it). 
The question is how we can get rid of some of them in a way that still gives us something meaningful. 
Just for kicks I generated 5 meter contour lines, but I will not share the image - they were way too dense to have any meaningful value. And it makes sense - after all we are looking at individual trees here. 
One interesting notice - in the lower part the density of the points is certainly larger. It could be that different original datasets were used (could be the case the LIDAR data was acquired in several steps). 

Experiments

The original data is in feet, in State Plane Coordinate System projection, Washington North Zone, NAD83 datum. We will use feet for this chapter. 

Experiment 1

Mininum values in 10x10 feet cells. The original data gives as points approximately 1 per 10 square feet, so this is an approximation which will give us 1 point from 10 original ones. Approximately. 
Data set for the small rectangle gets some 13 times smaller (15 to 1.2 megabytes) 
Does not help. The minimum points with their precise value is not helpful. 

Experiment 2

How about 100x100 feet minimums points? Now this translates to 30 meter grid, which is worse than the USGS topo map (which I don't trust too much - we have been in nature there and the small detail in it is close to imaginary).
There are very few points there, but it is extremely likely that the points found there are actually real points that exist in the nature. 

Experiment 3

How about we grow from experiment 2 by adding points which belong to the same cells, but do not happen to be much higher than the minimum point. For example, if the point is 100 feet it is unlikely it will be more than some 50 feet higher. So I experiment with ~30% maximum grade. To be pricese - dh*dh*10<dx*dx+dy*dy for all pairs of points in each cell. Note that the lower part starts looking very nice.


How about dh*dh*20<dx*dx+dy*dy? 


Not much of difference, but could it be somewhat usable?
I will merge this data with the original data from the place. This will fill holes and probably provide some other points.  

Update

Google has started to use some lidar data (some time last year) in the Tiger Mt. region: google maps link. The same horrible source!!!