visualizing a month of lightning

23 Mar 2015

Weather Decision Technologies has access to some fascinating weather data. One of those datasets that I enjoy playing around with is lightning. We ingest around 2,000 lightning strikes per minute. While the processing of that data is interesting, I wanted to attempt to visualize the sheer volume of lightning strikes. I pulled down an archive of lightning strikes from May 2013 and created a map of 80,305,421 lightning strikes

In October of last year, MapBox open sourced a tool called tippecanoe. Tippecanoe builds vector tilesets from large GeoJSON features. This was one of the tools behind the most detailed tweet map ever.

So I have a large dataset of lightning strikes combined with a tool that takes large GeoJSON features. Match made in heaven, right? Our lightning provider has an archive available to us broken out in gzipped csv files. The size of these daily archive files range from around 200MB - 2GB. I pulled down a couple of month’s worth of archive files (~50GB) and started to tinker.

My first step was to take those files and create some GeoJSON. I took the quick and dirty approach. So I brute forced converted ~180M strikes from compressed CSV to single day GeoJSON files. The GeoJSON was not valid (to spec), but tippecanoe requires an array of features.

So here comes the work of tippecanoe. There are some other options that are available for usage. The following command is what I settled on to create this dataset:

1
cat json/201305* | tippecanoe -z15 -g2 -o may-2013-ltg.mbtiles

So the waiting game began. It doesn’t take too long (5 or 6 hours) to process with 8GB of memory, but of course the more you have, the faster it is. I would not recommend limping in using 4GB of memory (it’s just not going to happen).

So that worked, and I tweeted out a picture of it:

But there were some weird things that I couldn’t get past. The concentration of the data was in the US (midwest). This was not the only area in the world that had a large amount of lightning strikes during May, 2013. Tippecanoe’s progressive disclosure was too aggressive for non-concentrated areas of lightning.

ltg-diff

I started tweaking the settings and ran a single day (~4m strikes) through tippecanoe to make sure I could correct the problem. I landed on a pretty good combination to allow the features to still come out at low zoom levels while not going over some limits that MapBox has put into place (mbtile size limit, number of features within a vector tile, number of composite datasetsets to name a few).

1
cat json/201305* | tippecanoe -z14 -g3 -r1.25 -X -o may-2013-ltg.mbtiles

Now I have a huge .mbtiles file sitting here that needs to get itself to MapBox. The folks at MapBox created a nice tool to help me along my way: mapbox-upload. MapBox built out a nice javascript and cli interface to use.

1
export MapboxAccessToken={sk.secret-key-from-your-mb-account}

1
mapbox-upload {mb-account-name}.{mapid} {filename}.mbtiles

After that, I moved on to styling. In taking Eric Fischer’s advice from his tweet map blog post, I used the colorize-alpha image-filter. Then it was just small tweaks until I was happy with the output for each zoom level. My weekend involved a lot of this view:

ltg-studio-style

Once it was all said and done, it made for a beautiful map to highlight the scale of just one single dataset that Weather Decision Technologies provides to it’s customers. If you’re interested in lightning or any other weather data, hit Renny Vandewege up at rvandewege@wdtinc.com

lightning in west Africa: null-island-ltg lightning in the Mediterranean: med-ltg lightning in Southeast Asia: se-asia-ltg