Avalanche Data Part I: Collecting and Analyzing Public Information for Patterns

It has been a goal of mine for a while to collect and analyze publicly available avalanche data to discover patterns and raise awareness. My home state of Colorado is a disastrous combination of climate, tourists, newcomers, and testosterone junkies with a varying degree of IQ who perform little to know thought before jumping on our Continental slopes. The result can be 20 fatalities each winter season. While the availability of public information is appauling, I did manage to wrangle together a large database with hundreds of incidents, millions of weather records, and a variety of locations across many different states.

As of today, this data is available via post or by visiting my website, be kind as I will ban people bogging down my small amount of resources and may even retaliate. Use wireshark or Firebug to decipher the request format.The port will hopefully go away once I set up Apache, port forwarding is not allowed by my hosting service and I needed a bizzarre install of Tomcat that is missing the configuration file with authbind.

My goals for this project were simple, establish an automated network for data collection, ETL, and the combination of the data which is placed in a relational database. That database is then analyzed using a set of open source tools and custom code for statistical analysis from Apache Commons Math for clustering and some analysis.

Attributes I Needed and What I Found

I wished for prediction so I needed everything from crystal type to weather patterns. Avalanche, crown, base layer type, depth, path size, destructive force, terrain traps, and a variety of other factors are important. Regression testing on what I did receive showed previous storms,terrain traps, and the base layer to be the most important factors for size and destructive force.

However, this data was dirty, not cleanable with great expense, and somewhat unreliable. Only two sites reliably reported crown depth, width, and even base layer. Agencies are likely not forthcoming with this information since it relates directly to sales revenue.

Only the weather data, which can be acquired from many government sites was forthcoming.

Web Framework

I decided to use Tomcat as the web framework to deploy my WAR. This is only my second attempt at Spring. Tomcat is an incredibly fast framework as evidenced by my site. Spring is an incredibly decent framework for handling requests, requiring much less code when set up than most other frameworks. In particular, the Request handling is of significant benefit. Post requests are handled with GSon.

Below is a basic request mapping:

        @RequestMapping(value = "/", method = RequestMethod.GET)
	public String home(Locale locale, Model model) {
		//The Request Mapping
		ServletRequestAttributes requestAttributes = ((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes());
		String ip=requestAttributes.getRequest().getRemoteAddr();
		
		//my ip should be all zeros for local host at the moment and so I need to change it 
		if(ip.contains("0:0:0:0:0:0:0:1")||ip.contains("127.0.0.1")){
			//ip detection
		}
		
		
	
		ClassPathXmlApplicationContext ctx=new ClassPathXmlApplicationContext("avy.xml");
		
		GeoDBPath gpath=(GeoDBPath) ctx.getBean("GPS");
		
		GeoServicesDB geodb=GeoServicesDB.getInstance(gpath.getFpath());
		ArrayList coords=geodb.getGPS(ip);
		
		double latitude=coords.get(0);
		double longitude=coords.get(1);
		
		AvyDAO avyobjs=(AvyDAO) ctx.getBean("AvyDAO");
				
		
		List avs=avyobjs.findAll(latitude,longitude);
		String outhtml=""; 
                //table head 
                  int i=0; 
                  for(Avalanche av: avs){ 
                    if((i%2)==0)
                   { 
                     //my table 
                    } else{ 
                     //my table 
                    } i++; 
               } 
               //end table 
               model.addAttribute("avyTable",outhtml.replaceAll("null","")); return "home"; 
          }

The Tools That Worked

Standard deviations, elementary statistics, and other basic statistics are handle=able using custom code. Fast clustering algorithms and more complex math that can be made more efficient is completed well with Apache’s common math.

Clustering is of particular interest. Commons math does not have affinity propagation but does have a quick k-means clusterer, a downer for wanting to discover patterns without known relations. However, the relations can be estimated using sqrt(n/2) centroids. This is the number affinity propagation often chooses. With this method, it is possible to obtain decent relations in the time taken to process a simple post request.

The Collection Process

Data collection resulted in an automated set of interrelated scrapers,ETL processes, and triggers. Scrapers were set up for nearly every reporting site available. This meant that the American North West, Alaska, California, and British Columbia were the only sites available for collection. The Colorado Avalanche Information Center and Utah’s avalanche center were the best in terms of data with Utah providing a wide range of attributes. This data was fed to weather collecting scrapers and finally to an ETL process. I wrapped the entire process in a Spring program and configuration file.

The Final Product and v2.0

My final product is a site that delivers reports on incidents, weather, and other key factors as well as the opportunity to cluster what little complete data there is in your region. A heat map and google map show the incident locations. I will hopefully include relevant date filters and eventually graphs and predictions as the data grows stronger and more numerous. Weather is currently averaged from two weeks before an avalanche event. However, this could grow to accommodate historical patterns. Weather data may be the only solidly available data at the present time and so will likely happen sooner than predictive features.

A Plea for Better Data From Avalanche Centers and Why No Predictions are Available

In the end, I was appauled by the lack of data. People die because they know nothing of the conditions generating avalanches. I myself have felt the force of a concussion wave rippling my clothes from half a mile away. This must end. Selling data should not take precedence over safety. My site suffers at the moment from poor reporting, a lack of publicly available clean data, and the result of simple mis-reportings not caught in ETL. My data set actually shrank in cleaning from 1000 to 300 avalanches across the entire NorthWest.

Still, weather data was incredibly public. The National Resource Conservation Service, which sells data, is a powerful tool when placed in combination with the National Atmospheric and Oceanic Society and Air Force weather station data sets.

Overall, I can only provide for public clustering because of this poor data. Over time, this may change as clusters become more distinct and patterns and predictions more powerful. However, I would feel personally responsible for someone’s untimely end at this point. I have tried running multiple regression on this topic before but the results were dismal. While better than 2011, data sets still need improvement.

The Future

I have no intention of stopping collection and will document my development work on the project here. I also plan to document any attempts to develop a device that uses the data it collects and my weather and/or other data to make predictions on the snowpack.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s