Friday, November 7, 2014

Data Normalization, Geocoding, and Error Assessment

Goals and Objectives:

The purpose of this lab was to become familiar normalizing data tables. Data about frac sand mines were obtained from the Wisconsin DNR. The format in which the address was recorded was not consistent. Different address types were listed, while some did not have an address at all. The goal was to normalize the table in order to geocode the locations of the selected mines.

Methods:

Geocoding the mines was a two step process. First the tables had to be normalized. Once they were normalized, geocoding could take place. Many of the addresses had the necessary information for geocoding to be successful the first time. The few locations that did not run successfully required further searching for their locations. Using the PLSS data provided for the location, that was used to locate the general area of the mine. Once the area was located, aerial photographs were used in order to locate the mine a on a map. A street could be located and an address could then be written for that location.

Results:

Each student was given a selected set of mines to geocode. Their mapped mines would then be used to compare their results with the rest of the class. The purpose of this exercise was to show the different variation in where the points would be mapped, based on the data the students received. An emphasis was put on the importance of data quality and standard formatting to eliminate, or reduce error.

The table below (Figure1.1) shows the data table that the students received from the Wisconsin DNR.
This figure illustrates the inconsistency in the format for addresses to the mines. Without normalizing the table, ArcGis would not be able to geocode these different locations. The student had to manually manipulate the data. New fields were created in order to separate the different parts of the address into their own fields.

Figure 1.1 shows the data table before it is normalized. 


 The table below (Figure 2.1) shows the data table after it has been normalized. The address is broken up into separate categories needed in order to geocode the addresses. The address field from Figure 1.1 needed to be broken down. The resulting fields needed were, PLSS, State, City, Zip Code and Street. These new categories allowed for geocoding to occur.

Figure 2.1 shows the data table after normalization has occurred.


Figure 3.1 Shows the final map of frac sand mines.




The map to the right (Figure 3.1) shows the end result of where the other students mines are in location to where my mines were mapped. Since the mines do not completely overlap, it shows how there was error that occurred in the process of normalizing the data table. Since there was not one consistent format, the mines ended up in different locations.







The table below (Figure 4.1) is the end result showing the distance between my mines and the mines that the other students mapped. The point distance tool was used in order to get a distance between the different mines. As the table shows, there was great distance between the different mines. This illustrates just how inaccurate data can be without consistent formats.

Figure 4.1 shows the final table of distances between the mines mapped.

Discussion:

There are two different types of error associated with geographic data. The two types of error are inherent and operational. Inherent error occurs due to the nature of what geographic data represents. There will never be a perfect representation, so there will always be a slight amount of inherent error. Also with this type of error, it can occur when data is transferred between different storage devices as well as changing the coordinate system. This applies directly to this project. The data was stored in decimal degrees for distance. That is not the easiest form to measure distance. The data had to be projected into a new coordinate system. This allowed for the distances to be measured in a more recognizable unit.
The second type of error is operational. These errors typically occur during the collecting or using of geographic data. It is also sometimes called processing errors due to the fact that this error shows itself during while it is being worked with.

Results:

This lab showed the importance of normalizing a data table and proper formatting of the data. If there is not a standard procedure for storing data, it can create problems. People that access the data may not realize how accurate or inaccurate it may be. Without standards to go by, it will be hard to eliminate error or reduce it to a minimum.


Sources:

Wisconsin DNR
Tremealeau County Data



No comments:

Post a Comment