Data Quality - Garbage In Garbage Out GIGO

Zinhle35 16 views 11 slides Mar 01, 2025
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

A presentation about GIS data quality


Slide Content

GiGo “Fast is fine, but accuracy is everything.”

Data Quality: How good is our data ? Importance of Data Quality

How Good is our Data? Scale ratio of distance on a map to the equivalent distance on the earth's surface Primarily an output issue; at what scale do I wish to display? Precision or Resolution the exactness of measurement or description Determined by input; can output at lower (but not higher) resolution Accuracy the degree of correspondence between data and the real world Fundamentally controlled by the quality of the input Lineage The original sources for the data and the processing steps it has undergone Currency the degree to which data represents the world at the present moment in time Documentation or Metadata data about data: recording all of the above Standards Common or “agreed-to” ways of doing things Data built to standards is more valuable since it’s more easily shareable

Accuracy Positional Accuracy (sometimes called Quantitative accuracy) Spatial horizontal accuracy: distance from true location vertical accuracy: difference from true height Temporal Difference from actual time and/or date Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf. a feature is what the GIS/map purports it to be a railroad is a railroad, and not a road A soil sample agrees with the type mapped Completeness-- the reliability concept from experimental design/stat. inf. Are all instances of a feature the GIS/map claims to include, in fact, there? Partially a function of the criteria for including features: when does a road become a track? Simply put, how much data is missing? Logical Consistency: The presence of contradictory relationships in the database Non-Spatial Some crimes recorded at place of occurrence, others at place where report taken Data for one country is for 2000, for another its for 2001 Annual data series not taken on same day/month etc. (sometimes called lineage error) Data uses different source or estimation technique for different years (again, lineage) Spatial Overshoots and gaps in road networks or parcel polygons

5 Sources of Error Error is the inverse of accuracy. It is a discrepancy between the coded and actual values. Sources Inherent instability of the phenomena itself E.g. Random variation of most phenomena (e.g. leaf size) Measurement E.g. surveyor or instrument error Model used to represent data E.g. choice of spheroid, or classification systems Data encoding and entry E.g. keying or digitizing errors Data processing E.g. single versus double precision; algorithms used Propagation or cascading from one data set to another E.g. using inaccurate layer as source for another layer Example for Positional Accuracy choice of spheroid and datum choice of map projection and its parameters accuracy of measured locations (surveying) of features on earth media stability (stretching ,folding, wrinkling of maps, photos) human drafting, digitizing or interpretation error resolution &/or accuracy of drafting/ digitising equipment Thinnest visible line: 0.1-0.2 millimeters At scale of 1:20,000 = 6.5 - 12.8 feet (20,000 x 0.2 = 4,000mm = 4m = 12.8 feet) registration accuracy of tics machine precision: coordinate rounding error in storage and manipulation other unknown

6 Currency: Is my data “up-to-date”? data is always relative to a specific point in time, which must be documented. there are important applications for historical data (e.g. analysing trends), so don’t necessarily trash old data “current” data requires a specific plan for on-going maintenance may be continuous, or at pre-defined points in time. otherwise, data becomes outdated very quickly currency is not really an independent quality dimension; it is simply a factor contributing to lack of accuracy regarding consistency: some GIS features do not match those in the real world today completeness: some real world features are missing from the GIS database Many organisations spend substantial amounts acquiring a data set without giving any thought to how it will be maintained .

Standards: common “agreed-to” ways of doing things May exist for: Data itself [including process (the way it’s produced) and product (the outcome)] Utilities Data Content Standard , FGDC-STD-010-2000   Accuracy of data Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD-007.3-1998  Documentation about the data ( metadata ) Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998   Transfer of data and its documentation Spatial Data Transfer Standard (SDTS), FGDC-STD-002 For symbology and presentation Digital Geologic Map Symbolization   May address: Content ( what is recorded) Format ( how it’s recorded: file format, . tif , shapefile, etc ) May be a product of: An organization’s internal actions [ private or organization standards] An external government body (Federal Geographic Data Committee) or third sector body (Open GIS Consortium) [ public or de jure standards] Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [ industry or de facto standards] http://www.fgdc.gov/standards/standards.html

8 Adopting Standards: What you should do Data quality achieved by adoption and use of standards: Do it! Common ways of doing things essential for using & sharing data internally and externally only federal agencies required to use FGDC standards, its optional for any others (e.g. state, local) power of feds often results in adoption by everybody, although there are some noted failures (e.g. The OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn) FGDC or ISO standards provide excellent starting point for local standards, and should be adopted unless there are compelling reasons otherwise Standards for metadata (“documenting your data”) are the most important and should be first priority. Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998 ISO Document 19115 Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata—Implementation Specification, (format for storing ISO 19115 metadata in XML format) If not one of these standard for metadata, adopt some standard!

Importance of Data Quality - Water Utilities Data Access – decision making Data Integration – customer information, billing, hydraulic modelling Data Usage Data Content

QA/QC Do not confuse the two Introduced into Workflows Ensure proactiveness

Quality Assurance Techniques Geodatabase ( multiuser ) – pros over shapefile Encapsulate data and business rules (prevent editing mistakes e.g. not allowing invalid attributes, network connectivity and relationships (topology)
Tags