Database Structures – Relational, Object Oriented – ER diagram - spatial data models – Raster Data Structures – Raster Data Compression - Vector Data Structures - Raster vs Vector Models TIN and GRID data models - OGC standards - Data Quality.
Size: 7.29 MB
Language: en
Added: Sep 09, 2021
Slides: 137 pages
Slide Content
OCE 552 - Geographic Information System
UNIT II SPATIAL DATA MODELS 9 Database Structures – Relational, Object Oriented – ER diagram - spatial data models – Raster Data Structures – Raster Data Compression - Vector Data Structures - Raster vs Vector Models - TIN and GRID data models - OGC standards - Data Quality.
Database Structures
Database Structures : A geodatabase can be designed for single or multiple users. A single-user database can be a personal geodatabase or a file geodatabase. A personal geodatabase stores data as tables in a Microsoft Access database. A file geodatabase, on the other hand, stores data in many small-sized binary files in a folder.
The geodatabase organizes vector data sets into feature classes and feature datasets. In a geodatabase, feature classes can be standalone feature classes or members of a feature dataset.
The presence of feature attribute and nonspatial data tables means that a GIS requires a database management system (DBMS) to manage these tables. A DBMS is a software package that enables us to build and manipulate a database. A DBMS provides tools for data input, search, retrieval, manipulation, and output. For example, ArcGIS uses Microsoft Access for managing personal geodatabases.
Many GIS packages also have database connection capabilities to access remote databases. This is important for GIS users who routinely access data from centralized databases. For example, GIS users at a ranger district office may regularly retrieve data maintained at the headquarters office of a national forest. This scenario represents a client-server distributed database system.
THE RELATIONAL MODEL A database is a collection of interrelated tables in digital format. At least four types of database designs have been proposed in the literature: Flat file, Hierarchical, Network, and Relational
A flat file contains all data in a large table. A feature attribute table is like a flat file. A hierarchical database organizes its data at different levels and uses only the one-to-many association between levels. A network database builds connections across tables, as shown by the linkages between the tables A common problem with both the hierarchical and the network database designs is that the linkages (i.e., access paths) between tables must be known in advance and built into the database at design time.
GIS packages, both commercial and open source, typically use the relational model for database management. A relational database is a collection of tables, also called relations, that can be connected to each other by keys. A primary key represents one or more attributes whose values can uniquely identify a record in a table. A foreign key is one or more attributes that refer to a primary key in another table.
But in GIS, they often have the same name, such as the feature ID. In that case, the feature ID is also called the common field. In Figure Zonecode is the common field connecting zoning and parcel, and PIN (parcel ID number) is the common field connecting parcel and owner. When used together, the fields can relate zoning and owner.
Normalization Normalization is a process of decomposition, taking a table with all the attribute data and breaking it down into small tables while maintaining the necessary linkages between them. Normalization is designed to achieve the following objectives: • To avoid redundant data in tables • To ensure that attribute data in separate tables can be maintained and updated separately and can be linked whenever necessary • To facilitate a distributed database.
The map shows four land parcels with the PINs of P101,P102, P103, P104 Table 2.1 Unnormalised Table
Table 2.2 First Normalisation
Fig 2.4 Second Normalisation
Fig 2.5 Final Normalised Table
Types of Relationship A relational database may contain four types of relationships or cardinalities between tables or, more precisely, between records in tables: one-to-one, one-to-many, many-to-one, and many-to-many
Four type of data relationship between table
OBJECT ORIENTED DATABASE STRUCTURE : An object-based spatial database is a spatial database that stores the location as objects . The object-based spatial model treats the world as surface littered with recognizable objects (e.g. cities, rivers), which exist independent of their locations. Objects can be simple as polygons and lines, or be more complex to represent cities.
While a field-based data model sees the world as a continuous surface over which features (e.g. elevation) vary, using an object-based spatial database, it is easier to store additional attributes with the objects, such as direction, speed, etc. The geodatabase model supports an object-oriented vector data model. In this model, entities are represented as object with properties, behaviour, and relationships. These object types include simple objects, geographic features (objects with location), network features (objects with geometric integration with other features), annotation features, and other more specialized feature types.
Classes, Methods and Relationships Each data model Object is essentially an instance of a Class. Classes are object oriented constructs which group objects that share the same set of attributes and methods. Methods are the functions that define the interaction of objects to the outside world. In addition to a description for objects, its attributes and behaviors , a data model also explains the relationship between classes.
An example of a class can be a Line feature and one of its instances might be a river . Attribute fields of the river line are an integer identifier, number of line segments and start and end points of each segment. Calculation of total flow volume by using the river dimension attributes will be an example of Method for the river object. In order to account for flow and interactions between each river segment and the watershed, and also to streamline query and storage, definition of (topological) relationships between classes is needed.
The three main relationships between classes that have been implemented in the design of the hydrologic data model are Generalization, Association and Aggregation. A generalization relationship between any two classes means that one of the classes (Child class) is derived from the other (Base class). Association shows the relationship between instances of classes.
Spatial object Class Inheritance Hierarchy
ENTITY RELATIONSHIP MODEL (ER MODEL) The entity relationship (ER) model represents the conceptual design of a database. The ER diagram helps in understanding the components of a database and relationships among them. Entity Record An entity is a real world item that exists on its own. The set of all possible values for an entity is the entity type. For example, a particular student such as ‘ Ravi Kumar’ is an entity record. Student is the entity type in this case. In ER diagram we show entity type as a rectangle containing the type name.
Attribute Properties that describe an entity are known as its attributes. The value of an attribute could be expressed in numbers or in text. In ER diagram attributes are represented by ovals attached to the entity by a line.
Attributes can be classified as: Key attributes: An attribute whose values are distinct for each individual entity record and are used for identifying an individual entity record are known as key attributes. For example in the student entity type, StudentID is the key attribute since no two students can have same StudentID . A key attribute is underlined in ER diagram.
Non-key attributes : Attributes that are not unique but are used to describe the entities are known as non-key attributes. Names, age, address of a student are the non key attributes. Simple : Attributes that can’t be divided into subparts are called simple attributes. For example StudentID which is just a number is a simple attribute. Composite : Attributes that can be divided into subparts with each subpart having their own independent meaning are composite attributes. For example Name of a student can be divided into two parts i.e. first name and last name . This could be illustrated by branching off the components of the attribute.
Single valued: Attributes that can hold only single value at a time are called single valued attributes. Age of a student can’t have more than one value and hence it is a single valued attribute. Multiple valued: Attributes that can have more than one value are called multiple valued attributes. For example the contact number of a student can have two or more than two phone numbers. A multi valued attribute is shown as:
Derived attributes: The attributes that are derived using a mathematical formula and operations on other attributes are called derived attributes. Stored attributes: The attributes from which another attributes can be derived are called stored attributes. The age of a student can be calculated by counting the number of years starting from his date of birth to the present date. In this case age is the derived attribute and date of birth is the stored attribute. In ER diagram a derived attribute is represented with a dotted oval and a line.
Relationship A relationship is an association among entity types. It is represented as a diamond in ER diagram. For example an entity ‘ student’ can be associated with another entity ‘ class’ as follows: ‘Attends’ is the relationship between the two entities. The degree of a relationship type is the number of participating entity types. The above example has degree 2 and is therefore a binary relationship.
Cardinality Cardinality denotes the occurrences of data on either side of a relation. The cardinality ratio for a binary relationship specifies the maximum number of relationship instances an entity can participate in. A one to one relationship indicates that a single instance of one entity is associated with a single instance in the related entity.
A one to many or a many to one relationship indicates that a single instance of one entity is associated with one or more instances of the related entity.
A many to many relationship indicates that either entity participating in the relationship may have many instances.
Example: The diagram shown below represents the academic functioning of a college. There are five entities viz . Department, Faculty, Student, Course , and H ostel . All the five entities have their own attributes. DNumber , FacultyID , StudentID , CourseID , and HostelID are the key attributes of D epartment, Faculty, Student, Course and Hostel respectively.
ER-Diagram showing academic functioning of a college
Spatial Data Model
Vector Data Structures
Vector data structure Geographic entities encoded using the vector data model, are often called features. The features can be divided into two classes: Simple features b. Topological features
a. Simple features These are easy to create, store and are rendered on screen very quickly. They lack connectivity relationships and so are inefficient for modeling phenomena conceptualized as fields.
Point entities : These represent all geographical entities that are positioned by a single XY coordinate pair. Along with the XY coordinates the point must store other information such as what does the point represent etc.
Line entities : Linear features made by tracing two or more XY coordinate pair. Simple line: It requires a start and an end point. Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter the line segment and the higher the number of coordinate pairs, the closer the chain approximates a complex curve. Simple Polygons : Enclosed structures formed by joining set of XY coordinate pairs.
b. Topological features A topology is a mathematical procedure that describes how features are spatially related and ensures data quality of the spatial relationships. Topological relationships include following three basic elements: I. Connectivity: Information about linkages among spatial objects II. Contiguity: Information about neighbouring spatial object III. Containment: Information about inclusion of one spatial object within another spatial object
Connectivity Arc node topology defines connectivity – 1. Arcs are connected to each other if they share a common node. This is the basis for many network tracing and path finding operations. 2. Arcs represent linear features and the borders of area features. 3. Every arc has a from-node which is the first vertex in the arc and a to-node which is the last vertex.
Arc-node Topology
Nodes can, however, be used to represent point features which connect segments of a linear feature (e.g., intersections connecting street segments, valves connecting pipe segments). Node showing intersection
Arc-Node Topology with list Arc-node topology is supported through an arc-node list. For each arc in the list there is a from node a nd a to no de. Connected arcs are determined by common node numbers.
Contiguity Polygon topology defines contiguity. The polygons are said to be contiguous if they share a common arc. Contiguity allows the vector data model to determine adjacency
The from node and to node of an arc indicate its direction, and it helps determining the polygons on its left and right side. In the illustration above, polygon B is on the left and polygon C is on the right of the arc 4. Polygon A is outside the boundary of the area covered by polygons B, C and D. It is called the external or universe polygon
Containment Geographic features cover distinguishable area on the surface of the earth. The polygons can be simple or they can be complex with a hole or island in the middle. In the illustration given below assume a lake with an island in the middle. The lake actually has two boundaries, one which defines its outer edge and the other (island) which defines its inner edge.
The polygon D is made up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7 creates an island in the polygon.
Polygons are represented as an ordered list of arcs and not in terms of X, Y coordinates. This is called Polygon-Arc topology Since arcs define the boundary of polygon, arc coordinates are stored only once, thereby reducing the amount of data and ensuring no overlap of boundaries of the adjacent polygons.
Polygon as a topological feature
Raster Data Structures
Raster Data Compression
Raster Vs Vector Models
Comparison between Vector and Raster Data Models Data Model Advantages Disadvantages Raster Simple data structure Cell size determines the resolution at which the data is represented Compatible with remote sensing or scanned data Requires a lot of storage space Spatial analysis is easier Projection transformations are time consuming Simulation is easy because each unit has the same size and shape Network linkages are difficult to establish Vector Data is represented at its original resolution and form without generalization The location of each vertex is to be stored explicitly Require less storage space Overlay based on criteria is difficult Editing is faster and convenient Spatial analysis is cumbersome Network analysis is fast Simulation is difficult because each unit has a different topological form Projection transformations are easier
Raster Data Compression Data compression refers to the reduction of data volume, a topic particularly important for data delivery and Web mapping. Data compression is related to how raster data are encoded. Quadtree and RLE, because of their efficiency in data encoding, can also be considered as data compression methods.
A variety of techniques are available for data compression. They can be lossless or lossy. A lossless compression preserves the cell or pixel values and allows the original raster or image to be precisely reconstructed. RLE is an example of lossless compression.
A lossy compression cannot reconstruct fully the original image but can achieve higher compression ratios than a lossless compression. Lossy compression is therefore useful for raster data that are used as background images rather than for analysis
Newer image compression techniques can be both lossless and lossy. An example is MrSID (Multi-resolution Seamless Image Database) patented by LizardTech Inc.
MrSID uses the wavelet transform for data compression. The wavelet-based compression is also used by JPEG 2000 and ECW (Enhanced Compressed Wavelet). The wavelet transform treats an image as a wave and progressively decomposes the wave into simpler wavelets
Using a wavelet (mathematical) function, the transform repetitively averages groups of adjacent pixels (e.g., 2, 4, 6, 8, or more) and, at the same time, records the differences between the original pixel values and the average. The differences, also called wavelet coefficients, can be 0, greater than 0, or less than 0.
Using the Haar function, we take the average of each pair of adjacent pixels. The averaging results in the string (2, 8, 8, 4) and retains the quality of the original image at a lower resolution. But if the process continues, the averaging results in the string (5, 6) and loses the darker center in the original image.
Suppose that the process stops at the string (2, 8,8, 4). The wavelet coefficients will be −1 (1 − 2), −1(7 − 8), 0 (8 − 8), and 2 (6 − 4). If, however, a lossless compression is needed, we can use the coefficients to reconstruct the original image. For example, 2 − 1 = 1 (the first pixel), 2 − (−1) = 3 (the second pixel), and so on.
The UTM (Universal Transverse Mercator) system is a system of coordinates that describes position on a map
TIN and GRID Models
TIN and Grid Models Triangular Irregular Network (TIN) A surface representation derived from irregularly spaced points and breakline features. Each sample point has an x,y coordinate and a z-value or surface value. TIN can be created from following triangulation methods Delaunay Triangulation method Important Points method Adaptive Densification
Delaunay Triangulation Method TIN represents surface as contiguous non-overlapping triangles created by performing Delaunay triangulation. These triangles have a unique property that the circumcircle that passes through the vertices of a triangle contains no other point inside it. This topologic data structure manages information about the nodes that form each triangle and the neighbors of each triangle.
Delaunay Triangulation Method
Advantages of Delaunay triangulation The triangles are as equiangular as possible, thus reducing potential numerical precision problems created by long skinny triangles The triangulation is independent of the order the points are processed Ensures that any point on the surface is as close as possible to a node
The TIN model is a vector data model which is stored using the relational attribute tables. TIN dataset contains three basic attribute tables: Arc attribute table that contains length, from node and to node of all the edges of all the triangles. Node attribute table that contains x, y coordinates and z (elevation) of the vertices . Polygon attribute table that contains the areas of the triangles, the identification number of the edges and the identifier of the adjacent polygons
As TIN stores topological relationships, the datasets can be applied to vector based geoprocessing such as automatic contouring, 3D landscape visualization, volumetric design, surface characterization .
A triangulated irregular network (TIN) approximates the terrain with a set of non overlapping triangles . Each triangle in the TIN assumes a constant gradient. Flat areas of the land surface have fewer but larger triangles, whereas areas with higher variability in elevation have denser but smaller triangles. The TIN is commonly used for terrain mapping and analysis, especially for 3-D display
Important Points Method : The Extract Important points method creates vector points from raster elevation data. Points are created automatically for cell values at regular grid intersections or that mark significant changes in surface elevation, depending on the chosen point extracting method
Adaptive Densification Method : It is used to create TIN objects using raster surface data as the input object. This process iteratively inserts nodes inside existing triangles at the location of maximum surface deviation from the plane of triangle.
Grid VS TIN TIN Grid Features TIN represent features more accurately. Flow directions can be arbitrary In Grid, Flow directions are restricted to grid points. There are only 8 possible flow directions Advantages Ability to describe the surface at different level of resolution Effeciency in storing data Easy to store and manipulate Easy integration with raster databases Smoother, more natural appearance of derived terrain features Disadvantages In many cases require visual inspection and manual control of the network Inability to use grid sizes to reflect areas of different complexity of relief.
OGC Standards The Open Geospatial Consortium (OGC) is a not-for-profit organisation focused on developing and defining open standards for the geospatial community to allow interoperability between various software, and data services.
OGC Interoperable Sectors
Data Quality In GIS, data quality is used to give an indication of how good data are. It describes the overall fitness or suitability of data for a specific purpose or is used to indicate data free from errors and other problems. Examining issues such as error , accuracy , precision and bias can help to assess the quality of individual data sets.
Data sets used for analysis need to be complete , compatible and consistent , and applicable for the analysis being performed. Flaws in data are usually referred to as errors. Error is the physical difference between the real world and the GIS facsimile. A more systematic error would have occurred if the co-ordinates for all the ski lift stations in the data set had been entered in ( y , x ) order instead of ( x , y ).
Accuracy is the extent to which an estimated data value approaches its true value. If a GIS database is accurate, it is a true representation of reality. It is impossible for a GIS database to be 100 per cent accurate, though it is possible to have data that are accurate to within specified tolerances. For example, a ski lift station co-ordinate may be accurate to within plus or minus 10 metres.
Precision is the recorded level of detail of your data. A co-ordinate in metres to the nearest 12 decimal places is more precise than one specified to the nearest three decimal places. Computers store data with a high level of precision, though a high level of precision does not imply a high level of accuracy.
Four contestants in the shooting have produced the results The difference between accuracy and precision is important and is explained in Box
Bias in GIS data is the systematic variation of data from reality. Bias is a consistent error throughout a data set. A consistent overshoot in digitized data caused by a badly calibrated digitizer, or the consistent truncation of the decimal points from data values by a software program, are possible examples.
Resolution and generalization are two important issues that may affect the representation of features in a GIS database. In raster GIS, resolution is determined by cell size. For example, for a raster data set with a 20-metre cell size, only those features that are 20 × 20 metres or larger can be distinguished.
Figure allows comparison of a 25 metre resolution vegetation map with a 5 metre resolution aerial photograph of the same area.
Resolution is dependent on the scale of the original map, the point size and line width of the features represented thereon and the precision of digitizing. Generalization is the process of simplifying the complexities of the real world to produce scale models and maps. Cartographic generalization is a subject in itself and is the cause of many errors in GIS data derived from maps.