Unit 4 Data Editing.pptx

e20ag004 1,124 views 72 slides Dec 13, 2023
Slide 1
Slide 1 of 72
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72

About This Presentation

Remote sensing


Slide Content

AI8702 REMOTE SENSING AND GEOGRAPHICAL INFORMATION SYSTEM Unit 4 Data Input and Analysis

Data Editing in GIS Editing geographic data is the  process of creating, modifying, or deleting features and related data on layers in a map . Each layer is connected to a data source that defines and stores the features; this is typically a geodatabase feature class or a feature service.

Data Editing in GIS Geographic Information System simply represents real world conditions with the aid of computer. It is a tool for analyzing the problems. For that we need some data. It may be spatial or non-spatial. These data may include errors. We could expect errors from the original source as well as derived during encoding. Before the processing of data it is essential to identify and eliminate the error, otherwise it will contaminate the GIS data base.

Data Editing in GIS The pre-processing of GIS data i.e. data editing can be grouped into the following: Detecting and correcting errors Reprojection, transformation and generalization Edge matching and rubber sheeting

1. Detecting and Correcting Errors Errors affect the quality of GIS data. Once the data is collected, and prepared for visualization and analysis it must be checked for errors. Errors in input data may derive from three main sources. Errors in the sources of data: It may the errors in maps used by surveyors or printing errors. Errors resulting from original measurements (while encoding): It may be scanning errors, digitizing errors, typing errors, etc. Errors arising through processing (at the time of transfer and conversion): While transferring and converting data different formats makes errors and data loss.

Common sources of Errors Old data sources: The data sources used for a GIS project may be too old to use. Data collected in past may not be acceptable for current time projects. Lack of data: The data for a given area may be incomplete or entirely lacking. For example the land-use map for border regions may not be available.

Common sources of Errors Map scale: The details shown on a map depend on the scale used. Maps or data of the appropriate scale at which details are required, must be used for the project. Use of wrong scale would make the analysis erroneous. Observation: High density of observations in an area increases the reliability of the data. Insufficient observations may not provide the level of resolution required for adequate spatial analysis as expected from the project.

Errors resulting from original measurements Positional accuracy: Representing correct positions of geographic features on map depend upon the data being used. Biased field work, improper digitization and scanning errors result in accuracies in GIS projects. Content accuracy: Maps must be labeled correctly. An incorrect labeling can introduce errors which may go unnoticed by the user. Any omission from map or spatial database may result in inaccurate analysis.

Errors arising through processing Numerical errors: Different computers have different capabilities for mathematical operations. Computer processing errors occur in rounding off operations and are subject to the inherent limits of number manipulation by the processor. Topological errors: Data is subject to variation. Errors such as dangles, slivers, overlap etc are found to be present in the GIS data layers.

Errors arising through processing Dangle:   An arc is said to be a dangling arc if either it is not connected to another arc properly (undershoot) or is digitized past its intersection with another arc (overshoot).

Errors arising through processing Sliver:   It refers to the gap which is created between the two polygons when snapping is not considered while creating those polygons. These errors can be corrected using the constraints or the rules which are defined for the layers. Topology rules define the permissible spatial relationships between features. 

Errors arising through processing Digitizing and geocoding: Many errors arise at the time of digitization, geocoding, overlaying or rasterizing. The errors associated with damaged source maps and error while digitizing can be corrected by comparing original maps with digitized versions.

Errors arising through processing Raster data editing  is concerned with correcting the specific contents of raster images than their general geometric characteristics. The objective of the editing is to produce an image suitable for raster geoprocessing. 1. Filling holes and gaps:  To fill holes and gaps that appear in the raster image.

Errors arising through processing 2. Edge smoothing:  To remove or fill single pixel irregularities in the foreground pixels and background pixels along lines

Errors arising through processing 3. Deskewing:  To rotate the image by a small angle so that it is aligned orthogonally to the x and y axes of the computer screen

Errors arising through processing 4. Filtering:  To remove speckles or the random high or low valued pixels in the image

Errors arising through processing 5. Clipping and delete:  To create a subset of an image or to remove unwanted pixels.

Errors arising through processing Vector data editing  is a post digitizing process that ensures that the data is free from errors. 1. Lines intersect properly without having any undershoots or overshoots

Errors arising through processing 2. Nodes are created at all points where lines intersect. 3. All polygons are closed and each of them contain a label point. 4. Topology of the layer is built.

2. Reprojection, Transformation and Generalization Once spatial and attribute data have been encoded and edited, it is necessary to process data geometrically in order to provide a common reference. The data derived from various sources should be converted in to a common projection before they combined and analyzed. If it not reprojected, data derived from a source map using one projection will not plot the same location data derived from another source using another projection. Data derived from different sources may also have different co-ordinate systems. They may have different origins, units of measurements and orientations. So it is necessary to transform it in to a common grid system. It involves some mathematical calculations.

2. Reprojection, Transformation and Generalization Data may be derived from different maps with different scales. The generalization should be done while comparing data of large and small scales. This will also helps to save time and reduce the space of storage. The simplest method for generalization is to delete points between two points with in a specified interval. But it will not preserve the space of the object. When we generalize a map, data loss is a min problem. But it is necessary with comparison of different scale maps. Instead of this, compaction technique could be used it will help to reduce the space with out any data loss.

Methods of generalization

3. Edge Matching and Rubber Sheeting When our study area extends across two or more map sheets, small difference and miss matches may occur. For that normally each map sheets would be digitize separately and then adjacent sheets joined after editing, projection, transformation and generalization. This joining process is known as edge matching. This involves three basic steps: Mismatches at sheet boundaries must be resolved. When the maps are joining, the adjacent lines and polygons may not join. It should be corrected to complete features and ensure that the data are correct topologically.

3. Edge Matching and Rubber Sheeting

TO AN INTEGRATED GIS DATA BASE We are preparing an integrated GIS data base using the edited and reprojected data from various sources

Raster data models Raster data model would appear to be very simple, but even raster data can be stored inside a computer in lot of different ways. Let us take the example of GIS containing two layers: a land use layer depicting a relatively small number of land uses, each of which is represented by land use code number (Ex: 1 = Urban, 2 = Forest, 3 = Village, etc) and a transport network layer (Ex: 0 = None, 1 = Road, 2 = Railway, etc). The data could be organized in the computer in any of the following ways.

Raster data models By location (Grid cell) – This would list the data values for each of the different layers for the first grid cell, then the second cell and so on. By coverage – This would store all the data values for the first coverage (i.e., land use) as a 2D matrix and then all the data values for the 2 nd coverage. By binary coverage – This would represent all the cells having 1 indicated presence of land use and 0 indicates absence of land use. By data value

Raster data structure In a simple raster data structure the geographical entities are stored in a matrix of rectangular cells. A code is given to each cell which informs users which entity is present in which cell.

Raster data structure The simplest way of encoding a raster data into computers can be understood as follows : (a) Entity model: It represents the whole raster data. Let us assume that the raster data belongs to an area where land is surrounded by water. Here a particular entity (land) is shown in green color and the area where land is not present is shown by white.

Raster data structure (b) Pixel values: The pixel value for the full image is shown. Cells having a part of the land are encoded as 1 and others where land is not present are encoded as 0.

Raster data structure (c) File structure: It demonstrates the method of coding raster data. The first row of the file structure data tells that there are 5 rows and 5 columns in the image, and 1 is the maximum pixel value. The subsequent rows have cells with value as either 0 or 1 (similar to pixel values).

Raster data structure The huge size of the data is a major problem with raster data. An image consisting of twenty different land-use classes takes the same storage space as a similar raster map showing the location of a single forest. To address this problem many data compaction methods have been developed which are discussed below:

Raster data Compression/ compaction 1. Run-length Encoding Reduction of data on a row by row basis Stores a single value for a group of cells rather than storing values for individual cells First line represents the dimension of the matrix (5×5) and the number of entities (1) present. In second and subsequent lines, the first number in the pair represents absence (0) or presence (1) of the entity and the second number indicates the number of cells referenced.

Raster data Compression/ compaction

Raster data Compression/ compaction 2. Block Encoding Data is stored in blocks in the raster matrix. The entity is subdivided into hierarchical blocks and the blocks are located using coordinates. The first cell at top left hand is used as the origin for locating the blocks

Raster data Compression/ compaction 3. Chain coding Works by defining boundary of the entity i.e. sequence of cells starting from and returning to the given origin Direction of travel is specified using numbers. (0 = North, 1 = East, 2 = South, 3 = West) The first line tells that the coding started at cell (4, 2) and there is only one chain. In the second line the first number in the pair tells the direction and the second number represents the number of cells lying in this direction.

Raster data Compression/ compaction

Raster data Compression/ compaction 4. Quad trees A raster is divided into a hierarchy of quadrants that are subdivided based on similar value pixels. The division of the raster stops when a quadrant is made entirely from cells of the same value. A quadrant that cannot be subdivided is called a leaf node.

Raster data Compression/ compaction A satellite or remote sensing image is a raster data where each cell has some value and together these values create a layer. A raster may have a single layer or multiple layers. In a multi-layer/ multi-band raster each layer is congruent with all other layers, have identical numbers of rows and columns, and have same locations in the plane. Digital elevation model (DEM) is an example of a single-band raster dataset each cell of which contains only one value representing surface elevation.

Raster data Compression/ compaction A single layer raster data can be represented using a.  Two colors (binary):  The raster is represented as binary image with cell values as either 0 or 1 appearing black and white respectively

Raster data Compression/ compaction Grayscale:  Typical remote sensing images are recorded in an 8 bit digital system. A grayscale image is thus represented in 256 shades of gray which range from 0 (black) to 255 (white). However a human eye can’t make distinction between the 255 different shades. It can only interpret 8 to 16 shades of gray.

Raster data Compression/ compaction A satellite image can have multiple bands, i.e. the scene/details are captured at different wavelengths (Ultraviolet- visible- infrared portions) of the electromagnetic spectrum. While creating a map we can choose to display a single band of data or form a color composite using multiple bands. A combination of any three of the available bands can be used to create RGB composites. These composites present a greater amount of information as compared to that provided by a single band raster.

Raster data Compression/ compaction Data Model Advantages Disadvantages       Raster Simple data structure Cell size determines the resolution at which the data is represented Compatible with remote sensing or scanned data Requires a lot of storage space Spatial analysis is easier Projection transformations are time consuming Simulation is easy because each unit has the same size and shape Network linkages are difficult to establish

Raster File Formats BMP – Bit Map Graphics in MS windows applications TIFF – Tagged Image File Format GeoTIFF – Geographic Tagged Image File Format GIF – Graphic Interchange Format JPEG – Joint Photographic Experts Group PNG – Portable Network Graphics GRID – Global Research Identifier Database MrSID – Multi resolution Seamless Image Database

Vector data model The vector data model is closely linked with the discrete object view. In vector data model, geographical phenomena are represented in three different forms: Point, Line and Polygon. Point – A location depicted by a single set of (x, y) co-ordinates at the scale of abstraction. The wells in a village, electricity poles in a town and cities in the world map are the examples of spatial features described by points. Note – A city can be marked as a single point on a world map but polygon on a state map. Scale plays an important role in deciding the geometry of a geographical feature.

Vector data model Line/ Arc – Ordered sets of (x,y) co-ordinate pair arranged to form a linear feature. The roads, rails and telephone cables are the examples of spatial features described by lines. Polygon – The set of (x,y) co-ordinate pairs enclosing a homogeneous area. The land parcels, agricultural farms and water bodies are the examples of spatial features described by polygons.

Vector data structure Geographic entities encoded using the vector data model, are often called features. The features can be divided into two classes : a. Simple features These are easy to create, store and are rendered on screen very quickly. They lack connectivity relationships and so are inefficient for modeling phenomena conceptualized as fields.

Vector data structure b. Topological features A topology is a mathematical procedure that describes how features are spatially related and ensures data quality of the spatial relationships. Topological relationships include following three basic elements: I. Connectivity: Information about linkages among spatial objects II. Contiguity: Information about neighboring spatial object III. Containment: Information about inclusion of one spatial object within another spatial object

Vector data structure Connectivity Arc node topology  defines connectivity - arcs are connected to each other if they share a common node. This is the basis for many network tracing and path finding operations . Arcs represent linear features and the borders of area features. Every arc has a from-node which is the first vertex in the arc and a to-node which is the last vertex. These two nodes define the direction of the arc. Nodes indicate the endpoints and intersections of arcs. They do not exist independently and therefore cannot be added or deleted except by adding and deleting arcs.

Vector data structure Nodes can, however, be used to represent point features which connect segments of a linear feature (e.g., intersections connecting street segments, valves connecting pipe segments). Arc Node Topology Node showing Intersection

Vector data structure Arc-node topology is supported through an arc-node list. For each arc in the list there is a  from node a nd a  to no de. Connected arcs are determined by common node numbers. Arc-Node Topology with list

Vector data structure Contiguity Polygon topology  defines contiguity. The polygons are said to be contiguous if they share a common arc. Contiguity allows the vector data model to determine adjacency. Polygon Topology

Vector data structure The  from  node and  to  node of an arc indicate its direction, and it helps determining the polygons on its left and right side. Left-right topology refers to the polygons on the left and right sides of an arc. In the illustration above, polygon B is on the left and polygon C is on the right of the arc 4 . Polygon A is outside the boundary of the area covered by polygons B, C and D. It is called the external or universe polygon, and represents the world outside the study area. The universe polygon ensures that each arc always has a left and right side defined.

Vector data structure Containment Geographic features cover distinguishable area on the surface of the earth. An area is represented by one or more boundaries defining a polygon. The polygons can be simple or they can be complex with a hole or island in the middle. In the illustration given below assume a lake with an island in the middle. The lake actually has two boundaries, one which defines its outer edge and the other (island) which defines its inner edge. An island defines the inner boundary of a polygon. The polygon D is made up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7 creates an island in the polygon.

Vector data structure Containment Polygon arc topology

Vector data structure Polygons are represented as an ordered list of arcs and not in terms of X, Y coordinates. This is called  Polygon-Arc topology . Since arcs define the boundary of polygon, arc coordinates are stored only once, thereby reducing the amount of data and ensuring no overlap of boundaries of the adjacent polygons.

Vector data structure Simple Features Point entities  : These represent all geographical entities that are positioned by a single XY coordinate pair. Along with the XY coordinates the point must store other information such as what does the point represent etc . Line entities  : Linear features made by tracing two or more XY coordinate pair. Simple line: It requires a start and an end point. Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter the line segment and the higher the number of coordinate pairs, the closer the chain approximates a complex curve.

Vector data structure Simple Polygons  : Enclosed structures formed by joining set of XY coordinate pairs. The structure is simple but it carries few disadvantages which are mentioned below: Lines between adjacent polygons must be digitized and stored twice, improper digitization give rise to slivers and gaps Convey no information about neighbor Creating islands is not possible

Vector data structure Topologic Features Networks  : A network is a topologic feature model which is defined as a line graph composed of links representing linear channels of flow and nodes representing their connections. The topologic relationship between the features is maintained in a connectivity table. By consulting connectivity table, it is possible to trace the information flowing in the network

Vector data structure Polygons with explicit topological structures  : Introducing explicit topological relationships takes care of islands as well as neighbors. The topological structures are built either by creating topological links during data input or using software. Dual Independent Map Encoding (DIME) system of US Bureau of the Census is one of the first attempts to create topology in geographic data.

Vector data structure Polygons with explicit topological structures   Polygon as a topological feature

Vector data structure Polygons with explicit topological structures   Polygons are formed using the lines and their nodes. Once formed, polygons are individually identified by a unique identification number. The topological information among the polygons is computed and stored using the adjacency information (the nodes of a line, and identifiers of the polygons to the left and right of the line) stored with the lines.

Vector data structure Fully topological polygon network structure A fully topological polygon network structure is built using boundary chains that are digitized in any direction. It takes care of islands and lakes and allows automatic checks for improper polygons. Neighborhood searches are fully supported. These structures are edited by moving the coordinates of individual points and nodes, by changing polygon attributes and by cutting out or adding sections of lines or whole polygons. Changing coordinates require no modification to the topology but cutting out or adding lines and polygons requires recalculation of topology and rebuilding the database.

Vector data structure Triangular Irregular Network (TIN ) TIN represents surface as contiguous non-overlapping triangles created by performing Delaunay triangulation. These triangles have a unique property that the circumcircle that passes through the vertices of a triangle contains no other point inside it. TIN is created from a set of mass points with x, y and z coordinate values. This topologic data structure manages information about the nodes that form each triangle and the neighbors of each triangle.

Vector data structure Triangular Irregular Network (TIN ) Delaunay Triangulation

Vector data structure Advantages of Delaunay triangulation The triangles are as equiangular as possible, thus reducing potential numerical precision problems created by long skinny triangles The triangulation is independent of the order the points are processed Ensures that any point on the surface is as close as possible to a node

Vector data structure

Vector data structure

Vector data structure The TIN model is a vector data model which is stored using the relational attribute tables. A TIN dataset contains three basic attribute tables: Arc attribute table that contains length, from node and to node of all the edges of all the triangles . Node attribute table that contains x, y coordinates and z (elevation) of the vertices Polygon attribute table that contains the areas of the triangles, the identification number of the edges and the identifier of the adjacent polygons.

Vector data structure Storing data in this manner eliminated redundancy as all the vertices and edges are stored only once even if they are used for more than one triangle. As TIN stores topological relationships, the datasets can be applied to vector based geoprocessing such as automatic contouring, 3D landscape visualization, volumetric design, surface characterization etc.  

Vector data model Vector Data is represented at its original resolution and form without generalization The location of each vertex is to be stored explicitly Require less storage space Overlay based on criteria is difficult Editing is faster and convenient Spatial analysis is cumbersome Network analysis is fast Simulation is difficult because each unit has a different topological form Projection transformations are easier