Unit - 4 Data input and Analysis

UNIT 4 – Data Input and Analysis Spatial Data: Definition: Spatial data refers to any data that is associated with a geographic location on the Earth's surface. Types of Spatial Data: Vector Data : Represents geographic features as points, lines, and polygons. Examples include roads, rivers, administrative boundaries, and parcels. Raster Data : Organized into a grid of cells, each cell representing a specific area on the Earth's surface. Raster data is commonly used to represent continuous phenomena such as elevation, land cover, and temperature. Attribute Data : Information associated with spatial features, stored in tabular format. Attributes provide additional context and characteristics about spatial features (e.g., population density, land use type, elevation values).

Analysis in GIS Spatial Analysis : Involves examining patterns, relationships, and trends within spatial data to derive meaningful insights and make informed decisions. Types of Spatial Analysis: Overlay Analysis : Combining multiple layers of spatial data to identify areas of overlap or proximity. This technique is used for tasks such as site suitability analysis, habitat mapping, and land-use planning. Buffering : Creating a buffer zone around a specific geographic feature or point of interest. Buffer analysis is useful for assessing proximity and accessibility, such as determining the impact zone around a hazardous facility or the service area of a retail store.

Spatial Querying : Selecting features from a dataset based on their spatial relationship with other features or geographic criteria. Network Analysis : Analyzing the connectivity and flow within spatial networks, such as road networks, utility networks, and transportation systems. Network analysis enables tasks like route optimization, service area delineation, and emergency response planning. Interpolation : Estimating values at unsampled locations based on known values at nearby locations. Interpolation methods are commonly used in environmental modeling , hydrology, and resource management to predict variables like temperature, rainfall, and pollutant concentrations. Geoprocessing : Refers to a set of operations and tools used to manipulate and analyze spatial data within GIS software. Geoprocessing tasks include data conversion, data transformation, spatial joins, and data aggregation.

Integration of Data and Analysis Effective GIS analysis relies on the integration of spatial data from multiple sources and the application of appropriate analytical techniques to extract meaningful information. GIS enables users to perform complex spatial analyses by combining different types of spatial data (vector, raster, and attribute data) and applying a variety of analytical methods to solve spatial problems. The outputs of GIS analysis, such as maps, charts, and reports, provide valuable insights for decision-making, planning, and resource management across various domains, including urban planning, environmental science, public health, and business intelligence.

DATA The terms data and information interchangeably but these two terms convey very distinct concepts. Data is defined as a body of facts or figures which have been gathered systematically for one or more specific purposes. Data is a plural and in a broad sense it can include things such as pictures (binary images), programmes and rules. linguistic expressions (e.g., name, age, address, date, ownership) symbolic expressions (e.g., traffic signs) mathematical expressions (e.g., E = mc2) signals (e.g., electromagnetic waves).

INFORMATION Information is defined as data which have been processed into a form that is meaningful to a recipient and is of perceived value in current or prospective decision making. It is data that make information useful for one person and same information may not be useful to another person. relevant (i.e. has intended purposes and an appropriate level of required detail) reliable, accurate and verifiable (by independent means) up-to-date and timely (depending on purposes) complete (in terms of attribute, spatial and temporal coverage) intelligible (i.e. comprehensible by its recipients) consistent (with other sources of information) convenient and easy to handle, and adequately protected.

Functions of information system Conversion : It is the process of transforming data from one format to another. The transformation, for example, may be from one unit of measurement to another like km to cm or from one feature classification to another. Organisation: It involves the process of organising or re-organising data according to database management rules and procedures and can be accessed cost-effectively. Structuring: It represents the formatting or re-formatting of data, so that it can be acceptable to a particular software application or information system. Modelling: It includes the statistical analysis and visualisation of data that improves user’s knowledge base and intelligence in decision-making.

DATABASE The concept of database is the approach to information organisation in computer-based data processing. A database is defined as an automated, formally defined and centrally controlled collection of persistent data used and shared by different users in an enterprise. The term ‘ centrally controlled ’ means that databases tend to be physically distributed in different computer systems in the same time at different locations. A database is set-up to serve the information needs of an organisation. The sharing of data is the key to the concept of a database. Data in a database are described as ‘permanent’ in the sense that they are different from ‘transient’ data such as input to and output from an information system. The data usually remain in the database for a considerable length of time, although the actual content of the data can change very frequently.

Creating a database Step 1 : This step involves data investigation. It is the ‘fact finding’ stage of database creation. Here the task is to consider the type, quantity and qualities of data to be included in the database. Step 2 : It comprises of data modelling. It is the process of forming a conceptual model of data by examining the relationships between entities and characteristics of entities and attributes. Step 3: It consists of database design. Database design is the process of creating a practical design for database. The design will depend on the database software being used and its data model. Step 4: It involves database implementation which is the procedure of populating the database with attribute data and this is always followed by monitoring and upkeep, including fine tuning, modification and updating.

Relationship Perspective of Database Relationships represent an important concept in database management. It describes the logical association between entities. Relationships can be categorical or spatial, depending on whether they describe location or other characteristics. Categorical Relationships : These relationships describe the association among individual features in a classification system. The classification of data is based on the concept of scale of measurement. There are four scales of measurement: nominal, ordinal, interval and ratio.

Spatial Relationships: These relationships describe the association among different features in space. Spatial relationships are visually displayed when data are presented in the graphical form. There are numerous types of spatial relationships possible among features. Recording spatial relationships implicitly demands considerable storage space. Computing spatial relationships on-the-fly slows down data processing particularly, if relationship information is required frequently. There are two types of spatial relationships such as topological and proximal

DATABASE MODELS A database model is the theoretical foundation of a database and fundamentally determines the way a data can be stored, organised and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system. Databases can be organised in different ways known as database models. Hierarchical data are organised by records on a parent-child one-to-many relations. Network data are organised by records which are classified into record types within pointers linking associated records. Relational data are organised by records in relations which resemble a table. Object-oriented data are uniquely identified as individual objects that are classified into object types or classes according to the characteristics (attributes and operations) of the object.

Hierarchical Model Hierarchical Database Management Systems (DBMSs) were popular from the late 1960s with the introduction of IBM’s Information Management System (IMS) DBMS through the 1970s. It organises data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information generally in the child data segments. Data in a series of records have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses parent child relationships. These are a 1: N mapping between record types. This is done by using trees, like set theory used in the relational model, borrowed from mathematics.

For example, an organisation might store information about population in a city such as ward name, locality name, street number, house number, residents name, etc. The organisation might also store information about resident’s children such as name and date of birth. The resident and children data forms a hierarchy where the resident data represents parent segment and children data represents child segment. If a resident has four children then there would be four child segments associated with one resident segment. In a hierarchical database, the parent-child relationship is one-to-many. This restricts a child segment to having only one parent segment. In the hierarchical model, the links established by the pointers are permanent and cannot be modified. This makes the hierarchical model more rigid and inflexible causing difficulties in expansion or modification of databases.

Network Model The popularity of the network data model coincided with the popularity of the hierarchical data model. Some data were more naturally modelled with more than one parent per child. But the network model permitted the modelling of many-to-many relationships in data. The basic data modelling construct in the network model is the set construct. A set consists of an owner record type, a set name and a member record type. A member record type can have that role in more than one set; hence the multi-parent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types may exist as well as sets between them. Network model becomes complex with the increase in size of database. This model also suffers from inflexibility but the degree of inflexibility is lower compared to the hierarchical model because it provides multi-parent relationship.

Relational Model Relational model based on the concept proposed by Codd (1970) and is popular among GIS users. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database, data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields. The properties of relational tables are: values are atomic. each row is unique. sequence of rows is insignificant. each column has a unique name. column values are of the same kind. sequence of columns is insignificant.

T hese relationships are only specified at retrieval time, the relational databases are classed as dynamic database management system. The Relational Database Management System (RDBMS) is database based on relational model. The main disadvantage is the terminology of relational database which can be confusing because of the use of different terminologies by different users. Although the relational model is flexible than hierarchical model and network model but still suffers from data redundancy and can be slow and difficult to implement. Its efficiency is reduced with handling of complex data formats of GIS because of limited range of data types.

Object-Oriented Model Object-Oriented Database (OODB) paradigm is the combination of Object-Oriented Programming Language (OOPL) systems and persistent systems. The power of OODB comes from the seamless treatment of both persistent data as found in databases and transient data as found in executing programmes. Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects and provide full-featured database programming capability. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment.

As a result, applications require less code, use more natural data modelling and code bases are easier to maintain. In contrast to a relational DBMS, where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure. Firstly, it provides higher performance management of objects and secondly, it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as risk analysis systems, telecommunications service applications, World Wide Web (WWW) document structures, design and manufacturing systems which have complex relationships between data. Main problem of object-oriented model is the implicit uncertainty of geographical ideas; therefore, it is difficult to represent them in rigidly bounded datasets. There is also no theoretical base or standard query language for object-oriented model.

SPATIAL DEFINITION Spatial data describes the location and shape of geographic features, and their spatial relationship to the features. The information contained in the spatial database is held in the form of digital coordinates which describe the spatial features it mainly depends on the latitude and longitude of the feature. Spatial Data can be encoded using following spatial entities. The Spatial data can be represented through using 1. The Point 2. The Line 3. The Area 4. Network 5. The Surface

SPATIAL DATA MODEL The GIS model can be split into two parts A Model of spatial form and A model of Spatial Process. The model of spatial form represents the structure and distribution of features in geographical space. Computers require unambiguous instructions on how to turn data about spatial entities into graphical representations. This process is the second stage in designing and implanting a data model. There are two method of spatial data model. They are 1. Raster Data Model 2. Vector Data Model

Raster data model The Raster data model is one of the important spatial data model described as tessellation. In the raster world individual cells are used as the building block for creating images of point, line, area, and network and surface entities. Each area is divided into rows and columns, which form a regular grid structure. Each cell must be rectangular in shape, but not necessarily square. Each cell within this matrix contains location co-ordinates as well as an attribute value. The origin of rows and column is at the upper left corner of the grid. Rows function as the “y” coordinate and column as ”x” coordinate in a two dimensional system. A cell is defined by its location in terms of rows and columns.

Vector data model In the vector spatial data can be represented by using point. It is the basic building blocks from which all spatial entities are constructed. The simplest spatial entity, the point is represented by a single co-ordinate pair. Line and area entities are constructed by connecting a series of points into chains and polygons. The more complex the shape of a line or area feature the greater the number of points required representing it.

RASTER DATA COMPRESSIONAL METHOD One of the major problems with raster data sets is their size, because a value must be recorded and stored for each cell in an image. A complex image made up of mosaic of different features requires the same amount of storage space as similar raster map. To overcome the problem the following raster representation method can be adopted. They are 1. Run Length Encoding Method 2. Block Coding Method 3. Chain Coding Method 4. Quadtree Method

Run Length Encoding Method It is the raster image compression or compaction method. This technique reduces data volume on a row by row basis. It stores a single value where there are a number of cells of a given type in a group, rather than storing for each individual cell.

Block Coding Method This approach extends the run length encoding idea to two dimensions by using a series of square blocks to store data. These are seven unit cells, two four- cell squares and one nine –cell square. Coordinate are required to locate the blocks in the raster matrix.

Chain Coding Method The chain coding method is an important technique to coding the raster data structure. This method of data can reduce the work by defining the boundary of the entity. The boundary is defined as a sequence of unit cells starting from returning to a given origin. The direction of travel around the boundary is usually given using a numbering system.

Quadtrees Method One of the advantage of the raster data model is that each cell can be subdivided into smaller cells of the same shape and orientation. Pequet in 1990.This data model has produced a range of innovative data storage and data reduction methods that are based on regularly subdividing space. The quadtree works on the principle of subdividing the cells in raster image into quads. This subdivision process continuous until each cell in the image can be classed as having the spatial entity either present or absent.

COMPARISON OF RASTER AND VECTOR DATA MODELS The vector data model defines boundaries. There are no boundaries defined in the raster data model. The vector model represents location as x,y coordinates in a Cartesian coordinate system. The raster model represents location as cells, also in a Cartesian coordinate system. Rater data store rows and columns of cell values. The vector model represents features with well-defined boundaries; the raster model represents a more generalized view. The primary focus of the vector data model is the geographic feature; the primary focus of the raster data model is location.

The vector model represents feature shape accurately; the raster model represents rectangular areas and thus is more generalized and less accurate. The vector model is used for high-quality cartography and where accuracy and precision are important, such as for cadastral (property) applications. The raster data model is useful for image/picture storage and is well suited to many spatial modeling operations such as modeling surface storm runoff and forest fire spread. The overlay operation examines two datasets to determine what geographic features exist at the same location. Overlaying vectors is a complex operation; the nature of the raster data model allows simple and fast overlays.

METHODS OF DATA INPUT Data in analogue or digital form need to be encoded to be compatible with the GIS being used. This would be a relative straight forward task in all GIS packages uses the same spatial and attribute data models. All data is analogue form need to be converted to digital form before they can be input into GIS. a) Key board entry b) Automatic digitizing c) Manual digitizing d) Scanning

Key board Entry Key board entry is after referred to as key coding. Key coding is the entry of data into a file at a computer terminal. This technique is used for attribute data that are only available on paper. E.g.: entering both the attribute and spatial data (location, is terms of postal codes) about the total number of hospitals with in the Trivandrum city limit etc.. Key board entry operation is been simplified by the introduction of OCR. Text scanner and optical character recognition software can be used to read is characters automatically.

Digitizing The most common method of encoding spatial features from paper maps is through digitizing. There are mainly two main methods is digitizing operation: they are. manual digitizing automatic digitizing Scanning Automatic Line follower

Manual digitizing Most manual digitizers may be used in one of two modes: point mode or stream mode. In point mode the user begins digitizing each line segment with a start mode, records each change is direction of the line with a digitized point and finishes the segment with an end node. Thus a straight line can be digitized with just two points, the start and the end nodes. Smoot curves are problematic, since they require as infinite number of points to record their true shape. In practice, the user must choose a sensible number of points to represent a curve. The manual digitizing of paper may is one of the main source of positional error in GIS.

The accuracy of data generated by this method of encoding is dependent on many factors; including the scale and resolution of the source map, and the quality of the equipment and software being used. Errors can be introduced during the digitizing process by incorrect registration of map document on the digitize table or hand wobble. A shaky hand will produce differences between the line on the map and its digitized facsimile. Five Procedures to be Followed While Using A Manual Digitizer: Registration Digitizing point features Digitizing line features Digitizing area features Adding attribute information

Automatic digitizing Manual digitizing is a time consuming and tedious process. If large number of complex maps need to be digitized it is worth. Considering the alternative, although often more expensive, methods. Two automatic digitizing methods are considered here; scanning and automatic line follower or following. Scanning Scanning is the most commonly used method of automatic digitizing. Scanning is an appropriate method of data encoding when raster data are required; Since this is the automatic output format from most of scanning software’s. A scanner is a piece of hardware for converting as analogue source document into digital raster format.

The cheapest scanners are the small flat bed scanners a common pc peripheral. High quality and oral large format scanners require the source document to be placed on the rotating drum; and sensors move along the axis of rotation. There are some practical problems faced when scanning source document; they are The possibility of optical distortion when using the flat bed scanners. The automatic scanning of unwanted information. The selections of appropriate scanning tolerance to ensure important data are encoded and background data are ignored. The format of files produced and the input of data to GIS software. The amount of editing required producing data suitable for analysis.

Any way there are three different types of scanners in wide spread use; they are the flat bed scanners, rotating drum scanners and large format feed scanners. All scanners works on the same principle. A scanner has a light source, a background and lenses. During scanning absence or presence of light is detected as one of the three components move part the other two. Quality can be influenced by the setting of a threshold above which all values are translated as white and below which the values are black. Setting brightness and contrast level can also affect the quality of images obtained.

Automatic line follower Another type of automatic digitizer is then automatic line follower. This method mimics manual digitizing and uses a laser and light sensitive device to follow the lines on the map. Where scanners are raster devices, the automatic line follower is a vector device and produce output as (x, y) co ordinate strings. Difficulties may be faced when digitizing features such as dashed or contour lines . Hence arises the necessity of considerable editing and checking of data.

DATA EDITING We could expect errors from the original source as well as derived during encoding. Before the processing of data, it is essential to identify and eliminate the error, otherwise it will contaminate the GIS data base. The pre-processing of GIS data i.e. data editing can be grouped into the following: Detecting and correcting errors Reprojection, transformation and generalization. Edge matching and rubber sheeting.

ATTRIBUTE DATA EDITING Attribute data may also have some errors and it could be identify easily by manually and could compare with original data. There are many methods for checking and correcting attribute data. Some of them are: Impossible values: We could check the error value, when we know the range of data. Extreme values: We could identify the errors in the data by extreme values. Internal consistency: By tallying we could check the error in total and averages.

SPATIAL DATA FILE Spatial data files are somewhat like other files you work with on a computer. However, that is where the similarities end. Spatial data files are unique in that they store ―georeferenced information In addition, descriptive information about the georeferenced information is stored in each spatial data file. Thus rather than just text (like a word processing document) or numbers (like a spreadsheet), an individual spatial data file is a digital representation of a similar group of geographic features on the surface of the earth. The geographic features can be actual physical entities or events, or they can represent conceptual features. Examples of individual spatial data files representing real geographic features or events. Examples of individual spatial data files representing conceptual geographic features are census tract boundaries, zoning boundaries, or parcel boundaries

FORMATS OF SPATIAL DATA FILES The Shapefile spatial data file format This is a very common format for spatial data files in the vector category. b) In this format, geographic features can be represented in one of three ways: i ) Points ii) Lines (aka arcs). iii) Polygons (aka areas, polylines) The Coverage spatial data file format This was the original spatial data file format used in GIS software. Along with shapefiles, the coverage format is being superseded by the new geodatabase format

However, many web sites still offer spatial data for download in the coverage format. Just like shapefiles, geographic features are represented as points, lines, or polygons and many factors come into play when deciding which representation is best. Unlike shapefiles, a single coverage is actually comprised of two folders. Each folder contains a multitude of other files. If either folder is missing, or if files from within either folder are missing, the coverage will be ―corrupt and not useable. Coverages and shapefiles are often used almost interchangeably in GIS. They each can represent the same geographic features. It is only the internal file structure that is different.

The Grid spatial data file format In most respects, grids are very different from either shapefiles or coverages. Grids fall into the raster category. Like coverages however, grids are comprised of two folders, each containing files that the software ―puts together‖ for display and manipulation Grids can be either: i ) Integer Grids ii) Floating Point Grids

Images as Spatial Data Files Many different image formats can be used in GIS. All image formats fall within the raster category of spatial data. In some cases, images are not used specifically as ―spatial data, but are used to enhance spatial data by providing a digital photograph of a place or object. In other cases, the images themselves are spatial data. Data provided from the Landsat satellite is an example of imagery that is spatial. When an image is ―georeferenced - meaning that information is embedded within the image that describes its position on the surface of the earth in real world coordinates. In addition to being ―georeferenced‖, many images may also be ―orthorectified. Images that are both georeferenced and ortho rectified are frequently called ―orthophotographs or just ― Orthos .

DATA STRUCTURES There are two basic types of structures used to represent the features or objects, namely raster and vector data. Rasters are digital aerial photographs, imagery from satellites, digital pictures, or even scanned maps. Data stored in a raster format represents real-world phenomena, such as: Thematic data Continuous data, Pictures Thematic and continuous rasters may be displayed as data layers along with other geographic data on your map but are often used as the source data for spatial analysis with the ArcGIS Spatial Analyst extension.

RASTER DATA STRUCTURES

VECTOR DATA STRUCTURE

Advantages of Raster Data Structures Simple data structure Easy to generate (e.g. from remote sensing or scan-digitizing) Easy workflows and analysis Technology is cheap and is being energetically developed Overlay and limitation of mapped data with remotely sensed data is easy Various kinds of analytical (spatial) operations are easy Simulation is very easy as each spatial unit has the same shape and size Same set of grid cells are used for several variables Simple when doing your own programming Disadvantages of Raster Data Structures Non-adaptive data structure Tends to generate huge files, depending on resolution Cell arrangement is usually random and does not respect natural borders Limited interactivity and more primitive analysis algorithms Errors in evaluating perimeter of shape Topology or network linkages are difficult to establish Geometric transformations are difficult to handle Use of large cells to reduce data volumes result into loss of information

Advantages of Vector Data Structures Small amount of data Logical data structure Attributes are combined with objects Preserves quality after interactivity (e.g. scaling) More sophisticated in spatial analysis Topology described with network linkages Retrieval, updating and generalization of graphics and attributes are possible Widely used to described administrative zones Disadvantages of Vector Data Structures Complex data structure Continuous data is not represented effectively Spatial variability is not implicitly represented Spatial analysis and filtering within polygons is impossible Needs a lot of manual editing to get good quality It always introduces hard boundaries Simulation is difficult as each unit has different topological form Overlaying of several polygon maps or polygon and raster maps is difficult Display and plotting can be expensive.

INTRODUCTION TO ANALYSIS The analysis and modelling subsystem is the very heart of the GIS. This is, however, the most abused subsystem of GIS. The abuses range from attempts to compare noncomparable nominal spatial. data with highly precise ratio spatial data to statements about the causative nature of spatially corresponding phenomena made without testing alternative causes. Preprocessing procedures are used to convert a dataset into a form suitable for permanent storage within the GIS database for application development. Often, a large proportion of the data entered into a GIS requires some kind of processing and manipulation in order to make it conform to a data type, georeferencing system, and data structure that is compatible with the system. The end result of the preprocessing phase is a coordinated set of thematic data layers.

SPATIAL MEASUREMENT METHODS Measurements allow to produce ratios of lengths to widths and of perimeters to areas. The GIS user need to describe not only what objects are, how many objects exist, and where they are, but also how large they are, how far apart and what the distance between them is like. Calculating length, perimeters, and areas is a common application of GIS. In a raster GIS, the lengths are calculated using Pythagorean geometry

QUERIES Queries on GIS data base to retrieve data, is an essential part of most GIS projects. Queries offer a method of data retrieval, and can be performed on data that are part of GIS data base. Queries are useful at all stages of GIS analysis for checking the quality of raster GIS measurement.

BUFFERING TECHNIQUES Buffering is the creation of polygons that surround other points, lines or polygons. The user may wish to create buffers to exclude a certain amount of area around point, line or polygon, or to include only the buffer area in a study. A buffer is a polygon created as a zone of influence around an entity or around individual objects or multiple objects. The creation of buffer is based on the location, shape, characteristics of influential parameters, and orientation of an existing object. However, a buffer can be more than just a measured distance from any other two dimensional object and is controlled to some degree by the presence of friction surfaces, topography, barriers, and so on.

The creation of point buffer is very simple conceptually but poses complex computational operations. Creating buffer zones around point features or entity is a circle simply drawn around each point as centre and the area of influence under study as the radius of the circle. Line buffers can be created by measuring a specified distance in all directions from the line target object. Plate shows the example of line buffer for the alignment of pipline . An area buffer can be created by measuring the distance with an area measured from its outer perimeter.

OVERLAY ANALYSIS Map overlay is an important technique for integrating data derived from various sources and perhaps is the basic key function in GIS data analysis and modelling surfaces. Map overlay is a process by which it is possible to take two or more different thematic map layers of the same area and overlay them on top of the other to form a composite new layer. There are some fundamental differences in operations and analyses in the way map overlays are performed between the raster and vector worlds. In vector-based systems map overlay is time-consuming, complex and computationally expensive. In raster-based systems it is just quick, straightforward and efficient.

VECTOR OVERLAY CAPABILITIES Vector GIS displays the locations or all objects stored using points and arcs. Attributes and entity types can be displayed by varying colours, line patterns and point symbols. Using vector GIS, one may display only a subset of the data. Relational query is an important concept in vector overlay analysis. Different systems use different ways of formulating queries. Structured Query language ( SQl ) is used by many systems.

We wish to produce a map of major soil types from a layer that has polygons based on much more finely defined classification scheme. To do this, we process the data using three steps: ( i ) Reclassify areas by a single attribute or some combination; for instance reclassify soil areas by soil type only (ii) Dissolve boundaries between areas of the same type by deleting the arc between two polygons if the relevant attributes are the same in both polygons, (iii) Merge polygons into large objects by recording the sequence of line segments that connect to form the boundary, that is, to rebuild topology and assign new 10 numbers to each new object.

RASTER OVERLAY In the raster data structure everything is represented by gird cells. A point is represented by a single cell, a line by a string of cells and an area by a group of cells. Therefore the method of overlaying various thematic layers are different from vector overlay. Raster overlay can be performed by using map algebra or mathematics. This most important function in raster overlay, is basically an operation of entities like appropriate coding point, line and area features in the input data layers.

Two issues are specifically considered in performing raster overlay: resolution and scales/levels of measurement. These two parameters of digital data effect the results of raster over lay modelling. Consideration of these two issues is very much useful in reducing the degree of uncertainty and improving the accuracy and precision of GIS data analysis

MODELLING IN GIS Modelling is a simplified version of a concept. It is a simplified representation of a phenomenon or a complex system into simple and understandable concept of real world. It is a graphical, mathematical, physical, or verbal representation of a concept, phenomenon, relationship, structure, system, or an aspect of the real world. A modelling may have following objectives: a) To facilitate understanding by eliminating unnecessary components, b) To aid in decision making by simulating 'what if' scenarios, c) To explain, control, and predict events on the basis of past observations.

General types of Models Structural Model Structural model focuses on the composition and construction of things. There are two types of structural models: Object Model: This type of model forms a visual representation of an item. Characteristics include scaled, 2 or 3dimensional, symbolic representation. For example: an architect's blueprint of a building. Action Model : It tracks the space/time relationships of items. Characteristics include change detection, transition statistics and animation. For example: a model train along its track.

Relational Model Relational model focuses on the interdependence and relationships among factors. There are two types of Relational models: Functional Model: This model is based on Input / Output method. It tracks relationships among variables, such as storm runoff prediction. Characteristics include cause/effect linkages and sensitivity analysis. Conceptual Model : It is perception-based. It incorporates both fact interpretation and value weights, such as suitability for outdoor recreation. Characteristics include heuristics (expert rules) and scenarios.

GIS MODELS When Geographical Information System (GIS) is used in the process of building models with spatial data, it is called as GIS modelling. GIS modelling involves symbolic representation of Locational properties (Where?), as well as Thematic (What?) and Temporal (When?) attributes describing characteristics and conditions with reference to space and time. There are two types of GIS model: Cartographic Model: It is automation of manual techniques, which traditionally use drafting aids and transparent overlays, such as a map identifying locations of productive soils and gentle slopes using binary logic expressed as a geo-query. Spatial Model: Spatial model is expression of mathematical relationships among mapped variables, such as a map of crop yield throughout a field based on relative amounts of phosphorous, potassium, nitrogen and ph levels using multi-value logic expressed as variables, parameters and relationships.

Elements of GIS Modelling: A GIS model must have following elements: A set of selected spatial variables. Functional / mathematical relationship between variables. A model is related to exploratory data analysis, data visualization and data base management. GIS model can be vector based and raster based.

Digital Elevation Model (DEM) Digital Elevation Model (DEM) is the digital representation of the land surface elevation with respect to any reference datum. DEM is frequently used to refer to any digital representation of a topographic surface. DEM is the simplest form of digital representation of topography. DEMs are used to determine terrain attributes such as elevation at any point, slope and aspect. Terrain features like drainage basins and channel networks can also be identified from the DEMs. DEMs are widely used in hydrologic and geologic analyses, hazard monitoring, natural resources exploration, agricultural management etc. Three main type of structures used are the following. a) Regular square grids b) Triangulated irregular networks (TIN) c) Contours.

Creation of DEMs Several methods are available to create DEM. a) Conversion of printed contour lines The first method is conversion of printed contour lines and use it in raster or vector form. The elevation contours are "tagged" with elevations. Any other additional elevation data are created from the hydrography layer. Finally, an algorithm is used to interpolate elevations at every grid point from the contour data. b) Photogrammetry: This can be done manually or automatically: i ) Manually, an operator looks at a pair of stereo photos through a stereo plotter and must move two dots together until they appear to be one lying just at the surface of the ground ii) Automatically, an instrument calculates the parallax displacement of a large number of points.

Types of DEM: A DEM can be represented as a raster (a grid of squares, also known as a heightmap when representing elevation) or as a vector-based triangular irregular network (TIN). The TIN DEM dataset is also referred to as a primary (measured) DEM, whereas the Raster DEM is referred to as a secondary (computed) DEM.

Methods for obtaining elevation data used to create DEMs: Lidar-Light Detection And Ranging (sometimes Light Imaging, Detection, And Ranging) Stereo photogrammetry from aerial surveys Multi-view stereo applied to aerial photography Interferometry from radar data Real Time Kinematic GPS Topographic maps Theodolite or total station Doppler radar Surveying and mapping drones Range imaging.

Gridded structure: Gridded DEM (GDEM) consists of regularly placed, uniform grids with the elevation information of each grid. The GDEM thus gives a readily usable dataset that represents the elevation of surface as a function of geographic location at regularly spaced horizontal (square) grids. Measure of quality: The quality of a DEM depends on its horizontal and vertical accuracy. Accuracy of the GDEM and the size of the data depend on the grid size.

Triangular Irregular Network (TIN) structure: TIN is a more robust way of storing the spatially varying information. It uses irregular sampling points connected through non-overlapping triangles. The vertices of the triangles match with the surface elevation of the sampling point and the triangles (facets) represent the planes connecting the points.

Contour-based structure : Contours represent points having equal heights/ elevations with respect to a particular datum such as Mean Sea Level (MSL). In the contour-based structure, the contour lines are traced from the topographic maps and are stored with their location (x, y) and elevation information. These digital contours are used to generate polygons, and each polygon is tagged with the elevation information from the bounding contour.

Common uses of DEMs: Extracting terrain parameters. Modeling water flow or mass movement (for example, landslides). Creation of relief maps. Rendering of 3D visualizations Creation of physical models (including raised-relief maps). Rectification of aerial photography or satellite imagery. Reduction (terrain correction) of gravity measurements (gravimetry, physical geodesy).

EXPERT SYSTEMS (ES) Expert systems (ES) are computer systems that advise on or help solve real-world problems which would normally require a human expert's interpretation. Such systems work through problems using a computer model of expert human reasoning. Geographic data input is one area where expert systems can be used to extract features from imagery, exploit the potential of automatic scanning of manuscript maps, manage the editing of geographic data at the same time data are being captured, and assess the quality of data being entered into the system. Development of intelligent user interfaces will make GIS responsive to user needs because the user will no longer have to become an expert in the use of GIS in addition to their own field of specialization.

Characteristics of expert systems Expert systems are unique in their ability to "explain" their line of reasoning or justify conclusions reached. The ability to "explain" to be one of the fundamental criteria for cartographic expert system definition. The ability to explain the reasoning behind a conclusion implies a certain level of self-knowledge. Thus, a cartographic expert system should not only be able to make excellent maps, but also explain why specific decisions were made

Organization of expert systems Expert systems differ from conventional computer programs in their organization. Ordinary computer programs organize knowledge on two levels: data and program. Most expert systems organize knowledge on three levels: facts, rules, and inference. In the knowledge base there is declarative knowledge (Le., facts and rules) about a particular problem being solved. In conventional computer programs rules are embedded in the procedural knowledge coded as the program. Hence, it is difficult to separate the rules from the procedural, or control, mechanism of program execution (i.e., inference). A knowledge-based system would separate domain-specific rules from the procedural language used for controlling program execution. This organization makes it much easier to encode and maintain facts and rules. In fact, PROLOG, a commonly used language for building expert systems, has been described as a database programming language.

Unit - 4 Data input and Analysis

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Unit - 4 Data input and Analysis

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77