UNIT 3: INPUT, STORAGE AND EDITING AND GIS GEO 382: GEOGRAPHICAL INFORMATION SYSTEMS GEOLOGY AND GEOPHYSICS DEPARTMENT KING SAUD UNIVERSITY SOURCES OF DATA As previously identified, two types of data are input into a GIS, spatial and attribute. The data input process is the operation of encoding both types of data into the GIS database formats. The creation of a clean digital database is the most important and time consuming task upon which the usefulness of the GIS depends. The establishment and maintenance of a robust spatial database is

the cornerstone of a successful GIS implementation. The digital data is the most expensive part of the GIS. Generally 60 to 80 % of the cost incurred during implementation of GIS technology lies in data acquisition, data compilation and database development. SOURCES OF DATA A wide variety of data sources exist for both spatial and attribute data. The most common general sources for spatial data are: hard copy maps

aerial photographs; remotely-sensed imagery point data samples from surveys; and existing digital data files.

SOURCES OF DATA Existing hard copy maps, e.g. sometimes referred to as analogue maps, provide the most popular source for any GIS project. Potential users should be aware that while there are many private sector firms specializing in providing digital data, but government agencies are an excellent source of data. Attribute data has an even wider variety of data sources. Any textual or tabular data than can be referenced to a geographic feature, e.g. a point, line, or area, can be input into a GIS. Attribute data is usually input by manual keying or via a bulk loading utility of the DBMS software. ASCII format is a de facto standard for the transfer and conversion of attribute information. DATA INPUT TECHNIQUES

Since the input of attribute data is usually quite simple, the discussion of data input techniques will be limited to spatial data only. There is no single method of entering the spatial data into a GIS. Rather, there are several, mutually compatible methods that can be used singly or in combination. The choice of data input method is governed largely by the application, the available budget, and the type and the complexity of data being input. DATA INPUT TECHNIQUES There are at least four basic procedures for inputting spatial data into a GIS. These are: 1. Keyboard entry

2. Manual digitizing; 3. Automatic scanning; 4. Entry of coordinates using coordinate geometry (COGO); and 5. Conversion of existing digital data. DATA INPUT TECHNIQUES (KEYBOARD ENTRY) The keyboard entry techniques is the first technique and involves manually entering the data at a computer terminal. Attribute data are commonly input by keyboard whereas spatial data are rarely input this way. Keyboard entry may also be used during manual digitizing to enter the attribute information.

DATA INPUT TECHNIQUES (DIGITIZING) The second technique used for the entry of spatial data is manual digitzing. A digitizer is an electronic device consisting of a table upon which the map or drawing is placed. The user traces the spatial features with a hand-held magnetic pen, often called a mouse or cursor. While tracing the features the coordinates of selected points, e.g. vertices, are sent to the computer and stored. All points that are recorded are registered against positional

control points, usually the map corners, that are keyed in by the user at the beginning of the digitizing session. DATA INPUT TECHNIQUES (DIGITIZING) The coordinates are recorded in coordinate system or map projection. a user defined Latitude and longitude and UTM is most often

used. The ability to adjust or transform data during digitizing from one projection to another is a desirable function of the GIS software. Numerous functional techniques exist to aid the operator in the digitizing process. DATA INPUT TECHNIQUES (DIGITIZING) Digitizing can be done in a point mode, where single points are recorded one at a time, or in a stream mode, where a point is collected on regular intervals of time or distance, measured by an X and Y movement, e.g. every 3

metres. Digitizing can also be done blindly or with a graphics terminal. Blind digitizing infers that the graphic result is not immediately viewable to the person digitizing. Most systems display the digitized linework as it is being digitized on an accompanying graphics terminal. DATA INPUT TECHNIQUES (DIGITIZING) Manual digitizing has many advantages. These include: Low capital cost, e.g. digitizing tables are cheap;

Low cost of labour; Flexibility and adaptability to different data types and sources; Easily taught in a short amount of time - an easily mastered skill

Generally the quality of data is high; Digitizing devices are very reliable and most often offer a greater precision that the data warrants; and Ability to easily register and update existing data. DATA INPUT TECHNIQUES (AUTOMATIC SCANNING) Third technique input of spatial data is Automatic scanning. A variety of scanning devices exist for the automatic capture of spatial data. While several different technical approaches exist in scanning technology, all have the advantage of being able to capture spatial features from a map at a

rapid rate of speed. However, as of yet, scanning has not proven to be a viable alternative for most GIS implementation. Scanners are generally expensive to acquire and operate. As well, most scanning devices have limitations with respect to the capture of selected features, e.g. text and symbol recognition. Experience has shown that most scanned data requires a substantial amount of manual editing to create a clean data layer. DATA INPUT TECHNIQUES (AUTOMATIC SCANNING) Given these basic constraints some other practical limitations of scanners should be identified. These include :

hard copy maps are often unable to be removed to where a scanning device is available, e.g. most companies or agencies cannot afford their own scanning device and therefore must send their maps to a private firm for scanning; hard copy data may not be in a form that is viable for effective scanning, e.g. maps are of poor quality, or are in poor condition; geographic features may be too few on a single map to make it practical, costjustifiable, to scan;

often on busy maps a scanner may be unable to distinguish the features to be captured from the surrounding graphic information, e.g. dense contours with labels; with raster scanning there it is difficult to read unique labels (text) for a geographic feature effectively; and scanning is much more expensive than manual digitizing, considering all the cost/performance issues. DATA INPUT TECHNIQUES (AUTOMATIC SCANNING)

Consensus within the GIS community indicates that scanners work best when the information on a map is kept very clean, very simple, and uncluttered with graphic symbology. Currently, general consensus is that the quality of data captured from scanning devices is not substantial enough to justify the cost of using scanning technology. DATA INPUT TECHNIQUES COORDINATE GEOMETRY A fourth technique for the input of spatial data involves the calculation and entry of coordinates using coordinate geometry (COGO) procedures.

This involves entering survey data using a keyboard. From these data the coordinate of spatial features are calculated. This produces a very high level of precision and accuracy which is needed in a cadastral system (land record map). However this technique is very costly and labour intensive. DATA INPUT AND TECHNIQUES (CONVERSION OF EXISTING DIGITAL DATA) A fifth technique that is becoming increasingly popular for data input is the conversion of existing digital data.

A variety of spatial data, including digital maps, are openly available from a wide range of government and private sources. The most common digital data to be used in a GIS is data from CAD systems. A number of data conversion programs exist, mostly from GIS software vendors, to transform data from CAD formats to a raster or topological GIS data format. Most GIS software vendors also provide an ASCII data exchange format specific to their product, and a programming subroutine library that will allow users to write their own data conversion routines to fulfil their own specific needs. DATA INPUT AND TECHNIQUES CONVERSION OF EXISTING DIGITAL DATA

Government agencies are usually a good source for technical information on data conversion requirements. Some of the data formats common to the GIS marketplace are IGDS Interactive Graphics Design Software (Intergraph / Microstation), DLG - Digital Line Graph (US Geological Survey), DXF - Drawing Exchange Format (Autocad), GENERATE - ARC/INFO Graphic Exchange Format, EXPORT - ARC/INFO Export Format . Please note that most formats are only utilized for graphic data. Attribute data is usually handled as ASCII text files.

DATA EDITING AND QUALITY ASSURANCE Data editing and verification is in response to the errors that arise during the encoding of spatial and non-spatial data. The editing of spatial data is a time consuming, interactive process that can take as long, if not longer, than the data input process itself. ERRORS DURING DATA INPUT Incompleteness of the spatial data

This includes missing points, line segments, and/or polygons. Locational placement errors of spatial data These types of errors usually are the result of careless digitizing or poor quality of the original data source. Distortion of the spatial data This kind of error is usually caused by base maps that are not scale-correct over the whole image, e.g. aerial photographs, or from material stretch, e.g. paper documents. ERRORS DURING DATA INPUT

Incorrect linkages between spatial and attribute data This type of error is commonly the result of incorrect unique identifiers (labels) being assigned during manual key in or digitizing. This may involve the assigning of an entirely wrong label to a feature, or more than one label being assigned to a feature. Attribute data is wrong or incomplete. Often the attribute data does not match exactly with the spatial data. This is because they are frequently from independent sources and

often different time periods. Missing data records or too many data records are the most common problems. SPATIAL DATA ERRORS A variety of common data problems occur in converting data into a topological structure. These originate from the original quality of the source data and the characteristics of the data capture process. Usually data is input by digitizing. Digitizing allows a user to trace spatial data from a hard copy product, e.g. a map, and have it recorded by the computer software. Most GIS software has utilities to clean the data and build a topologic structure. The cleaning process can be very lengthy.

Experience indicates that in the course of any GIS project 60 to 80 % of the time required to complete the project is involved in the input, cleaning, linking, and verification of the data. ATTRIBUTE DATA ERRORS The identification of attribute data errors is usually not as simple as spatial errors. This is especially true if these errors are attributed to the quality or reliability of the data. Errors as such usually do not surface until later on in the GIS processing. Solutions to these type of problems are much more complex and often do not exist entirely. Simple errors of linkage, e.g. missing or duplicate records, become evident during the linking operation between spatial and attribute data. Again, most GIS software contains functions that check for and clearly identify

problems of linkage during attempted operations. This is also an area of consideration when evaluating GIS software. DATA VERIFICATION Six clear steps stand out in the data editing and verification process for spatial data. These are: 1) Visual review: This is usually by check plotting. 2) Cleanup of lines and junctions: This process is usually done by software first and interactive editing second. 3) Weeding of excess coordinates: This process involves the removal of redundant vertices by the software for linear and/or polygonal features. 4) Correction for distortion and warping: Most GIS software has functions for scale correction and rubber sheeting. However, the distinct rubber sheet algorithm used will vary depending on the spatial data model, vector or raster, employed by the GIS. Some raster techniques may be more intensive than vector based algorithms.

5) Construction of polygons: Since the majority of data used in GIS is polygonal, the construction of polygon features from lines/arcs is necessary. Usually this is done in conjunction with the topological building process. 6) The addition of unique identifiers or labels: Often this process is manual. However, some systems do provide the capability to automatically build labels for a data layer. DATA VERIFICATION These data verification steps occur after the data input stage and prior to or during the linkage of the spatial data to the attributes. Data verification ensures the integrity between the spatial and attribute data. Verification should include some brief querying of attributes and cross checking against known values.

ORGANIZING DATA FOR ANALYSIS Most GIS software organizes spatial data in a thematic approach that categorizes data in vertical layers. Typical layers used in natural resource management agencies or companies include forest cover, soil classification, elevation, road network (access), ecological areas, hydrology, etc. Spatial data layers are commonly input one at a time, e.g. forest cover. Accordingly, attribute data is entered one layer at a time. ORGANIZING DATA FOR

ANALYSIS Most often, the spatial and attribute data may be entered at different times and linked together later. The clear identification of the requirements for any GIS project is necessary before any data input procedures, and/or layer definitions, should occur. It is mandatory that GIS users fully understand their needs before undertaking a GIS project. VERTICAL DATA ORGANIZATION SPATIAL DATA LAYERS VERTICAL DATA ORGANIZATION In most GIS software data is organized in themes as data layers. This approach allows data to be input as separate themes

and overlaid based on analysis requirements. This can conceptualized as vertical layering the characteristics of the earth's surface. In any GIS project a variety of data layers will be required. These must be identified before the project is started and a priority given to the input or digitizing of the spatial data layers. When considering the physical requirements of the GIS software it is important to understand that two types of data are required for each layer, attribute and spatial data. EDITING AND UPDATING OF DATA Perhaps the primary function in the data storage and retrieval subsystem involves the editing and updating of data. Frequently, the following data editing capabilities are

required: interactive editing of spatial data; interactive editing of attribute data; the ability to add, manipulate, modify, and delete both spatial features and attributes (independently or simultaneously) ; and the ability to edit selected features in a batch processing mode. EDITING AND UPDATING OF DATA

Updating implies the resurvey and processing of new information. The updating function is of great importance during any GIS project. However, often periodic data updates are required. These frequently involve an increased accuracy and/or detail of the data layer. Depending on the particular GIS, the update process may involve some data manipulation and analysis functions. DATA RETRIEVAL AND QUERYING The ability to query and retrieve data based on some user defined criteria is a necessary feature of the data storage and retrieval subsystem. Data retrieval involves the capability to easily select data for graphic or attribute editing, updating, querying, analysis and/or

display. The ability to retrieve data is based on the unique structure of the DBMS and command interfaces are commonly provided with the software. Most GIS software also provides a programming subroutine library, or macro language, so the user can write their own specific data retrieval routines if required. Querying is the capability to retrieve data, usually a data subset, based on some user defined formula. Many GIS software offerings have attempted to standardize their querying capability by use of a Standard Query Language (SQL).

Recently Viewed Presentations

  • Window to My Environment Tom Brody Region 5

    Window to My Environment Tom Brody Region 5

    PowerPoint Presentation Last modified by: Government User ... Times New Roman Tahoma WP IconicSymbolsA Arial Unicode MS Arial Default Design Window to My Environment No Slide Title No Slide Title No Slide Title No Slide Title No Slide Title No...
  • The realm of unknown: implicit knowledge

    The realm of unknown: implicit knowledge

    Brain death Devastation of neocortex Permanent cessation of ''those higher functions of the nervous system that demarcate man from the lower primates Permanent (after 3-12 months) Some Alzheimer's diseases, anencephalic neonates Minimally Conscious State The border between VS and MCS...
  • Vietnam's Free Trade Agreements

    Vietnam's Free Trade Agreements

    FTA network - Indonesia. Japan EPA. India CECA** Trade Preferential System of the Organization of the Islamic Conference* Preferential Tariff Arrangement-Group of Eight Developing Countries. Pakistan FTA. Australia CEPA** European Free Trade Association FTA** Chile FTA** R.O.K. FTA** Total: 17...
  • PowerPoint-Präsentation - CSC

    PowerPoint-Präsentation - CSC

    Reactome A database of biological pathways David Croft Reaction Example 1: Enzymatic Reaction Example 2: Transport Reaction Example 3: Signaling A Reactome Pathway Reactome Data Model Events: Reactions Pathways Entities: Proteins Complexes Small compounds Modulation: Inhibition Activation Where Reactome's Data...
  • Cisco Presentation Guide

    Cisco Presentation Guide

    Cisco is also frequently seen as arrogant and inflexible and the high cost of the products is not always seen as worth the money. Router- Color and subdued Router w/Silicon Switch Protocol Translator CiscoWorks Workstation Terminal Server Access Server NetFlow...
  • Site Overview LTER Site Name - US Long Term Ecological ...

    Site Overview LTER Site Name - US Long Term Ecological ...

    Web sites in different content management systems, dependent on databases that are difficult to maintain under the current data management structure.
  • Communication, Terminology, and Abbreviation

    Communication, Terminology, and Abbreviation

    Communication, Terminology, and Abbreviation Acharaporn Sripusanapan, RN, PhD Faculty of Nursing, Chiang Mai University June 8, 2007 Objectives After completion of this topic, learners should be able to explain the meaning of communication list the significance of terminology used in...
  • Forces in Fluids - RPDP

    Forces in Fluids - RPDP

    Forces in Fluids Chapter 11 Why don't you sink into the snow when you wear snow shoes? Because the size of the area over which the force is distributed has changed. Pressure = Force/Area Pressure is equal to the force...