Lecture 01 - Data, Information, knowledge, and Data Models.pptx
louisnguyenn25
56 views
54 slides
May 08, 2024
Slide 1 of 54
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
About This Presentation
MN404
Size: 3.11 MB
Language: en
Added: May 08, 2024
Slides: 54 pages
Slide Content
MN405 Data and Information Management Data, Information, knowledge, and Data Models Lecture-1
The content of this lecture slide is adopted from the following book: C. Coronel, S. Morris. Database Systems: Design, Implementation, & Management, 14th ed, Cengage Learning, 2023. Ch-1&2. Acknowledgment
Define the difference between data and information Describe what a database is, and how it is important for decision making Explain the importance of database design Understand flaws in file system data management Outline the main components of the database system Describe the main functions of a database management system (DBMS) Discuss data modeling and the basic data-modeling building blocks Define what business rules are and how they influence database design List emerging alternative data models and the needs they fulfill Lecture-1 Objectives By the end of this lecture , you should be able to: 3
4 Why Databases? 4
Data — raw facts Information — result of processing raw data — reveal context/ meaning of data Knowledge implies familiarity, awareness, and understanding of information Accurate, relevant, and timely information is the key to good decision making Data management — discipline — proper generation, storage, and retrieval of data Data versus Information (1 of 2) 5
Data versus Information (2 of 2) Figure 1.2 Transforming Raw Data into Information Compiled By: Dr Md Waliur Rahman Miah 6
Introducing the Database D atabase : shared, integrated computer structure stores a collection of: user data and Metadata Database management system (DBMS) : collection of programs manages the database structure controls access to the data stored in the database 7
Role and Advantages of DBMS (1 of 2) A DBMS provides the following advantages: Presents a single, integrated view of the data Improved data sharing Improved data security Better data integration Minimized data inconsistency Improved data access Improved decision making Increased end-user productivity 8
Role and Advantages of DBMS (2 of 2) Figure 1.4 The DBMS Manages the Interaction Between the End User and the Database 9
10 Types of Databases (1 of 3) Classification by users: Single-user database — stand-alone database o n a personal computer Multiuser database workgroup database enterprise database Classification by location centralized database distributed database cloud database
Types of Databases (2 of 3) Classification by data type General-purpose databases Discipline-specific databases operational database analytical database two main components: data warehouse online analytical processing (OLAP) 11
Classification by Data-structure: Unstructured data Structured data Semistructured XML database Types of Databases (3 of 3) You may also find: Social media database NoSQL database 12
Why Database Design is Important (1 of 3) Database design — involves planning database structure to store and manage data — Data must be properly decomposed into its constituent parts before storing. A well-designed database: facilitates data management generates accurate and valuable information A poorly designed database: causes difficult-to-trace errors lead to poor decision making 13
Why Database Design is Important (2 of 3) Figure 1.5 Employee Skills Certification in a Poor Design 14
Figure 1.6 Employee Skills Certification in a Good Design 15 Why Database Design is Important (3 of 3)
Evolution of File System Data Processing (1 of 2) Manual file systems: file folders and filing cabinets Computerized file systems: a data processing (DP) specialist created a computer-based filing system track data and produce required reports File system redux in modern end-user productivity tools Business users widely use spreadsheet programs such as Microsoft Excel to enter data in a series of rows and columns to manipulate data A common misuse of spreadsheets is as a substitute for a database Spreadsheets “file system” suffers from the same problems as the early file systems 16
Evolution of File System Data Processing (2 of 2) Table 1.2 Basic File Terminology Term Definition Data Raw facts, such as a telephone number, a birth date, a customer name, and a year-to-date (YTD) sales value. Data has little meaning unless it has been organized in some logical manner. Field A character or group of characters (alphabetic or numeric) that has a specific meaning. A field is used to define and store date. Record A logically connected set of one or more fields that describes a person, place, or thing. For example the fields that constitute a record for a customer might consist of the customer’s name, address, phone number, date of birth, credit limit, and unpaid balance. File A collection of related records. For example, a file might contain data about the students currently enrolled at Gigantic University. 17
Problems with File System Data Processing Lengthy development times Difficulty of getting quick answers Complex system administration Lack of security and limited data sharing Extensive programming 18
Structural and Data Dependence A file system exhibits: Structural dependence Data dependence Structural and data dependence makes the file system cumbersome 19
Data Redundancy Data redundancy — same data is stored unnecessarily at different places islands of information for scattered data locations may result different versions of the same data Possible results of uncontrolled data redundancy Poor data security Data inconsistency Data-entry errors Data integrity problems 20
Data Anomalies A data anomaly — not all of the required changes made successfully three types of anomalies: Update anomalies Insertion anomalies Deletion anomalies 21
Database Systems (1 of 2) Database system : logically related data stored in a single logical data repository the data might be physically distributed among multiple storage facilities DBMS eliminates most of the file system’s problems: data inconsistency, data anomaly, data dependence, and structural dependence DBMS software : stores data structures, relationships between those structures, and access paths to those structures defines, stores, and manages all access paths and components 22
Database Systems (2 of 2) Figure 1.9 Contrasting Database and File Systems 23
The Database System Environment (1 of 2) Database system -- define and regulate the collection, storage, management, and use of data The database system is composed of the following five components: Hardware Software People Procedures Data Database solutions must be cost-effective as well as tactically and strategically effective 24
The Database System Environment (2 of 2) 25 Figure 1.10 The Database System Environment
DBMS Functions (1 of 3) A DBMS performs the following important functions: Data dictionary management — definitions of data elements and relationships Data storage management — structures with p erformance tuning Data transformation and presentation — conform to required data structures and logical expectations Security management — enforces user security and data privacy 26
DBMS Functions (2 of 3) A DBMS performs the following important functions (continued): Multiuser access control – without compromising integrity Backup and recovery management – ensure data safety and integrity Data integrity management – enforces integrity rules, thus minimizing redundancy and maximizing data consistency 27
DBMS Functions (3 of 3) A DBMS performs the following important functions (continued): Database access languages and application programming interfaces – eg SQL to access data, and API for Python, Java, and C to use the database . Database communication interfaces – e.g JDBC, mysql.connector etc 28
Disadvantages of Database System Increased costs Management complexity Maintaining currency (updated to latest system)) Vendor dependence Frequent upgrade/replacement cycles 29
Data Modeling
Data modeling — Process of creating a specific data model for a problem domain — Data modeling is an iterative, progressive process Data model is a relatively — simple representation of complex real-world data structures Database designers: — use data-modeling constructs and powerful database design tools — reduce errors in database modeling Data Modeling and Data Models 31
Entity Attribute Relationship Constraints The following are three different types of relationships: One-to-many (1:M or 1..*) relationship Many-to-many (M:N or *..*) relationship One-to-one (1:1 or 1..1) relationship Data Model Basic Building Blocks (1 of 2) 32
Constraint — restriction on data ensure data integrity — expressed in the form of rules eg. : an employee’s salary must have values that are between 6,000 and 350,000 a student’s GPA must be between 0.00 and 4.00 each class must have one and only one teacher Data Model Basic Building Blocks (2 of 2) 33
Business rule precise and unambiguous description of policy, procedure, or principle within an organization Used to define entities, attributes, relationships, and constraints Examples of business rules : A customer may generate many invoices An invoice is generated by only one customer Main sources of business rules: company managers, policy makers, department managers, and written documentation Business Rules 34
“A customer may generate many invoices.” From this business rule, deduce the following: Customer and invoice are objects of interest for the business and should be represented by their respective entities There is a “generate” relationship between customer and invoice The rule above is complemented by another business rule: “ A n invoice is generated by only one customer.” The relationship then is one-to-many (1:M) Translating Business Rules into Data Model Components 35
Entity names — descriptive of the objects Attribute name — descriptive of the data represented prefix the name of an attribute with the name or abbreviation of the entity in which it occurs For example, in the CUSTOMER entity, customer’s credit limit may be called CUS_CREDIT_LIMIT A proper naming convention can help make your model self-documenting Naming Conventions 36
Hierarchical and Network Models Relational Model Entity Relationship Model Object-Oriented Model Object/Relational and XML Emerging Data Models: Big Data and NoSQL Different Data Models 37
Hierarchical model developed in the 1960s manage large amounts of data for complex manufacturing projects Hierarchical structure contains levels, or segments Segment — equivalent of a file system’s record type — higher layer — parent segment — directly beneath it is called the child The network model was created to represent complex data relationships more effectively than the hierarchical model Hierarchical and Network Models (1 of 2) 38
Concepts emerged with the network model — still used by modern data models: Schema Subschema Data manipulation language (DML) Data definition language (DDL) Hierarchical and Network Models (2 of 2) 39
Foundation of this model is a mathematical concept known as a relation relation — two-dimensional structure composed of rows and columns row represents tuple/record, and column represents an attribute Implemented through sophisticated relational database management system (RDBMS) RDBMS manages all the underlying details, — while the users sees a collection of tables in which the data is stored Relational Model 40
Database designers prefer to use simple graphical tools — to visualize entities and their relationships — simplify representation of complex database design — The entity relationship (ER) model (ERM) was developed to do this The relational data model and ERM combined to provide the foundation for complex tightly structured database design An entity relationship diagram (ERD) uses graphical representations to model database components Entity Relationship Model (1 of 3) 41
ER model components: Entity – rectangle box Attributes – texts in the box Relationships – association-lines among data (or diamond box) three ER notations: Chen notation Crow’s Foot notation Class diagram notation (part of the Unified Modeling Language (UML) ) Entity Relationship Model (2 of 3) 42
Figure 2.3 The ER Model Different Notations Entity Relationship Model (3 of 3) 43
Object-oriented data model (OODM) — both data and its relationship are contained in a single structure known as an object — OODM — basis for OODBMS (object-oriented database management system) — OODM — semantic data model — indicates meaning OODM components: [continued next slide] — Object — abstraction of a real-world entity — Attributes — properties of an object — Class Object-Oriented Model (1 of 3) 44
OODM components (continued): — Class — collection of objects with similar structure and behavior — Method — real-world action such as finding a selected PERSON’s name, changing a PERSON’s name, or printing a PERSON’s address — Class hierarchy —upside-down tree where each class has only one parent — Inheritance — ability of an object to inherit the attributes and methods of parent classes Object-oriented data models are typically depicted using Unified Modeling Language (UML) class diagrams Object-Oriented Model (2 of 3) 45
Figure 2.4 A Comparison of the OO, UML, and ER Models Object-Oriented Model (3 of 3) 46
Extended relational data model (ERDM) — adds OO model’s features within the relational database structure — ERDM DBMS is often called object/relational database management system (O/R DBMS) O/R DBMSs advantages: — model’s conceptual simplicity — data integrity — easy-to-use query language — high transaction performance — high availability, security, scalability, and expandability Applications example: OLTP and OLAP database Object/Relational DBMS 47
Internet of Things (IoT) has accelerated the rate of data growth — about 2.5 quintillion bytes of data are created daily Big Data — movement to find: — new and better ways to manage large amounts of data — web- and sensor-generated data — derive business insight from it Basic characteristic of Big Data databases: — volume, velocity, and variety, or the 3 Vs Emerging Data Models: Big Data and NoSQL (1 of 3) 48
Example of Big Data technologies: Hadoop — Java-based, open-source, high-speed, fault-tolerant distributed storage and computational framework Hadoop Distributed File System (HDFS) MapReduce NoSQL — large-scale distributed database system that stores structured and unstructured data in efficient ways Emerging Data Models: Big Data and NoSQL (2 of 3) 49
NoSQL databases general characteristics: not based on the relational model and SQL support highly distributed database architectures provide high scalability, high availability, and fault tolerance support very large amounts of sparse data geared toward performance rather than transaction consistency Emerging Data Models: Big Data and NoSQL (3 of 3) 50
Figure 2.5 The Evolution of Data Models Data Models: Summary 51
Summary (1 of 2) Define the difference between data and information Describe what a database is, the various types of databases, and why they are valuable assets for decision making Explain the importance of database design Understand flaws in file system data management Outline the main components of the database system Describe the main functions of a database management system (DBMS) 52
Summary (2 of 2) Discuss data modeling and why data models are important Describe the basic data-modeling building blocks Define what business rules are and how they influence database design Understand how the major data models evolved List emerging alternative data models and the needs they fulfill 53
Dr Md Waliur Rahman Miah [email protected] MIT Melbourne 288 La Trobe Street, Melbourne, VIC 3000, Australia Phone:+61 03 8600 6700