1.CSV stands for Comma-Separated Values.

logeswarisaravanan 6 views 42 slides Nov 01, 2025
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

CSV stands for Comma-Separated Values.�It’s a plain text file used to store tabular data (like spreadsheets


Slide Content

UNIT - I I Data Extraction Fundamentals Introduction to Tabular Formats Parsing CSV Parsing XLS with XLRD Parsing XML Introductin to JSON Getting Data into MongoDB MongoDB - CURD Database Creation, Update, Delete, Read using MongoDB Operators - $gt, $lt, $exists, $regex Querying Arrays and using $in and $all operators Changing Entries $update, $set, $unset

Data Extraction Fundamentals - ETL Extraction  gathers data from one or more sources. The process of extracting data includes locating and identifying the relevant data, then preparing to be transformed and loaded. Transformation  is where data is sorted and organized. Cleansing — such as removing missing values — also happens during this step. Depending on the destination you choose, data transformation could include data typing, JSON structures, object names, and time zones to ensure compatibility with the data destination. Loading  is the last step, where the transformed data is delivered to a central repository for immediate or future analysis.

Data Extraction Fundamentals Introduction to Tabular Format One of the simplest methods used to analyze the data and to display the data is in tabular form. In the tabular form, you get a systematic arrangement of rows and columns. The first column is used to indicate the titles and the first row is also used to indicate the same. It is very accurate as well as an easy method to display the data.

Data Extraction Fundamentals XML - D esigned to be both human- and machine-readable. The design goals of XML emphasize simplicity, generality, and usability across the Internet. Parse XML File

Data Extraction Fundamentals Parse XSL with XLRD Excel files are spreadsheet documents in which data is stored in form of tables. These files can be read in python by using the XLRD module. The retrieved data can also be filtered according to users choice. import xlrd workbook = xlrd.open_workbook (‘filename.xlsx’) worksheet = workbook.sheet_by_index ( sheet_index ) worksheet.cell_value ( row,column )

Data Extraction Fundamentals

CURD - MongoDB The  Create  operation is used to insert new documents in the MongoDB database. The  Read  operation is used to query a document in the database. The  Update  operation is used to modify existing documents in the database. The  Delete  operation is used to remove documents in the database.

CURD - MongoDB Create  operation db.collection.insertOne() db.collection.insertMany()

CURD - MongoDB Create operation db.RecordsDB.insertOne ({ name: "Marsh", age: "6 years", species: "Dog", ownerAddress : "380 W. Fir Ave", chipped: true })

CURD - MongoDB

CURD - MongoDB Create operation db.RecordsDB.insertMany ([{ name: "Marsh", age: "6 years", species: "Dog", ownerAddress : "380 W. Fir Ave", chipped: true}, { name: " Kitana ", age: "4 years", species: "Cat", ownerAddress : "521 E. Cortland", chipped: true}])

CURD - MongoDB

CURD - MongoDB Read   operation Operations allow you to supply special query filters and criteria that let you specify which documents you want.  db.collection.find() db.collection.findOne()

CURD - MongoDB Read  operation db.RecordsDB.find () db.RecordsDB.find ({"species":"Cat"}) db.{collection}. findOne ({query}, {projection})

CURD - MongoDB Update   operation  Operations operate on a single collection, and they are atomic at a single document level. An update operation takes filters and criteria to select the documents you want to update. db.collection.updateOne() - update a currently existing record and change a single document with an update operation. db.collection.updateMany() - update multiple items by passing in a list of items db.collection.replaceOne() - replace a single document in the specified collection.

CURD - MongoDB Update   operation   db.RecordsDB.updateOne ({name: "Marsh"}, {$set:{ ownerAddress : "451 W. Coffee St. A204"}}) db.RecordsDB.updateMany ({species:"Dog"}, {$set: {age: "5"}}) db.RecordsDB.replaceOne ({name: "Kevin"}, {name: "Maki"})

CURD - MongoDB Delete   operation   db.collection.deleteOne() -  remove a document from a specified collection on the MongoDB server db.collection.deleteMany() - method used to delete multiple documents from a desired collection with a single delete operation. EG: db.RecordsDB.deleteOne ({name:"Maki"}) db.RecordsDB.deleteMany ({species:"Dog"})

MongoDB - OPERATORS Offers different types of operators that can be used to interact with the database. Operators are special symbols or keywords that inform a compiler or an interpreter to carry out mathematical or logical operations. The query operators enhance the functionality of MongoDB by allowing developers to create complex queries to interact with data sets that match their applications.

MongoDB - OPERATORS Operators add Conditions { <fieldName1> : { <operator1> : <value1> }, …. } EG: {“ favfruit ” : {“$ne”: “apple”}} {“age” : {“$ gt ”: 25}} {“ eyeColor ” : {“$in”: [“blue”, “green”]}}

Comparison- OPERATORS $ gt Matches if values are greater than to the given value. $ lt Matches if values are lesser than to the given value. $ gte Matches if values are greater or equal to the given value. $ lte Matches if values are less or equal to the given value. $in Matches any of the values in an array. $ nin Matches none of the values specified in an array. $ eq Matches values that are equal to the given value. $ne Matches values that are not equal to the given value.

Comparison- OPERATORS $ gt Matches if values are greater than the given value. Syntax:  { field: { $ gt : value } } $ lt Matches if values are less than the given value. Syntax:  { field: { $ lt : value } }

MongoDB - OPERATORS db.inventory.insertMany ( [ { "item" : "nuts" , "quantity" : 30, "carrier" : { "name" : " Shipit " , "fee" : 3 } }, { "item" : "bolts" , "quantity" : 50, "carrier" : { "name" : " Shipit " , "fee" : 4 } }, { "item" : "washers" , "quantity" : 10, "carrier" : { "name" : " Shipit " , "fee" : 1 } } ] )

MongoDB - OPERATORS db.inventory.find ( { quantity: { $ gt : 20 } } ) Example output: { _id: ObjectId ("61ba25cbfe687fce2f042414"), item: 'nuts' , quantity: 30, carrier: { name: ' Shipit ' , fee: 3 } }, { _id: ObjectId ("61ba25cbfe687fce2f042415"), item: 'bolts' , quantity: 50, carrier: { name: ' Shipit ' , fee: 4 } }

MongoDB - OPERATORS db.inventory.find ( { quantity: { $ lt : 20 } } ) Example output: { _id: ObjectId ("61ba634dfe687fce2f04241f"), item: 'washers' , quantity: 10, carrier: { name: ' Shipit ' , fee: 1 } }

ELEMENT- OPERATORS Element Operators The element query operators are used to identify documents using the fields of the document.  $exists Matches documents that have the specified field. Syntax :  { field: { $exists: < boolean > } } When < boolean > is true,  $exists  matches the documents that contain the field, including documents where the field value is null. If < boolean > is false, the query returns only the documents that do not contain the field .

MongoDB - OPERATORS name:{$exists: true} name:{$exists: false} db.inventory.find ( { qty: { $exists: true}}) .count() db.inventory.find ( { qty: { $exists: true, $ nin : [ 5, 15 ] } } ) .count()

MongoDB - OPERATORS { a: 5, b: 5, c: null } { a: 3, b: null, c: 8 } { a: null, b: 3, c: 9 } { a: 1, b: 2, c: 3 } { a: 2, c: 5 }     { a: 3, b: 2 } { a: 4 } { b: 2, c: 4 } { b: 2 } { c: 6 } db.records.find ( { a: { $exists: true } } db.records.find ( { b: { $exists: false } } )

MongoDB - OPERATORS Output 1: { a: 5, b: 5, c: null } { a: 3, b: null, c: 8 } { a: null, b: 3, c: 9 } { a: 1, b: 2, c: 3 } { a: 2, c: 5 } { a: 3, b: 2 } { a: 4 }    Output 2: { a: 2, c: 5 } { a: 4 } { c: 6 }

EVALUATION - OPERATORS The MongoDB evaluation operators can evaluate the overall data structure or individual field in a document. $ regex Select documents that match the given regular expression.  SYNTAX: {<fieldname>: {“$ regex ”: /pattern/<options>}} This operator provides regular expression capabilities for pattern matching strings in the queries. Operator is used to search for the given string in the specified collection.

EVALUATION- OPERATORS Collection: employee { name: "Tony", position: "Backend developer" } { name: "Bruce", position: "frontend developer“ } { name: "Nick", position: "HR Manager“ } Query db.employee.find ({ position : {$ regex :"developer"}})

EVALUATION- OPERATORS Collection: employee { name: "Tony", position: "Backend developer" } { name: "Bruce", position: "frontend developer“ } { name: "Nick", position: "HR Manager“ } db.employee.find ({ position : {$ regex :/ ger / i } }) db.employee.find ({ name : {$ regex :/ Bru /}})

MongoDB - OPERATORS

Querying Arrays - Using $in and $all operators MongoDB’s   flexible structure makes it a popular choice for  storing  and  managing  diverse data. In  MongoDB , data is stored in collections and collections have documents that support data types like  strings ,  numbers ,  objects , and most important arrays. Arrays in MongoDB allow the users to store data in an ordered form. Efficiently querying array elements is crucial for developers to extract meaningful information from the databases.

$in operators The  $in  operator selects the documents where the value of a field equals any value in the specified array. { field: { $in: [<value1>, <value2>, ... < valueN > ] } } If the field holds an array, then the  $in  operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (for example, <value1>, <value2>, and so on).

$in operators db.inventory.insertMany ( [ { "item" : "Pens" , "quantity" : 350, "tags" : [ "school" , "office" ] }, { "item" : "Erasers" , "quantity" : 15, "tags" : [ "school" , "home" ] }, { "item" : "Maps" , "tags" : [ "office" , "storage" ] }, { "item" : "Books" , "quantity" : 5, "tags" : [ "school" , "storage" , "home" ] }] ) db.inventory.find ( { quantity: { $in: [ 5, 15 ] } })

$all operators The  $all  operator selects the documents where the value of a field is an array that contains all the specified elements. Syntax: { <field>: { $all: [ <value1> , <value2> ... ] } } { arrayField : {$all: [element1, element2]} }

$all operators db.products.insertMany ([ { "_id" : 1, "name" : " xPhone ", "price" : 799, "color" : [ "white", "black" ]} , { "_id" : 2, "name" : " xTablet ", "price" : 899, "color" : [ "white", "black", "purple" ]}, { "_id" : 3, "name" : " SmartTablet ", "price" : 899, "color" : [ "blue" ]}, { "_id" : 4, "name" : " SmartPad ", "price" : 699, "color" : [ "white", "orange", "gold", "gray" ]}, { "_id" : 5, "name" : " SmartPhone ", "price" : 599, "color" : [ "white", "orange", "gold", "gray" ]} db.products.find ({ color: { $all: ["black", "white"] }

$all operators { "_id" : 1, "name" : " xPhone ", "price" : 799, "color" : [ "white", "black" ]} , { "_id" : 2, "name" : " xTablet ", "price" : 899, "color" : [ "white", "black", "purple" ]}, db.products.find ({ color: { $all: ["black", "white"] }

Changing Entries Update(), $set & $unset operators The $set operator sets the value of a specified field(s) in a document that matches the search criteria. Syntax {   $set: {     <field1>: <value1>,     ...   } } The $unset operator performs the inverse operation of the  $set operator. That is, it removes the specified field(s) from the document that matches the search criteria. Syntax {   $unset: {     <field1>: <value1>,     ...   } }

$set operators EG db.RecordsDB.updateOne ({Designation:“Architect"}, {$set: {salary: 76000}}) db.RecordsDB.updateMany ({Designation:“ FullStack Developer"}, {$set: {salary: 88000}})

$unset operators db.products.insertMany ( [ { "item" : "chisel" , “ pid " : "C001" , "quantity" : 4, " instock " : true }, { "item" : "hammer" , “ pid " : "unknown" , "quantity" : 3, " instock " : true }, { "item" : "nails" , “ pid " : "unknown" , "quantity" : 100, " instock " : true }] ) db.products.updateOne ( { pid : "unknown" }, { $unset: { quantity: "", instock : "" } }) db.products.updateMany ( { pid : "unknown" }, { $unset: { quantity: "", instock : "" } })

$unset operators - output { item: 'chisel' , pid : 'C001' , quantity: 4, instock : true}, { item: 'hammer' , pid : 'unknown' }, { item: 'nails' , pid : 'unknown' , quantity: 100, instock : true}
Tags