3 Data Structure in R

NishaArora1 647 views 36 slides Apr 04, 2021
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Data Structure in R


Slide Content

Dr Nisha Arora
Data Structure in R
https://www.linkedin.com/in/drnishaarora/

Contents
2
Data Structure in R
Vector
List
Factor
Matrix
Data Frame
Array
Subsetting & Basic Operations

3
Coercion
Allelementsofavectormustbethesametype,sowhenwe
attempttocombinedifferenttypestheywillbecoercedtothe
mostflexibletype.
Typesfromleasttomostflexibleare:
.
Logical
Integer
Double/ Numeric
Character

4
Coercion
Whenalogicalvectoriscoercedtoanintegerordouble,TRUE
becomes1andFALSEbecomes0
x<-c(FALSE,FALSE,TRUE);as.numeric(x)
TotalnumberofTRUEs
sum(x)
ProportionthatareTRUE
mean(x)

5
Coercion in R
✓Toforcefullycoerceavariableclassintoother,following
functionsareused
as.numeric(),as.logical(),etc.
In Python, we call it ‘typecasting’
https://youtu.be/FJ6IkFycCdA

6
Data Structure in R
✓Vector
ThebasiconedimensionaldatastructureinRisthevector
✓List
Listsaregenericvectorswhichcancontainanydatatypeor
datastructure
✓Matrix
ThebasictwodimensionaldatastructureinRisthematrix
Note:Avariablewithasinglevalueisknownasscalar.InRa
scalarisavectoroflength1

7
Data Structure in R
✓Factor
Afactorisavectorthatcancontainonlypredefinedvalues,and
isusedtostorecategoricaldata
✓DataFrame
Adataframeisalistofequal-lengthvectors.Thismakesita2-
dimensionalstructure,soitsharespropertiesofboththematrix
andthelist.
✓Array
Anarrayisann-dimensionaldatastructure.Matrixisanspecial
caseofarrayfor2dimensions.
We will discuss ‘tibble’ from tidyversein next lession

8
Vectors in R
TocreatevectorsinRusingconcatenationfunction
num_var<-c(1,2,4.5)
UsetheLsuffixtogetanintegerratherthanadouble
int_var<-c(13L,0L,10L)
UseTRUEandFALSE(orTandF)tocreatelogicalvectors
log_var<-c(TRUE,FALSE,T,F)
Usedoubleorsinglequotationtocreatecharactervector
chr_var<-c(“abc",“123")
Vectorscanalsobecreatedbyusingsequenceorscanfunction

9
Vectors in R
Tonameavector
#Assigningnamesdirectly
x<-c(Mon=37,Tue=41.4,Wed=43.2)
#Usingnames()function
x<-c(78,86,89);names(x)<-c(“chem",“phy",“math")
#UsingsetNames()function
x<-setNames(1:3,c("a","b","c"))

10
Vector Subsetting
x=c(11,42,23,14,55);
names(x)=c('ajay','ravi','john','anjali','namrata');x
x[2];x[1:3];x[5];x[7]
#x[n]gives'nth'elementofvectorx,thereareonly6elements,
sox[7]isNA
x['ajay'];x[c('ravi','namrata’)]
#Toselectelementsbynames

11
List in R
Listsaredifferentfromvectorsbecausetheirelementscanbeof
anytype,includinglists.
Wecanconstructlistsbyusinglist()insteadofc()
x<-list(1:4,"abc",c(T,T,F),c(2.3,5.9))

12
List Subsetting
https://stackoverflow.com/a/49699955/5114585
Threeways:
Usingsinglesquare
bracket‘[]’
UsingDoublesquare
bracket‘[[]]’
Calling‘$’byusing
names

13
Matrix in R
TocreatematrixinR
x=matrix(1:9,nrow=3,ncol=3)
x=matrix(1:9,3,3)#Alternateway
Tocreateamatrixbyusingbyrow
z=matrix(1:9,nrow=3,ncol=3,byrow=TRUE)
#BydefaultbyrowisFALSE,somatrixiscreatedbycolumn
a<-matrix(1:9,byrow=TRUE,nrow=3)#Alternateway

14
Matrix in R
Tocreatematrixbyusingcbind()command
one<-c(1,0,0)
two<-c(0,1,0)
three<-c(0,0,1)
b<-cbind(one,two,three)
Tocreateamatrixbyusingrbind()command
c<-rbind(one,two,three)

15
Matrix in R
Toassignnamestocolumnsandrowsofmatrix
x=cbind(c(78,85,95),c(99,91,85),c(67,62,63))
colnames(x)=c(“Jan",‘Feb',“Mar“)
rownames(x)=c(“product1”,‘product2’,‘product3’)
Otherusefulcommands
dim(x);head(x);nrow(x);ncol(x);attributes(x)
rowSums(x);colSums(x)

16
Matrix Subsetting
Tofindsubmatricesofagivenmatrix
x<-matrix(1:6,2,3)
x[1,2]#Elementoffirstrow,secondcolumn[singleelement]
x[2,1]#Elementofsecondrow,firstcolumn[singleelement]
x[2,]#Matrixofalltheelementsofsecondrow[matrix]
x[,1]#Matrixofalltheelementsoffirstcolumn[matrix]
x[1:2,3]#Elementsoffirst&secondrowforthirdcolumnonly

17
Matrix Subsetting
Tofindsubmatricesofagivenmatrix
x<-matrix(1:6,2,3)
Bydefault,whenasingleelementofamatrixisretrieved,itisreturned
asavectoroflength1ratherthana1×1matrix.
Thisbehaviourcanbeturnedoffbysettingdrop=FALSE.
x[1,2] #Singleelement
x[1,2,drop=FALSE] #Matrixofonerow&onecolumn

18
Matrix Subsetting
Tofindsubmatricesofagivenmatrix
x<-matrix(1:6,2,3)
Similarly,sub-settingasinglecolumnorasinglerowresultsina
vector,notamatrix(bydefault).
Thisbehaviourcanbeturnedoffbysettingdrop=FALSE.
x[1,] #Singlerow
x[1,,drop=FALSE] #Matrixofonerow&onecolumn

19
Matrix Subsetting
Tofindsubmatricesofagivenmatrix
x=cbind(c(78,85,95),c(99,91,85),c(67,62,63))
x[,2]
x[,2:3]
x[2,3]
x[1:2,3]

20
Factors in R
They are used for handling categorical variable, e.g., the ones
that are nominal or ordered categorical variables.
For example,
Male, Female Nominal categorical
Low, Medium, High Ordinal categorical

21
Factors in R
To create a factor in R using factor()
gender_vector <-c("Male", "Female", "Female", "Male", "Male")
factor_gender_vector <-factor(gender_vector)
Also, try levels(factor_gender_vector)
To change the levels of factor
levels(factor_gender_vector) = c(("F", "M"))
Other useful commands
summary(factor_gender_vector); table(factor_gender_vector)

22
Data frames in R
AdataframeisthemostcommonwayofstoringdatainR,andifused
systematicallymakesdataanalysiseasier.
✓Similartotables(databases),dataset(Excel/SAS/SPSS)etc.
✓Consistsofcolumnsofdifferenttypes;Moregeneralthanamatrix
✓Columns–Variables;Rows–Observations
✓Convenienttoholdallthedatarequiredforadataanalysis
✓Theyarerepresentedasaspecialtypeoflistwhereeveryelementof
thelisthastohavethesamelength
✓Dataframesalsohaveaspecialattributecalledrow.names

23
Data frames in R
✓Dataframesare,well,tables(likeinanyspreadsheetprogram).
✓Indataframesvariablesaretypicallyinthecolumns,andcasesin
therows.
✓Columnscanhavemixedtypesofdata;somecancontain
numeric,yetotherstext.
✓Ifallcolumnswouldcontainonlycharacterornumericaldata,
thenthedatacanalsobesavedinamatrix(thosearefasterto
operateon).
We will also discuss ‘tibble’ in the course.

24
Data frames in R
TocreateadataframeinR
Example_1:
df<-data.frame(x=1:3,y=c("a","b","c"))
Example_2:
height<-c(180,175,190)
weight<-c(75,82,88)
name<-c("Anil","Ankit","Sunil")
data<-data.frame(name,heigth,weight)

25
Data frames in R
TocombinedataframesinR
Example_1:usingcbind()
df<-data.frame(x=1:3,y=c("a","b","c"))
cbind(df,data.frame(z=3:1))
Example_2:usingrbind()
rbind(df,data.frame(x=10,y="z"))

26
Data frames in R
TocombinedataframesinR
Example_1:usingcbind()
df<-data.frame(x=1:3,y=c("a","b","c"))
cbind(df,data.frame(z=3:1))
Example_2:usingrbind()
rbind(df,data.frame(x=10,y="z"))

27
Data Type Conversions
Useis.foototestfordatatypefoo.
ReturnsTRUEorFALSE
Useas.footoexplicitlyconvertit.
Examples:
is.numeric(),is.character(),is.vector(),is.matrix(),is.data.frame()
as.numeric(),as.character(),as.vector(),as.matrix(),as.data.frame)

28
Handling of missing values
X<-c(1:8,NA)
✓Removingmissingvlaues
mean(X,na.rm=T)ormean(X,na.rm=TRUE)
✓Tocheckforthelocationofmissingvalueswithinavector
which(is.na(X))
✓Toassignthisalargenumber,say,999
X[which(is.na(X))]=999
For more code: follow me on GitHub

29
Handling of missing values
x<-c(1,2,NA,4,NA,5)
✓Identifymissingvalues
bad<-is.na(x)
✓Toremovemissingvalues
x[!bad]

30
Handling of missing values
x<-c(1,2,NA,4,NA,5);y<-c("a","b",NA,"d","e",NA)
df=data.frame(x,y)
✓Totakethesubsetofdataframewithnomissingvalue
good=complete.cases(x,y);good
✓Totakethesubsetofvectorxwithnomissingvalue
x[good]
✓Totakethesubsetofvectorywithnomissingvalue
y[good]

Books
31
✓ Crowley,M.J.(2007).TheRBook.Chichester,New
England:JohnWiley&Sons,Ltd.
✓ AnIntroductiontoRbyW.N.Venables,D.M.Smithand
theRCoreTeam
✓ RinaNutshellbyJosephAdler:O’Reilly
✓ Teetor,P.(2011).Rcookbook.Sebastopol,CA:O’Reilly
MediaInc.

Books
32
✓BioStatistics-https://www.middleprofessor.com/files/applied-
biostatistics_bookdown/_book/
✓AdvancedR-https://adv-r.hadley.nz/
✓DataVisualization-https://rkabacoff.github.io/datavis/
✓RforDataScience-https://r4ds.had.co.nz/index.html
✓Data Exploration & Analysis -
https://bookdown.org/mikemahoney218/IDEAR/
✓https://bookdown.org/mikemahoney218/LectureBook/

Blogs & Communities
33
http://www.r-bloggers.com/
http://www.inside-r.org/blogs
https://blog.rstudio.org/
http://www.statmethods.net/
http://stats.stackexchange.com
https://www.researchgate.net
https://www.quora.com
https://github.com

Learn To Code
34
https://www.datacamp.com/
https://www.dataquest.io/
https://www.codeschool.com/
https://guide.freecodecamp.org/r/
https://www.hackerrank.com/contests/co/
https://www.hackerearth.com/practice/
https://hackernoon.com/tagged/r
https://rpubs.com/

35
Reach Out to Me
http://stats.stackexchange.com/users/79100/learner
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
https://github.com/arora123/
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw
https://www.linkedin.com/in/drnishaarora/
[email protected]

Thank You
Tags