R WorkShop Part-1 | R language | how to use R in datamining

rajanudeep 0 views 49 slides Oct 03, 2025
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

R WorkShop Part-1.PDF


Slide Content

R Workshop
Part I
CIT,UPES
R Workshop

Introduction
What is R?
R refer to two things
R,The programming Language
R, the piece of software that you use to run programs written in R
R (the language) was created in the early 1990s by Ross Ihaka and Robert
Gentleman, then both working at the University of Auckland. It is based
upon the S language that was developed at Bell Laboratories in the 1970s,
primarily by John Chambers. R (the software) is a GNU project, reecting
its status as important free and open source soft- ware. Both the language
and the software are now developed by a group of (currently) 21 people
known as the R Core Team.
R Workshop

Introduction
What is R?
R refer to two things
R,The programming Language
R, the piece of software that you use to run programs written in R
R (the language) was created in the early 1990s by Ross Ihaka and Robert
Gentleman, then both working at the University of Auckland. It is based
upon the S language that was developed at Bell Laboratories in the 1970s,
primarily by John Chambers. R (the software) is a GNU project, reecting
its status as important free and open source soft- ware. Both the language
and the software are now developed by a group of (currently) 21 people
known as the R Core Team.
R Workshop

Introduction
What is R?
R refer to two things
R,The programming Language
R, the piece of software that you use to run programs written in R
R (the language) was created in the early 1990s by Ross Ihaka and Robert
Gentleman, then both working at the University of Auckland. It is based
upon the S language that was developed at Bell Laboratories in the 1970s,
primarily by John Chambers. R (the software) is a GNU project, reecting
its status as important free and open source soft- ware. Both the language
and the software are now developed by a group of (currently) 21 people
known as the R Core Team.
R Workshop

R Language
R is an interpreted language which means that your code doesn't need to
be compiled before you run it.
R is a mixture of following programming paradigms
imperative language (you write a script that does one calculation after
another)
object- oriented programming (data and functions are combined inside
classes)
functional programming (functions are rst-class objects; you treat
them like any other variable, and you can call them recursively)
R Workshop

Installing R
You can download R from http://www.r-project.org.
On this page click the link that says download R in the Getting Started
pane at the bottom of the page.
IDE:
Emacs + ESS(Emacs Speaks Statistics)
Eclipse/Architect (http://www.openanalyt
ics.eu/downloads/architect)
RStudio(http://www.rstudio.org.)
Live-R(http://live-analytics.com/)
R Workshop

Basic Data Type
There are 5 types of basic data types in R
Numeric
Integer
Complex
Logical
Character
R Workshop

Numeric Data Type
Decimal values are called numerics in R. It is the default computational
data type.
> x = 10.5 # assign a decimal value
> x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric"
R Workshop

Numeric data Type
we assign an integer to a variable k, it is still being saved as a numeric
value.
> k = 1
> k # print the value of k
[1] 1
> class(k) # print the class name of k
[1] "numeric"
R Workshop

Numeric data Type
k is not an integer can be conrmed with the is.integer function
> is.integer(k) # is k an integer?
[1] FALSE
R Workshop

Integer data Type
k is not an integer can be conrmed with the is.integer function
> is.integer(k) # is k an integer?
[1] FALSE
R Workshop

Integer data type
> y = as.integer(3)
> y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
#coercion
> as.integer("5.27") # coerce a decimal string
[1] 5
> as.integer("Joe") # coerce an non-decimal string
[1] NA
Warning message:
NAs introduced by coercion
> as.integer(TRUE) # the numeric value of TRUE
[1] 1
> as.integer(FALSE) # the numeric value of FALSE
[1] 0
R Workshop

Complex Number
Complex Numbers:
> z = 1 + 2i # create a complex number
> z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex
> sqrt(-1) # square root of -1
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
> sqrt(-1+0i) # square root of -1+0i
[1] 0+1i
R Workshop

Character
A character object is used to represent string values in R. We convert
objects into character values with the as.character() function.
> x = as.character(3.14)
> x # print the character string
[1] "3.14"
> class(x) # print the class name of x
[1] "character"
Two character values can be concatenated
with the paste function.
> fname = "Joe"; lname ="Smith"
> paste(fname, lname)
[1] "Joe Smith"
#substr function
substr("Mary has a little lamb.", start=3, stop=12)
[1] "ry has a l"
R Workshop

Character
sub and gsub perform replacement of the rst and all matches respectively
> sub("little", "big", "Mary has a little lamb.")
[1] "Mary has a big lamb."
> help("sub")
R Workshop

Attributes
R objects can have attributes
names, dimnames
dimensions (e.g. matrices, arrays)
class
length
other user-dened attributes/metadata
Attributes of an object can be accessed using the attributes() function.
R Workshop

Vector
A vector is a sequence of data elements of the same basic type. Members
in a vector are ocially called components.
>c(2, 3, 5)
[1] 2 3 5
>8.5:4.5 #sequence of numbers from 8.5 down to 4.5
## [1] 8.5 7.5 6.5 5.5 4.5
>c(1, 1:3, c(5, 8), 13) #values concatenated into single vector
## [1] 1 1 2 3 5 8 13
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE
> c("aa", "bb", "cc", "dd", "ee")
[1] "aa" "bb" "cc" "dd" "ee"
> length(c("aa", "bb", "cc", "dd", "ee"))
[1] 5
#Spacial Values
NA,NaN,Inf,-Inf,NULL
R Workshop

Vectors
The vector function creates a vector of a specied type and length. Each of
the values in the result is zero, FALSE, or an empty string
vector("numeric", 5)
## [1] 0 0 0 0 0
vector("complex", 5)
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
vector("logical", 5)
## [1] FALSE FALSE FALSE FALSE FALSE
vector("character", 5)
## [1] "" "" "" "" ""
R Workshop

Sequence
The seq function create sequence.
seq(3, 12) #same as 3:12
## [1] 3 4 5 6 7 8 9 10 11 12
seq(3, 12, 2)
## [1] 3 5 7 9 11
seq(0.1, 0.01, -0.01)
## [1] 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01
###seq_len creates a sequence from 1 up to its input
seq_len(7)
[1] 1 2 3 4 5 6 7
R Workshop

Sequence
seq_along creates a sequence from 1 up to the length of its input:
pp <- c("Peter", "Piper", "picked", "a", "peck",
"of", "pickled", "peppers")
seq_along(pp)
[1] 1 2 3 4 5 6 7 8
length method calculate the length of input vector
length(1:5)
## [1] 5
length(c(TRUE, FALSE, NA))
## [1] 3
R Workshop

Name
A great feature of R's vectors is that each element can be given a name.
Labeling the elements can often make your code much more readable. You
can specify names when you create a vector in the form name = value.
c(apple = 1, banana = 2, "kiwi fruit" = 3, 4)
## apple banana kiwi fruit
## 1 2 3 4
You can add element names to a vector after its
creation using the names function:
x <- 1:4
names(x) <- c("apple", "bananas", "kiwi fruit", "")
x
## apple bananas kiwi fruit
## 1 2 3 4
R Workshop

Name
.
This names function can also be used to
retrieve the names of a vector:
names(x)
## [1] "apple" "bananas" "kiwi fruit" ""
If a vector has no element names,
then the names function returns NULL:
names(1:4)
## NULL
R Workshop

Vector Recycling
If we try to add a single number to a vector, then that number is added to
each element of the vector:
1:5 + 1
## [1] 2 3 4 5 6
1 + 1:5
## [1] 2 3 4 5 6
When adding two vectors together,
R will recycle elements in the
shorter vector to to match
the longer one:
the longer one:
1:5 + 1:15
## [1] 2 4 6 8 10 7 9 11 13 15 12
14 16 18 20
What happens If the length of the longer vector isn't a multiple of the
length of the shorter one?
R Workshop

Vector Recycling
If we try to add a single number to a vector, then that number is added to
each element of the vector:
1:5 + 1
## [1] 2 3 4 5 6
1 + 1:5
## [1] 2 3 4 5 6
When adding two vectors together,
R will recycle elements in the
shorter vector to to match
the longer one:
the longer one:
1:5 + 1:15
## [1] 2 4 6 8 10 7 9 11 13 15 12
14 16 18 20
What happens If the length of the longer vector isn't a multiple of the
length of the shorter one?
R Workshop

Vector Repeatation
rep function can be used to repeate the elements of a vector
rep(1:5, 3)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(1:5, each = 3)
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
rep(1:5, times = 1:5)
## [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
rep(1:5, length.out = 7)
## [1] 1 2 3 4 5 1 2
R Workshop

List
An R list is an ordered collection of objects.A list can be created by using
list function
l <- list(1,2,3,4,5)
parcel <- list(destination="New York",dimensions=c(2,6,9),
price=12.95)
R Workshop

Matrix
A matrix is an extension of a vector to two dimensions. A matrix is used to
represent two-dimensional data of a single type. To generate a new matrix
is we use matrix function.
> m <- matrix(data=1:12,nrow=4,ncol=3,
+ dimnames=list(c("r1","r2","r3","r4"),
+ c("c1","c2","c3")))
> m
c1 c2 c3
r1 1 5 9
r2 2 6 10
r3 3 7 11
r4 4 8 12
R Workshop

Data Frame
Data frames are used to store tabular data
They are represented as a special type of list where every element of
the list has to have the same length
Each element of the list can be thought of as a column and the length
of each element of the list is the number of rows
Unlike matrices, data frames can store dierent classes of objects in
each column (just like lists); matrices must have every element be the
same class
Data frames also have a special attribute called row.names
Data frames are usually created by calling read.table() or read.csv()
Can be converted to a matrix by calling data.matrix()
R Workshop

Factors
Factors are used to represent categorical data. Factors can be unordered or
ordered. One can think of a factor as an integer vector where each integer
has a label.
> x <- factor(c("yes", "yes", "no", "yes", "no"))
> x
[1] yes yes no yes no
Levels: no yes
> table(x)
x
no yes
2 3
R Workshop

Factors
The order of the levels can be set using the levels argument to factor().
x <- factor(c("yes", "yes", "no", "yes", "no"),
levels = c("yes", "no"))
R Workshop

Extracting elements from data stuctures
There are a number of operators that can be used to extract subsets of R
objects.
[ always returns an object of the same class as the original; can be
used to select more than one element (there is one exception)
[[ is used to extract elements of a list or a data frame; it can only be
used to extract a single element and the class of the returned object
will not necessarily be a list or data frame
$ is used to extract elements of a list or data frame by name;
semantics are similar to hat of [[.
R Workshop

.
x <- c("a", "b", "c", "c", "d", "a")
> x[1]
[1] "a"
> x[2]
[1] "b"
> x[1:4]
[1] "a" "b" "c" "c"
> x[x > "a"]
[1] "b" "c" "c" "d"
> u <- x > "a"
> u
[1] FALSE TRUE TRUE TRUE TRUE FALSE
> x[u]
[1] "b" "c" "c" "d"
R Workshop

.
Similarly, subsetting a single column or a single
row will give you a vector, not a matrix(by default).
> x <- matrix(1:6, 2, 3)
> x[1, ]
[1] 1 3 5
> x[1, , drop = FALSE]
[,1] [,2] [,3]
[1,] 1 3 5
R Workshop

Subsetting List
.
> x <- list(foo = 1:4, bar = 0.6)
> x[1]
$foo
[1] 1 2 3 4
> x[[1]]
[1] 1 2 3 4
> x$bar
[1] 0.6
> x[["bar"]]
[1] 0.6
> x["bar"]
$bar
[1] 0.6
R Workshop

Subsetting List
.
The [[ can take an integer sequence.
> x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
> x[[c(1, 3)]]
[1] 14
> x[[1]][[3]]
[1] 14
> x[[c(2, 1)]]
[1] 3.14
R Workshop

Data Frame Slicing
column slicing
by Name
by Index
Row slicing
by Name
by index
R Workshop

Vectorized Operations
Many operations in R are vectorized .
> x <- 1:4; y <- 6:9
> x + y
[1] 7 9 11 13
> x > 2
[1] FALSE FALSE TRUE TRUE
> x >= 2
[1] FALSE TRUE TRUE TRUE
> y == 8
[1] FALSE FALSE TRUE FALSE
> x * y
[1] 6 14 24 36
> x / y
[1] 0.1666667 0.2857143 0.3750000 0.4444444
R Workshop

Vectorized Operations
Many operations in R are vectorized .
> x <- matrix(1:4, 2, 2); y <- matrix(rep(10, 4), 2, 2)
> x * y ## element-wise multiplication
[,1] [,2]
[1,] 10 30
[2,] 20 40
> x / y
[,1] [,2]
[1,] 0.1 0.3
[2,] 0.2 0.4
> x %*% y ## true matrix multiplication
[,1] [,2]
[1,] 40 40
[2,] 60 60
R Workshop

Package
A package is a collection of pre- viously programmed functions, often
including functions for specic tasks.
Loading a Package
library(package Name)
Installing a package
install.packages(package name)
Default Packages
getOption("defaultPackages")
All Installed Package
(.packages())
R Workshop

Reading Data
There are a few principal functions reading data into R.
read.table, read.csv,for reading tabular data
readLines, for reading lines of a text le
source, for reading in R code les (inverse of dump)
dget, for reading in R code les (inverse of dput)
load,for reading in saved workspaces
unserialize, for reading single R objects in binary form
R Workshop

Writing Data
There are analogous functions for writing data to les
write.table
writeLines
dump
dput
save
serialize
R Workshop

Reading data by using read.table
The read.table function is one of the most commonly used functions for
reading data. It has a few important arguments:
le,
header,
sep,
colClasses,
dataset
nrows,
comment.char,
skip,
stringsAsFactors,
R Workshop

Example
For small to moderately sized datasets, you can usually call read.table
without specifying any other arguments
data <- read.table("foo.txt")
read.csv is identical to read.table except that the default separator is a
comma
R Workshop

Reading Data from .xls le
.
> library(gdata) # load gdata package
> help(read.xls) # documentation
> mydata = read.xls("mydata.xls") # read from first sheet
##or
> library(XLConnect) # load XLConnect package
> wk = loadWorkbook("mydata.xls")
> df = readWorksheet(wk, sheet="Sheet1")
R Workshop

Control Structure
A conditional expression in R has the following form:
if (condition) {
expressions
}
###OR
if (condition) {
trueExpressions
} else {
falseExpressions
}
R Workshop

Control Structure
In R, conditional statements are not vector operations. If the condition
statement is a vector of more than one logical value, only the rst item will
be used. For example:
> x <- 10
> y <- c(8, 10, 12, 3, 17)
> if (x < y) x else y
[1] 8 10 12 3 17
Warning message:
In if (x < y) x else y :
the condition has length > 1 and only the first element will be used
If you would like a vector operation, use the ifelse function instead:
> a <- c("a","a","a","a","a")
> b <- c("b","b","b","b","b")
> ifelse(c(TRUE,FALSE,TRUE,FALSE,TRUE),a,b)
[1] "a" "b" "a" "b" "a"
R Workshop

Functions
Functions are the R objects that evaluate a set of input arguments and
return an output value.
In R, function objects are dened with this syntax:
function(arguments) body
R Workshop

Functions
.
> f <- function(x,y) {x+y}
> f(1,2)
[1] 3
> g <- function(x,y=10) {x+y}
> g(1)
[1] 11
R Workshop

References
Learning R by Rechard Cotton
R in a nutshell by Joseph Adler
Roger Peng's notes
R Workshop
Tags