Introduction to r language programming.ppt

dinasaif3 13 views 32 slides Jun 16, 2024
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Introduction to R


Slide Content

Introduction to R

http://www.r-project.org/
R Project for Statistical Computing

What is R?
•R is an integrated suite of software facilities for data
manipulation, calculation, and graphical display. It
includes:
–An effective data handling and storage facility,
–A suite of operators for calculations on arrays (matrices),
–A large coherent integrated collection of intermediate
tools for data analysis,
–Graphical facilities for data analysis and display either on-
screen or on hardcopy, and
–A well-developed, simple, and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.

Installing R
Guidance on downloading and
installing R can be found on
the R home page under the
FAQ link.

CRAN Mirrors
CRAN stands for “Comprehensive R
Archive Network”
Usually you pick a CRAN mirror to
use as the source for
downloading your R code and
packages.

What is an R “package”?
•An R package is data analysis code that is written
so that it can be executed in the R environment
•Several basic statistics packages are supplied with
the basic R distribution
•Additional packages can be obtained from the
CRAN sites
–You will likely need to download additional packages
that deal specifically with microarray analysis for this
course

R Packages
•There are many
contributed packages that
can be used to extend R.
•These libraries are created
and maintained by the
authors.

Let’s Get Started…
–There should be an R icon on your
desktop

When you launch R on a
PC you should get an
interface that looks like
this screenshot. Things
may look slightly different
on a Mac.

At the prompt (>) type the
command , library(), to see
which packages are
available in your R
installation.
A library is a location where
R goes to find packages.
What is listed for you may differ from
what is shown here.

•The caret “>” is the command prompt
•‘library()’ is the command
•the paranthesesare for defining specific
operations associated with the function.

Running this command will
load all of the data sets that
come with your R
installation.

In this series of commands, we
load the BOD (Biological
Oxygen Demand) data set and
then print out the data to the
screen.

Data Frames in R
•The data sets in R are objects called data
frames
–You can think of a data frame as a table where the
columns are variables and rows are observations

Types of Objects in R
•Vector
–One-dimensional array of arbitrary length. All members of a vector must be of the
same type (numeric, alpha, etc.)
•Matrix
–Two –dimensional array with an arbitrary number of rows and columns. All elements
in a matrix must be of the same type.
•Array
–Similar to a matrix but with an arbitrary dimension
•Data frame
–Organized similar to a matrix except that each column can include its own type of
data. Not all columns in a data frame need to contain the same type of data.
•Function
–A type of R object that performs a specific operation. R contains many built-in
functions.
•List
–An arbitrary collection of R objects

Create the BOD Data Frame from Scratch
1. Create a vectorobject for time using the c() function.
2. Create a vector object for demand.
3. Use the data.frame() function to create the data frame object
> time=c(1,2,3,4,5,7)
> time
[1] 1 2 3 4 5 7
> demand=c(8.3,10.3,19.0,16.0,15.6,19.8)
> demand
[1] 8.3 10.3 19.0 16.0 15.6 19.8
> MyBOD=data.frame(time=time, demand=demand)
> MyBOD
time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8

Adding columns to a Data Frame
You can add and delete columns
from a data frame.
Here we add a column for the sex of
whatever it is we are measuring
oxygen demand for.
Oops!!! We have a data entry error. The value for
sex should all be female (F). How would you fix
this?
> sex=c('F','F','F','M','M','M')
> sex
[1] "F" "F" "F" "M" "M" "M"
> MyBOD$sex <-sex
> MyBOD
time demand sex
1 1 8.3 F
2 2 10.3 F
3 3 19.0 F
4 4 16.0 M
5 5 15.6 M
6 7 19.8 M

Deleting columns from a Data Frame
You can delete columns from a data
frame.
Here we deleted the column for sex
that we just created from the
MyBOD2 data frame.
> MyBOD$sex <-NULL
> MyBOD
time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8

Displaying Data in R
R comes with an incredible array of
built in data analysis tools for
exploring, analyzing, and visualizing
data.
Here is a plot of the Time and Demand
variables for the MyBOD2 data frame
using the plot() command.
Note that because we “attached” this
data frame we can just use the names
of the variables to access the
observation data.

Displaying Data in R
Here is a box plot of the Demand
variables for the MyBOD2 data frame
using the boxplot() command.

Analyzing Data
The summary() command provides
summary statistics for a data frame.
> summary(MyBOD)
time demand
Min. :1.000 Min. : 8.30
1st Qu.:2.250 1st Qu.:11.62
Median :3.500 Median :15.80
Mean :3.667 Mean :14.83
3rd Qu.:4.750 3rd Qu.:18.25
Max. :7.000 Max. :19.80

Analyzing Data
Here are a series of commands to
generate some basic statistics for
the Demand variable in the MyBOD
data frame.
The data frame has been attached
so that the variable names can be
used directly.

Examples of stats functions in R
•mean()
•median()
•table() –there is no function to find the modeof a data set
but the table() function will show how many times a value is
observed.
•max()
•min()
•There is no built in function for midrangeso you have to
construct a formula to calculate this based on the values from
the max() and min() functions.

Measuring data spread
Here are a series of commands to generate some
basic statistics related to the spread of
measurements for the Demand variable in the
MyBODdata frame.
The data frame has been attached so that the
variable names can be used directly.

More examples of stats functions in R
•var()
•sd()
•There is no built in function for calculating the standard error
of the mean (sem) so you have to create a formula to
calculate this.
•There is no built in function for calculating the rangeso you
have to construct a formula to calculate this based on the
values from the max() and min() functions.

What is meant by mode?
What do the variance,
standard deviation and
standard error of the mean tell
us about a data set?

Your Turn
Create a data frame for age and frequency using
the data on this slide.
Plot age versus frequency.
What are mean and median age?
What is the variance, standard deviation, and
standard error mean for frequency?

What are mean and median age?
What is the variance, standard
deviation, and standard error mean
for frequency?

Tip of the Day: The edit() function

Installing R Packages
1.In R, choose the menu item Packages -> Install Packages
2.Choose a CRAN site
3.You will see a list of Packages
4.Choose the aplpackpackage
5.Your should see a message about accessing the package and
then the message
“package ‘aplpack’ successfully installed and MD5 sums checked”
6.To load the package, type library(aplpack)
7.Run the following command:
stem.leaf(rnorm(50))

What is this command doing?
stem(rnorm(50))
Tags