Next: Functions Up: R-tutorial Previous: Quick tour of R

Basics: data manipulation/massaging

misc
- case sensitive
- You can use alphanumeric characters, underscore (_) and period (.).
- ``#'' is used to add comments, so everything after ``#'' is ignored by R.
- Each command can be terminated by semi-colon (;) OR new-line.
- Number of ``spaces'' doesn't matter.
Assignment of variables (objects) and expression
Basic commands can be assignment or expression.
```
> d <- 6.2    # Assignment
> d           # Expression: display the value assigned to varidable d
> 6.2   ->  d # another way of assignment
```
Once you assign values to variables, R will remember the values in the memory. But I bet that your memory isn't as good as R's. When you forget what variables you have defined, type:
```
> ls()                # list
```
If you want R to forget the definition of variables, type:
```
> rm(d)                # remove d
```
Getting help

R comes with great online documentations. Since R is extremely feature rich, nobody can memorize all details, so it's essential to know how to find the documentation.
```
> help("rm")
> help.search("bootstrap")  # keyword search
> help.start()
```

Vectors
``c'' means concatenate.

> d <- c(2.5, 2, 4, 5)   # a vector of 4 elements
> d
> e <- c(1,1,1)          # a vector of 3 elements
> e
> f <- c(d, e)           # f formed by concatenating d and e, 7 elements
> f
> f[4]                   # extract the 4-th element
> g <- f[3:6]            # create a new vector with 3rd to 6-th elements

$\fbox{\textbf{Note well that () and [] are not interchangeable}}$

() is used to indicate functions or arithmetic groupings.
[] is used to indicate the indices of a vector.

Other ways to create vectors

> a <- rep(2, 10)
> a
> a <- seq(3, 9)
> a
> 1:10
> seq(3, 4, 0.1)

Arithmetic operations
```
> 5.1 / 3 + 2 * ((3 + 4.1)^2 - 5)
> d2 - 3
```
Operations on vectors -- vectorized operations is one of the quirks/strengths of R.
Operations are performed element by element.
```
> g <- c(1,2,3)
> g + e
> e / g
> g + 2
> g * (e + 1)
> 1:9 / g
```
Note that the two vectors can have different lengths. The shorter vector is recycled as often as needed.

Math functions
These functions are applied to each element.

> a <- c(1, 2, 3, 4, 5, 6)
> sqrt(a)
> exp(a)
> log(a)
> a^2
> sum(a)
> prod(a)
> mean(a)

Types of vectors
Numeric vectors/variables:

Examples: c(1.0, 2.1, -0.3), 2:10

Character vectors/variables:
Each element is a character string
```
> pets <- c(2,11,4)
> names(pets) <- c("cat","fish","shrimp")
> num.fur.balls <- pets["cat"]
```
Logical vectors/variables:
Contains TRUE or FALSE
- R allows manipulation of logical quantities: TRUE or FALSE.
- Additionally, logical vector can take NA (not available) for missing data.
- Generally, logical vectors are created by conditions
- Logical operators: <, <=, >, >=, ==, !=
```
> a <- 1:5
> t.or.f <- c(T, F, F, F, T)
> gt3 <- a > 3
> even <- a %% 2 == 0
> a[t.of.f]
> b <- c(0.5, 0.2, NA, 0.1)
> b <- b[! is.na(b)]         # eliminate the missing data
```

Data frames

Data frames contain rows and columns, similar to spread-sheets.

> x <- c(1, 2, 3, 4)
> y <- c(5, 6, 7, 8)
> z <- c(9, 10, 11, 12)
> dat <- data.frame(x, y, z)       # creates 4 rows, 3 columns data frame
> dat
> names(dat)                       # each column has a name
> named(dat) <- c("c1", "c2", "z") # changing the column names

Extraction of elements

> dat[2,3]         # element of row 2, column 3
> dat[2, "z"]      # same thing, but using the column name
> dat[1,]          # first row
> dat[,2]          # 2nd column
> dat[2:4,c(1,3)]  # subset, 2-4 rows and 1 & 3 columns
> dat$c1           # extracting the c1 column by name

Attach/detach
Frequently, you need to access the columns of dataframes.
attach() will make the column names visible temporarily.

> attach(dat)
> newVect <- c1 + c2   # exactly same as newVect <- dat$c1 + dat$c2
> z <- c1 * c2         # Note dat$z is not changed
> dat$z <- c1 * c2     # This changes dat$z.
> dat$modded <- z + c2 # This will add a new column with name "modded"
> detach(dat)          # Stop the attach
> c1

Manipulating data frames
cbind(): column bind, combine data frames or vectors by columns.
rbind(): row bind

> dim(dat)              # shows the number of columns and rows
> a <- c(13,14,15,16)
> dat <- cbind(dat, a)  # add 4-th column
> dat
> rbind(dat, dat[3:4,]) # extract 3-4-th rows and attach it at the end
> dat[dat[,1] > 2,]     # select the rows, whose 1st column > 2

For the comparison, you can use >, <, >=, <=, ==, !=.
Also you can use & (and), | (or), ! (not) to make logical conditions.

Getting information about variables/objects

> dim(dat)    # dimenion of the object
> ncol(dat)   # number of columns
> nrow(dat)   # number of rows
> length(a)   # length of a vector

Importing data
If you have data stored in some spread sheet format, export the data in tab delimited text format (or any other decent text formats, such as comma separated text, works).
If you want to try this, download a example data here
```
> dat.in  <- read.table("data.txt", header=T, sep="\t")
```
- Use ``header=T'' if the 1st row of the data in text is column names.
- Use ``header=F'' if not. This will create a data.frame.
- You may need to specify the full path:
```
> dat.in <- read.table("/Users/naoki/doc/analysis/data.txt", header=T)
```
  Or use setwd() to set the current working directory or the R process.
```
> getwd()                          # print out the current working directory
> setwd("/Users/naoki/doc/analysis/")
> getwd()
> dat.in <- read.table("data.txt", header=T)
```
Other types of objects
- matrices or more generally arrays: multi-dimensional generalizations of vectors.
- factors: handles categorical data (e.g. sex: female, male, or hermaphrodite).
- lists: a general form of vector in which the various elements need not be of the same type.
More advanced, but useful functions for data manipulations
- apply(), lapply(), sapply(), tapply()
- is.na(), any(), all()

Next: Functions Up: R-tutorial Previous: Quick tour of R

Naoki Takebayashi 2009-03-27