Next: Quick tour of R Up: R-tutorial Previous: Installation

Subsections

Basics: Importing data

From spread-sheet (e.g. OpenOffice.org and Excel)

Easiest to transfer the data by converting the data into a simple text file. For example, spread-sheet can export the data to comma-delimited (CSV) or tab-delimited text files. tab-delimination is probably better because if some cells contains comma as a part of the data, column alignment may get screwed up with CSV.
Steps in spread-sheet
1. Select the sheet which contains the data
2. Clean up the data
  - If your data contains clumn header, make the first row to be the column header.
  - Simplify the column header.
    - Use only alphanumeric characters (A-Z, 0-9). Underscores (_) and periods (.) are ok to use.
      Remove following characters: #, comma (,), ;, :, ?, etc.
    - Remove spaces.
      E.g. use ``dryWeight'' instead of ``dry weight''.
    - Each header should be unique.
      You can't have two columns with the same name of ``weight''. You should change them to ``weight.1'' and ``weight.2''.
    - R is case-sensitive.
      ``dryWeight'' and ``dryweight'' are different.
    - Actually, it probably works even if you don't follow the above rules. During the import, these spaces and bad characters automatically get converted to '.'.
  - Check the contents of data
    - Missing data can be left as empty cells or use 'NA' (not applicable).
      These empty cells automatically get converted to 'NA', which is the correct representation of missing data in R.
3. Save as tab-delimite (or comma delimited) text file
  File -> Save As
  In ``Fomat:'', select ``Text(Tab delimited)'' (or ``CSV'')

To R environment

OK, we now have a data file with simple text format (either from some simulation or from Excel). After starting up R, type in the following commands in the R prompt ('>').

Steps in R
1. read.table()
```
dat.in  <- read.table("data.txt", header=T, sep="\t")
```
  - read.table() is a command (function) to read spread-sheet like text file.
    NOTE: All commands have this format of commandName(arguments). Arguments are separated by comma's.
  - Use ``header=T'' if the 1st row of the data in text is column names.
  - Use ``header=F'' if not. This will create a data.frame.
  - You may need to specify the full path:
```
> dat.in <- read.table("/Users/naoki/doc/analysis/data.txt", header=T)
```
    Or use setwd() to set the current working directory or the R process.
```
> getwd()                          # print out the current working directory
> setwd("/Users/naoki/doc/analysis/")
> getwd()
> dat.in <- read.table("data.txt", header=T)
```
    Or you can do the same thing if you are using GUI R.
    From the menu, select
```
Misc -> Changing Working Directory ...
```
    And navigate the file system to find the directory which CONTAINS the data file.
2. Check if you have successfully imported the data.
```
names(dat.in)
```
  This will show you the list of column names.
```
dim(dat.in)
```
  This will show you the dimension (size) of data set: number of rows and columns.
A little bit of explanation and summary:
What we have done above is that you used the command (function) read.table(), and the results of this command (data read in the way R can understand) is stored in a container called dat.in.
<- is called an assignment (it looks like an arrow). The results of read.table() is assigned to dat.in.
You can use whatever name for this storage container (I chose dat.in). Also, R can import multiple files. For example if you have two files to read in, you can store one set of data in simDat and the othe set in obsDat.
This storage container of spread-sheet like data is called data frame, which we will talk more in the next section.

Exercise: importing data

Download an example data here, open it in OpenOffice.org or Excel, and go through the process.
Image spreadsheet-data

Next: Quick tour of R Up: R-tutorial Previous: Installation

Naoki Takebayashi 2009-03-27