Next: Quick tour of R
- Easiest to transfer the data by converting the data into a
simple text file. For example, spread-sheet can export the data to
comma-delimited (CSV) or tab-delimited text files. tab-delimination
is probably better because if some cells contains comma as a part of
the data, column alignment may get screwed up with CSV.
- Steps in spread-sheet
- Select the sheet which contains the data
- Clean up the data
- If your data contains clumn header, make the first row to be the column header.
- Simplify the column header.
- Use only alphanumeric characters (A-Z,
0-9). Underscores (_) and periods (.) are
ok to use.
Remove following characters: #, comma
(,), ;, :, ?, etc.
- Remove spaces.
E.g. use ``dryWeight'' instead of
- Each header should be unique.
You can't have two columns
with the same name of ``weight''. You should change them to
``weight.1'' and ``weight.2''.
- R is case-sensitive.
``dryWeight'' and ``dryweight'' are different.
- Actually, it probably works even if you don't follow the
above rules. During the import, these spaces and bad
characters automatically get converted to '.'.
- Check the contents of data
- Missing data can be left as empty cells or use 'NA' (not applicable).
These empty cells automatically get converted to 'NA', which is the correct
representation of missing data in R.
- Save as tab-delimite (or comma delimited) text file
File -> Save As
In ``Fomat:'', select ``Text(Tab delimited)'' (or ``CSV'')
OK, we now have a data file with simple text format (either from some
simulation or from Excel). After starting up R, type in the following
commands in the R prompt ('>').
- Steps in R
dat.in <- read.table("data.txt", header=T, sep="\t")
- read.table() is a command (function) to read
spread-sheet like text file.
NOTE: All commands have this format
of commandName(arguments). Arguments are separated by comma's.
- Use ``header=T'' if the 1st row of the data in text is column names.
- Use ``header=F'' if not. This will create a data.frame.
- You may need to specify the full path:
> dat.in <- read.table("/Users/naoki/doc/analysis/data.txt", header=T)
setwd() to set the current working directory or the R process.
> getwd() # print out the current working directory
> dat.in <- read.table("data.txt", header=T)
Or you can do the same thing if you are using GUI R.
From the menu, select
Misc -> Changing Working Directory ...
And navigate the file system to find the directory which
CONTAINS the data file.
- Check if you have successfully imported the data.
This will show you the list of column names.
This will show you the dimension (size) of data set: number of rows and columns.
- A little bit of explanation and summary:
What we have done above is that you used the command (function)
read.table(), and the results of this command (data read in
the way R can understand) is stored in a container called
<- is called an assignment (it looks like an arrow). The results
of read.table() is assigned to dat.in.
You can use whatever name for this storage container (I chose
dat.in). Also, R can import multiple files. For example if
you have two files to read in, you can store one set of data in
simDat and the othe set in obsDat.
This storage container of spread-sheet like data is called
data frame, which we will talk more in the next section.
Download an example
data here, open it in OpenOffice.org
or Excel, and go through the process.
Next: Quick tour of R