R Club

Data Types

posted by: Julin Maloof

Data Types

This document provides a brief introduction to Data Types: how data is represented in R. A separate document discusses Data Structures: the kinds of objects that data can be stored in.

R categorizes data as different types: numeric, character, factor, boolean/logical, date, etc.

Numeric

The numeric data type is for numbers. Numbers can be used for calculations. Some relevant functions are:

my.numbers <- c(4, 5)  # a numeric vector
sum(my.numbers)
## [1] 9
is.numeric(my.numbers)
## [1] TRUE
my.text.numbers <- c("4", "5")  # a character vector
sum(my.text.numbers)
## Error: invalid 'type' (character) of argument
is.numeric(my.text.numbers)
## [1] FALSE
as.numeric(my.text.numbers)
## [1] 4 5
sum(as.numeric(my.text.numbers))
## [1] 9

Character

The character data type is for representing text. You can tell if something is a charcter type in R because it will be enclosed in quotes.

my.text <- c("Julin", "Maloof")
my.text
## [1] "Julin"  "Maloof"
is.character(my.text)
## [1] TRUE
# converting numbers to characters
my.numbers
## [1] 4 5
as.character(my.numbers)  # note the quotation marks
## [1] "4" "5"

Factor

The factor data type is used to designate groups. Factors have associated levels that represent each possible group available in the factor.

genotypes <- factor(c("wildtype", "mutant1", "mutant2", "wildtype"))
genotypes  #note that the levels are listed and the values are not in quotes
## [1] wildtype mutant1  mutant2  wildtype
## Levels: mutant1 mutant2 wildtype
class(genotypes)
## [1] "factor"
is.factor(genotypes)
## [1] TRUE
is.character(genotypes)
## [1] FALSE
as.character(genotypes)  # note the quotes
## [1] "wildtype" "mutant1"  "mutant2"  "wildtype"
levels(genotypes)  #alpabetical by default
## [1] "mutant1"  "mutant2"  "wildtype"
nlevels(genotypes)
## [1] 3
# often you want wildtype to be the first level
genotypes <- relevel(genotypes, ref = "wildtype")
levels(genotypes)
## [1] "wildtype" "mutant1"  "mutant2"
# or maybe you want to have a custom order for everything
genotypes <- factor(genotypes, levels = c("mutant2", "wildtype", "mutant1"))
genotypes
## [1] wildtype mutant1  mutant2  wildtype
## Levels: mutant2 wildtype mutant1

Logical

The logical data type is used for true/false values

my.boolean <- c(F, T, T, F)  #you could also use TRUE and FALSE
my.boolean
## [1] FALSE  TRUE  TRUE FALSE
is.logical(my.boolean)
## [1] TRUE
# The '!' reverses the values (logical NOT)
!my.boolean
## [1]  TRUE FALSE FALSE  TRUE
# logicals can be used in extraction
genotypes[my.boolean]
## [1] mutant1 mutant2
## Levels: mutant2 wildtype mutant1
genotypes[genotypes == "wildtype"]  #here you are creating a logical inside the square brackets.
## [1] wildtype wildtype
## Levels: mutant2 wildtype mutant1
genotypes == "wildtype"
## [1]  TRUE FALSE FALSE  TRUE
# conversions:
as.numeric(my.boolean)  # 1 is true, 0 is false.  This can be useful for summing, ie
## [1] 0 1 1 0
sum(genotypes == "wildtype")
## [1] 2
as.logical(c(1, 0, 1, 0))
## [1]  TRUE FALSE  TRUE FALSE
as.character(my.boolean)
## [1] "FALSE" "TRUE"  "TRUE"  "FALSE"
# converting from text
as.logical(c("T", "True", "true", "TRUE", "F", "False", "FALSE", "false", "not logical"))
## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE    NA

Posted by Julin Maloof

comments powered by Disqus