Data Types
This document provides a brief introduction to Data Types: how data is represented in R. A separate document discusses Data Structures: the kinds of objects that data can be stored in.
R categorizes data as different types: numeric, character, factor, boolean/logical, date, etc.
Numeric
The numeric data type is for numbers. Numbers can be used for calculations. Some relevant functions are:
my.numbers <- c(4, 5) # a numeric vector
sum(my.numbers)
## [1] 9
is.numeric(my.numbers)
## [1] TRUE
my.text.numbers <- c("4", "5") # a character vector
sum(my.text.numbers)
## Error: invalid 'type' (character) of argument
is.numeric(my.text.numbers)
## [1] FALSE
as.numeric(my.text.numbers)
## [1] 4 5
sum(as.numeric(my.text.numbers))
## [1] 9
Character
The character data type is for representing text. You can tell if something is a charcter type in R because it will be enclosed in quotes.
my.text <- c("Julin", "Maloof")
my.text
## [1] "Julin" "Maloof"
is.character(my.text)
## [1] TRUE
# converting numbers to characters
my.numbers
## [1] 4 5
as.character(my.numbers) # note the quotation marks
## [1] "4" "5"
Factor
The factor data type is used to designate groups. Factors have associated levels that represent each possible group available in the factor.
genotypes <- factor(c("wildtype", "mutant1", "mutant2", "wildtype"))
genotypes #note that the levels are listed and the values are not in quotes
## [1] wildtype mutant1 mutant2 wildtype
## Levels: mutant1 mutant2 wildtype
class(genotypes)
## [1] "factor"
is.factor(genotypes)
## [1] TRUE
is.character(genotypes)
## [1] FALSE
as.character(genotypes) # note the quotes
## [1] "wildtype" "mutant1" "mutant2" "wildtype"
levels(genotypes) #alpabetical by default
## [1] "mutant1" "mutant2" "wildtype"
nlevels(genotypes)
## [1] 3
# often you want wildtype to be the first level
genotypes <- relevel(genotypes, ref = "wildtype")
levels(genotypes)
## [1] "wildtype" "mutant1" "mutant2"
# or maybe you want to have a custom order for everything
genotypes <- factor(genotypes, levels = c("mutant2", "wildtype", "mutant1"))
genotypes
## [1] wildtype mutant1 mutant2 wildtype
## Levels: mutant2 wildtype mutant1
Logical
The logical data type is used for true/false values
my.boolean <- c(F, T, T, F) #you could also use TRUE and FALSE
my.boolean
## [1] FALSE TRUE TRUE FALSE
is.logical(my.boolean)
## [1] TRUE
# The '!' reverses the values (logical NOT)
!my.boolean
## [1] TRUE FALSE FALSE TRUE
# logicals can be used in extraction
genotypes[my.boolean]
## [1] mutant1 mutant2
## Levels: mutant2 wildtype mutant1
genotypes[genotypes == "wildtype"] #here you are creating a logical inside the square brackets.
## [1] wildtype wildtype
## Levels: mutant2 wildtype mutant1
genotypes == "wildtype"
## [1] TRUE FALSE FALSE TRUE
# conversions:
as.numeric(my.boolean) # 1 is true, 0 is false. This can be useful for summing, ie
## [1] 0 1 1 0
sum(genotypes == "wildtype")
## [1] 2
as.logical(c(1, 0, 1, 0))
## [1] TRUE FALSE TRUE FALSE
as.character(my.boolean)
## [1] "FALSE" "TRUE" "TRUE" "FALSE"
# converting from text
as.logical(c("T", "True", "true", "TRUE", "F", "False", "FALSE", "false", "not logical"))
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE NA
Posted by Julin Maloof