Factors
Factors are used to store categorical variables, where categorical variables are those whose value can only be one amongst a well-defined, discrete set of values. For example factor_gender is a factor that stores variables that can contain elements: "male" and "female".
To construct a factor variable out of a vector of values, just wrap the vector using factor(). For example:
Factors are used to store categorical variables, where categorical variables are those whose value can only be one amongst a well-defined, discrete set of values. For example factor_gender is a factor that stores variables that can contain elements: "male" and "female".
To construct a factor variable out of a vector of values, just wrap the vector using factor(). For example:
> gender_vector = c("Male", "Female", "Female", "Male", "Male") > factor_gender_vector = factor(gender_vector) > factor_gender_vector [1] Male Female Female Male Male Levels: Female Male
Categorical variables are of two types: nominal and ordinal. factor_gender would be nominal as there is no grading from lower to higher between male and female unless you are a sexist asshole. factor_bondratings would be ordinal as there is a natural grading, where we know :
AAA > AA > A > BBB > BB > CCC > CC > C > D
In R, the assumption in for the categorical nominal variable to be nominal. If you wish to specify ordinal, use the order and levels keywords:
temperature_vector = c("High","Low","High","Low","Medium") factor_temperature_vector = factor(temperature_vector, order=TRUE, levels=c("Low","Medium","High")) > factor_temperature_vector [1] High Low High Low Medium Levels: Low < Medium < High
Renaming the elements of a factor variable Use the levels() function to do this.
> survey_vector = c("M", "F", "F", "M", "M") > factor_survey_vector = factor(survey_vector) > factor_survey_vector [1] M F F M M Levels: F M > levels(factor_survey_vector) = c("Female", "Male") > factor_survey_vector [1] Male Female Female Male Male Levels: Female Male
Note that it is important to follow the correct order while naming. Using levels(factor_survey_vector) = c("Female", "Male") would have been incorrect, since I had run the code earlier to see the unnamed output being "Levels: F M" Using summary() summary() is a general R function but it's very useful with factors. For example:
> summary(factor_survey_vector) Female Male 2 3
If a factor is nominal, then the comparison operator > becomes invalid. See the following (continuation) code for my favorite proof for the equality of sexes:
> # Battle of the sexes: > # Male > factor_survey_vector[1] [1] Male Levels: Female Male > # Female > factor_survey_vector[2] [1] Female Levels: Female Male > # Male larger than female? > factor_survey_vector[1] > factor_survey_vector[2] '>' not meaningful for factors
Comparison operators meaningful for ordinal categorical variables. See:
> speed_vector = c("Fast", "Slow", "Slow", "Fast", "Ultra-fast") > # Add your code below > factor_speed_vector = factor(speed_vector, order = TRUE, levels = c("Slow", "Fast", "Ultra-fast")) > # Print > factor_speed_vector [1] Fast Slow Slow Fast Ultra-fast Levels: Slow < Fast < Ultra-fast > # R prints automagically in the right order > summary(factor_speed_vector) Slow Fast Ultra-fast 2 2 1 > compare_them = factor_speed_vector[2] > factor_speed_vector[5] > # Is data analyst 2 faster than data analyst 5? > compare_them [1] FALSE
So Analyst 2 is not faster than Analyst 5.
No comments:
Post a Comment