**Factors**

Factors are used to store categorical variables, where categorical variables are those whose value can only be one amongst a well-defined, discrete set of values. For example factor_gender is a factor that stores variables that can contain elements: "male" and "female".

**To construct a factor variable**out of a vector of values, just wrap the vector using factor(). For example:

> gender_vector = c("Male", "Female", "Female", "Male", "Male") > factor_gender_vector = factor(gender_vector) > factor_gender_vector [1] Male Female Female Male Male Levels: Female Male

Categorical variables are of two types: nominal and ordinal. factor_gender would be nominal as there is no grading from lower to higher between male and female unless you are a sexist asshole. factor_bondratings would be ordinal as there is a natural grading, where we know :

```
```

```
```

AAA > AA > A > BBB > BB > CCC > CC > C > D

`In R, the assumption in for the categorical nominal variable to be nominal. If you wish to specify ordinal, use the order and levels keywords:`

```
```

```
```

temperature_vector = c("High","Low","High","Low","Medium") factor_temperature_vector = factor(temperature_vector, order=TRUE, levels=c("Low","Medium","High")) > factor_temperature_vector [1] High Low High Low Medium Levels: Low < Medium < High

Renaming the elementsof a factor variable Use the levels() function to do this.

```
```

```
```

> survey_vector = c("M", "F", "F", "M", "M") > factor_survey_vector = factor(survey_vector) > factor_survey_vector [1] M F F M M Levels: F M > levels(factor_survey_vector) = c("Female", "Male") > factor_survey_vector [1] Male Female Female Male Male Levels: Female Male

Note that it is important to follow the correct order while naming. Using levels(factor_survey_vector) = c("Female", "Male") would have been incorrect, since I had run the code earlier to see the unnamed output being "Levels: F M"Using summary()summary() is a general R function but it's very useful with factors. For example:

```
```

```
```

> summary(factor_survey_vector) Female Male 2 3

```
```

`If a factor is nominal, then the comparison operator > becomes invalid. See the following (continuation) code for my favorite proof for the equality of sexes:`

```
```

```
```

> # Battle of the sexes: > # Male > factor_survey_vector[1] [1] Male Levels: Female Male > # Female > factor_survey_vector[2] [1] Female Levels: Female Male > # Male larger than female? > factor_survey_vector[1] > factor_survey_vector[2] '>' not meaningful for factors

`Comparison operators meaningful for ordinal categorical variables. See:`

```
```

```
```

> speed_vector = c("Fast", "Slow", "Slow", "Fast", "Ultra-fast") > # Add your code below > factor_speed_vector = factor(speed_vector, order = TRUE, levels = c("Slow", "Fast", "Ultra-fast")) > # Print > factor_speed_vector [1] Fast Slow Slow Fast Ultra-fast Levels: Slow < Fast < Ultra-fast > # R prints automagically in the right order > summary(factor_speed_vector) Slow Fast Ultra-fast 2 2 1 > compare_them = factor_speed_vector[2] > factor_speed_vector[5] > # Is data analyst 2 faster than data analyst 5? > compare_them [1] FALSE

```
```

`So Analyst 2 is not faster than Analyst 5.`

## No comments:

## Post a Comment