Statisticians use the term âfactorsâ to describe categorical variables, or enums. They are so essential that R coerces all character strings to be factors by default.
Why do we need factor variables to begin with? Because of modeling functions like âlm()â and âglm()â. Modeling functions need to treat expand categorical variables into individual dummy variables, so that a categorical variable with 5 levels will be expanded into 4 different columns in your modeling matrix. Thereâs no way for R to know it should do this unless it has some extra information in the form of the factor class. From this point of view, setting âstringsAsFactors = TRUEâ when reading in tabular data makes total sense. If the data is just going to go into a regression model, then R is doing the right thing.