Recoding data using catToFactor function

Catriona · February 8, 2024, 10:43am

Hey!

I am analysing the network canvas data for my project. There are a number of variables that I need to recode to factor variable in a single column. For example, ethnicity/gender.

The code (as printed on this site) has run successfully. I have defined the cat to factor function, created the variable list and applied the function to this. This has creating columns with the relevant factor structure (e.g. a “Gender” column with factors 0 male, 1 female, 2 transgender etc), but data has not been pulled through for each case it simply records an NA for each person.

Is there any update to this code or something i have done wrong?

Pat · February 8, 2024, 3:34pm

Hi Catriona,

Yes, apologies! Below is the updated code, which we’ll be correcting on our documentation site:

catToFactor <- function(dataframe,variableName) {
    fullVariableName <- paste0(variableName,"_")
    catVariables <- grep(fullVariableName, names(dataframe), value=TRUE)
    # Check if variable exists
    if (identical(catVariables, character(0))){
      stop(paste0("Cannot find variable named -",variableName,"- in the data"))
    # Check if "true" in multiple columns of a single row
    } else if (sum(apply(dataframe[,catVariables], 1, function(x) sum(x %in% "true")>1))>0) {
      stop(paste0("Your variable -",variableName,"  - appears to take multiple values.")) }
    catValues <- sub(paste0('.*',fullVariableName), '', catVariables)
    factorVariable <- c()
    for(i in 1:length(catVariables)){
      factorVariable[dataframe[catVariables[i]]=="TRUE"] <- catValues[i]
    }
    return(factor(factorVariable,levels=catValues))
}

Best,
Pat

Catriona · February 8, 2024, 4:33pm

Pat:

catToFactor <- function(dataframe,variableName) {
    fullVariableName <- paste0(variableName,"_")
    catVariables <- grep(fullVariableName, names(dataframe), value=TRUE)
    # Check if variable exists
    if (identical(catVariables, character(0))){
      stop(paste0("Cannot find variable named -",variableName,"- in the data"))
    # Check if "true" in multiple columns of a single row
    } else if (sum(apply(dataframe[,catVariables], 1, function(x) sum(x %in% "true")>1))>0) {
      stop(paste0("Your variable -",variableName,"  - appears to take multiple values.")) }
    catValues <- sub(paste0('.*',fullVariableName), '', catVariables)
    factorVariable <- c()
    for(i in 1:length(catVariables)){
      factorVariable[dataframe[catVariables[i]]=="TRUE"] <- catValues[i]
    }
    return(factor(factorVariable,levels=catValues))
}

Brilliant - thank you Pat - issue resolved