Error message during Fresco import to R (data cleaning/analysis)

Hi all!

I am working on importing my team’s data into R using this tutorial. I kept running into this error message when I try defining the egor object (screenshot below):

thumbnail_Outlook-yse4pw43

This seemed to suggest that there were issues with those 2 columns, which I was initially confused by since I’d seen those columns were populated properly in the data files named attributeList_Person. So I started comparing our exported data with the demo data, and I found out that we seem to have an extra type of file? (Side-by-side below)

And turns out those extra files only contains the headers mentioned in the error message, but is blank otherwise. So I went ahead and deleted those, in the hopes of resolving the error.

The same error still came up, so I started looking more closely at our exported data. I realized there’s something weird happening with one of the participants’ file clusters – the attributeList_Person is missing completely, and they only have attributeList in the name. Plus, the file is totally blank for some reason. Maybe it’s an exporting error, or the participant didn’t answer all the questions in the protocol?

Either way, I found out which participant it was and I was able to delete their other blank file (edgeList). I tried again, but still ran into that same error. Checking other participants’ files showed that the columns referred to by the error message didn’t yield anything either, as they seem to be properly populated. Removing this participant altogether also didn’t change the error message.

Perhaps I’m barking up the wrong tree with checking the exported files, but the code is identical to the tutorial, save for the segment I replaced when I was cross-comparing the data between my team and the tutorial:

  • tutorial:
    head(alterData[,c(“nodeID”,“networkCanvasEgoUUID”,“networkCanvasUUID”,“Close”,“Drugs”,“Sex”,“Age”)])
  • mine: head(alterData[,c(“nodeID”,“networkCanvasEgoUUID”,“networkCanvasUUID”,“Connection”)])

Any pointers would be greatly appreciated & I’m happy to clarify anything!

The “undefined columns selected” error usually means you’re trying to reference a column that is not in your data frame. I’d suggest looking both at 1) the names of the columns in that data frame (e.g., colnames(alterData)) and 2) looking at the entire data frame (e.g., head(alterData)) to troubleshoot.

Best,
Pat

Were these interviews all exported at the same time, and by the same person? The reason I ask is that the suffix is added to the file name when you export normally, but it is removed if you select the “merge networks” export option. Is it possible someone used this option during export by mistake?

If not, I will investigate if there is an issue if you are able to send me your protocol file (joshua@northwestern.edu) and let me know which version of Interviewer you used and on which platform (Windows/mac/iPad/Android).

Hi Pat, thank you for the troubleshooting advice! Trying these made me realize that I was indeed referencing a column that didn’t exist, albeit a different one than the error was referencing (the nonexistent column was Connections, not nodeID or networkCanvasEgoUUID). Thank you again for your help!

Hi Joshua! Yes, these interviews were all exported by the same time by the same person. I don’t believe we selected the “merge networks” option for this particular export, though it may be of relevance that we exported this from Fresco and not Interviewer?

Thanks for the reply. I think its definitely of relevance that this was exported from Fresco, since it is so much newer. Are you able to confirm which version was used to export? It should appear as a column in the ego attribute files.

Please feel free to send over your protocol file to the email address in my last reply if you believe there is an issue with the data, and I’d be very happy to look into it for you.

The version is 1.0.0-beta.4, according to the ego file.

I’ll definitely be sending an email over just in case! Thanks so much :slight_smile:

Hi again everyone, I’ve been able to get past defining the egor object but ran into a new issue with the start of the data visualization process. What is this error referring to?

At first, I thought it had to do with the recoding categorical variables step (converting variables to a single factor column), since the error mentions the cause is in “factor.” But skipping the recoding step still pulls up the same error, so that doesn’t appear to be the issue.

Trying rlang::last_trace() yielded this:


Any pointers/advice would be greatly appreciated!!

It looks like there is something in the egoR object that isn’t working well with the conversion to a network object using the as_network function. This step is only necessary if you want an individual network from the egoR object. If that is the case, I’d start troubleshooting by taking a look at the egoR object (i.e., egorNetworkCanvas) to see if any aspect of the three data frames within that object look unusual.

Best,
Pat

As an additional suggestion, we found that there is a circumstance where you might have an empty column in your csv with the header “null”. We are working on a fix for this issue in the software itself, but on the off chance that it is causing your error please try manually deleting any columns called “null” in your data file.

1 Like

Hi Pat, thank you for this pointer! What would stand out as unusual in this instance? Here’s an example of what comes up with I look at the egoR object:

The only potential issue I can see is that the networkCanvasCaseID values appear to be NA. This was caused by a bug in Fresco 1.0.0-beta.4 that we fixed in 1.0.0-beta-5. There are a couple of things you could try to test if this is the problem:

  1. If the interviews are still in your Fresco instance and haven’t been deleted, upgrade your instance and reexport them.

  2. If they’ve been deleted, you can try programmatically assigning values in your data cleaning process. This value would be the participant label or “Anonymous Participant.”

This may not be relevant to your issue, but will still be good to use the most up to date version.

1 Like

Thank you for this pointer Caden! (Meant to reply to you specifically, but it seems I clicked the wrong “reply” button!)

I’ve upgraded my instance and re-exported everything, but the networkCanvasCaseID column is still blank for all participant ego files – except one. Like you mentioned in your second suggestion, the contents of this participant’s column comes from the “label” I’d given this participant in Fresco.

Regardless, I’ll go through our files to manually input a value in this column before retrying data visualization; will provide an update on the results!

Hi Caden! Unfortunately updating and re-exporting + manually editing the networkCanvasCaseID column is still running into the same error; screenshots of the error & subsequent attempts at troubleshooting below.

In addition to fixing the blank networkCanvasCaseID column, I removed a few extraneous blank columns as well in the hopes of getting the conversion to network object to work. Is there anything else I could adjust manually to help the conversion?

Thank you again for your continued help!!

Hi Danielle!

I unfortunately haven’t been able to reproduce the networkCanvasCaseID null issue on my end. Just to be thorough, would you please try one more time and send me a screenshot of the Ego file that shows the missing networkCanvasCaseID as well as the COMMIT_HASH column? Feel free to blur out any other columns if there is sensitive data.

If we still can’t reproduce after that, would you be willing to hop on a quick call with me at your convenience to help get to the bottom of the issue?

As for the data analysis issues, we are still trying to troubleshoot on our end and will get back to you as soon as possible.

Thanks,
Caden

Just to add that I ran through your protocol, exported data, and then ran through the data tutorial without errors just now. This all but confirms that any remaining issues you have are either to do with not having upgraded to beta 5, or having not modified some aspect of the tutorial’s example code correctly to suit your data.

If the latter is the case, make sure you consult the documentation for the egor and ggplot packages. I’ve provided the full code I ran, including modifications based on your protocol, below.

# Load egor
library(egor)
library(sna)
library(ggplot2)

# Remember to update this path!
folderPath <- paste0('~/Desktop/daniell2-export/')

# Read each type of file into list, combine into single data frame
alterData <- folderPath %>%
  list.files(full.names=TRUE,pattern="attributeList_Person.csv") %>%
  lapply(read.csv) %>%
  bind_rows()

edgelistData_Connection <- folderPath %>%
  list.files(full.names=TRUE,pattern="edgeList_Connection.csv") %>%
  lapply(read.csv) %>%
  bind_rows()

egoData <- folderPath %>%
  list.files(full.names=TRUE,pattern="ego.csv") %>%
  lapply(read.csv) %>%
  bind_rows()

catToFactor <- function(dataframe,variableName) {
    fullVariableName <- paste0(variableName,"_")
    catVariables <- grep(fullVariableName, names(dataframe), value=TRUE)
    # Check if variable exists
    if (identical(catVariables, character(0))){
      stop(paste0("Cannot find variable named -",variableName,"- in the data"))
    # Check if "true" in multiple columns of a single row
    } else if (sum(apply(dataframe[,catVariables], 1, function(x) sum(x %in% "true")>1))>0) {
      stop(paste0("Your variable -",variableName,"  - appears to take multiple values.")) }
    catValues <- sub(paste0('.*',fullVariableName), '', catVariables)
    factorVariable <- c()
    for(i in 1:length(catVariables)){
      factorVariable[dataframe[catVariables[i]]=="TRUE"] <- catValues[i]
    }
    return(factor(factorVariable,levels=catValues))
}

# List of categorical variables in our protocol to convert into factors
categoricalVariablesList <- list('relationship')

# Iterate the list and call our catToFactor function, assigning the result to a new column in our dataframe
for (variable in categoricalVariablesList) {
  alterData[variable] <- catToFactor(alterData, variable)
}

# Load the file into R
egorNetworkCanvas <- egor(alters = alterData,
           egos = egoData,
           aaties = edgelistData_Connection,
           ID.vars = list(
             ego = "networkCanvasEgoUUID",
             alter = "networkCanvasUUID",
             source = "networkCanvasSourceUUID",
             target = "networkCanvasTargetUUID"))

oneEgoNet <- as_network(egorNetworkCanvas)[[1]]
oneEgoNet%v%"vertex.names" <- oneEgoNet%v%"name"

gplot(oneEgoNet,
       usearrows = FALSE,
       label = oneEgoNet%v%"name",
       displaylabels = TRUE,
       vertex.col="#CC6677",
       edge.col="gray",
       coord = matrix(c(as.numeric(oneEgoNet%v%"sociogram_layout_x"),
                        -as.numeric(oneEgoNet%v%"sociogram_layout_y")),
                        nrow=length(unique(oneEgoNet%v%"name")),
                        ncol=2))
1 Like

Hi Caden!

All participant files are missing info in the networkCanvasCaseID column, here’s an example after I made sure our Fresco was up to date + re-exported all data: (Let me know if you need screenshots from all the other participants as well)

I’m absolutely open to hopping onto a call! And thank you again for your help – we have really appreciated you + your team’s responsiveness through all this!

Hi Joshua!

Thank you for the code! I did change the folderPath code to folderPath <- paste0("C:/Users/daniell2/Downloads/network_canvas_7_18") to get the correct location of the files on my desktop. The error I’m still running into happens at this step, which seems to be the same one as the error I ran into back in my June 20th post:

Like before, I’ve manually edited the data files so that the networkCanvasCaseID columns have a value in them and removed columns with the “null” header.

Thank you for your help!!