Could someone give a bit more information about how the “networkCanvasEgoUUID”, “networkCanvasCaseID”, “networkCanvasSessionID” and “networkCanvasUUID” are generated?
Are they completely generated at random?
Im asking this because we are noticing the following aspects when working on the data management of our incoming data:
“networkCanvasSessionID” don’t seem to be repeated in other csv files but the ego. What is the purpose of this then?
“networkCanvasUUID” appear more than once in our attribute files, in rows with a different “networkCanvasEgoUUID”. What is the explanation for this (I thought they were unique identifiers initially)? Can you confirm that the combination of “networkCanvasUUID” + “networkCanvasEgoUUID” gives a unique identifier?
One issue we are experiencing is when data collectors modify a session that was already uploaded in our database. The agreement we had is that when doing so, they edit the case ID to include a number - e.g. 231002_M758_2 (where “2” means version 2 basically). This makes it straightforward later to remove the previous versions of one session with a same interviewee in the ego csv, but the issue is with attribute csvs. Since no date or time information is included in these files, it’s impossible to now which nodes were registered the first time, and which ones were in the second version that was uploaded later and has the exact same networkCanvasUUID and egoUUID as the ones from the first version. I’m not sure if my explanation is clear. Please let me know if I need to clarify something.
networkCanvasEgoUUID, networkCanvasSessionID, and networkCanvasUUID are randomly generated to assist with data management. These IDs are meant to be long enough to ensure uniqueness and are also primarily designed to be read by computers, rather than being human readable. However, the networkCanvasCaseID is assigned by the researcher/interviewer after selecting a protocol in response to the pop up “Enter a Case ID”. This ID is entirely defined by the researcher but is often used to provide a more human readable ID (e.g., to quickly identify which interview needs to be restarted when an interview is interrupted). Because of this, there is no guarantee these IDs will be unique.
networkCanvasSessionID is a randomly generated ID that is associated with session meta-data (e.g., sessionStart, sessionFinish, sessionExported) and is also used to define the file names (e.g., alter attribute data is saved as “caseID_sessionID_alterType.csv”. This provides some redundant information as “networkCanvasEgoUUID.” However, the “networkCanvasEgoUUID” is the primary ID variable that is used to match a given participants responses across ego data, alter attribute data, and edge list data, as it is stored within each of these datasets.
In alter attribute files, networkCanvasUUID is the unique ID for each alter that can be linked to edge list data for that ego (i.e., networkCanvasSourceUUID and networkCanvasTargetUUID). Because these are randomly generated and quite long, I believe networkCanvasUUID’s should be unique even across egos, although @Joshua will be able to confirm this (It’s also possible this is a bug if they’re not unique?). Therefore, “networkCanvasUUID” + “networkCanvasEgoUUID” should absolutely be unique but, I believe, should also be unnecessary (i.e., because networkCanvasUUID for alter attributes should already be unique).
This one is a little more challenging without knowing more about your export and database processes. For example, it sounds like you’re exporting via CSV and merging sessions? If so, one possible solution would be exporting and storing observations as separate files by deselecting the merge sessions option, each file would be then be named with the new Case ID (e.g., caseID_sessionID_ego.csv, caseID_sessionID_alterType.csv, etc.) which could help you track new versus old data. If you can provide a little more information on your export and management process, I’d be happy to try and provide some additional guidance.
So I’ll share a screenshot hereunder where you can see duplicate “networkCanvasUUID” in pink and their corresponding “networkCanvasEgoUUID”. I had a better and here is was I noticed:
we have name generator questions using roster data, ans it seems that the same names selected lead to the same “networkCanvasUUID” in some cases.
However, not all the time, as you can see in rows 97-103, where several identical alters found in the roster data (having a unique “barcode”) do not have a same “networkCanvasUUID”. I checked and all the data entered in these lines have been entered by a same interviewer, thus coming from the same device.
On the other hand, if you look at the highlighted rows (107-108), these correspond to two alters registered in 2 different interviews coming from 2 different interviewers and who identified the same person (with same barcode) from the roster data.
We are indeed exporting via CSV and merging sessions coming from several tablets (as you can see in the screenshot in the previous post). This also all happens offline. Your suggested alternative does not sound feasible considering the amount of data we need to merge, in addition I’m not sure how it would help identifying the new versus old data of a same session since even when exporting as separate files, and then bringing all of this in one same database (in excel for instance), you would still not see any specific information about time or the case ID in the attribute files. Unless I’m missing something?
Any other ideas welcome otherwise.
Sorry! I hadn’t considered you were using a roster to nominate alters. The way the creation of the alter UUID’s works is the random numbers are generated using data from alter attributes. Consequently, alters nominated from a roster will share the networkCanvasUUID across participants/egos if they retain all the same alter attributes while the alter networkCanvasUUID will differ if attributes are modified across participants/egos. In this case, you’d want to combine the EgoUUID and alter attribute UUID if you wanted to guarantee unique alter/ego combinations.
So, we frequently save our files separately and import then into R to combine (e.g., see our example workflow here). Using software like R, it would be somewhat trivial to read the file names and assign the sessionID as a variable in the alter attribute and edge list data. This is what I had in mind, although that would take a bit of bespoke coding.
Ultimately, the software isn’t really designed to accommodate the situation where data is exported, the Case ID is changed, and then updated data from the same participant is exported again with the same EgoUUID. If you’re manually managing data in excel, I’d suggest using operating system level functionality for files (e.g., like “Date Modified” meta data) to identify the older excel file and removing the duplicate alter data with the same EgoUUID before merging in the newer data.
Is there a specific reason why not more (meta) data is standardly included in attribute files? Having indeed finish or export time or the “networkCanvasCaseID” in there would have solved our issue. And this might help other users in the future?
But maybe there’s a good reason why it’s only in the ego files?
Again, many thanks for your prompt answers!
The reason is that putting that data in the attribute files would be redundant, since the attribute files are linked to the ego file that already contains them. It would also add a significant amount of repetitive data, which would waste storage space.