help to interpret JSON data privacy download (.gz / JSON ) file | XM Community
Skip to main content

Hi All,
Just testing out the data privacy request process where it prepares a compressed .gz file that you have to download, then extract (using 7zip) and then you get the json file.

The problem is what to do with the JSON file as it's a specific structure / schema that I can't work out. Can anyone help?

I've got as far as trying to use Power Query in excel to convert it but I'm stuck at the point of drilling down / expanding the records into tables. The responses are for one person but across multiple surveys and so merging them into one columns won't work.

To be honest I'm not that interested in the response but I'm more interested in just seeing at top level what surveys the person was found in but I guess it's all or nothing.

Hope someone can help on this. Really surprised that on this page - https://www.qualtrics.com/support/survey-platform/sp-administration/data-privacy-tab/right-to-erasure/#Download
Qualtrics tell you that it'll be a JSON file and then leave you to it! There is no help in how you interpret the data file!!

Hope someone can help

Thanks

Rod Pestell

2 years ago - coming back round to this especially now that Qualtrics have seemed to change / limit the interactions you see in the XM directory timeline.  Does or has anyone found a solution for this?

 

Thanks

 

Rod Pestell


Hi ​@Rod_Pestell 

For survey responses, the structure will look like this. If you need access to the survey id directly in the response, you will have to add the ED SurveyID to your existing survey flows.

{
" user_email]": {
"responses": "
{
// Metadata
"StartDate": { "Label": string, "Value": string },
"EndDate": { "Label": string, "Value": string },
"Status": { "Label": string, "Value": string },
"IPAddress": { "Label": string, "Value": string },
"Progress": { "Label": string, "Value": number },
"Duration (in seconds)": { "Label": string, "Value": number },
"Finished": { "Label": string, "Value": string },
"RecordedDate": { "Label": string, "Value": string },
"ResponseId": { "Label": string, "Value": string },
"RecipientLastName": { "Label": string, "Value": string },
"RecipientFirstName": { "Label": string, "Value": string },
"RecipientEmail": { "Label": string, "Value": string },
"ExternalReference": { "Label": string, "Value": string },
"LocationLatitude": { "Label": string, "Value": number },
"LocationLongitude": { "Label": string, "Value": number },
"DistributionChannel": { "Label": string, "Value": string },
"UserLanguage": { "Label": string, "Value": string },
// Questions
"Q1_STR": { "Label": string, "Value": string },
"Q1_NUM": { "Label": string, "Value": number },
...
// ED Fields
"SurveyID": { "Label": string, "Value": string },
...
},
...
]
}
}

JSONPath for extracting the survey ids would then be:

$..responses
  • .SurveyID.Value
  •  


    Hi ​@vgayraud , thanks very much for this guidance.  Looking at an example JSON file I have, is there an easy way to determine when the Metadata, Question and Embedded Data stops / starts.  So far I’ve used the logic if it starts with a Q and the next character is a number 0-9.  The covers most things but some of my embedded data fields also start with Q followed by a number.  I’ve narrowed it down further by the logic that ‘most’ (but again not all) questions will have a display order.

     

    For instance, below is an example of one of our surveys that I’ve pushed through some powerquery scripts and returned a lift.  It needs some work still but the screenshot below is showing the last few questions of which some of their ID’s have been renamed from Q… to TT…. (an exception to the rule that I’m willing to gloss over!) but then TITLE and FirstName and LastName is the start of all things embedded (except TextiQ stuff which I’ve filtered out based on their suffixes (eg. ‘- Topics’, ‘- Sentiment Score’).

     

     

    My logic works here but further down where I’ve use Q<0-9> in the embedded fields, my logic will change them to a question. eg.  these all should be embedded data.

     

    I could hardcode a certain field or index number but this won’t work for other surveys so it would be good to understand if there is a logic that will work across all surveys.

     

    The JSON file or at least what I’ve seen doesn’t seem to show a split between the types so any advise on that part would be most helpful. (using jsonformatter.org - see below))

     

    Thanks

     

    Rod Pestell

     

     

     


    Hi ​@Rod_Pestell 

    I haven’t noticed anything allowing to truly separate metadata, questions and EDs in the data privacy JSON export, other than that the 3 sections always seem to be in that order. That doesn’t really help of course since the number of fields for each section will vary depending on the survey.

    My best bet would be to leverage the Retrieve Survey JSON Schema API for your surveys and match the exportTag values against the data privacy export. The survey JSON schema will contain an associated dataType field specifying if it’s a metadata, a question or embeddedData.


    Thanks ​@vgayraud , I’ll have to look into API call from excel.  I’ve done some basic VBA API calls to qualtrics in the past (editing embedded fields) but not survey schemas.  Have you done anything like this within excel or maybe I should be thinking more about giving python within excel ago?  I guess for the meantime I will just plug in a field name where embedded fields start and run the query twice for now.  At least it does stay in order from my observations too.

     

    Also do you know how ticket data might come out in the JSON extract?  I have yet to find someone who has this and so probably need to cater for this.  Currently we use ananonymous form for creating tickets and so have to connect the dots (ie email address) manually when doing various subject access requests. 

     

    Thanks for your help and I will continue to persevere

     

    Rod Pestell


    I’ve done API calls in VBA using WinHttp as well, but nothing overly complex. It’s perfect for basic stuff.

    If you have a 365 sub and access to python in excel and more complex needs, I’d certainly give it a go. It’ll be way easier to parse JSON, process data and build advanced API integrations. Although at that point, Excel might just not be the ideal tool to use anymore…

    I haven’t had to explore ticket data in the data privacy either, I’ll let you know if I come upon something.

     

    Best of luck,


    Leave a Reply