API Extract Issue | XM Community
Skip to main content
Question

API Extract Issue


Hello,

 

I’m facing some issues with the result I’m trying to achieve. To summarize, I want to export the data of my survey in Excel format in an automated way (weekly email), not having to go to the survey and manually export it. I am trying to achieve this through Python using the Qualtrics API token. Please find here part of my code that is problematic:

 

API_TOKEN = 'MY_API_TOKEN'

SURVEY_ID = 'MY_SURVEY_ID'

DATA_CENTER = 'MY_DATA_CENTER'

BASE_URL = f'https://{DATA_CENTER}.qualtrics.com/API/v3/surveys/{SURVEY_ID}/export-responses/'

headers={

        "Content-Type":"application/json",

        "X-API-TOKEN": API_TOKEN

    }

def test_baseurl():

    try:

        response = requests.get(BASE_URL, headers=headers)

        response.raise_for_status()

        print("URL is valid")

        print("Response:", response.json())

    except requests.exceptions.RequestException as e:

        print(f"Error: {e}")

 

if __name__== "__main__":

    test_baseurl()

 

 

The error I get is SSLCertVerificationError that indicates that there is an issue with the SSL certificate verification when attempting to connect to the Qualtrics API. Can you please help me resolve this issue?

 

Thank you,

Valentina

10 replies

Badge +1

Hi Valentina, are you behind a proxy server by chance? I run this same thing through my employer’s network and have to account for proxy. If you are, or might be, try adding the following to your code. Obviously you’ll need to replace {your_proxy_url} with whatever it might be for your network.

 

proxies = {

    'http': f"http://{your_proxy_url}:8080",

    'https':f "http://{your_proxy_url}:8080",

}

response = requests.get(BASE_URL, headers=headers, proxies=proxies)

 

I’ll also add that I don’t think you can export directly to Excel at this point, or maybe I’m just behind, but the code I have exports the CSV via a ZIP file and I have to extract the ZIP and then I read the CSV back in and load it to our data warehouse.

Badge +1

Hi @jupshaw ,

 

Thanks a lot for your answer!! i will try to incorporate what you have suggested. Would you be so kind and share the code that you have to export in CSV? It seems you’re already doing what i’m trying to achieve. 

 

Thanks a lot!

Userlevel 6
Badge +31

Quick suggestion:

  1. Try ‘Response Export Automation’ if you are looking for raw data
  2. Setup a dashboard if not and add in all the fields you want to see in the data file. Now setup automation within dashboard to send you email attached with data file in whatever format you need.

Hope this helps!

Badge +1

Hi @Appzk ,

 

Unfortunately i don’t think that would work as I don’t have the possibility to use an SFTP server, is there another way to “response export automation”?

Userlevel 6
Badge +31

Does 2nd approach look unfeasible to u?

Badge +1

Hi @valconst this is the code that I use. The majority of the script came from an official Qualtrics example that I can’t seem to find again, but that’s where I got my start. I’ve added a few things like the proxies as well as a bit of data cleaning at the end in step 5. Since it is downloaded as a ZIP you have to extract it and read the CSV in as a dataframe if you want to do more with it.

I read it in, remove extra columns if I don’t want everything in the file, then remove the extra rows at the top where Qualtrics has multiple headers for the survey output, and finally rename columns. I do all this because the next part of my process is to parameterize the data and load them to our data warehouse each morning.

Userlevel 5
Badge +11

Hi @jupshaw 

Thanks for sharing the python script.  I have been struggling with trying to export a specific set of columns using the workflow and tasks Qualtrics offers  during which I have been met with a number of problems:

 

-My initial thought was to select the columns I wanted in the D&A part and export using a filter.  But it won’t allow more than 200 fields in one view (I need more) and it won’t apply a filter. 

-My second attempt was to use a workflow, using the extract responses task which allowed a filter to be selected.  I ran into an undocumented problem in that the filter names must be no more than 100 characters in length (finding that Easter egg I really think I should get a prize!!  Qualtrics are you listening?!)  After fixing that, I was able to continue but I then realised it showed just the question text in the list to select from, so not the QID or question labels, making it very hard to select and identify the fields if you use a generic question text.  Qualtrics said this was by design and won’t change it!?!

-My third attempt was to use the above extract task but just export all fields and then use a basic transformation task to remove the fields I didn’t need but due to having 1200 fields (the survey is quite heavy on matrix questions!) and needing to cut it down to about 250 the task is so badly written it is just not responsive at all.  It takes more than 30 seconds to delete one field.  I’d likely spend 12 hours solid every 30 seconds clicking delete. I don’t even want to contemplate doing that!

 

As you can probably understand, I am at my wits’ end with this simple process I’m trying to do.  For the time being I’ve created 3 views in the D&A and split the fields using the ResponseId as a primary key, manually downloaded them and joined them together in excel. 

 

I don’t have python installed on my PC but I’m interested in your script as it looks like it has the ability to select specific fields and embedded data.  Is that right?

 

I also read that the API can also be used to only download fields that haven’t yet been downloaded if you use the same process id or something.  That feature is a bit like the extract responses task above which is a function that I’ll need once it’s up and running.

 

The final step is to send the file to an SFTP server - this was another reason to keep this process server side and so my main question is do you know if there is a way to run something like this from a code task that will give me the flexibility the choose the fields I need.

 

If you have any other suggestions that too would be great.

 

Many thanks in advance

 

Rod Pestell

Badge +1

@Rod_Pestell I think the API calls and using Python might be more flexible and speedy for you. To answer a couple of your questions:

  1. You can select fields manually and you have to reference them by the QID. In this code snippet you’ll see that you can specify field names in a list and then you put the list in the dictionary for the HTTP POST request. You can also select embedded data and if I recall correctly, that is using the name you gave it when uploading, so something like “CustomerID”.

    questionArray = ["QID2", "QID3", "QID4", "QID26", "QID12", "QID13", "QID14", "QID15", "QID17", "QID18", "QID19",

                     "QID25", "QID24"]

    embeddedArray = []

     

    '''Step 1: Creating Data Export'''

    data = {

        # https://api.qualtrics.com/sdks/b3A6NjEwMzk-start-response-export

        "format": "csv",

        "breakoutSets": False,

        "useLabels": True,

        # "limit": 10,

        "startDate": start_date,

        "questionIds": questionArray,

        }

  2. You can talk with SFTP in Python. In step 4 the code says where to extract the ZIP file to. If you add some more to this you should be able to extract it to the SFTP. In my code here that would be the file_path variable. I don’t have it in this script but I used the paramiko package to work with SFTP and have read from and written to them using it.

    '''Step 4: Unzipping the file'''

    with zipfile.ZipFile(io.BytesIO(requestDownload.content), mode="r") as archive:

        archive.extractall(file_path)

        for info in archive.infolist():

            file_name = info.filename

            print(f"\tDownload complete: {info.filename}")

Userlevel 5
Badge +11

Hi @jupshaw,

 

Thanks for the guidance.  We have about 200 fields to include.  Do you know if there might be some sort of limit to the fields or endpoint api call?  (apologies if I’m not using the right terminology, I am new to python and APIs!)

 

Also, I see in your example you set the limit to 10 responses

 

# "limit": 10,

 

But there is a hash at the front.  Does that mean it’s commented out and therefore ignored?

 

My plan would be to use a pre-determined filter (set to last 7 days upto the previous day) it will just export as much as there is.  We could however think about use the continuation token but then I assume you have to keep a record of that which most probably complicates the process somewhat?

 

One other thing to check, in the data variant / variable, again you’ve included but prompted out the link to the api documentation.  Is that ok to do?  (from my VBA days I would normally just place that outside of the variable as a note above (again my python programming knowledge is zero!)

 

data = {
    # https://api.qualtrics.com/sdks/b3A6NjEwMzk-start-response-export
    "format": "csv",
    "breakoutSets": False,
    "useLabels": True,
    # "limit": 10,
    "startDate": start_date,
    # "questionIds": questionArray,
    }

 

Finally, I’m aware of jsfiddle and htmlfiddle websites, which let you test things out without the need of installing various apps.  I came across pythonfiddle and wondered if this would be a safe place to test out scripts (for instance I’m aware that I will need to share a token).  Or would you suggest I install an app (perhaps I need to bite the bullet and request Visual Studio Code).

 

Thanks

 

Rod

Badge +1

@Rod_Pestell I’m not aware of field limitations in the API, however none of our surveys are that large so I wouldn’t be able to test that.

In Python the hash # is used to comment out code. Limit is an available option that I use to test new calls, but have it commented out in my production code. The API documentation link is just in there for my documentation purposes and has no effect on the code since it is in a comment.

In my full code it should show that I’m pulling things based on the last 10 days. With the way that it is written it will also pull through today, which means I have the possibility of pulling some duplicate records. But when I load the responses to our data warehouse, I’m pulling the ResponseIDs from the past X number of days and removing anything from the Python call that is already in the warehouse.

I have not used any online Python editors but I would be wary about putting secrets like API tokens in them, strictly from a cyber security perspective.

Leave a Reply