Implementing and Creating Datasets for Download or Reporting – Learnosity Product & Developer Help

This article describes the steps needed to generate and retrieve a dataset. The dataset generation process is a series of steps using Data API endpoints. Once generated, the dataset can be downloaded. Some datasets can be used with Reports API to render a report.

Introducing datasets

In Learnosity Analytics, we have the concept of "datasets", which are a set of data files (sometimes many thousands of files) containing aggregated results and analysis in JSON and CSV formats.

You would want to use datasets so that you can access raw data from Learnosity to perform your own custom data analysis and reporting.

The exact contents and format of a dataset will vary depending on the dataset type. The dataset is generated asynchronously using Data API. Once completed, the dataset can be retrieved via Data API or rendered in-browser using Reports API, if the dataset type supports it.

Implementation overview

Report generation is performed asynchronously as follows:

1. Initialize a new dataset using Data API (see Step 1, below), this involves creating and configuring a new dataset. You must specify a dataset_id when you do this, in UUID format. The endpoint will return one or more URLs to which input files need to be uploaded.

If input files are not required to be uploaded, the dataset_type and a job_reference are returned, and you can proceed directly to Step 3 below.

2. Upload input files:

a. Upload input files (see Step 2a, below). Upload the input files required for the dataset. The data is uploaded to a specific URL returned from Step 1.

b. Commence the dataset generation job using Data API (see Step 2b, below). Notify Data API that input files have been uploaded and kick off the dataset generation job.

3. Poll for job completion using Data API (see Step 3, below). The status of the dataset generation job can be obtained by polling the /jobs endpoint.

4. Retrieve results via Data API or Reports API (if applicable, see Step 4, below). On completion, raw report data can be retrieved by your server application using Data API. If the dataset type has a corresponding report type in Reports API, the report can render the new dataset, and raw data can also be accessed from the Reports API methods.

Step 1: Initialize dataset

Initialize a new report dataset using the Data API SET /reports/datasets endpoint. The request data specifies the configuration for the dataset including which fields to calculate.

The specific request and response format depends on the type of dataset being generated.

Note: you must specify a dataset_id when you do this, in UUID format.

See below for a list of all datasets and the relevant attributes:

The endpoint returns a dataset_id for the new dataset.

If you are required to upload input files as part of your request, as indicated by the file_count parameter, the response also provides a list of input_files URLs, which are used in Step 2.

If you are not providing input files, the response will contain a job_reference, and you can proceed to Step 3.

Refresh an existing dataset

You can refresh an existing dataset with new group data, and then regenerate it based on the latest score data. To do so, include the dataset_id as a parameter in your request.

Sample request

{
    "dataset_id": "ce6c3842-5366-486d-a68f-ab9e6160e9de",
    "dataset_type": "activity-summary-by-group",
    "file_count": 1,
    "filters": {
        "activity_id": [
            "20170106b_ELA_comprehension"
        ]
    },
    "options": {
        "default_sort_field": "mean_percent",
        "default_sort_order": "desc",
        "fields": [
            "group_count",
            "population",
            "lowest_score",
            "highest_score",
            "mean_seen_activities",
            "mean_attempted_percent",
            "mean_score",
            "median_score",
            "p25_score",
            "p75_score",
            "p90_score",
            "stddev_score",
            "lowest_percent",
            "highest_percent",
            "mean_percent",
            "median_percent",
            "p25_percent",
            "p75_percent",
            "p90_percent",
            "stddev_percent"
        ],
        "default_users_sort_field": "score",
        "default_users_sort_order": "desc",
        "user_fields": [
            "score",
            "attempted_max_score",
            "unscored_max_score",
            "max_score",
            "seen_activities"
        ]
    },
    "groups": [
        {
            "key": "district",
            "label": "District"
        },
        {
            "key": "school",
            "label": "School"
        },
        {
            "key": "class",
            "label": "Class"
        }
    ]
}

Sample response

{
  "meta": {
    "status": true,
    "timestamp": 1483585276
  },
  "data": {
    "dataset_id": "ce6c3842-5366-486d-a68f-ab9e6160e9de",
    "input_files": [
      "https://learnosity-reportdatasets-va.s3.amazonaws.com/reports/0034/activity-summary-by-group/ce6c3842-5366-486d-a68f-ab9e6160e9de/infiles/0.ndjson?AWSAccessKeyId=AKIAJB5XQL2VQTD4KG6Q&Expires=1483588877&Signature=QUXjR0beW83bB7l1OpiTMOKRsKc%3D"
    ]
  }
}

Step 2(a): Upload input files

This step involves uploading the list of users and sessions that should be included in this report, and their group information, if applicable. The data is uploaded as one or more NDJSON files.

Upload the NDJSON files with an HTTP PUT request to the input_files URL(s) returned in Step 1.
Set the Content-Type: application/x-www-form-urlencoded header on the PUT request.
Set the Expect: 100-continue header if your HTTP client supports it, to avoid transmitting unnecessary NDJSON data in certain error cases.

The signed input_files URLs expire 60 minutes after they're issued. If one or more signed URLs have expired, start again from Step 1 to initialize a new dataset_id.

The expected format of your NDJSON file can be found in the documentation for the dataset type you are creating.

Example PUT request

The data can be uploaded using curl, or the HTTP client of your choice. For example, the following command will upload the users.ndjson file from a curl-enabled terminal:

$ curl --request PUT --header "Expect: 100-continue" --header "Content-Type: application/x-www-form-urlencoded" --data-binary "@users.ndjson" "https://learnosity-reportdatasets-va.s3.amazonaws.com/reports/0034/activity-summary-by-group/a951cc14-0316-4c12-bb13-1f386338a094/infiles/0.ndjson?AWSAccessKeyId=AKIAJB5XQL2VQTD4KG6Q&Expires=1484208440&Signature=lEANv4k3aIK1CelRhXsYHgXNTDQ%3D"

To debug issues with constructing a valid PUT request, inspect the raw request content and compare it to the following sample request produced by curl:

PUT https://learnosity-reportdatasets-va.s3.amazonaws.com/reports/0034/sessions-summary-by-group/a951cc14-0316-4c12-bb13-1f386338a094/infiles/0.ndjson?AWSAccessKeyId=AKIAJB5XQL2VQTD4KG6Q&Expires=1484204382&Signature=3zqRDJvtif%2B5wYmdCsWXHZLCkrM%3D HTTP/1.1
Host: learnosity-reportdatasets-va.s3.amazonaws.com
User-Agent: curl/7.43.0
Accept: */*
Expect: 100-continue
Content-Type: application/x-www-form-urlencoded
Content-Length: 10376

{"user_id":"ANONYMIZED_USER_aeee19f1", "group_path":["Simpson","Springfield High","7_krabappel"]}
{"user_id":"ANONYMIZED_USER_aeee19f2", "group_path":["Simpson","Springfield High","7_krabappel"]}
{"user_id":"ANONYMIZED_USER_aeee19f3", "group_path":["Simpson","Springfield High","7_krabappel"]}
{"user_id":"ANONYMIZED_USER_aeee19f4", "group_path":["Simpson","Springfield High","7_krabappel"]}
{"user_id":"ANONYMIZED_USER_aeee19f5", "group_path":["Simpson","Springfield High","7_krabappel"]}
{"user_id":"ANONYMIZED_USER_aeee19f6", "group_path":["Simpson","Springfield High","7_krabappel"]}

Step 2(b): Commence dataset job

Once the input files are uploaded, commence the dataset generation job using the Data API SET /jobs/reports/datasets endpoint. This call validates the provided report configuration, input files and commences the dataset aggregation processing. The request object contains the dataset_type and dataset_id from step 1.

Sample request

{
    "dataset_type": "activity-summary-by-group",
    "dataset_id": "e63de7cf-8a11-4b87-9a98-aba1af4e5340"
}

Sample response

{
    "meta": {
        "status": true,
        "timestamp": 1474336936
    },
    "data": {
        "job_reference": "402ead85-5a27-4d40-b68b-7b1e77924ed4",
        "dataset_type": "activity-summary-by-group",
        "dataset_id": "e63de7cf-8a11-4b87-9a98-aba1af4e5340"
    }
}

Step 3: Poll for job completion

Use the Data API GET /jobs endpoint to poll the dataset generation job's status. Pass the job_reference returned earlier.

A status of "completed" indicates that the dataset has been compiled successfully and can now be retrieved using Reports API and Data API. A status of "halted" indicates that the job has not completed successfully. Consult the error property of the job for further information.

For sample request and response objects, visit the GET /jobs documentation.

Step 4: Retrieve data

Raw data from compiled datasets is accessible via the Data API GET /reports/datasets endpoint. Some dataset types have a corresponding Reports API report, allowing the dataset to be retrieved and explored interactively using the in-browser interface in Reports API.

Retrieve datasets using Data API

The raw dataset files can be retrieved via Data API using the GET /reports/datasets endpoint. The endpoint returns pre-signed URLs. Send an HTTP GET request to the pre-signed URL to retrieve a JSON or CSV file. See the documentation for the dataset type you're creating for more details.

Retrieve datasets using Reports API

See the Reports API documentation for the supported report types corresponding to these datasets:

Introducing datasets

Implementation overview

Related articles