Skip to main content

Bulk upload schema

JupiterOne provides the ability to upload assets data via bulk upload in both the JupiterOne UI and via API.

Defining scopes for bulk uploads

When performing a bulk upload through the API, the scope will need to be provided as a parameter when starting the synchronization job. This allows you to define the specific scope that the bulk upload should apply to. If using bulk upload within the UI, you will be prompted to choose a scope when uploading a file. You can find the bulk upload tool in the JupiterOne dashboard by navigating to Assets > Add new asset (the plus sign in the top-right).

Bulk uploads trigger a data synchronization process that automatically updates or deletes entities and relationships as needed. Previously existing entities and relationships within the same _scope that no longer exist in the latest upload will be marked for deletion.

To avoid this issue, it is important to always include the complete set of entities and relationships within the defined _scope of the upload. The _scope defines which entities and relationships will be affected by the bulk upload and will prevent unintended data deletion. All data outside of that scope will remain unaffected.

For example
POST /persister/synchronization/jobs
 {
"source": "api",
"scope": "my-sync-job"
}

To successfully upload entity and relationship data, follow the schema outlined below:

{
"entities": [
{
"_key": "1",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
},
{
"_key": "2",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
}

],
"relationships": [
{
"_key": "a",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},

{
"_key": "b",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "2",
"_toEntityKey": "1"

}
]
}

Entity Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class as listed above, any other property beginning with _ will be ignored when processing the upload.

PropertyTypeDescription
_keystringA unique identifier/key for this entity within the scope defined by _scope.
_typestringUser defined type for this entity. Value should be in snake_case.
_classstring or string[]The defined class for this entity. Value should be in TitleCase.
ownerstringIdentifier for the person/thing responsible for this entity.

Relationship Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class, _fromEntityKey, and _toEntityKey as listed above, any other property beginning with _ will be ignored when processing the upload.

PropertyTypeDescription
_keystringA unique identifier/key for this relationship within the defined _scope.
_typestringUser defined type for this relationship. Value should be in snake_case.
_classstringRelationship class. Typically a third-person singular verb such as HAS or MANAGES or ALLOWS. Value should be in CAPS.
_fromEntityKeystringThe unique key for the entity on the "from" side of this relationship. Use this for in scope entities.
_toEntityKeystringThe unique key for the entity on the "to" side of this relationship. Use this for in scope entities.
_fromEntityIdstringThe unique _id for the entity on the "from" side of this relationship. Use this for out of scope entities. Cannot be included when using DIFF syncMode.
_toEntityIdstringThe unique _id for the entity on the "to" side of this relationship. Use this for out of scope entities. Cannot be included when running DIFF syncMode.

Read more about creating relationships between entities.

Validation

The synchronization API performs comprehensive validation on all uploaded data to ensure data integrity. Understanding these validation rules will help you successfully upload entities and relationships.

Entity Validation

Required Fields

The required fields for entities depend on the synchronization mode:

DIFF Mode (Full Replacement):

  • _key - Required
  • _type - Required
  • _class - Required, either:
    • Single string, OR
    • Array of strings (max 5 items)

PATCH Mode (Partial Update):

  • When scope is defined: Either _key OR _id is required
  • When scope is not defined: _id is required

Property Constraints

Custom Properties:

  • Cannot start with underscore (_) - these are reserved for system properties
  • Must have values of the following types:
    • string
    • boolean
    • number
    • null
    • Homogenous arrays (all elements the same type):
      • Array of strings
      • Array of booleans
      • Array of numbers

Invalid property values:

  • Mixed-type arrays (e.g., ["string", 123, true])
  • Nested objects (except _rawData)
  • Undefined values

Relationship Validation

Required Fields

Standard Relationships (By Key):

  • _key - Required
  • _type - Required
  • _class - Required
  • _fromEntityKey - Required
  • _toEntityKey - Required

Mapped Relationships:

  • _key - Required
  • _type - Required
  • _class - Required
  • _mapping - Required object with the following structure:
    • sourceEntityKey - Required, the key of the source entity
    • targetFilterKeys - Required, array of arrays for filtering target entities
    • targetEntity - Required, object describing the target entity
    • skipTargetCreation - Optional boolean
    • relationshipDirection - Optional, either "FORWARD" or "REVERSE"

Sync Mode Constraints

warning

Relationships cannot be uploaded in PATCH mode synchronization jobs. If you attempt to upload relationships to a PATCH mode job, you will receive an error: Relationships are not allowed in PATCH jobs

Upload Size Constraints

Each upload request must contain:

  • At least 1 entity OR at least 1 relationship
  • Both entities and relationships arrays can be included in the same request

Common Validation Errors

When validation fails, you will receive a 400 Bad Request response with an error message. Here are common validation errors and how to resolve them:

Error MessageCauseResolution
Missing JupiterOne-Account headerRequired header not providedInclude the jupiterone-account header in your request
/entities/0/_key is requiredMissing required fieldAdd the _key field to the entity (in DIFF mode)
/entities/0 (entity key: "key-1") has invalid property name '_internal'Property name starts with underscoreRemove the underscore prefix or rename the property
/entities/0/_class (entity key: "key-1") has invalid type. Valid types are string or array of strings.Invalid _class typeEnsure _class is either a string or array of strings
/entities/0/_key: maximum length exceededField exceeds character limitReduce the length of the _key to 7000 characters or less
Relationships are not allowed in PATCH jobsAttempting to upload relationships in PATCH modeUse DIFF mode if you need to upload relationships
entities must have minimum 1 itemEmpty entities array providedInclude at least one entity in the array, or remove the empty array

Example: Valid Entity Upload

{
"entities": [
{
"_key": "user:john@example.com",
"_type": "user",
"_class": ["User", "Person"],
"name": "John Doe",
"email": "john@example.com",
"active": true,
"tags": ["employee", "developer"]
}
]
}

Example: Valid Relationship Upload

{
"relationships": [
{
"_key": "user:john@example.com|has|group:engineers",
"_type": "user_has_group",
"_class": "HAS",
"_fromEntityKey": "user:john@example.com",
"_toEntityKey": "group:engineers"
}
]
}

Entity and Relationship Synchronization (Bulk Upload)

An integration job is responsible for sending all of the latest entities and relationships to the persister and the persister will compare the new state to the old state and automatically apply the changes to the graph.

The persister exposes a public REST API that will be used when developing, testing, and running integrations outside the JupiterOne cloud infrastructure.

The synchronization API also supports synchronizing a grouping of entities and relationships from an API source by using a scope property. That is, a group of entities and relationships can be logically grouped together by an arbitrary scope value and uploaded to the persister via the synchronization API and the create, update, and delete operations will be automatically determined within the given scope. The scope value is stored on the entities and relationships in the _scope property.

Integration Job Bookkeeping

While an integration job is running, the persister will need to keep track of data as the job progresses.

This information will be tracked:

  • New entities
  • New relationships
  • Raw data associated with entities (including Content-Type)
  • Job status and progress counters
  • Job metadata (start time, source, etc.)

Phases of Synchronization

  1. Data Collection: An integration job or other tools runs and collects all data and stores it temporarily on filesystem.

  2. Data Upload: All data that represents "new state" is uploaded to the persister and associated with an integration job identifier. The "new state" will consist of entities, relationships, and raw data.

  3. Finalization: After an integration has uploaded all data to the persister, "finalization" is triggered. During the "finalization" phase, the persister compares the "new state" with the "old state" and determines changes. The persister immediately performs any changes that are detected during the run of the finalization task (they are not queued on a Kinesis stream).

    Entities are finalized first and relationships are finalized afterward (because relationships might reference new entities).

Synchronization API Usage

Request Flags

ignoreDuplicates: Instructs the system to not throw an error if there are graph objects with duplicate keys. This will allow the latest graph object to be created if there are duplicate keys already in use.

Request Body Properties

source:

  • api for ad hoc data.
  • integration-external for custom integrations.

scope: The Scope value can be set to any string. The same value needs to be used in the future for updating entities/relationships/properties within that scope. Additionally:

  • Scope is required when the syncMode is DIFF.
  • Scope can only be used when the source is api.

syncMode:

  • DIFF is the default value when a syncMode is not specified. This mode will update/replace all of the entities/relationships within a specified scope. The full dataset should be provided, otherwise entities and relationships may be unintentionally deleted.
  • PATCH mode will only update or create the entities that are provided in the upload. Existing entities that are not included in the upload will remain unchanged.
info

The CREATE_OR_UPDATE mode is now deprecated and will no longer function

integrationInstanceId:

  • Required when referencing a Custom integration (the scope is equal to integration-external).

Start a synchronization job

Sample request:

POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}

Sample request:

POST /persister/synchronization/jobs
{
"source": "integration-managed",
"integrationInstanceId": "5465397d-8491-4a12-806a-04792839abe3"
}

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Get status of synchronization job

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Note: numRelationshipCreateErrors indicates the number of relationships that could not be created when one or both entities are not found or have been soft deleted.

Upload batch of entities and relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

In the case of a csv, the type of graph object ("entity" or "relationship") is inferred by the presence of one or more of the following the columns: _fromEntityKey, _fromEntityId, _toEntityKey, _toEntityId.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/upload

Entity / Relationship JSON:

{
"entities": [
{
"_key": "1",
"_class": "DataStore",
"_type": "fake_entity",
"displayName": "my_datastore"
},
{
"_key": "2",
"_class": "Database",
"_type": "fake_entity",
"displayName": "my_database"
},
{
"_key": "3",
"_class": "Domain",
"_type": "fake_entity",
"displayName": "my_domain"
}
],
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_class": "IS",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_class": "MANAGES",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}

Entity / Relationship CSV

"_type","_class","_key","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name",,

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Upload batch of entities

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/entities

Upload Entity JSON

{
"entities": [
{
"_key": "1",
"_type": "fake_entity"
},
{
"_key": "2",
"_type": "fake_entity"
},
{
"_key": "3",
"_type": "fake_entity"
}
]
}

Upload Entity CSV

"_type","_class","_key","displayName"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name"

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Upload batch of relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/relationships

Upload Relationship JSON:

{
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_class": "HAS",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_class": "HAS",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}

Upload Relationship CSV:

"_type","_class","_key","_id","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

CSV Upload Data Types

JupiterOne will infer primitive types (e.g. strings, numbers, booleans) within columns automatically. If the value can be converted to a number or boolean, it will be converted during the upload process.

To include JSON arrays within a csv column, there are two acceptable ways to express these structures:

Double Quote Format

Use double quotes "" to escape quotes within an JSON array. This format is the most common way to express and escape quote characters when embedding JSON within a csv column.

JSON Array:

"_type","_class","_key","_id","custom"
"my_type","my_class","my_key","my_id","[""my_value"",100,true]"

Column Dot Notation

JSON arrays can also be described by using the value's JSON path (via dot notation) within the name of the column. Each element of that JSON array would then receive its own column with a zero indexed number specifying its location in the array.

JSON Array:

"_type","_class","_key","_id","custom.0","custom.1","custom.2"
"my_type","my_class","my_key","my_id","my_value","100","true"

Sample response:

JSON Array:

[
{
"_type": "my_type",
"_class": "my_class",
"_key": "my_key",
"_id": "my_id",
"custom": ["my_value", 100, true]
}
]
note

Arrays are valid on Entity properties only. Relationships with array properties will return an error.

Getting Bulk Upload URLs for Synchronization Jobs

You can use a bulk upload URL to upload a file that has the same structure as the body of a normal upload request. The persister processes this file during finalization.

Currently, the persister only allows one bulk upload per synchronization job. If you request a bulk upload URL more than once, the persister returns the same URL until it expires. Upload URLs expire in one hour.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/uploadUrl

Sample response:

{
"uploadUrl": "{a very long signed S3 URL}",
"expiresAt": 1631198730000
}

Finalize synchronization job

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/finalize

Sample response (when running locally):

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Sample response (when running in AWS):

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINALIZE_PENDING",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Synchronization job status upon completion

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}