Skip to main content

Bulk upload schema

JupiterOne provides the ability to upload assets data via bulk upload in both the JupiterOne UI and via API.

Defining scopes for bulk uploads

When performing a bulk upload through the API, the scope will need to be provided as a parameter when starting the synchronization job. This allows you to define the specific scope that the bulk upload should apply to. If using bulk upload within the UI, you will be prompted to choose a scope when uploading a file. You can find the bulk upload tool in the JupiterOne dashboard by navigating to Assets > Add new asset (the plus sign in the top-right).

Bulk uploads trigger a data synchronization process that automatically updates or deletes entities and relationships as needed. Previously existing entities and relationships within the same _scope that no longer exist in the latest upload will be marked for deletion.

To avoid this issue, it is important to always include the complete set of entities and relationships within the defined _scope of the upload. The _scope defines which entities and relationships will be affected by the bulk upload and will prevent unintended data deletion. All data outside of that scope will remain unaffected.

For example
POST /persister/synchronization/jobs
 {
"source": "api",
"scope": "my-sync-job"
}

To successfully upload entity and relationship data, follow the schema outlined below:

{
"entities": [
{
"_key": "1",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
},
{
"_key": "2",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
}

],
"relationships": [
{
"_key": "a",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},

{
"_key": "b",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "2",
"_toEntityKey": "1"

}
]
}

Entity Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class as listed above, any other property beginning with _ will be ignored when processing the upload.

PropertyTypeDescription
_keystringA unique identifier/key for this entity within the scope defined by _scope.
_typestringUser defined type for this entity. Value should be in snake_case.
_classstring or string[]The defined class for this entity. Value should be in TitleCase.
ownerstringIdentifier for the person/thing responsible for this entity.

Relationship Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class, _fromEntityKey, and _toEntityKey as listed above, any other property beginning with _ will be ignored when processing the upload.

PropertyTypeDescription
_keystringA unique identifier/key for this relationship within the defined _scope.
_typestringUser defined type for this relationship. Value should be in snake_case.
_classstringRelationship class. Typically a third-person singular verb such as HAS or MANAGES or ALLOWS. Value should be in CAPS.
_fromEntityKeystringThe unique key for the entity on the "from" side of this relationship. Use this for in scope entities.
_toEntityKeystringThe unique key for the entity on the "to" side of this relationship. Use this for in scope entities.
_fromEntityIdstringThe unique _id for the entity on the "from" side of this relationship. Use this for out of scope entities.
_toEntityIdstringThe unique _id for the entity on the "to" side of this relationship. Use this for out of scope entities.

Read more about creating relationships between entities.

Entity and Relationship Synchronization (Bulk Upload)

An integration job is responsible for sending all of the latest entities and relationships to the persister and the persister will compare the new state to the old state and automatically apply the changes to the graph.

The persister exposes a public REST API that will be used when developing, testing, and running integrations outside the JupiterOne cloud infrastructure.

The synchronization API also supports synchronizing a grouping of entities and relationships from an API source by using a scope property. That is, a group of entities and relationships can be logically grouped together by an arbitrary scope value and uploaded to the persister via the synchronization API and the create, update, and delete operations will be automatically determined within the given scope. The scope value is stored on the entities and relationships in the _scope property.

Integration Job Bookkeeping

While an integration job is running, the persister will need to keep track of data as the job progresses.

This information will be tracked:

  • New entities
  • New relationships
  • Raw data associated with entities (including Content-Type)
  • Job status and progress counters
  • Job metadata (start time, source, etc.)

Phases of Synchronization

  1. Data Collection: An integration job or other tools runs and collects all data and stores it temporarily on filesystem.

  2. Data Upload: All data that represents "new state" is uploaded to the persister and associated with an integration job identifier. The "new state" will consist of entities, relationships, and raw data.

  3. Finalization: After an integration has uploaded all data to the persister, "finalization" is triggered. During the "finalization" phase, the persister compares the "new state" with the "old state" and determines changes. The persister immediately performs any changes that are detected during the run of the finalization task (they are not queued on a Kinesis stream).

    Entities are finalized first and relationships are finalized afterward (because relationships might reference new entities).

Synchronization API Usage

Request Flags

ignoreDuplicates: Instructs the system to not throw an error if there are graph objects with duplicate keys. This will allow the latest graph object to be created if there are duplicate keys already in use.

Request Body Properties

source:

  • api for ad hoc data.
  • integration-external for custom integrations.

scope: The Scope value can be set to any string. The same value needs to be used in the future for updating entities/relationships/properties within that scope. Additionally:

  • Scope is required when the syncMode is DIFF.
  • Scope can only be used when the source is api.

syncMode:

  • DIFF is the default value when a syncMode is not specified. This mode will update/replace all of the entities/relationships within a specified scope. The full dataset should be provided, otherwise entities and relationships may be unintentionally deleted.
  • CREATE_OR_UPDATE should be used when you are editing an existing scope of data. Use this mode when you want to add, update, or delete a subset of entities/relationships.

integrationInstanceId:

  • Required when referencing a custom integration (the scope is equal to integration-external).

Start a synchronization job

Sample request:

POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}

Sample request:

POST /persister/synchronization/jobs
{
"source": "integration-managed",
"integrationInstanceId": "5465397d-8491-4a12-806a-04792839abe3"
}

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Get status of synchronization job

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Note: numRelationshipCreateErrors indicates the number of relationships that could not be created when one or both entities are not found or have been soft deleted.

Upload batch of entities and relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

In the case of a csv, the type of graph object ("entity" or "relationship") is inferred by the presence of one or more of the following the columns: _fromEntityKey, _fromEntityId, _toEntityKey, _toEntityId.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/upload

Entity / Relationship JSON:

{
"entities": [
{
"_key": "1",
"_class": "DataStore",
"_type": "fake_entity",
"displayName": "my_datastore"
},
{
"_key": "2",
"_class": "Database",
"_type": "fake_entity",
"displayName": "my_database"
},
{
"_key": "3",
"_class": "Domain",
"_type": "fake_entity",
"displayName": "my_domain"
}
],
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}

Entity / Relationship CSV

"_type","_class","_key","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name",,

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Upload batch of entities

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/entities

Upload Entity JSON

{
"entities": [
{
"_key": "1",
"_type": "fake_entity"
},
{
"_key": "2",
"_type": "fake_entity"
},
{
"_key": "3",
"_type": "fake_entity"
}
]
}

Upload Entity CSV

"_type","_class","_key","displayName"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name"

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Upload batch of relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

FormatContent-Type
json'application/json'
yaml'application/yaml'
csv'text/csv'

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/relationships

Upload Relationship JSON:

{
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}

Upload Relationship CSV:

"_type","_class","_key","_id","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

CSV Upload Data Types

JupiterOne will infer primitive types (e.g. strings, numbers, booleans) within columns automatically. If the value can be converted to a number or boolean, it will be converted during the upload process.

To include JSON arrays within a csv column, there are two acceptable ways to express these structures:

Double Quote Format

Use double quotes "" to escape quotes within an JSON array. This format is the most common way to express and escape quote characters when embedding JSON within a csv column.

JSON Array:

"_type","_class","_key","_id","custom"
"my_type","my_class","my_key","my_id","[""my_value"",100,true]"

Column Dot Notation

JSON arrays can also be described by using the value's JSON path (via dot notation) within the name of the column. Each element of that JSON array would then receive its own column with a zero indexed number specifying its location in the array.

JSON Array:

"_type","_class","_key","_id","custom.0","custom.1","custom.2"
"my_type","my_class","my_key","my_id","my_value","100","true"

Sample response:

JSON Array:

[
{
"_type": "my_type",
"_class": "my_class",
"_key": "my_key",
"_id": "my_id",
"custom": ["my_value", 100, true]
}
]
note

Arrays are valid on Entity properties only. Relationships with array properties will return an error.

Getting Bulk Upload URLs for Synchronization Jobs

You can use a bulk upload URL to upload a file that has the same structure as the body of a normal upload request. The persister processes this file during finalization.

Currently, the persister only allows one bulk upload per synchronization job. If you request a bulk upload URL more than once, the persister returns the same URL until it expires. Upload URLs expire in one hour.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/uploadUrl

Sample response:

{
"uploadUrl": "{a very long signed S3 URL}",
"expiresAt": 1631198730000
}

Finalize synchronization job

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/finalize

Sample response (when running locally):

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Sample response (when running in AWS):

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINALIZE_PENDING",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Synchronization job status upon completion

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}

Bulk Update

Start a new synchronization job:

Sample request:

POST /persister/synchronization/jobs
{
"source": "api",
"syncMode": "CREATE_OR_UPDATE"
}

Once the synchronization job is running, use the job id from the response to send a request to update existing entities/relationships.

Sample request:

POST /persister/synchronization/jobs/<jobId>/upload
{
"entities": [
{
"_key": "some_entity_id",
"_type": "fake_entity",
"_class": "MyEntity0",
"property0": "value0"
},
{
"_key": "some_other_entity_id",
"_type": "fake_entity",
"_class": "MyEntity1",
"property1": "value1"
}
],
"relationships": [
{
"_key": "a",
"_type": "new_relationship",
"_fromEntityKey": "some_entity_id",
"_toEntityKey": "some_other_entity_id"
}
]
}

Last, finalize the job.

POST /persister/synchronization/jobs/<jobId>/finalize

Bulk Delete

Start a new synchronization job:

Sample request:

POST /persister/synchronization/jobs
{
"source": "api",
"syncMode": "CREATE_OR_UPDATE"
}

Once the synchronization job is running, use the job id from the response to send a request to delete existing entities/relationships.

Sample request:

POST /persister/synchronization/jobs/<jobId>/upload
{
"deleteEntities": [
{
"_id": "example-uuid-01"
},
{
"_id": "example-uuid-02"
},
{
"_id": "example-uuid-03"
}
],
"deleteRelationships": [
{
"_id": "example-uuid-04"
},
{
"_id": "example-uuid-05"
},
{
"_id": "example-uuid-06"
}
]
}

Lastly, finalize the job.

POST /persister/synchronization/jobs/<jobId>/finalize
note
  • When you delete an entity, all of the associated relationships will also be deleted. You do not need to call out both unless you are deleting unrelated relationships.
  • You can delete by both _id and _key. We recommend deleting entities by id because the _id is unique across all entities.