Bulk upload schema

JupiterOne provides the ability to upload assets data via bulk upload in both the JupiterOne UI and via API.

Defining scopes for bulk uploads

When performing a bulk upload through the API, the scope will need to be provided as a parameter when starting the synchronization job. This allows you to define the specific scope that the bulk upload should apply to. If using bulk upload within the UI, you will be prompted to choose a scope when uploading a file. You can find the bulk upload tool in the JupiterOne dashboard by navigating to Assets > Add new asset (the plus sign in the top-right).

Bulk uploads trigger a data synchronization process that automatically updates or deletes entities and relationships as needed. Previously existing entities and relationships within the same _scope that no longer exist in the latest upload will be marked for deletion.

To avoid this issue, it is important to always include the complete set of entities and relationships within the defined _scope of the upload. The _scope defines which entities and relationships will be affected by the bulk upload and will prevent unintended data deletion. All data outside of that scope will remain unaffected.

For example

POST /persister/synchronization/jobs

 {
   "source": "api",
   "scope": "my-sync-job"
 }

To successfully upload entity and relationship data, follow the schema outlined below:

{
  "entities": [
    {
      "_key": "1",
      "_type": "bulk_upload_entity",
      "_class": "EntityClass",
      "displayName": "Entity's displayName to show in UI",
      "owner": "Owner's name"
      // ...any other properties defined for the given type/class
    },
    {
      "_key": "2",
      "_type": "bulk_upload_entity",
      "_class": "EntityClass",
      "displayName": "Entity's displayName to show in UI",
      "owner": "Owner's name"
      // ...any other properties defined for the given type/class
    }

  ],
  "relationships": [
    {
      "_key": "a",
      "_type": "bulk_upload_relationship",
      "_class": "VERB",
      "_fromEntityKey": "1",
      "_toEntityKey": "2"
    },

    {
      "_key": "b",
      "_type": "bulk_upload_relationship",
      "_class": "VERB",
      "_fromEntityKey": "2",
      "_toEntityKey": "1"

    }
  ]
}

Entity Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class as listed above, any other property beginning with _ will be ignored when processing the upload.

Property	Type	Description
`_key`	`string`	A unique identifier/key for this entity within the scope defined by `_scope`.
`_type`	`string`	User defined type for this entity. Value should be in `snake_case`.
`_class`	`string` or `string[]`	The defined class for this entity. Value should be in `TitleCase`.
`owner`	`string`	Identifier for the person/thing responsible for this entity.

Relationship Properties

Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class, _fromEntityKey, and _toEntityKey as listed above, any other property beginning with _ will be ignored when processing the upload.

Property	Type	Description
`_key`	`string`	A unique identifier/key for this relationship within the defined `_scope`.
`_type`	`string`	User defined type for this relationship. Value should be in `snake_case`.
`_class`	`string`	Relationship class. Typically a third-person singular verb such as `HAS` or `MANAGES` or `ALLOWS`. Value should be in `CAPS`.
`_fromEntityKey`	`string`	The unique key for the entity on the "from" side of this relationship. Use this for in scope entities.
`_toEntityKey`	`string`	The unique key for the entity on the "to" side of this relationship. Use this for in scope entities.
`_fromEntityId`	`string`	The unique _id for the entity on the "from" side of this relationship. Use this for out of scope entities.
`_toEntityId`	`string`	The unique _id for the entity on the "to" side of this relationship. Use this for out of scope entities.

Read more about creating relationships between entities.

Entity and Relationship Synchronization (Bulk Upload)

An integration job is responsible for sending all of the latest entities and relationships to the persister and the persister will compare the new state to the old state and automatically apply the changes to the graph.

The persister exposes a public REST API that will be used when developing, testing, and running integrations outside the JupiterOne cloud infrastructure.

The synchronization API also supports synchronizing a grouping of entities and relationships from an API source by using a scope property. That is, a group of entities and relationships can be logically grouped together by an arbitrary scope value and uploaded to the persister via the synchronization API and the create, update, and delete operations will be automatically determined within the given scope. The scope value is stored on the entities and relationships in the _scope property.

Integration Job Bookkeeping

While an integration job is running, the persister will need to keep track of data as the job progresses.

This information will be tracked:

New entities
New relationships
Raw data associated with entities (including Content-Type)
Job status and progress counters
Job metadata (start time, source, etc.)

Phases of Synchronization

Data Collection: An integration job or other tools runs and collects all data and stores it temporarily on filesystem.
Data Upload: All data that represents "new state" is uploaded to the persister and associated with an integration job identifier. The "new state" will consist of entities, relationships, and raw data.
Finalization: After an integration has uploaded all data to the persister, "finalization" is triggered. During the "finalization" phase, the persister compares the "new state" with the "old state" and determines changes. The persister immediately performs any changes that are detected during the run of the finalization task (they are not queued on a Kinesis stream).

Entities are finalized first and relationships are finalized afterward (because relationships might reference new entities).

Synchronization API Usage

Request Flags

ignoreDuplicates: Instructs the system to not throw an error if there are graph objects with duplicate keys. This will allow the latest graph object to be created if there are duplicate keys already in use.

Request Body Properties

source:

api for ad hoc data.
integration-external for custom integrations.

scope: The Scope value can be set to any string. The same value needs to be used in the future for updating entities/relationships/properties within that scope. Additionally:

Scope is required when the syncMode is DIFF.
Scope can only be used when the source is api.

syncMode:

DIFF is the default value when a syncMode is not specified. This mode will update/replace all of the entities/relationships within a specified scope. The full dataset should be provided, otherwise entities and relationships may be unintentionally deleted.

info

The CREATE_OR_UPDATE mode is now deprecated and will no longer function

integrationInstanceId:

Required when referencing a custom integration (the scope is equal to integration-external).

Start a synchronization job

Sample request:

POST /persister/synchronization/jobs

{
  "source": "api",
  "scope": "my-sync-job"
}

Sample request:

POST /persister/synchronization/jobs

{
  "source": "integration-managed",
  "integrationInstanceId": "5465397d-8491-4a12-806a-04792839abe3"
}

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "AWAITING_UPLOADS",
    "startTimestamp": 1586915139427,
    "numEntitiesUploaded": 0,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 0,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Get status of synchronization job

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "AWAITING_UPLOADS",
    "startTimestamp": 1586915139427,
    "numEntitiesUploaded": 0,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 0,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Note: numRelationshipCreateErrors indicates the number of relationships that could not be created when one or both entities are not found or have been soft deleted.

Upload batch of entities and relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

Format	Content-Type
json	`'application/json'`
yaml	`'application/yaml'`
csv	`'text/csv'`

In the case of a csv, the type of graph object ("entity" or "relationship") is inferred by the presence of one or more of the following the columns: _fromEntityKey, _fromEntityId, _toEntityKey, _toEntityId.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/upload

Entity / Relationship JSON:

{
  "entities": [
    {
      "_key": "1",
      "_class": "DataStore",
      "_type": "fake_entity",
      "displayName": "my_datastore"
    },
    {
      "_key": "2",
      "_class": "Database",
      "_type": "fake_entity",
      "displayName": "my_database"
    },
    {
      "_key": "3",
      "_class": "Domain",
      "_type": "fake_entity",
      "displayName": "my_domain"
    }
  ],
  "relationships": [
    {
      "_key": "a",
      "_type": "fake_relationship",
      "_fromEntityKey": "1",
      "_toEntityKey": "2"
    },
    {
      "_key": "b",
      "_type": "fake_relationship",
      "_fromEntityKey": "2",
      "_toEntityKey": "3"
    }
  ]
}

Entity / Relationship CSV

"_type","_class","_key","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name",,

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "AWAITING_UPLOADS",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 2,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Upload batch of entities

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

Format	Content-Type
json	`'application/json'`
yaml	`'application/yaml'`
csv	`'text/csv'`

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/entities

Upload Entity JSON

{
  "entities": [
    {
      "_key": "1",
      "_type": "fake_entity"
    },
    {
      "_key": "2",
      "_type": "fake_entity"
    },
    {
      "_key": "3",
      "_type": "fake_entity"
    }
  ]
}

Upload Entity CSV

"_type","_class","_key","displayName"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name"

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "AWAITING_UPLOADS",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 0,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Upload batch of relationships

Batch upload accepts the formats: json, csv, and yaml sent as text in the request body. The following Content-Type request headers should be set according to the intended type:

Format	Content-Type
json	`'application/json'`
yaml	`'application/yaml'`
csv	`'text/csv'`

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/relationships

Upload Relationship JSON:

{
  "relationships": [
    {
      "_key": "a",
      "_type": "fake_relationship",
      "_fromEntityKey": "1",
      "_toEntityKey": "2"
    },
    {
      "_key": "b",
      "_type": "fake_relationship",
      "_fromEntityKey": "2",
      "_toEntityKey": "3"
    }
  ]
}

Upload Relationship CSV:

"_type","_class","_key","_id","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "AWAITING_UPLOADS",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 2,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

CSV Upload Data Types

JupiterOne will infer primitive types (e.g. strings, numbers, booleans) within columns automatically. If the value can be converted to a number or boolean, it will be converted during the upload process.

To include JSON arrays within a csv column, there are two acceptable ways to express these structures:

Double Quote Format

Use double quotes "" to escape quotes within an JSON array. This format is the most common way to express and escape quote characters when embedding JSON within a csv column.

JSON Array:

"_type","_class","_key","_id","custom"
"my_type","my_class","my_key","my_id","[""my_value"",100,true]"

Column Dot Notation

JSON arrays can also be described by using the value's JSON path (via dot notation) within the name of the column. Each element of that JSON array would then receive its own column with a zero indexed number specifying its location in the array.

JSON Array:

"_type","_class","_key","_id","custom.0","custom.1","custom.2"
"my_type","my_class","my_key","my_id","my_value","100","true"

Sample response:

JSON Array:

[
  {
    "_type": "my_type",
    "_class": "my_class",
    "_key": "my_key",
    "_id": "my_id",
    "custom": ["my_value", 100, true]
  }
]

note

Arrays are valid on Entity properties only. Relationships with array properties will return an error.

Getting Bulk Upload URLs for Synchronization Jobs

You can use a bulk upload URL to upload a file that has the same structure as the body of a normal upload request. The persister processes this file during finalization.

Currently, the persister only allows one bulk upload per synchronization job. If you request a bulk upload URL more than once, the persister returns the same URL until it expires. Upload URLs expire in one hour.

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/uploadUrl

Sample response:

{
  "uploadUrl": "{a very long signed S3 URL}",
  "expiresAt": 1631198730000
}

Finalize synchronization job

Sample request:

POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/finalize

Sample response (when running locally):

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "FINISHED",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 3,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 2,
    "numRelationshipsCreated": 2,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Sample response (when running in AWS):

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "FINALIZE_PENDING",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 0,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 2,
    "numRelationshipsCreated": 0,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Synchronization job status upon completion

Sample request:

GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3

Sample response:

{
  "job": {
    "source": "api",
    "scope": "my-sync-job",
    "id": "f445397d-8491-4a12-806a-04792839abe3",
    "status": "FINISHED",
    "startTimestamp": 1586915752483,
    "numEntitiesUploaded": 3,
    "numEntitiesCreated": 3,
    "numEntitiesUpdated": 0,
    "numEntitiesDeleted": 0,
    "numRelationshipsUploaded": 2,
    "numRelationshipsCreated": 2,
    "numRelationshipsUpdated": 0,
    "numRelationshipsDeleted": 0
  }
}

Defining scopes for bulk uploads​

For example​

Entity Properties​

Relationship Properties​

Entity and Relationship Synchronization (Bulk Upload)​

Integration Job Bookkeeping​

Phases of Synchronization​

Synchronization API Usage​

Request Flags​

Request Body Properties​

Start a synchronization job​

Get status of synchronization job​

Upload batch of entities and relationships​

Upload batch of entities​

Upload batch of relationships​

CSV Upload Data Types​

Getting Bulk Upload URLs for Synchronization Jobs​

Finalize synchronization job​

Synchronization job status upon completion​

Contents

Defining scopes for bulk uploads

For example

Entity Properties

Relationship Properties

Entity and Relationship Synchronization (Bulk Upload)

Integration Job Bookkeeping

Phases of Synchronization

Synchronization API Usage

Request Flags

Request Body Properties

Start a synchronization job

Get status of synchronization job

Upload batch of entities and relationships

Upload batch of entities

Upload batch of relationships

CSV Upload Data Types

Getting Bulk Upload URLs for Synchronization Jobs

Finalize synchronization job

Synchronization job status upon completion