Bulk upload schema
JupiterOne provides the ability to upload assets data via bulk upload in both the JupiterOne UI and via API.
Defining scopes for bulk uploads
When performing a bulk upload through the API, the scope
will need to be provided as a parameter when starting the synchronization job. This allows you to define the specific scope
that the bulk upload should apply to. If using bulk upload within the UI, you will be prompted to choose a scope
when uploading a file. You can find the bulk upload tool in the JupiterOne dashboard by navigating to Assets > Add new asset (the plus sign in the top-right).
Bulk uploads trigger a data synchronization process that automatically updates or deletes entities and relationships as needed. Previously existing entities and relationships within the same _scope
that no longer exist in the latest upload will be marked for deletion.
To avoid this issue, it is important to always include the complete set of entities and relationships within the defined _scope
of the upload. The _scope
defines which entities and relationships will be affected by the bulk upload and will prevent unintended data deletion. All data outside of that scope will remain unaffected.
For example
POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}
To successfully upload entity and relationship data, follow the schema outlined below:
{
"entities": [
{
"_key": "1",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
},
{
"_key": "2",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
}
],
"relationships": [
{
"_key": "a",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "2",
"_toEntityKey": "1"
}
]
}
Entity Properties
Properties with _
prefix are reserved as JupiterOne system internal metadata properties. Other than _key
, _type
, _class
as listed above, any other property beginning with _
will be ignored when processing the upload.
Property | Type | Description |
---|---|---|
_key | string | A unique identifier/key for this entity within the scope defined by _scope . |
_type | string | User defined type for this entity. Value should be in snake_case . |
_class | string or string[] | The defined class for this entity. Value should be in TitleCase . |
owner | string | Identifier for the person/thing responsible for this entity. |
Relationship Properties
Properties with _
prefix are reserved as JupiterOne system internal metadata properties. Other than _key
, _type
, _class
, _fromEntityKey
, and _toEntityKey
as listed above, any other property beginning with _
will be ignored when processing the upload.
Property | Type | Description |
---|---|---|
_key | string | A unique identifier/key for this relationship within the defined _scope . |
_type | string | User defined type for this relationship. Value should be in snake_case . |
_class | string | Relationship class. Typically a third-person singular verb such as HAS or MANAGES or ALLOWS . Value should be in CAPS . |
_fromEntityKey | string | The unique key for the entity on the "from" side of this relationship. Use this for in scope entities. |
_toEntityKey | string | The unique key for the entity on the "to" side of this relationship. Use this for in scope entities. |
_fromEntityId | string | The unique _id for the entity on the "from" side of this relationship. Use this for out of scope entities. |
_toEntityId | string | The unique _id for the entity on the "to" side of this relationship. Use this for out of scope entities. |
Read more about creating relationships between entities.
Entity and Relationship Synchronization (Bulk Upload)
An integration job is responsible for sending all of the latest entities and relationships to the persister and the persister will compare the new state to the old state and automatically apply the changes to the graph.
The persister exposes a public REST API that will be used when developing, testing, and running integrations outside the JupiterOne cloud infrastructure.
The synchronization API also supports synchronizing a grouping of entities and relationships from an API source by using a scope property. That is, a group of entities and relationships can be logically grouped together by an arbitrary scope value and uploaded to the persister via the synchronization API and the
create, update, and delete operations will be automatically determined within the given scope. The scope value is stored on the entities and relationships in the _scope
property.
Integration Job Bookkeeping
While an integration job is running, the persister will need to keep track of data as the job progresses.
This information will be tracked:
- New entities
- New relationships
- Raw data associated with entities (including
Content-Type
) - Job status and progress counters
- Job metadata (start time, source, etc.)
Phases of Synchronization
Data Collection: An integration job or other tools runs and collects all data and stores it temporarily on filesystem.
Data Upload: All data that represents "new state" is uploaded to the persister and associated with an integration job identifier. The "new state" will consist of entities, relationships, and raw data.
Finalization: After an integration has uploaded all data to the persister, "finalization" is triggered. During the "finalization" phase, the persister compares the "new state" with the "old state" and determines changes. The persister immediately performs any changes that are detected during the run of the finalization task (they are not queued on a Kinesis stream).
Entities are finalized first and relationships are finalized afterward (because relationships might reference new entities).
Synchronization API Usage
Request Flags
ignoreDuplicates
: Instructs the system to not throw an error if there are graph objects with duplicate keys. This will allow the latest graph object to be created if there are duplicate keys already in use.
Request Body Properties
source
:
api
for ad hoc data.integration-external
for custom integrations.
scope
: The Scope value can be set to any string. The same value needs to be used in the future for updating entities/relationships/properties within that scope. Additionally:
- Scope is required when the
syncMode
isDIFF
. - Scope can only be used when the
source
isapi
.
syncMode
:
DIFF
is the default value when a syncMode is not specified. This mode will update/replace all of the entities/relationships within a specified scope. The full dataset should be provided, otherwise entities and relationships may be unintentionally deleted.CREATE_OR_UPDATE
should be used when you are editing an existing scope of data. Use this mode when you want to add, update, or delete a subset of entities/relationships.
integrationInstanceId
:
- Required when referencing a custom integration (the
scope
is equal tointegration-external
).
Start a synchronization job
Sample request:
POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}
Sample request:
POST /persister/synchronization/jobs
{
"source": "integration-managed",
"integrationInstanceId": "5465397d-8491-4a12-806a-04792839abe3"
}
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Get status of synchronization job
Sample request:
GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Note: numRelationshipCreateErrors
indicates the number of relationships that could not be created when one or both entities are not found or have been soft deleted.
Upload batch of entities and relationships
Batch upload accepts the formats: json
, csv
, and yaml
sent as text in the
request body. The following Content-Type
request headers should be set
according to the intended type:
Format | Content-Type |
---|---|
json | 'application/json' |
yaml | 'application/yaml' |
csv | 'text/csv' |
In the case of a csv
, the type of graph object ("entity" or "relationship") is
inferred by the presence of one or more of the following the columns:
_fromEntityKey
, _fromEntityId
, _toEntityKey
, _toEntityId
.
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/upload
Entity / Relationship JSON:
{
"entities": [
{
"_key": "1",
"_class": "DataStore",
"_type": "fake_entity",
"displayName": "my_datastore"
},
{
"_key": "2",
"_class": "Database",
"_type": "fake_entity",
"displayName": "my_database"
},
{
"_key": "3",
"_class": "Domain",
"_type": "fake_entity",
"displayName": "my_domain"
}
],
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}
Entity / Relationship CSV
"_type","_class","_key","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name",,
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Upload batch of entities
Batch upload accepts the formats: json
, csv
, and yaml
sent as text in the
request body. The following Content-Type
request headers should be set
according to the intended type:
Format | Content-Type |
---|---|
json | 'application/json' |
yaml | 'application/yaml' |
csv | 'text/csv' |
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/entities
Upload Entity JSON
{
"entities": [
{
"_key": "1",
"_type": "fake_entity"
},
{
"_key": "2",
"_type": "fake_entity"
},
{
"_key": "3",
"_type": "fake_entity"
}
]
}
Upload Entity CSV
"_type","_class","_key","displayName"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name"
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Upload batch of relationships
Batch upload accepts the formats: json
, csv
, and yaml
sent as text in the
request body. The following Content-Type
request headers should be set
according to the intended type:
Format | Content-Type |
---|---|
json | 'application/json' |
yaml | 'application/yaml' |
csv | 'text/csv' |
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/relationships
Upload Relationship JSON:
{
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}
Upload Relationship CSV:
"_type","_class","_key","_id","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
CSV Upload Data Types
JupiterOne will infer primitive types (e.g. strings, numbers, booleans) within columns automatically. If the value can be converted to a number or boolean, it will be converted during the upload process.
To include JSON arrays within a csv column, there are two acceptable ways to express these structures:
Double Quote Format
Use double quotes ""
to escape quotes within an JSON array. This
format is the most common way to express and escape quote characters when
embedding JSON within a csv column.
JSON Array:
"_type","_class","_key","_id","custom"
"my_type","my_class","my_key","my_id","[""my_value"",100,true]"
Column Dot Notation
JSON arrays can also be described by using the value's JSON path (via dot notation) within the name of the column. Each element of that JSON array would then receive its own column with a zero indexed number specifying its location in the array.
JSON Array:
"_type","_class","_key","_id","custom.0","custom.1","custom.2"
"my_type","my_class","my_key","my_id","my_value","100","true"
Sample response:
JSON Array:
[
{
"_type": "my_type",
"_class": "my_class",
"_key": "my_key",
"_id": "my_id",
"custom": ["my_value", 100, true]
}
]
Arrays are valid on Entity properties only. Relationships with array properties will return an error.
Getting Bulk Upload URLs for Synchronization Jobs
You can use a bulk upload URL to upload a file that has the same structure as the body of a normal upload request. The persister processes this file during finalization.
Currently, the persister only allows one bulk upload per synchronization job. If you request a bulk upload URL more than once, the persister returns the same URL until it expires. Upload URLs expire in one hour.
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/uploadUrl
Sample response:
{
"uploadUrl": "{a very long signed S3 URL}",
"expiresAt": 1631198730000
}
Finalize synchronization job
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/finalize
Sample response (when running locally):
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Sample response (when running in AWS):
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINALIZE_PENDING",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Synchronization job status upon completion
Sample request:
GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Bulk Update
Start a new synchronization job:
Sample request:
POST /persister/synchronization/jobs
{
"source": "api",
"syncMode": "CREATE_OR_UPDATE"
}
Once the synchronization job is running, use the job id from the response to send a request to update existing entities/relationships.
Sample request:
POST /persister/synchronization/jobs/<jobId>/upload
{
"entities": [
{
"_key": "some_entity_id",
"_type": "fake_entity",
"_class": "MyEntity0",
"property0": "value0"
},
{
"_key": "some_other_entity_id",
"_type": "fake_entity",
"_class": "MyEntity1",
"property1": "value1"
}
],
"relationships": [
{
"_key": "a",
"_type": "new_relationship",
"_fromEntityKey": "some_entity_id",
"_toEntityKey": "some_other_entity_id"
}
]
}
Last, finalize the job.
POST /persister/synchronization/jobs/<jobId>/finalize
Bulk Delete
Start a new synchronization job:
Sample request:
POST /persister/synchronization/jobs
{
"source": "api",
"syncMode": "CREATE_OR_UPDATE"
}
Once the synchronization job is running, use the job id from the response to send a request to delete existing entities/relationships.
Sample request:
POST /persister/synchronization/jobs/<jobId>/upload
{
"deleteEntities": [
{
"_id": "example-uuid-01"
},
{
"_id": "example-uuid-02"
},
{
"_id": "example-uuid-03"
}
],
"deleteRelationships": [
{
"_id": "example-uuid-04"
},
{
"_id": "example-uuid-05"
},
{
"_id": "example-uuid-06"
}
]
}
Lastly, finalize the job.
POST /persister/synchronization/jobs/<jobId>/finalize
- When you delete an entity, all of the associated relationships will also be deleted. You do not need to call out both unless you are deleting unrelated relationships.
- You can delete by both
_id
and_key
. We recommend deleting entities by id because the_id
is unique across all entities.