Bulk upload schema
JupiterOne provides the ability to upload assets data via bulk upload in both the JupiterOne UI and via API.
Defining scopes for bulk uploads
When performing a bulk upload through the API, the scope will need to be provided as a parameter when starting the synchronization job. This allows you to define the specific scope that the bulk upload should apply to. If using bulk upload within the UI, you will be prompted to choose a scope when uploading a file. You can find the bulk upload tool in the JupiterOne dashboard by navigating to Assets > Add new asset (the plus sign in the top-right).
Bulk uploads trigger a data synchronization process that automatically updates or deletes entities and relationships as needed. Previously existing entities and relationships within the same _scope that no longer exist in the latest upload will be marked for deletion.
To avoid this issue, it is important to always include the complete set of entities and relationships within the defined _scope of the upload. The _scope defines which entities and relationships will be affected by the bulk upload and will prevent unintended data deletion. All data outside of that scope will remain unaffected.
For example
POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}
To successfully upload entity and relationship data, follow the schema outlined below:
{
"entities": [
{
"_key": "1",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
},
{
"_key": "2",
"_type": "bulk_upload_entity",
"_class": "EntityClass",
"displayName": "Entity's displayName to show in UI",
"owner": "Owner's name"
// ...any other properties defined for the given type/class
}
],
"relationships": [
{
"_key": "a",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "bulk_upload_relationship",
"_class": "VERB",
"_fromEntityKey": "2",
"_toEntityKey": "1"
}
]
}
Entity Properties
Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class as listed above, any other property beginning with _ will be ignored when processing the upload.
| Property | Type | Description |
|---|---|---|
_key | string | A unique identifier/key for this entity within the scope defined by _scope. |
_type | string | User defined type for this entity. Value should be in snake_case. |
_class | string or string[] | The defined class for this entity. Value should be in TitleCase. |
owner | string | Identifier for the person/thing responsible for this entity. |
Relationship Properties
Properties with _ prefix are reserved as JupiterOne system internal metadata properties. Other than _key, _type, _class, _fromEntityKey, and _toEntityKey as listed above, any other property beginning with _ will be ignored when processing the upload.
| Property | Type | Description |
|---|---|---|
_key | string | A unique identifier/key for this relationship within the defined _scope. |
_type | string | User defined type for this relationship. Value should be in snake_case. |
_class | string | Relationship class. Typically a third-person singular verb such as HAS or MANAGES or ALLOWS. Value should be in CAPS. |
_fromEntityKey | string | The unique key for the entity on the "from" side of this relationship. Use this for in scope entities. |
_toEntityKey | string | The unique key for the entity on the "to" side of this relationship. Use this for in scope entities. |
_fromEntityId | string | The unique _id for the entity on the "from" side of this relationship. Use this for out of scope entities. Cannot be included when using DIFF syncMode. |
_toEntityId | string | The unique _id for the entity on the "to" side of this relationship. Use this for out of scope entities. Cannot be included when running DIFF syncMode. |
Read more about creating relationships between entities.
Validation
The synchronization API performs comprehensive validation on all uploaded data to ensure data integrity. Understanding these validation rules will help you successfully upload entities and relationships.
Entity Validation
Required Fields
The required fields for entities depend on the synchronization mode:
DIFF Mode (Full Replacement):
_key- Required_type- Required_class- Required, either:- Single string, OR
- Array of strings (max 5 items)
PATCH Mode (Partial Update):
- When
scopeis defined: Either_keyOR_idis required - When
scopeis not defined:_idis required
Property Constraints
Custom Properties:
- Cannot start with underscore (
_) - these are reserved for system properties - Must have values of the following types:
stringbooleannumbernull- Homogenous arrays (all elements the same type):
- Array of strings
- Array of booleans
- Array of numbers
Invalid property values:
- Mixed-type arrays (e.g.,
["string", 123, true]) - Nested objects (except
_rawData) - Undefined values
Relationship Validation
Required Fields
Standard Relationships (By Key):
_key- Required_type- Required_class- Required_fromEntityKey- Required_toEntityKey- Required
Mapped Relationships:
_key- Required_type- Required_class- Required_mapping- Required object with the following structure:sourceEntityKey- Required, the key of the source entitytargetFilterKeys- Required, array of arrays for filtering target entitiestargetEntity- Required, object describing the target entityskipTargetCreation- Optional booleanrelationshipDirection- Optional, either"FORWARD"or"REVERSE"
Sync Mode Constraints
PATCH Mode: Relationships cannot be uploaded in PATCH mode synchronization jobs. If you attempt to upload relationships to a PATCH mode job, you will receive an error: Relationships are not allowed in PATCH jobs
CROSS_SCOPE Mode:
- Entities cannot be uploaded in CROSS_SCOPE mode. Any attempt to upload entities will be rejected.
- Only relationships with
_fromEntityScopeand_toEntityScopefields are allowed. - Regular relationships (same-scope) will be rejected in this mode.
Upload Size Constraints
Each upload request must contain:
- At least 1 entity OR at least 1 relationship
- Both
entitiesandrelationshipsarrays can be included in the same request
Common Validation Errors
When validation fails, you will receive a 400 Bad Request response with an error message. Here are common validation errors and how to resolve them:
| Error Message | Cause | Resolution |
|---|---|---|
Missing JupiterOne-Account header | Required header not provided | Include the jupiterone-account header in your request |
/entities/0/_key is required | Missing required field | Add the _key field to the entity (in DIFF mode) |
/entities/0 (entity key: "key-1") has invalid property name '_internal' | Property name starts with underscore | Remove the underscore prefix or rename the property |
/entities/0/_class (entity key: "key-1") has invalid type. Valid types are string or array of strings. | Invalid _class type | Ensure _class is either a string or array of strings |
/entities/0/_key: maximum length exceeded | Field exceeds character limit | Reduce the length of the _key to 7000 characters or less |
Relationships are not allowed in PATCH jobs | Attempting to upload relationships in PATCH mode | Use DIFF mode if you need to upload relationships |
entities must have minimum 1 item | Empty entities array provided | Include at least one entity in the array, or remove the empty array |
Example: Valid Entity Upload
{
"entities": [
{
"_key": "user:john@example.com",
"_type": "user",
"_class": ["User", "Person"],
"name": "John Doe",
"email": "john@example.com",
"active": true,
"tags": ["employee", "developer"]
}
]
}
Example: Valid Relationship Upload
{
"relationships": [
{
"_key": "user:john@example.com|has|group:engineers",
"_type": "user_has_group",
"_class": "HAS",
"_fromEntityKey": "user:john@example.com",
"_toEntityKey": "group:engineers"
}
]
}
Entity and Relationship Synchronization (Bulk Upload)
An integration job is responsible for sending all of the latest entities and relationships to the persister and the persister will compare the new state to the old state and automatically apply the changes to the graph.
The persister exposes a public REST API that will be used when developing, testing, and running integrations outside the JupiterOne cloud infrastructure.
The synchronization API also supports synchronizing a grouping of entities and relationships from an API source by using a scope property. That is, a group of entities and relationships can be logically grouped together by an arbitrary scope value and uploaded to the persister via the synchronization API and the
create, update, and delete operations will be automatically determined within the given scope. The scope value is stored on the entities and relationships in the _scope property.
Integration Job Bookkeeping
While an integration job is running, the persister will need to keep track of data as the job progresses.
This information will be tracked:
- New entities
- New relationships
- Raw data associated with entities (including
Content-Type) - Job status and progress counters
- Job metadata (start time, source, etc.)
Phases of Synchronization
-
Data Collection: An integration job or other tools runs and collects all data and stores it temporarily on filesystem.
-
Data Upload: All data that represents "new state" is uploaded to the persister and associated with an integration job identifier. The "new state" will consist of entities, relationships, and raw data.
-
Finalization: After an integration has uploaded all data to the persister, "finalization" is triggered. During the "finalization" phase, the persister compares the "new state" with the "old state" and determines changes. The persister immediately performs any changes that are detected during the run of the finalization task (they are not queued on a Kinesis stream).
Entities are finalized first and relationships are finalized afterward (because relationships might reference new entities).
Synchronization API Usage
Request Flags
ignoreDuplicates: Instructs the system to not throw an error if there are graph objects with duplicate keys. This will allow the latest graph object to be created if there are duplicate keys already in use.
Request Body Properties
source:
apifor ad hoc data.integration-externalfor custom integrations.
scope: The Scope value can be set to any string. The same value needs to be used in the future for updating entities/relationships/properties within that scope. Additionally:
- Scope is required when the
syncModeisDIFF. - Scope can only be used when the
sourceisapi.
syncMode:
DIFFis the default value when a syncMode is not specified. This mode will update/replace all of the entities/relationships within a specified scope. The full dataset should be provided, otherwise entities and relationships may be unintentionally deleted.PATCHmode will only update or create the entities that are provided in the upload. Existing entities that are not included in the upload will remain unchanged.CROSS_SCOPEmode is specifically designed for managing relationships between entities that exist in different scopes. This mode should only be used for creating cross-scope relationships and will reject any attempts to create entities or regular relationships (where both entities are in the same scope). When using CROSS_SCOPE:- Set
sourceto"api"(only allowed value) - Set
scopeto any string identifying your cross-scope relationships - No custom integration needed
- Set
The CREATE_OR_UPDATE mode is now deprecated and will no longer function
integrationInstanceId:
- Required when referencing a Custom integration (the
scopeis equal tointegration-external).
CROSS_SCOPE Sync Mode
The CROSS_SCOPE sync mode is a specialized synchronization mode designed exclusively for creating relationships between entities that exist in different scopes. This mode provides a mechanism to connect entities across different data sources or integration instances.
Key Characteristics
- Purpose: Only for creating cross-scope relationships
- Restrictions:
- Cannot be used to create entities (will be rejected)
- Cannot be used to create regular relationships where both entities exist in the same scope (will be rejected)
- Only processes relationships with valid cross-scope entity references
- Required Request Body Properties:
source: Must be set to"api"(the only allowed value for CROSS_SCOPE mode)syncMode: Must be set to"CROSS_SCOPE"scope: Can be any string value you choose to identify this set of cross-scope relationships
- No Custom Integration Required: You can use a regular sync job; there's no need to create a custom integration
- Required Relationship Fields:
- Relationships must include
_fromEntityScopeand_toEntityScopefields
- Relationships must include
Required Relationship Properties for CROSS_SCOPE Mode
When using CROSS_SCOPE sync mode, relationships must include these additional mandatory fields:
| Property | Type | Description |
|---|---|---|
_fromEntityScope | string | Required - The scope identifier of the source entity |
_toEntityScope | string | Required - The scope identifier of the target entity |
Example: CROSS_SCOPE Sync Job
Starting the sync job:
{
"source": "api",
"syncMode": "CROSS_SCOPE",
"scope": "cross-scope-relationships"
}
Uploading cross-scope relationships:
{
"relationships": [
{
"_key": "aws_account|prod-123_OWNS_s3_bucket|data-bucket-001",
"_type": "account_owns_bucket",
"_class": "OWNS",
"_fromEntityKey": "aws_account|prod-123",
"_toEntityKey": "s3_bucket|data-bucket-001",
"_fromEntityScope": "integration-instance-abc123",
"_toEntityScope": "integration-instance-def456"
},
{
"_key": "github_repo|frontend_USES_npm_package|lodash",
"_type": "repo_uses_package",
"_class": "USES",
"_fromEntityKey": "github_repo|frontend",
"_toEntityKey": "npm_package|lodash",
"_fromEntityScope": "github-integration-789",
"_toEntityScope": "npm-scanner-xyz"
}
]
}
Attempting to upload entities or same-scope relationships in a CROSS_SCOPE sync job will result in rejection. This mode is strictly for cross-scope relationship management.
Start a synchronization job
Sample request:
POST /persister/synchronization/jobs
{
"source": "api",
"scope": "my-sync-job"
}
Sample request:
POST /persister/synchronization/jobs
{
"source": "integration-managed",
"integrationInstanceId": "5465397d-8491-4a12-806a-04792839abe3"
}
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Get status of synchronization job
Sample request:
GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915139427,
"numEntitiesUploaded": 0,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Note: numRelationshipCreateErrors indicates the number of relationships that could not be created when one or both entities are not found or have been soft deleted.
Upload batch of entities and relationships
Batch upload accepts the formats: json, csv, and yaml sent as text in the
request body. The following Content-Type request headers should be set
according to the intended type:
| Format | Content-Type |
|---|---|
| json | 'application/json' |
| yaml | 'application/yaml' |
| csv | 'text/csv' |
In the case of a csv, the type of graph object ("entity" or "relationship") is
inferred by the presence of one or more of the following the columns:
_fromEntityKey, _fromEntityId, _toEntityKey, _toEntityId.
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/upload
Entity / Relationship JSON:
{
"entities": [
{
"_key": "1",
"_class": "DataStore",
"_type": "fake_entity",
"displayName": "my_datastore"
},
{
"_key": "2",
"_class": "Database",
"_type": "fake_entity",
"displayName": "my_database"
},
{
"_key": "3",
"_class": "Domain",
"_type": "fake_entity",
"displayName": "my_domain"
}
],
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_class": "IS",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_class": "MANAGES",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}
Entity / Relationship CSV
"_type","_class","_key","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name",,
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Upload batch of entities
Batch upload accepts the formats: json, csv, and yaml sent as text in the
request body. The following Content-Type request headers should be set
according to the intended type:
| Format | Content-Type |
|---|---|
| json | 'application/json' |
| yaml | 'application/yaml' |
| csv | 'text/csv' |
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/entities
Upload Entity JSON
{
"entities": [
{
"_key": "1",
"_type": "fake_entity"
},
{
"_key": "2",
"_type": "fake_entity"
},
{
"_key": "3",
"_type": "fake_entity"
}
]
}
Upload Entity CSV
"_type","_class","_key","displayName"
"<a entity type>","<a entity class>","<an entity key>","my_entity_name"
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 0,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Upload batch of relationships
Batch upload accepts the formats: json, csv, and yaml sent as text in the
request body. The following Content-Type request headers should be set
according to the intended type:
| Format | Content-Type |
|---|---|
| json | 'application/json' |
| yaml | 'application/yaml' |
| csv | 'text/csv' |
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/relationships
Upload Relationship JSON:
{
"relationships": [
{
"_key": "a",
"_type": "fake_relationship",
"_class": "HAS",
"_fromEntityKey": "1",
"_toEntityKey": "2"
},
{
"_key": "b",
"_type": "fake_relationship",
"_class": "HAS",
"_fromEntityKey": "2",
"_toEntityKey": "3"
}
]
}
Upload Relationship CSV:
"_type","_class","_key","_id","displayName","_fromEntityKey","_toEntityKey"
"<a relationship type>","<a relationship class>","<a relationship key>","my_relationship_name","<an entity key>","<an entity key>"
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "AWAITING_UPLOADS",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
CSV Upload Data Types
JupiterOne will infer primitive types (e.g. strings, numbers, booleans) within columns automatically. If the value can be converted to a number or boolean, it will be converted during the upload process.
To include JSON arrays within a csv column, there are two acceptable ways to express these structures:
Double Quote Format
Use double quotes "" to escape quotes within an JSON array. This
format is the most common way to express and escape quote characters when
embedding JSON within a csv column.
JSON Array:
"_type","_class","_key","_id","custom"
"my_type","my_class","my_key","my_id","[""my_value"",100,true]"
Column Dot Notation
JSON arrays can also be described by using the value's JSON path (via dot notation) within the name of the column. Each element of that JSON array would then receive its own column with a zero indexed number specifying its location in the array.
JSON Array:
"_type","_class","_key","_id","custom.0","custom.1","custom.2"
"my_type","my_class","my_key","my_id","my_value","100","true"
Sample response:
JSON Array:
[
{
"_type": "my_type",
"_class": "my_class",
"_key": "my_key",
"_id": "my_id",
"custom": ["my_value", 100, true]
}
]
Arrays are valid on Entity properties only. Relationships with array properties will return an error.
Getting Bulk Upload URLs for Synchronization Jobs
You can use a bulk upload URL to upload a file that has the same structure as the body of a normal upload request. The persister processes this file during finalization.
Currently, the persister only allows one bulk upload per synchronization job. If you request a bulk upload URL more than once, the persister returns the same URL until it expires. Upload URLs expire in one hour.
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/uploadUrl
Sample response:
{
"uploadUrl": "{a very long signed S3 URL}",
"expiresAt": 1631198730000
}
Finalize synchronization job
Sample request:
POST /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3/finalize
Sample response (when running locally):
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Sample response (when running in AWS):
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINALIZE_PENDING",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 0,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 0,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}
Synchronization job status upon completion
Sample request:
GET /persister/synchronization/jobs/f445397d-8491-4a12-806a-04792839abe3
Sample response:
{
"job": {
"source": "api",
"scope": "my-sync-job",
"id": "f445397d-8491-4a12-806a-04792839abe3",
"status": "FINISHED",
"startTimestamp": 1586915752483,
"numEntitiesUploaded": 3,
"numEntitiesCreated": 3,
"numEntitiesUpdated": 0,
"numEntitiesDeleted": 0,
"numRelationshipsUploaded": 2,
"numRelationshipsCreated": 2,
"numRelationshipsUpdated": 0,
"numRelationshipsDeleted": 0
}
}