JupiterOne Data Stream (J1DS)
Overview
JupiterOne Data Stream (J1DS) is a service that can provide a Change Data Capture (CDC) stream of events in your graph database directly to a customer-managed AWS S3 bucket. This data can be stored for audit and data retention purposes, and also queried or otherwise processed (ETL) to perform additional analysis of the information that flows through JupiterOne.
Release Status
J1DS is currently in "closed beta" meaning we are working with existing customers to refine the functionality to ensure that it meets requirements and is valuable. The details of the implementation and the licensing terms are subject to change up until the feature is declared Generally Available (GA).
Getting Access
If you are an existing JupiterOne customer and you wish to try the J1DS service please reach out to your customer success representative who will be able to assist you.
How it Works
J1DS captures the graph database transaction logs, gathers the change events, and writes those events to the customer-configured AWS S3 bucket. These events represent the "after" state of any change made to an Entity or Relationship in the graph. The events can be Create, Update, or Delete events.
Data Partitioning
The data in the S3 bucket is provided using the following partitioning scheme:
jupiterone/graph/cdc/accountId=<JUPITERONE_ACCOUNT_ID>/year=<YEAR>/month=<MONTH>/day=<DAY>/time=<TIME_UTC>.jsonl
Where the .jsonl file contains newline delimited JSON records of each of the change events since the last export to the S3 bucket.
This data partitioning scheme is intended to work well with standard discovery tools (e.g. AWS Glue crawler), and would allow a customer with multiple JupiterOne accounts to collect the data into a single target location.
There is one additional key that J1DS will write to: jupiterone/.connection-test which is used to test connectivity from the JupiterOne platform to the customers' S3 bucket
Data Format
The records that are written to the S3 bucket represent the "updated" or "after" state for each graph object after each change or transaction that has taken place in the database.
An example record would look like the following:
{
"operation": "u",
"eventType": "entity",
"properties": {
"_scope": "eb4f2fac-e9a1-474d-8859-5f0e5ef90b16",
"_partition": 635,
"ultraRestricted": false,
"primaryTeamOwner": false,
"_integrationDefinitionId": "e770f533-4e49-40e0-8fd7-75bbb79dd824",
"_source": "integration-managed",
"_key": "slack-user:team_T0129XXXXXX:user_U09B1XXXXXX",
"_accountId": "j1dev",
"userType": "user",
"id": "U09B1XXXXXX",
"_createdOn": 1755879780646,
"_type": "slack_user",
"username": "some.user",
"_id": "3e6fa6e5-158d-5f03-8839-a964652b57dc",
"tag.CriticalAsset": true,
"name": "some.user",
"userId": "U09B1XXXXXX",
"_integrationInstanceId": "eb4f2fac-e9a1-474d-8859-5f0e5ef90b16",
"_deleted": false,
"updatedOn": 1759495751,
"firstName": "Some",
"tag.smartClass.CriticalAsset": true,
"lastName": "User",
"_integrationType": "slack",
"teamAdmin": false,
"restricted": false,
"emailDomain": "example.com",
"tag.SmartClass": true,
"teamOwner": false,
"isActive": true,
"appUser": false,
"_version": 1,
"_class": [
"User"
],
"shortLoginId": "some.user",
"email": "some.user@example.com",
"bot": false,
"admin": false,
"active": true,
"_integrationName": "Slack Example",
"realName": "Some User",
"displayName": "Some User",
"_beginOn": 1759510655079
},
"labels": [
"Entity",
"User",
"slack_user"
]
}
This event shows us that:
- The record is for an event type of
entity(rather than arelationship) - The operation is an "update"
umeaning this is a new version of the record - Properties are the new properties for this entity at this time
Configuring J1DS
During the closed Beta phase the configuration of J1DS is coordinated through the customer success and support team. That team will need to provide customers with some specific details to make the system work, and the customer will need to provide details about the S3 bucket and associated AWS account.
The steps for configuration are:
-
Create an AWS S3 bucket in the specific region that corresponds to your JupiterOne tenant
US: (.us.jupiterone.io): us-east-2 EU: (.eu.jupiterone.io): eu-central-1
NOTE: These specific regions are required due to network traffic costs. If the bucket is created in the wrong region J1DS will not function.
-
Configure the AWS S3 bucket to have the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowJupiterOnePutObject",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<JUPITERONE_AWS_ACCOUNT>:role/<JUPITERONE_ROLE_NAME>"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<BUCKET_NAME>/jupiterone/*",
"Condition": {
"Bool": {
"aws:SecureTransport": "true"
}
}
}
]
}NOTE: The specific values for
JUPITERONE_AWS_ACCOUNTandJUPITERONE_ROLE_NAMEwill be confirmed to the customer via the customer success and support team.
Once these setup tasks are completed and verified with JupiterOne the CDC stream will be enabled.
Frequently Asked Questions (FAQ)
Q: When do you do a full export of the database?
A: J1DS is intended to provide a change stream from the point at which it is enabled, it does not provide a full export of the contents of the JupiterOne database. The change data is intended to be used to, for example, construct the history of an entity in JupiterOne such that the current state of the entity in JupiterOne is "now" and the CDC data can be used to walk the history backwards to see previous versions of the object, computing diffs for interesting properties as it is processed.
Q: How is this system secured?
A: There are multiple steps to ensure that data is transferred securely, and to the intended bucket.
- When configuring J1DS you must provide the bucket name AND the AWS account that the bucket belongs to. When J1DS writes data to the S3 bucket the
expected bucket ownermechanism is used. This mitigates AWS S3 bucket hijacking where the same bucket might get recreated in a different AWS account - The policy needed by J1DS to write to the customer S3 bucket gives no read access to that bucket. The system can only write data; it cannot read data
Q: Does J1DS Support S3 managed KMS Encryption?
A: Yes! The PutObject API calls made by J1DS (and the multipart upload variant) will work with S3 managed KMS encryption keys.
Q: How Frequently is Data Written?
A: J1DS will write CDC data to the bucket every few minutes. Timing may vary between every 5 minutes and every 15 minutes. It's also possible that there are no new transaction logs to process, in which case data will not be written.
Known Issues and Limitations
- The data written to the S3 bucket is not currently compressed. We will be adding gzip compression to these objects in the future.
- Data is written on a best efforts basis, and the design is such that data should be written "at most once". Whilst the system is tolerant to transient failures and will attempt to transfer all transactions it is not possible to buffer transactions indefinitely.
- In the extremely unlikely event a database is restored from a backup it is possible that J1DS has transmitted change data that is no longer present in the database. Due to the nature of the data held by JupiterOne this should not represent a significant issue. There is no mechanism to reconcile the CDC stream in this scenario.