JupiterOne Data Stream (J1DS)

CLOSED BETA

Overview

JupiterOne Data Stream (J1DS) is a service that can provide a Change Data Capture (CDC) stream of events in your graph database directly to a customer-managed AWS S3 bucket. This data can be stored for audit and data retention purposes, and also queried or otherwise processed (ETL) to perform additional analysis of the information that flows through JupiterOne.

Release Status

J1DS is currently in "closed beta" meaning we are working with existing customers to refine the functionality to ensure that it meets requirements and is valuable. The details of the implementation and the licensing terms are subject to change up until the feature is declared Generally Available (GA).

Getting Access

If you are an existing JupiterOne customer and you wish to try the J1DS service please reach out to your customer success representative who will be able to assist you.

How it Works

J1DS captures the graph database transaction logs, gathers the change events, and writes those events to the customer-configured AWS S3 bucket. These events represent the "after" state of any change made to an Entity or Relationship in the graph. The events can be Create, Update, or Delete events.

Data Partitioning

The data in the S3 bucket is provided using the following partitioning scheme:

jupiterone/graph/cdc/accountId=<JUPITERONE_ACCOUNT_ID>/year=<YEAR>/month=<MONTH>/day=<DAY>/time=<TIME_UTC>.jsonl

Where the .jsonl file contains newline delimited JSON records of each of the change events since the last export to the S3 bucket.

This data partitioning scheme is intended to work well with standard discovery tools (e.g. AWS Glue crawler), and would allow a customer with multiple JupiterOne accounts to collect the data into a single target location.

info

There is one additional key that J1DS will write to: jupiterone/.connection-test which is used to test connectivity from the JupiterOne platform to the customers' S3 bucket

Data Format

The records that are written to the S3 bucket represent the "updated" or "after" state for each graph object after each change or transaction that has taken place in the database.

An example record would look like the following:

{
    "operation": "u",
    "eventType": "entity",
    "properties": {
        "_scope": "eb4f2fac-e9a1-474d-8859-5f0e5ef90b16",
        "_partition": 635,
        "ultraRestricted": false,
        "primaryTeamOwner": false,
        "_integrationDefinitionId": "e770f533-4e49-40e0-8fd7-75bbb79dd824",
        "_source": "integration-managed",
        "_key": "slack-user:team_T0129XXXXXX:user_U09B1XXXXXX",
        "_accountId": "j1dev",
        "userType": "user",
        "id": "U09B1XXXXXX",
        "_createdOn": 1755879780646,
        "_type": "slack_user",
        "username": "some.user",
        "_id": "3e6fa6e5-158d-5f03-8839-a964652b57dc",
        "tag.CriticalAsset": true,
        "name": "some.user",
        "userId": "U09B1XXXXXX",
        "_integrationInstanceId": "eb4f2fac-e9a1-474d-8859-5f0e5ef90b16",
        "_deleted": false,
        "updatedOn": 1759495751,
        "firstName": "Some",
        "tag.smartClass.CriticalAsset": true,
        "lastName": "User",
        "_integrationType": "slack",
        "teamAdmin": false,
        "restricted": false,
        "emailDomain": "example.com",
        "tag.SmartClass": true,
        "teamOwner": false,
        "isActive": true,
        "appUser": false,
        "_version": 1,
        "_class": [
            "User"
        ],
        "shortLoginId": "some.user",
        "email": "some.user@example.com",
        "bot": false,
        "admin": false,
        "active": true,
        "_integrationName": "Slack Example",
        "realName": "Some User",
        "displayName": "Some User",
        "_beginOn": 1759510655079
    },
    "labels": [
        "Entity",
        "User",
        "slack_user"
    ]
}

This event shows us that:

The record is for an event type of entity (rather than a relationship)
The operation is an "update" u meaning this is a new version of the record
Properties are the new properties for this entity at this time

Configuring J1DS

During the closed Beta phase the configuration of J1DS is coordinated through the customer success and support team. That team will need to provide customers with some specific details to make the system work, and the customer will need to provide details about the S3 bucket and associated AWS account.

The steps for configuration are:

Create an AWS S3 bucket in the specific region that corresponds to your JupiterOne tenant

US: (.us.jupiterone.io): us-east-2 EU: (.eu.jupiterone.io): eu-central-1

NOTE: These specific regions are required due to network traffic costs. If the bucket is created in the wrong region J1DS will not function.

Configure the AWS S3 bucket to have the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowJupiterOnePutObject",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<JUPITERONE_AWS_ACCOUNT>:role/<JUPITERONE_ROLE_NAME>"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<BUCKET_NAME>/jupiterone/*",
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "true"
                }
            }
        }
    ]
}

NOTE: The specific values for JUPITERONE_AWS_ACCOUNT and JUPITERONE_ROLE_NAME will be confirmed to the customer via the customer success and support team.

Once these setup tasks are completed and verified with JupiterOne the CDC stream will be enabled.

Frequently Asked Questions (FAQ)

Q: When do you do a full export of the database?

A: J1DS is intended to provide a change stream from the point at which it is enabled, it does not provide a full export of the contents of the JupiterOne database. The change data is intended to be used to, for example, construct the history of an entity in JupiterOne such that the current state of the entity in JupiterOne is "now" and the CDC data can be used to walk the history backwards to see previous versions of the object, computing diffs for interesting properties as it is processed.

Q: How is this system secured?

A: There are multiple steps to ensure that data is transferred securely, and to the intended bucket.

When configuring J1DS you must provide the bucket name AND the AWS account that the bucket belongs to. When J1DS writes data to the S3 bucket the expected bucket owner mechanism is used. This mitigates AWS S3 bucket hijacking where the same bucket might get recreated in a different AWS account
The policy needed by J1DS to write to the customer S3 bucket gives no read access to that bucket. The system can only write data; it cannot read data

Q: Does J1DS Support S3 managed KMS Encryption?

A: Yes! The PutObject API calls made by J1DS (and the multipart upload variant) will work with S3 managed KMS encryption keys.

Q: How Frequently is Data Written?

A: J1DS will write CDC data to the bucket every few minutes. Timing may vary between every 5 minutes and every 15 minutes. It's also possible that there are no new transaction logs to process, in which case data will not be written.

Known Issues and Limitations

The data written to the S3 bucket is not currently compressed. We will be adding gzip compression to these objects in the future.
Data is written on a best efforts basis, and the design is such that data should be written "at most once". Whilst the system is tolerant to transient failures and will attempt to transfer all transactions it is not possible to buffer transactions indefinitely.
In the extremely unlikely event a database is restored from a backup it is possible that J1DS has transmitted change data that is no longer present in the database. Due to the nature of the data held by JupiterOne this should not represent a significant issue. There is no mechanism to reconcile the CDC stream in this scenario.

Overview​

Release Status​

Getting Access​

How it Works​

Data Partitioning​

Data Format​

Configuring J1DS​

Frequently Asked Questions (FAQ)​

Q: When do you do a full export of the database?​

Q: How is this system secured?​

Q: Does J1DS Support S3 managed KMS Encryption?​

Q: How Frequently is Data Written?​

Known Issues and Limitations​

Contents