Querying J1DS Data in Amazon Athena

This page walks through deploying an AWS Glue and Athena infrastructure stack on top of your J1DS S3 bucket and running three SQL query patterns against your JupiterOne entity and relationship change history.

For instructions on enabling J1DS and configuring your S3 bucket, see Data Streaming (J1DS).

Prerequisites

AWS account with permissions to create CloudFormation stacks, Glue databases and tables, Athena workgroups, S3 buckets, and IAM roles
AWS CLI v2 installed and configured with credentials for your target account
J1DS enabled and streaming to an S3 bucket (see Data Streaming (J1DS) for setup)
The S3 bucket name where J1DS is writing data

Deploy the Stack

1. Download the CloudFormation template:

Download the CloudFormation template

2. Deploy the stack:

aws cloudformation deploy \
  --template-file j1ds-athena-setup.yaml \
  --stack-name j1ds-usage-example \
  --parameter-overrides \
    SourceBucketName=YOUR_BUCKET_NAME \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM

Replace YOUR_BUCKET_NAME with your J1DS S3 bucket name.

3. Note the stack outputs — the deploy command prints the Glue database name (j1ds-usage-example-cdc) and Athena WorkGroup name (j1ds-usage-example). These values appear in every query.

Verify the Deployment

Run a COUNT query in the Athena console to confirm data is accessible. Open the Amazon Athena console, select the j1ds-usage-example workgroup, and run:

SELECT COUNT(*) FROM "j1ds-usage-example-cdc".entities
WHERE accountid = 'YOUR_ACCOUNT_ID'
  AND year = 2026 AND month = 3;

Replace YOUR_ACCOUNT_ID with your JupiterOne account ID and adjust year/month to the current period.

Expected: a non-zero row count. If the result is zero, verify that J1DS is streaming (check your S3 bucket for recent .jsonl.gz files) and that the year/month values match files present in the bucket.

info

The accountid partition filter is required for all queries. Omitting it causes an Athena error, not an empty result — the table uses an injected partition projection that requires a caller-supplied value.

This is especially important if you stream multiple JupiterOne accounts (e.g., dev and prod) into the same CDC bucket. Each account's data is stored under a separate accountId= partition, so the accountid filter ensures your query returns results for a single account.

CDC Schema Reference

Record Structure

Every row in the entities and relationships tables corresponds to one CDC event:

Field	Column	Type	Values
operation	`operation`	string	`c` (create), `u` (update), `d` (delete)
eventType	`eventtype`	string	`entity` or `relationship`
properties	`properties`	string (JSON)	JSON blob — access with `json_extract_scalar()`
labels	`labels`	array<string> (entities) / string (relationships)	Entity: `["Entity","User","slack_user"]` / Relationship: `"HAS"`

Key Properties Fields

The properties column is a raw JSON string. Use json_extract_scalar() to access individual fields:

Field	Access Pattern	Description
`_id`	`json_extract_scalar(properties, '$._id')`	Stable UUID — the entity or relationship identifier
`_type`	`json_extract_scalar(properties, '$._type')`	Provider-specific type, e.g., `slack_user`, `aws_s3_bucket`
`_class`	`json_extract_scalar(properties, '$._class')`	Abstract class, e.g., `User`, `DataStore`
`_beginOn`	`CAST(json_extract_scalar(properties, '$._beginon') AS BIGINT)`	Epoch milliseconds when this version became current
`displayName`	`json_extract_scalar(properties, '$.displayName')`	Human-readable name

Query: Entity History

Use this query to audit all state changes to a specific entity over time. It returns every CDC event for a single entity ID, ordered chronologically. Replace YOUR_ENTITY_ID with the _id value of the entity you want to audit.

SELECT
    year,
    month,
    day,
    operation,                                              -- c=create, u=update, d=delete
    eventtype,                                              -- always 'entity' in this table
    json_extract_scalar(properties, '$._id')          AS entity_id,
    json_extract_scalar(properties, '$._type')        AS entity_type,
    json_extract_scalar(properties, '$._class')       AS entity_class,
    json_extract_scalar(properties, '$.displayName')  AS display_name,
    properties
FROM "j1ds-usage-example-cdc".entities
WHERE accountid = 'YOUR_ACCOUNT_ID'                        -- REQUIRED: injected partition
  AND year  = 2026                                         -- narrow partition range
  AND month = 3                                            -- to reduce bytes scanned
  AND json_extract_scalar(properties, '$._id') = 'YOUR_ENTITY_ID'  -- replace with your entity _id
ORDER BY year, month, day;

Example Output

year	month	day	operation	entity_type	entity_class	display_name
2026	3	17	c	slack_user	User	Alice Johnson
2026	3	18	u	slack_user	User	Alice Johnson
2026	3	19	u	slack_user	User	Alice J.

Query: Latest State Snapshot

Use this query to get the current state of all entities in your account — one row per entity, deduplicated to the most recent version. It uses MAX_BY to select the properties blob with the highest _beginon timestamp for each entity ID.

SELECT
    json_extract_scalar(latest_properties, '$._id')         AS entity_id,
    json_extract_scalar(latest_properties, '$._type')       AS entity_type,
    json_extract_scalar(latest_properties, '$._class')      AS entity_class,
    json_extract_scalar(latest_properties, '$.displayName') AS display_name,
    latest_properties                                       AS properties
FROM (
    SELECT
        MAX_BY(properties, CAST(json_extract_scalar(properties, '$._beginon') AS BIGINT)) AS latest_properties
    FROM "j1ds-usage-example-cdc".entities
    WHERE accountid = 'YOUR_ACCOUNT_ID'                    -- REQUIRED: injected partition
      AND year  = 2026                                     -- filter to recent data
      AND month = 3                                        -- to control scan cost
      AND json_extract_scalar(properties, '$._beginon') IS NOT NULL  -- guard against missing _beginon
    GROUP BY json_extract_scalar(properties, '$._id')
)
ORDER BY entity_type, display_name;

-- NOTE: _beginon is epoch milliseconds stored as a JSON number.
-- CAST to BIGINT is required for correct MAX_BY ordering (not lexicographic).
-- operation: c=create, u=update, d=delete

Example Output

entity_id	entity_type	entity_class	display_name
a1b2c3d4-...	aws_s3_bucket	DataStore	my-data-bucket
d4e5f6a7-...	slack_user	User	Alice Johnson

Query: Change Detection

Use this query to see what changed in your account over a time window, grouped by day, operation type, and entity type. Adjust the day BETWEEN range to scope the window you want to examine.

SELECT
    year,
    month,
    day,
    operation,                                              -- c=create, u=update, d=delete
    json_extract_scalar(properties, '$._type')        AS entity_type,
    COUNT(*)                                           AS change_count
FROM "j1ds-usage-example-cdc".entities
WHERE accountid = 'YOUR_ACCOUNT_ID'                        -- REQUIRED: injected partition
  AND year  = 2026
  AND month = 3
  AND day   BETWEEN 1 AND 7                                -- replace with your window
GROUP BY year, month, day, operation,
         json_extract_scalar(properties, '$._type')
ORDER BY year, month, day, operation;

-- TIP: To compare two windows, wrap each as a subquery and LEFT JOIN on entity_type,
-- or use EXCEPT to find entity types that appeared only in one window.
-- eventtype is always 'entity' in this table (use relationships table for relationship changes).

Example Output

year	month	day	operation	entity_type	change_count
2026	3	1	c	aws_iam_role	12
2026	3	1	u	slack_user	45
2026	3	2	c	aws_s3_bucket	3

Cost and Performance Notes

note

Athena charges $5 per TB scanned
All queries include partition filters (accountid, year, month) to minimize scan scope
The WorkGroup enforces a 10 GB per-query cutoff (~$0.05 maximum per query)
Typical per-query cost for a single account's daily data: under $0.01
Adding day filters (e.g., AND day BETWEEN 1 AND 7) reduces scanned data further

Prerequisites​

Deploy the Stack​

Verify the Deployment​

CDC Schema Reference​

Record Structure​

Key Properties Fields​

Query: Entity History​

Example Output​

Query: Latest State Snapshot​

Example Output​

Query: Change Detection​

Example Output​

Cost and Performance Notes​

Contents

Prerequisites

Deploy the Stack

Verify the Deployment

CDC Schema Reference

Record Structure

Key Properties Fields

Query: Entity History

Example Output

Query: Latest State Snapshot

Example Output

Query: Change Detection

Example Output

Cost and Performance Notes