Skip to main content

Resolving duplicate results

Sometimes a query may generate duplicate results. This duplication occurs if there are multiple paths of traversals (such as relationships) between the vertices (such as assets) referenced in a specific query. For example:

FIND aws_eni WITH publicIpAddress != undefined AS nic
THAT RELATES TO aws_instance
THAT RELATES TO aws_security_group AS sg THAT allows Internet
WHERE nic.securityGroupIds = sg.groupId

This query attempts to find network interfaces associated with a security group that allows public-facing AWS EC2 instances. In this case, there could be multiple security group rules allowing access to/from the Internet, which may result in duplicate data in the query result because each individual traversal is a successful match to the query. This issue can be resolved by leveraging UNIQUE and RETURN.

UNIQUE and RETURN

You can use a combination of UNIQUE and RETURN keywords to filter out the duplicates. We can modify the above example query to eliminate duplicate data within our results:

FIND UNIQUE aws_eni WITH publicIpAddress != undefined AS nic
THAT RELATES TO aws_instance
THAT RELATES TO aws_security_group AS sg THAT ALLOWS Internet
WHERE
nic.securityGroupIds = sg.groupId
RETURN
nic.id, nic.subnetId, nic.attachmentId,
nic.active, nic.privateIp, nic.publicIp,
nic.vpcId, nic.securityGroupIds, nic.securityGroupNames,
nic.tag.AccountName, nic.webLink

Limitation: UNIQUE keyword must be used together with RETURN.

The following query may return multiple rows containing the same values. In the below example, Peter Parker is using the same username across different applications:

FIND User AS u THAT IS Person AS p
RETURN u.username, p.firstName, p.lastName, p.email
u.usernamep.firstNamep.lastNamep.email
spidermanPeterParkernot.spiderman@example.com
spidermanPeterParkernot.spiderman@example.com
batmanBruceWaynetotally.not.batman@example.com
bwayneBruceWaynetotally.not.batman@example.com

Modifying the query to leverage the UNIQUE keyword will make the J1QL engine deduplicate rows:

FIND UNIQUE User AS u THAT IS Person AS p
RETURN u.username, p.firstName, p.lastName, p.email
u.usernamep.firstNamep.lastNamep.email
spidermanPeterParkernot.spiderman@example.com
batmanBruceWaynetotally.not.batman@example.com
bwayneBruceWaynetotally.not.batman@example.com
note

De-duplicating rows using the UNIQUE keyword will cause the_meta property returned for each row of data to be stripped out.

MERGE

The function MERGE() allows you to merge multiple properties into a single property list. You can now combine multiple (defined) properties into a single property without having to choose to return one property or another.

FIND UNIQUE User AS u
THAT IS Person AS p
RETURN p.displayName, merge(p.email, u.email, u.publicEmail) AS “Email List” (edited)