Resolving duplicate results
Sometimes a query may generate duplicate results. This duplication occurs if there are multiple paths of traversals (such as relationships) between the vertices (such as assets) referenced in a specific query. For example:
FIND aws_eni WITH publicIpAddress != undefined AS nic
THAT RELATES TO aws_instance
THAT RELATES TO aws_security_group AS sg THAT allows Internet
WHERE nic.securityGroupIds = sg.groupId
This query attempts to find network interfaces associated with a security group that allows public-facing AWS EC2 instances. In this case, there could be multiple security group rules allowing access to/from the Internet, which may result in duplicate data in the query result because each individual traversal is a successful match to the query. This issue can be resolved by leveraging UNIQUE
and RETURN
.
UNIQUE and RETURN
You can use a combination of UNIQUE
and RETURN
keywords to filter out the duplicates. We can modify the above example query to eliminate duplicate data within our results:
FIND UNIQUE aws_eni WITH publicIpAddress != undefined AS nic
THAT RELATES TO aws_instance
THAT RELATES TO aws_security_group AS sg THAT ALLOWS Internet
WHERE
nic.securityGroupIds = sg.groupId
RETURN
nic.id, nic.subnetId, nic.attachmentId,
nic.active, nic.privateIp, nic.publicIp,
nic.vpcId, nic.securityGroupIds, nic.securityGroupNames,
nic.tag.AccountName, nic.webLink
Limitation:
UNIQUE
keyword must be used together withRETURN
.
The following query may return multiple rows containing the same values. In the below example, Peter Parker is using the same username across different applications:
FIND User AS u THAT IS Person AS p
RETURN u.username, p.firstName, p.lastName, p.email
u.username | p.firstName | p.lastName | p.email |
---|---|---|---|
spiderman | Peter | Parker | not.spiderman@example.com |
spiderman | Peter | Parker | not.spiderman@example.com |
batman | Bruce | Wayne | totally.not.batman@example.com |
bwayne | Bruce | Wayne | totally.not.batman@example.com |
Modifying the query to leverage the UNIQUE
keyword will make the J1QL engine deduplicate rows:
FIND UNIQUE User AS u THAT IS Person AS p
RETURN u.username, p.firstName, p.lastName, p.email
u.username | p.firstName | p.lastName | p.email |
---|---|---|---|
spiderman | Peter | Parker | not.spiderman@example.com |
batman | Bruce | Wayne | totally.not.batman@example.com |
bwayne | Bruce | Wayne | totally.not.batman@example.com |
De-duplicating rows using the UNIQUE
keyword will cause the_meta
property returned for each row of data to be stripped out.
MERGE
The function MERGE()
allows you to merge multiple properties into a single property list. You can now combine multiple (defined) properties into a single property without having to choose to return one property or another.
FIND UNIQUE User AS u
THAT IS Person AS p
RETURN p.displayName, merge(p.email, u.email, u.publicEmail) AS “Email List” (edited)