Regular Expressions
Regular Expressions (regex) can be used in J1QL to filter against properties and to extract values using named capture groups in RETURN statements.
Example Queries
All Administrator Roles
Find all roles called Administrator using a case-insensitive search on the role name
FIND AccessRole WITH name = /administrator/i
All AWS Policies Allowing Create or Delete
Find all AWS entities that are related by an ALLOW policy which includes Update and Delete permission flags
FIND * WITH _integrationType = "aws"
THAT ALLOWS >> AS r *
WHERE r.permissionFlags = /..UD.../
RETURN TREE
Supported Features
The features available today are restricted based on what is supported by the upstream storage services, and what is considered to be safe in regex for performance and complexity.
Character Classes
Standard character classes are supported:
[0-9][a-zA-Z][a-zA-Z0-9]
Some shorthand classes are also supported:
\d\D\w\W
The following are currently NOT supported:
\sand\Sthese whitespace shorthand classed are not supported. You can use a literal spacein your regex.- POSIX character classes such as
[:digit:] - Literal whitespace in the regex needs to be in a character class. For example
/john[ ]smith/to match the stringjohn smith.
The limitation on use of \s and \S and whitespace is due to the regex implementaion in ElasticSearch. This limitation is expected to be resolved soon.
Anchor Tags
The start ^ and $ anchor tags are not supported at this time, although regex filters can be combined with the ^= starts with and $= ends with comparison operators:
FIND User WITH name ^= /john/i
FIND User WITH name $= /smith/i
Alternation
Regular expression | alternation is not supported, although multiple regex filters can be applied to the same field using the J1QL AND and OR syntax:
FIND User WITH (name = /john/i OR name = /smith/i)
Other Unsupported Features
Regex has many features, some additional not currently supported features:
- Lookarounds
- Atomic Groups
- Possessive Quantifiers
Named capture groups
Named capture groups are only supported in RETURN statements and only supported by invoking the REGEX function.
The REGEX function takes two parameters:
- The property to search
- The regex to search with
REGEX requires that the regex argument has a named capture group. Anything else will fail to parse.
Capture group names
Capture group names must only be made up of letters. Anything else will fail to parse.
Return values
The REGEX function extracts named capture group values from each entity's property. Each capture group becomes its own column in the result, with the capture group name as the column name. The entirety of the match will not be included unless the REGEX function itself is aliased.
Entities where the property does not match the regex pattern will still appear in the results, with null values for the capture group columns and alias.
In practice, that means that
FIND User as u RETURN REGEX(u.username, '(?<firstLetter>\w).*')
will return a single column firstLetter that contains the first letter of the username for each user. Users whose username does not match the pattern will have a null value for firstLetter.
FIND User as u RETURN REGEX(u.username, '(?<firstLetter>\w).*(?<lastLetter>\w)') as username
returns three columns: firstLetter which contains the first letter of the match, lastLetter which contains the last letter of the match, and username which contains the entirety of the match. For non-matching entities, all three columns will be null.