Filtering JSON using JMESPath

Overview

The JMESPath query language is built into Amazon’s AWS CLI. As noted in the CLI documentation, this is provided via the --query global option, which takes a JMESPath string as its argument. This can be used to filter the raw JSON results returned to the AWS CLI from the server side (i.e. from objects stored in your AWS account).

You can see an overview of the --query option here, with examples.

Server-side filtering (if it is available - and relevant - for your specific CLI command) may help to reduce the volume of JSON sent across the network, prior to any JMESPath filtering.

The JMESPath tutorial, example and specification provide plenty of additional information:

It’s important to note that the --query option is part of the client-side CLI program. This means the CLI receives full JSON from the server, except where it has been optionally filtered already, as part of your CLI command.

Basic AWS CLI Example: Get the Name from the ID

Suppose you want to extract the name of an EC2 instance (assuming there is a Name tag defined), given you have the instance ID.

First, here is an unfiltered command which returns the complete JSON for a (fictional) EC2 instance:

1
2


aws ec2 describe-instances \
  --instance-ids "i-01100a1001aa1aaa0"

Note: If you are using Windows, replace the Linux \ line continuation character with ^.

The resulting JSON is presented here (heavily summarized/redacted):

JSON


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


{
    "Reservations": [
        {
            "Groups": [],
            "Instances": [
                {
                    "AmiLaunchIndex": 0,
                    "ImageId": "ami-xyz",
                    "InstanceId": "i-01100a1001aa1aaa0",

                    ...

                    "Tags": [
                        {
                            "Key": "Name",
                            "Value": "my-first-app"
                        },
                        {
                            "Key": "awsApplication",
                            "Value": "arn:aws:resource-groups:xyz..."
                        }
                    ],

                    ...

                }
            ],
            "OwnerId": "xyz",
            "ReservationId": "r-xyz"
        }
    ]
}

To get only the name, you can use this:

1
2
3


aws ec2 describe-instances \
  --instance-ids "i-01100a1001aa1aaa0" \
  --query "Reservations[*].Instances[*].Tags[? Key == 'Name'].Value"

This returns the following JSON:

JSON


1
2
3
4
5
6
7


[
    [
        [
            "my-first-app"
        ]
    ]
]

If you want to simply return this as a string, you can use the --output text option. When added to the above command, this will return:

1

my-first-app

Either way, the JMESPath command:

1

Reservations[*].Instances[*].Tags[? Key == 'Name'].Value

is the part which filters the original JSON, returning only the requested data.

Some JMESPath Syntax

The [*] syntax is a list wildcard expression. It causes JMESPath to iterate over each item in the related [ ... ] list - and to process the remaining JMESPath expression parts against each element in that list.

The ? operator defines the start of a filter expression. In our case it defines a filter on only those tags where the key is Name.

The end result is that we drill down through Reservations and Instances - and return only those Value fields which match the tags filter.

More Advanced Example: Get the ID for a given Name

Let’s say we want to do the reverse: Start with a Name value from a tag field, and find the instance ID (or IDs) to which it belongs.

In other words: What is the instance ID for the instance(s) with a name of my-first-app?

How do we retrieve a JSON field (the InstanceId field) after we have selected a more deeply nested value (in this case, tag data). How can we go “back up” the JSON nesting levels from the filtered tag data to the instance ID?

I will break this down into multiple steps, to try to show not only the end result, but how we can think our way to reaching the end result, to better understand why it works.

Let’s start with a command that runs, but isn’t what we want:

1
2


aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].InstanceId"

This lists all our instance IDs:

1
2
3
4
5
6
7
8
9


[
    [
        "i-xxx"
    ],
    [
        "i-01100a1001aa1aaa0"
    ],
    ...
]

We can also select the exact tag that meets our requirements:

1
2


aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].Tags[? Key == 'Name' && Value == 'my-first-app']"

This returns the one tag we care about:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


[
    [
        []
    ],
    ...
    [
        [
            {
                "Key": "Name",
                "Value": "my-first-app"
            }
        ]
    ]
]

But it doesn’t let us then go back up the JSON hierarchy to select the instance ID.

What we need is some combination of the above two approaches. We need to select only the Instances we care about using our filter, instead of selecting them all using Instances[*].

We can use a multiselect list to get us moving in the right direction. Within the start [ and closing ] characters of a multiselect list, we will have:

one or more non expressions separated by a comma. Each expression will be evaluated against the JSON document. Each returned element will be the result of evaluating the expression.

(OK - I’m not sure what a “non expression” is - maybe that is a typo in the documentation?)

Anyway, we will only have one expression in our case: We can place our Tags filter into the Instances[*], instead of using that wildcard *.

Our JMESPath string would therefore become:

1

Reservations[*].Instances[ Tags[? Key == 'Name' && Value == 'my-first-app'] ]

But that’s not a valid JMESPath expression - we need to convert it into a filter which operates on the tags first (and we can continue to use our existing filter operating on Key ... && Value ...):

1

Reservations[*].Instances[ ? Tags[? Key == 'Name' && Value == 'my-first-app'] ]

The only change I made was to add a ? in front of the Tags step - to turn that into another filter expression. This is now a valid expression for our multiselect list [ ... ].

This works!

it returns only those instance objects which have a tag name of my-first-app. In my case, that is only one instance.

Now, to complete our original task, we can append .InstanceId to the end of our JMESPath expression, to return only that single piece of instance data, instead of the entire instance JSON structure.

The full AWS CLI command becomes:

1
2


aws ec2 describe-instances \
  --query "Reservations[*].Instances[? Tags[? Key == 'Name' && Value == 'my-first-app'] ].InstanceId"

This returns the following JSON:

JSON


1
2
3
4
5
6
7


[
    [],
    ...
    [
        "i-01100a1001aa1aaa0"
    ]
]

Note that the JSON structure we get is at the Instances level. That was the whole point: It needs to be at that level so that we can access the InstanceId data we want.

One Extra Step - Flattening!

If you have many instances, your JSON may contain many empty [ ] JSON lists for each of your instances which does not match the my-first-app filter.

This may be cumbersome.

You can use the flatten operator [] to clean this up. It is similar to the wildcard [*] operator, except it first consolidates consecutive lists (and any sublists) into a single top-level list. That means, in our case, that all those empty JSON lists will effectively be removed:

1
2


aws ec2 describe-instances \
  --query "Reservations[*].Instances[? @.Tags[? Key=='Name' && Value=='my-first-app' ]].InstanceId[]"

Note how we changed .InstanceId to .InstanceId[]. That’s the only thing we changed.

Now the JSON output will be:

JSON


1
2
3


[
    "i-01100a1001aa1aaa0"
]

Much cleaner.

Other Languages: Python (boto3)

If you are using boto3 instead of the AWS CLI, then you will no longer be using any built-in JMESPath: Remember, JMESPath is part of the client CLI implementation, not a part of the AWS API.

But it’s easy enough to add JMESPath to your Python code:

Python


1
2
3
4
5
6
7
8


import boto3, jmespath, json

jmes_filter = "Reservations[*].Instances[? Tags[? Key == 'Name' && Value == 'my-first-app'] ].InstanceId[]"

ec2 = boto3.client('ec2')
response = ec2.describe_instances()
data = jmespath.search(jmes_filter, response)
print(json.dumps(data, indent=2))

This will give the same result as the final CLI command from the previous section.

Other Languages: Java AWS SDK (v2)

If you are using a language such as Java and its AWS SDK, then it’s a completely different paradigm.

When you use its Ec2Client::describeInstances() method, you will receive an object as your response - not a JSON structure. And you will iterate over the paginated results to extract the specific data you need:

Java


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.ec2.Ec2Client;
import software.amazon.awssdk.services.ec2.model.DescribeInstancesRequest;
import software.amazon.awssdk.services.ec2.model.DescribeInstancesResponse;
import software.amazon.awssdk.services.ec2.model.Ec2Exception;
import software.amazon.awssdk.services.ec2.model.Instance;
import software.amazon.awssdk.services.ec2.model.Reservation;
import software.amazon.awssdk.services.ec2.model.Tag;

public class AwsSdkEc2Demo {

    public static void main(String[] args) {
        Region region = Region.XYZ;
        try (Ec2Client ec2 = Ec2Client.builder()
                .region(region)
                .build()) {
            describeEC2Instances(ec2);
        }
    }

    public static void describeEC2Instances(Ec2Client ec2) {
        String nextToken = null;
        try {
            do {
                DescribeInstancesRequest request = DescribeInstancesRequest.builder().maxResults(6).nextToken(nextToken).build();
                DescribeInstancesResponse response = ec2.describeInstances(request);
                for (Reservation reservation : response.reservations()) {
                    for (Instance instance : reservation.instances()) {
                        for (Tag tag : instance.tags()) {
                            if (tag.key().equals("Name") && tag.value().equals("my-first-app")) {
                                System.out.println("Instance Id is " + instance.instanceId());
                            }
                        }
                    }
                }
                nextToken = response.nextToken();
            } while (nextToken != null);

        } catch (Ec2Exception e) {
            System.err.println(e.awsErrorDetails().errorCode());
            System.exit(1);
        }
    }

}

The above bare-bones example uses the following Maven dependencies:

XML


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>software.amazon.awssdk</groupId>
                <artifactId>bom</artifactId>
                <version>2.20.162</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>

        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>ec2</artifactId>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>2.0.9</version>
        </dependency>

    </dependencies>

There is no JSON here - and therefore no JMESPath expressions.

Postscript: Server-Side Filtering

This article is mostly about JMESPath, but just for the record, the CLI command solution could be simplified by also using --filter or --filters:

1
2
3


aws ec2 describe-instances \
  --filter Name=tag:Name,Values='my-first-app' \
  --query "Reservations[*].Instances[*].InstanceId[]"

This results in:

JSON


1
2
3


[
    "i-01100a1001aa1aaa0"
]

See here for more details.