Sorting in MongoDB Aggregation Pipeline Explained

Last updated on 15 July 2022

#mongodb#databases

If you have ever used MongoDB, chances are you have used sorting operation at least once. While it is very trivial, you need to pay extra care when sorting in the MongoDB aggregation pipeline.

There are some stages in the aggregation pipeline that disturb the order of documents (eg. $group stage). That's why the order of the stages in your aggregation pipeline matters.

In this article, we will take a look at how you can use the $sort stage in the MongoDB aggregation pipeline, customize the functionality using multiple fields, and perform consistent sorting.

Here's how the article is structured:

🗳️ Establish the data
🎍 Sorting with single field
🪞 Case insensitive sorting
💫 Using multiple fields
🔮 Search result relevance based sorting
✨ Perform consistent sorting
🥅 Memory restriction
🖼️ Conclusion

There will be links for all the code snippets so you can tweak & test while going through the article.

Establish the data

To understand sorting, we will need some sample data for our experiments. You are always free to try this out on a different dataset.

For the purpose of this tutorial, we will use this sample Movies collection with only 5 documents in it:

1{
2  "name": "Spidey One way home",
3  "release_year": "2021",
4  "rating": 9,
5  "starring": [
6    "Tom Hanks",
7    "Tom Holland",
8    "Mark Zucks",
9    "Samy"
10  ],
11  "runtime": 120,
12  "totalReviews": 2000,
13  "director": "Jon What"
14},
15{
16  "name": "The Arrival of a Train",
17  "release_year": "1896",
18  "rating": 6,
19  "starring": [
20    "Shawn Ching",
21    "Looker Blindspot",
22    "Tom Hanks"
23  ],
24  "runtime": 115,
25  "totalReviews": 720,
26  "director": "Ricky"
27},
28{
29  "name": "Lost persuit of adventure",
30  "release_year": "2005",
31  "rating": 7.1,
32  "starring": [
33    "Jimmy simmon",
34    "Catarina"
35  ],
36  "runtime": 150,
37  "totalReviews": 823,
38  "director": "Ricky"
39},
40{
41  "name": "Jungle Warrior",
42  "release_year": "2016",
43  "rating": 6,
44  "starring": [
45    "Stormer",
46    "Carmony",
47    "Tom Hanks"
48  ],
49  "runtime": 150,
50  "totalReviews": 1368,
51  "director": "Whim Wailer"
52},
53{
54  "name": "the Last of the us all",
55  "release_year": "2005",
56  "rating": 8.5,
57  "starring": [
58    "Samy",
59    "George wise",
60    "Pennywise"
61  ],
62  "runtime": 120,
63  "totalReviews": 1800,
64  "director": "Jon What"
65}

Sort stage

To sort in MongoDB aggregation pipeline, here is how the syntax looks like:

1{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

Where, field1 (and field2) are fields in your MongoDB collection and the sort order value specifies the order of sorting. It takes one of the following three values:

To sort in an ascending order, the value is 1.
For the descending order, it is -1.
{ $meta: "textScore" } to sort by computed score metadata (discussed in the later section).

Sorting with single field

Let's start with a simple scenario where we need to sort using a single field in ascending order.

Run Code

1{
2  "$sort": {
3    "name": 1
4  }
5}

We are passing the field name (in this case, "name") and the sort order to the sort stage in the aggregation pipeline. This translates to: "sort using the name field, in ascending order".

Here's the sorted output:

1[
2  {
3    "_id": ObjectId("5a934e000102030405000003"),
4    "director": "Whim Wailer",
5    "name": "Jungle Warrior",
6    "rating": 5.9,
7    "release_year": "2016",
8    "runtime": 150,
9    "starring": [
10      "Stormer",
11      "Carmony",
12      "Tom Hanks"
13    ],
14    "totalReviews": 1368
15  },
16  {
17    "_id": ObjectId("5a934e000102030405000002"),
18    "director": "Ricky",
19    "name": "Lost persuit of adventure",
20    "rating": 7.1,
21    "release_year": "2005",
22    "runtime": 150,
23    "starring": [
24      "Jimmy simmon",
25      "Catarina"
26    ],
27    "totalReviews": 823
28  },
29  {
30    "_id": ObjectId("5a934e000102030405000000"),
31    "director": "Jon What",
32    "name": "Spidey One way home",
33    "rating": 9,
34    "release_year": "2021",
35    "runtime": 120,
36    "starring": [
37      "Tom Hanks",
38      "Tom Holland",
39      "Mark Zucks",
40      "Samy"
41    ],
42    "totalReviews": 2000
43  },
44  {
45    "_id": ObjectId("5a934e000102030405000001"),
46    "director": "Ricky",
47    "name": "The Arrival of a Train",
48    "rating": 6,
49    "release_year": "1896",
50    "runtime": 115,
51    "starring": [
52      "Shawn Ching",
53      "Looker Blindspot",
54      "Tom Hanks"
55    ],
56    "totalReviews": 720
57  },
58  {
59    "_id": ObjectId("5a934e000102030405000004"),
60    "director": "Jon What",
61    "name": "The Last of the us all",
62    "rating": 8.5,
63    "release_year": "2005",
64    "runtime": 120,
65    "starring": [
66      "Samy",
67      "George wise",
68      "Pennywise"
69    ],
70    "totalReviews": 1800
71  }
72]

Case insensitive sorting

By default, MongoDB considers the character casing when sorting.

To make case-insensitive sorting query in the aggregation pipeline, we can pre-process the documents in our Movies collection. How? Let's take a look:

Run Code

1{
2    "$addFields": {
3      "lowercasedName": {
4        "$toLower": "$name"
5      }
6    }
7  },
8  {
9    "$sort": {
10      "lowercasedName": 1
11    }
12  }

We are adding a new field to our movie documents called "lowercasedName". And it is nothing but a lowercase version of the movie's name. Second, we are sorting on the newly created field instead of the "name" field.

Notice we had two movie documents with the name starting with "The" and "the". Performing the above query normalizes it.

Movies sorted by their names in ascending order

1[
2  {
3    "_id": ObjectId("5a934e000102030405000003"),
4    "director": "Whim Wailer",
5    "lowercasedName": "jungle warrior",
6    "name": "Jungle Warrior",
7    "rating": 5.9,
8    "release_year": "2016",
9    "runtime": 150,
10    "starring": [
11      "Stormer",
12      "Carmony",
13      "Tom Hanks"
14    ],
15    "totalReviews": 1368
16  },
17  {
18    "_id": ObjectId("5a934e000102030405000002"),
19    "director": "Ricky",
20    "lowercasedName": "lost persuit of adventure",
21    "name": "Lost persuit of adventure",
22    "rating": 7.1,
23    "release_year": "2005",
24    "runtime": 150,
25    "starring": [
26      "Jimmy simmon",
27      "Catarina"
28    ],
29    "totalReviews": 823
30  },
31  {
32    "_id": ObjectId("5a934e000102030405000000"),
33    "director": "Jon What",
34    "lowercasedName": "spidey one way home",
35    "name": "Spidey One way home",
36    "rating": 9,
37    "release_year": "2021",
38    "runtime": 120,
39    "starring": [
40      "Tom Hanks",
41      "Tom Holland",
42      "Mark Zucks",
43      "Samy"
44    ],
45    "totalReviews": 2000
46  },
47  {
48    "_id": ObjectId("5a934e000102030405000001"),
49    "director": "Ricky",
50    "lowercasedName": "the arrival of a train",
51    "name": "The Arrival of a Train",
52    "rating": 6,
53    "release_year": "1896",
54    "runtime": 115,
55    "starring": [
56      "Shawn Ching",
57      "Looker Blindspot",
58      "Tom Hanks"
59    ],
60    "totalReviews": 720
61  },
62  {
63    "_id": ObjectId("5a934e000102030405000004"),
64    "director": "Jon What",
65    "lowercasedName": "the last of the us all",
66    "name": "the Last of the us all",
67    "rating": 8.5,
68    "release_year": "2005",
69    "runtime": 120,
70    "starring": [
71      "Samy",
72      "George wise",
73      "Pennywise"
74    ],
75    "totalReviews": 1800
76  }
77]

Similar output can be achieved using collation (introduced in version 3.4) as well.

Using multiple fields

Extending to the single field sorting mechanism, you can choose to sort by multiple fields as well.

Run Code

1{
2  "$sort": {
3    "rating": -1,
4    "release_year": 1
5  }
6}

Sorting happens incrementally for every field present in the sort stage.

The above sort query first sorts the results by descending rating values. Then, it sorts the results by the ascending release year. You get good old movies 📺.

Search result relevance based sorting

If you are performing a $match on a indexed text field, you can sort the search result by relevance.

1{
2  $match: {
3    $text: {
4      $search: "of"
5    }
6  }
7},
8{
9  $sort: {
10    score: {
11      $meta: "textScore"
12    }
13  }
14}

Here's how it works:

First $match stage searches for the keyword "of" in the text indexed field
Using $meta: "textScore" sorts the result in descending order of relevance

$text search requires a field indexed with a text index

Perform consistent sorting

To perform consistent sorting, there must be at least one unique field passed to the sort stage (_id is a good candidate).

If you need the sorting order to be exactly same on every pipeline run, you need consistent sorting. You can choose any unique value field for sorting (in addition to the fields you're sorting by) and you will easily achieve the desired result.

Let's take an example.

First, we are sorting by ascending value of release year. Here is the code:

Run Code

1{
2  "$sort": {
3    "release_year": 1
4  }
5}

You will notice there are two movies with the same release year of 2005. Now go ahead and put the second movie with release year of 2005 before the first one (simulating data shuffling).

Run it again. Different order of the those two movies? Yes, this is inconsistent sorting because we do not have any unique field in our sort stage.

Now let's try the same exercise with this code:

Run Code

1{
2  "$sort": {
3    "release_year": 1,
4    "name": -1
5  }
6}

You won't get any inconsistent ordering of those two movie documents. This is because we have introduced a unique field in our sort stage. Simple as that.

Optimizations

As stated in the official MongoDB documentation, sort stage can take benefit of an index if it's the first stage in the aggregation pipeline (or only after match stage).

You can read more about the optimization here.

Memory restriction

This is similar to how group stage works. When you exceed the default memory limit of 100 megabytes (or MB), it throws an error. If you intend to perform sorting on a larger dataset, use allowDiskUse.

Conclusion

And this was the introduction to the sort stage in the MongoDB aggregation pipeline. I hope it gave you better understanding of how it works, how you can use multiple fields, mix & match the sorting behavior, and perform consistent sorting to avoid any surprises (in case you depend on the sorting order).

Feel free to reach out in the comments for feedback and further queries.

I hope things are sorted 😉.

Liked the article?

First of all, here's a high five 👋 because you deserve it!

If you enjoyed the article, you will like the other articles on the MongoDB aggregation pipeline: