Sorting in MongoDB Aggregation Pipeline Explained
Last updated on 15 July 2022
If you have ever used MongoDB, chances are you have used sorting operation at least once. While it is very trivial, you need to pay extra care when sorting in the MongoDB aggregation pipeline.
There are some stages in the aggregation pipeline that disturb the order of documents (eg. $group stage). That's why the order of the stages in your aggregation pipeline matters.
In this article, we will take a look at how you can use the $sort
stage in the MongoDB aggregation pipeline, customize the functionality using multiple fields, and perform consistent sorting.
Here's how the article is structured:
- 🗳️ Establish the data
- 🎍 Sorting with single field
- 🪞 Case insensitive sorting
- 💫 Using multiple fields
- 🔮 Search result relevance based sorting
- ✨ Perform consistent sorting
- 🥅 Memory restriction
- 🖼️ Conclusion
There will be links for all the code snippets so you can tweak & test while going through the article.
Establish the data
To understand sorting, we will need some sample data for our experiments. You are always free to try this out on a different dataset.
For the purpose of this tutorial, we will use this sample Movies
collection with only 5 documents in it:
1{2 "name": "Spidey One way home",3 "release_year": "2021",4 "rating": 9,5 "starring": [6 "Tom Hanks",7 "Tom Holland",8 "Mark Zucks",9 "Samy"10 ],11 "runtime": 120,12 "totalReviews": 2000,13 "director": "Jon What"14},15{16 "name": "The Arrival of a Train",17 "release_year": "1896",18 "rating": 6,19 "starring": [20 "Shawn Ching",21 "Looker Blindspot",22 "Tom Hanks"23 ],24 "runtime": 115,25 "totalReviews": 720,26 "director": "Ricky"27},28{29 "name": "Lost persuit of adventure",30 "release_year": "2005",31 "rating": 7.1,32 "starring": [33 "Jimmy simmon",34 "Catarina"35 ],36 "runtime": 150,37 "totalReviews": 823,38 "director": "Ricky"39},40{41 "name": "Jungle Warrior",42 "release_year": "2016",43 "rating": 6,44 "starring": [45 "Stormer",46 "Carmony",47 "Tom Hanks"48 ],49 "runtime": 150,50 "totalReviews": 1368,51 "director": "Whim Wailer"52},53{54 "name": "the Last of the us all",55 "release_year": "2005",56 "rating": 8.5,57 "starring": [58 "Samy",59 "George wise",60 "Pennywise"61 ],62 "runtime": 120,63 "totalReviews": 1800,64 "director": "Jon What"65}
Sort stage
To sort in MongoDB aggregation pipeline, here is how the syntax looks like:
1{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
Where, field1
(and field2
) are fields in your MongoDB collection and the sort order
value specifies the order of sorting. It takes one of the following three values:
- To sort in an ascending order, the value is
1
. - For the descending order, it is
-1
. { $meta: "textScore" }
to sort by computed score metadata (discussed in the later section).
Sorting with single field
Let's start with a simple scenario where we need to sort using a single field in ascending order.
1{2 "$sort": {3 "name": 14 }5}
We are passing the field name (in this case, "name") and the sort order to the sort stage in the aggregation pipeline. This translates to: "sort using the name field, in ascending order".
Here's the sorted output:
1[2 {3 "_id": ObjectId("5a934e000102030405000003"),4 "director": "Whim Wailer",5 "name": "Jungle Warrior",6 "rating": 5.9,7 "release_year": "2016",8 "runtime": 150,9 "starring": [10 "Stormer",11 "Carmony",12 "Tom Hanks"13 ],14 "totalReviews": 136815 },16 {17 "_id": ObjectId("5a934e000102030405000002"),18 "director": "Ricky",19 "name": "Lost persuit of adventure",20 "rating": 7.1,21 "release_year": "2005",22 "runtime": 150,23 "starring": [24 "Jimmy simmon",25 "Catarina"26 ],27 "totalReviews": 82328 },29 {30 "_id": ObjectId("5a934e000102030405000000"),31 "director": "Jon What",32 "name": "Spidey One way home",33 "rating": 9,34 "release_year": "2021",35 "runtime": 120,36 "starring": [37 "Tom Hanks",38 "Tom Holland",39 "Mark Zucks",40 "Samy"41 ],42 "totalReviews": 200043 },44 {45 "_id": ObjectId("5a934e000102030405000001"),46 "director": "Ricky",47 "name": "The Arrival of a Train",48 "rating": 6,49 "release_year": "1896",50 "runtime": 115,51 "starring": [52 "Shawn Ching",53 "Looker Blindspot",54 "Tom Hanks"55 ],56 "totalReviews": 72057 },58 {59 "_id": ObjectId("5a934e000102030405000004"),60 "director": "Jon What",61 "name": "The Last of the us all",62 "rating": 8.5,63 "release_year": "2005",64 "runtime": 120,65 "starring": [66 "Samy",67 "George wise",68 "Pennywise"69 ],70 "totalReviews": 180071 }72]
Case insensitive sorting
By default, MongoDB considers the character casing when sorting.
To make case-insensitive sorting query in the aggregation pipeline, we can pre-process the documents in our Movies
collection. How? Let's take a look:
1{2 "$addFields": {3 "lowercasedName": {4 "$toLower": "$name"5 }6 }7 },8 {9 "$sort": {10 "lowercasedName": 111 }12 }
We are adding a new field to our movie documents called "lowercasedName". And it is nothing but a lowercase version of the movie's name. Second, we are sorting on the newly created field instead of the "name" field.
Notice we had two movie documents with the name starting with "The" and "the". Performing the above query normalizes it.
Movies sorted by their names in ascending order
1[2 {3 "_id": ObjectId("5a934e000102030405000003"),4 "director": "Whim Wailer",5 "lowercasedName": "jungle warrior",6 "name": "Jungle Warrior",7 "rating": 5.9,8 "release_year": "2016",9 "runtime": 150,10 "starring": [11 "Stormer",12 "Carmony",13 "Tom Hanks"14 ],15 "totalReviews": 136816 },17 {18 "_id": ObjectId("5a934e000102030405000002"),19 "director": "Ricky",20 "lowercasedName": "lost persuit of adventure",21 "name": "Lost persuit of adventure",22 "rating": 7.1,23 "release_year": "2005",24 "runtime": 150,25 "starring": [26 "Jimmy simmon",27 "Catarina"28 ],29 "totalReviews": 82330 },31 {32 "_id": ObjectId("5a934e000102030405000000"),33 "director": "Jon What",34 "lowercasedName": "spidey one way home",35 "name": "Spidey One way home",36 "rating": 9,37 "release_year": "2021",38 "runtime": 120,39 "starring": [40 "Tom Hanks",41 "Tom Holland",42 "Mark Zucks",43 "Samy"44 ],45 "totalReviews": 200046 },47 {48 "_id": ObjectId("5a934e000102030405000001"),49 "director": "Ricky",50 "lowercasedName": "the arrival of a train",51 "name": "The Arrival of a Train",52 "rating": 6,53 "release_year": "1896",54 "runtime": 115,55 "starring": [56 "Shawn Ching",57 "Looker Blindspot",58 "Tom Hanks"59 ],60 "totalReviews": 72061 },62 {63 "_id": ObjectId("5a934e000102030405000004"),64 "director": "Jon What",65 "lowercasedName": "the last of the us all",66 "name": "the Last of the us all",67 "rating": 8.5,68 "release_year": "2005",69 "runtime": 120,70 "starring": [71 "Samy",72 "George wise",73 "Pennywise"74 ],75 "totalReviews": 180076 }77]
Using multiple fields
Extending to the single field sorting mechanism, you can choose to sort by multiple fields as well.
1{2 "$sort": {3 "rating": -1,4 "release_year": 15 }6}
The above sort query first sorts the results by descending rating values. Then, it sorts the results by the ascending release year. You get good old movies 📺.
Search result relevance based sorting
If you are performing a $match on a indexed text field, you can sort the search result by relevance.
1{2 $match: {3 $text: {4 $search: "of"5 }6 }7},8{9 $sort: {10 score: {11 $meta: "textScore"12 }13 }14}
Here's how it works:
- First
$match
stage searches for the keyword "of" in the text indexed field - Using
$meta: "textScore"
sorts the result in descending order of relevance
Perform consistent sorting
To perform consistent sorting, there must be at least one unique field passed to the sort stage (_id
is a good candidate).
If you need the sorting order to be exactly same on every pipeline run, you need consistent sorting. You can choose any unique value field for sorting (in addition to the fields you're sorting by) and you will easily achieve the desired result.
Let's take an example.
First, we are sorting by ascending value of release year. Here is the code:
1{2 "$sort": {3 "release_year": 14 }5}
You will notice there are two movies with the same release year of 2005. Now go ahead and put the second movie with release year of 2005 before the first one (simulating data shuffling).
Run it again. Different order of the those two movies? Yes, this is inconsistent sorting because we do not have any unique field in our sort stage.
Now let's try the same exercise with this code:
1{2 "$sort": {3 "release_year": 1,4 "name": -15 }6}
You won't get any inconsistent ordering of those two movie documents. This is because we have introduced a unique field in our sort stage. Simple as that.
Optimizations
As stated in the official MongoDB documentation, sort stage can take benefit of an index if it's the first stage in the aggregation pipeline (or only after match stage).
You can read more about the optimization here.
Memory restriction
This is similar to how group stage works. When you exceed the default memory limit of 100 megabytes (or MB), it throws an error. If you intend to perform sorting on a larger dataset, use allowDiskUse.
Conclusion
And this was the introduction to the sort stage in the MongoDB aggregation pipeline. I hope it gave you better understanding of how it works, how you can use multiple fields, mix & match the sorting behavior, and perform consistent sorting to avoid any surprises (in case you depend on the sorting order).
Feel free to reach out in the comments for feedback and further queries.
I hope things are sorted 😉.
Liked the article?
First of all, here's a high five 👋 because you deserve it!
If you enjoyed the article, you will like the other articles on the MongoDB aggregation pipeline: