Match vs Find: Which One To Use in MongoDB

Last updated on 30 June 2022

#mongodb#databases

MongoDB aggregation pipeline has given us the $match operator to filter out all the documents in a collection and perform aggregation operations (group, lookup, sort, etc.) on the smaller set of documents (for performance reasons). But what if you only need to fetch data, can you use $match and find interchangeably?

In this article, we will explore $match vs. find, when to use each one along with some working examples.

Here is how it is structured:

  • ๐Ÿ—ž๏ธ $match stage introduction
  • ๐Ÿ‘ทโ€โ™‚๏ธ Benchmarking performance ($match vs find)
  • ๐Ÿ’ก Which one to use?
  • ๐Ÿ”ฎ Working examples using both operations
  • ๐Ÿ“ Conclusion

Introduction to $match stage

Before we jump into the benchmarking and take decision on what is the right approach for reading data, let's look at some of the basic differences in both the operations.

Here are some of the things that $match stage is capable of:

  1. It can be used in an aggregation pipeline.
  2. It filters the documents in a collection against a set of conditions. Only the matched documents proceed to the next stage of the aggregation pipeline.
  3. It has certain restrictions.

Find operation on the other hand, is completely opposite. It can be used independently, outside of an aggregation pipeline and it has no such restrictions as the $match stage has.

Benchmarking performance

There are a lot discussion in the community regarding the speed differences of match stage vs a simple find operation for fetching data.

Let's try to perform both of these operations on a same dataset to see for ourselves.

We have a sample dataset hosted on the free tier of Atlas Cloud for the purpose of this comparison. We will execute match and find on the same collection (and thus, same indexes). Here's how a document of our sample shipwrecks collection looks like:

Sample shipwrecks document

1{
2 "_id": {
3 "$oid": "578f6fa2df35c7fbdbaed8c4"
4 },
5 "recrd": "",
6 "vesslterms": "",
7 "feature_type": "Wrecks - Visible",
8 "chart": "US,U1,graph,DNC H1409860",
9 "latdec": {
10 "$numberDouble": "9.3547792"
11 },
12 "londec": {
13 "$numberDouble": "-79.9081268"
14 },
15 "gp_quality": "",
16 "depth": {
17 "$numberInt": "0"
18 },
19 "sounding_type": "",
20 "history": "",
21 "quasou": "",
22 "watlev": "always dry",
23 "coordinates": [{
24 "$numberDouble": "-79.9081268"
25 }, {
26 "$numberDouble": "9.3547792"
27 }]
28}

Lastly, we have created an index on the feature_type field:

$match vs find stage benchmarking - MongoDB Atlas index

Query plan in MongoDB

To execute a query the most efficient way, MongoDB creates a query plan for each query that runs. There are often multiple ways to execute a database query, which is why MongoDB only proceeds with the most efficient way, also known as, the winning plan.

The winning plans shows all the stages used, in a hierarchical fashion.

1"winningPlan" : {
2 "stage" : <STAGE1>,
3 "inputStage" : {
4 "stage" : <STAGE2>,
5 ...
6 }
7},
The inner most stage of a winning plan is executed first, passing its output (documents or index keys) to the outer stage (or parent node). So in the above example, STAGE2 runs before STAGE1.

To look at the query plan for a MongoDB query, simply chain a .explain() at the end of the query. You can also do .explain('executionStats') if you are only interested in that. Same follows for other keys in a query planner (like serverInfo & queryPlanner).

Using find()

Let's query our shipwrecks collection for our indexed field. We'll be using MongoDB Compass here but you can use this query to perform a find operation:

1db.shipwrecks.find({
2 "feature_type": "Wrecks - Visible"
3}).explain();

Here is the output on Compass:

Find operation query planner

Let's understand the above output closely:

  1. Compass gives us this nicely formatted json output of the query planner which contains all sorts of useful information.
  2. Looking at the winning plan will tell us the plan (or strategy) that MongoDB found suitable for our query.
  3. As we know, the first stage of the winning plan is the inner most node (or leaf node) of the tree. It is IXSCAN which means index scan. There are other fields in the IXSCAN scan which tells more about the index like index_name, direction, indexBounds, etc.
  4. After index scan is performed, the second stage (parent node) is the FETCH operation which finally fetches the data.
  5. executionStats.executionTimeMillis gives the total execution time taken by the query.

Using $match stage

To fetch the data using the $match stage, you can run the following query:

1db.collection.aggregate([
2 {
3 "$match": {
4 "feature_type": "Wrecks - Visible"
5 }
6 }
7]).explain();

Again, we have run the aggregation on Compass like this:

$match stage in MongoDB aggregation pipeline using Compass

And here is the stats we get:

Aggregate operation query planner

As you can see, the winning plan is exactly the same as in case of find operation. Even the execution time is close.

Execution time of find & $match aggregate operation differs in every run.

Which one to use?

Since there is no difference in the execution stages and the time taken for both operations, it is a personal preference in the case of a simple read operation.

But when it comes to modifying the data, find() operation fells short. This is where aggregation pipeline shines. You can do projections, group data, perform lookups, sort result, and so much more.

Even though it is not impossible to achieve the same with a find operation, it is certainly unnecessary, inconvenient, and error-prone. You'd have to build a custom in-house solution.

Don't reinvent the wheel they say.

Starting with MongoDB version 4.4, projections in find operation are now consistent with the projection capabilities in the aggregation pipeline. You can read more on this here.

Let's understand this with the help of an example.

Examples

Let's take a look at a few examples and see the differences in the syntax ๐Ÿ‘‡

Fetching data

A simple read operation using find is done like this:

Run a simple find operation

1db.collection.find({
2 key: 1
3})

The syntax of performing the same fetch operation using $match operator in the aggregation pipeline looks like this:

Run this aggregation pipeline

1db.collection.aggregate([
2 {
3 "$match": {
4 "key": 1
5 }
6 }
7])

I don't think there is any major difference in the syntax to make you prefer one over the other, but find operation is concise.

Feel free to chain a .explain('executionStats') to the above queries if you want to check out the execution time of the winning plan.

Counting

With find method, you can chain the .count() method to the find result:

Fetching count of matching documents

1db.collection.find({ key: 1 }).count();

Whereas in case of $match operator, you'd need another stage in the aggregation pipeline to count the items/field:

Run this to get count of matching documents

1db.collection.aggregate([
2 {
3 "$match": {
4 "key": 1
5 }
6 },
7 {
8 "$count": "key"
9 }
10])

While the syntax is starting to deviate, both approaches are still not largely different. You will be just fine in choosing one over the other. One can rightly argue on the better readability of the find operation here.

Fetch release years with good movie ratings

For the purpose of this example, let's suppose we have a movies collection and we want to know which years had the lowest movie rating greater than 7.

Here's the $match stage combined with the $group stage using the MongoDB aggregation pipeline:

Run Code

1db.movies.aggregate([
2 {
3 $group: {
4 _id: {
5 year: "$release_year"
6 },
7 minRating: { $min: "$rating" }
8 }
9 },
10 {
11 "$match": {
12 minRating: { $gt: 7 }
13 }
14 }
15])

So convenient. How can we implement the same with a find operation? Let's brainstorm:

  1. Perform a find operation to fetch ALL the data,
  2. Iterate through all the entries and store the lowest movie rating for every given release year,
  3. Finally, iterate over the results of step #2 again to collect the final results with rating greater than 7.

As you would have observed, aggregation pipeline is very useful in complex scenarios like this. It abstract away the nuances and provides a clean & concise way of getting the right insights out of your data.

Conclusion

This was a brief explainer on the differences between the find operation and match stage. We looked at the performance (in)difference in match vs. find, understood the query plans and the use-cases for both operations.

To summarize, using $match or find() is completely dependent on your use-case. If you just want to fetch the data, find is convenient. For even slightly complex scenarios, use $match.

I hope you found the article helpful. If you feel there's anything missing or can be better explained, I would love to know your thoughts in the comments down below ๐Ÿ™‚.

Liked the article?

If you enjoyed this article, you will like the other ones:

  1. Learn How to Use Group in Mongodb Aggregation Pipeline (With Exercise)
  2. 5 Tiny Developer Workflow Tips to Improve Your Productivity
  3. How to Use Project in Mongodb Aggregation Pipeline
  4. Introduction to TCP Connection Establishment for Software Developers
  5. 5 Things You Must Know About Building a Reliable Express.js Application
ย 
Liked the article? Share it on: Twitter
No spam. Unsubscribe at any time.