How to Use Project in Mongodb Aggregation Pipeline

Last updated on 19 December 2021

#Databases#MongoDB

If you have ever worked with MongoDB, chances are you've heard of or even come across the aggregation pipeline that comes with it. Aggregation pipeline in MongoDB is a very powerful feature to get insights out of your data that is not possible with simple find operation.

In this article, we're going to take at the project stage. We do have the projection support in find queries as well but it cannot achieve what project stage in the aggregation pipeline can. Today we're going to explore some of the common use-cases for the project stage:

  1. Projecting all fields
  2. Working with arrays
  3. Naming field in project stage
  4. Filtering fields
  5. Literal fields
  6. Conditionally removing fields

Establishing the data

Before we jump into the aggregation pipeline and the project stage, we would need some data to work with. I'm taking an example Restaurants collection for understanding the concept here. I'll be linking to a MongoDB playground for each query so that you can play around and see how it affects the output.

Here's the Restaurants collection with only 5 documents containing random data:

1{
2 name: 'Thai Restaurant',
3 location: 'London',
4 notOpenOnDays: ['Monday', 'Thursday'],
5 entryFee: '$5',
6 rating: '3',
7},
8{
9 name: 'Indian.com',
10 location: 'Broughton',
11 notOpenOnDays: ['Wednesday'],
12 entryFee: '$0',
13 rating: '4',
14},
15{
16 name: '23rd Street Pizza',
17 location: 'Fallowfield',
18 notOpenOnDays: ['Sunday'],
19 entryFee: '$2',
20 rating: '4',
21},
22{
23 name: '7 STAR Pizza',
24 location: 'Newcastle',
25 notOpenOnDays: ['Saturday', 'Sunday'],
26 entryFee: '$10',
27 rating: '2',
28},
29{
30 name: 'A Cake A Shake',
31 location: 'London',
32 notOpenOnDays: ['Monday'],
33 entryFee: '$1',
34 rating: '4',
35},

Projecting all fields

In the MongoDB aggregation pipeline, a pipeline stage has access to only those fields which are included in the previous stage.

So if you want to include all fields, you can include everything in the project stage so that it gets passed to the next one. Any field missed is not going to make it to the next stage of the pipeline.

MongoDB gives us a way to state which fields we want to include and exclude. So if you want to include only the name, the project stage would be:

1{
2 $project: { name: 1 }
3}

You can also put true or any number greater than 1 to show inclusion of a field but let's just stick to 1 for simplicity & conciseness.

Similarly, if you want to exclude the field you'd do the opposite:

1{
2 $project: { name: 0 }
3}

And, if you want to include everything in the Restaurants collection, here's what the project stage would look like:

Run Code

1{
2 $project: {
3 name: 1,
4 location: 1,
5 notOpenOnDays: 1,
6 entryFee: 1,
7 rating: 1,
8 }
9}
Including a field in the project stage implicitly means excluding all the other fields and vice-versa.

Working with arrays

Creating arrays

Creating a new array field is just as creating any non-array field, the only difference is that the field value is wrapped in [] to make it an array.

We can do it like this:

Run Code

1{
2 $project: {
3 review: [ "$rating" ]
4 }
5}

Now review field will be an array with rating values in it.

Extracting element(s) at specific index

To get an array element at a specific index, we use use $arrayElemAt,

Run Code

1{
2 $project: {
3 fieldName: { $arrayElemAt: ["$arrayFieldName", index] }
4 }
5}

With the latest MongoDB version 4.4, we now also have dedicated operators to get first and last elements in an array. The operators are $first and $last.

To get the first element,

Run Code

1{
2 $project: {
3 review: [ "$rating" ]
4 }
5},
6{
7 $project: {
8 newReview: { $first: "$review" },
9 _id: 0
10 }
11}

We are putting the restaurant rating in an array named review and in the next project stage we are extracting the first element of the array created in the stage before. This gives us:

1{ "newReview" : 3 }
2{ "newReview" : 4 }
3{ "newReview" : 4 }
4{ "newReview" : 2 }
5{ "newReview" : 4 }
We can chain as many project stages as we want in an aggregation pipeline.

To get the last element,

Run Code

1{
2 $project: {
3 review: [ "$rating" ]
4 }
5},
6{
7 $project: {
8 newReview: { $last: "$review" },
9 _id: 0
10 }
11}

This will give the same output as there is only one element in the array currently, let's add some values to the reviews array.

Run Code

1{
2 $project: {
3 review: [ 1, 2, 3, "$rating" ]
4 }
5},
6{
7 $project: {
8 newReview: { $last: "$review" },
9 _id: 0
10 }
11}

We would get the ratings as the output of this as we have it in the last position:

1{ "newReview" : 3 }
2{ "newReview" : 4 }
3{ "newReview" : 4 }
4{ "newReview" : 2 }
5{ "newReview" : 4 }

Filtering values

Filter lets you write custom filter which is applied to every element in the array and only the elements which pass the filter are passed as output to next stage.

Run Code

1{
2 $project: {
3 _id: 0,
4 openOnWeekend: {
5 $filter: {
6 input: "$notOpenOnDays",
7 as: "notOpen",
8 cond: {
9 $and: [
10 { $ne: ["$$notOpen", "Saturday"] },
11 { $ne: ["$$notOpen","Sunday"] },
12 ]
13 }
14 }
15 }
16 }
17 }

Output of the above aggregation query would only contain the restaurants which are open on weekends:

1[
2 {
3 "openOnWeekend": [ "Monday", "Thursday" ]
4 },
5 {
6 "openOnWeekend": [ "Wednesday" ]
7 },
8 {
9 "openOnWeekend": []
10 },
11 {
12 "openOnWeekend": []
13 },
14 {
15 "openOnWeekend": [ "Monday" ]
16 }
17]
We use $ for values present in our document and $$ for fields that are introduced within a pipeline stage.

Transforming data

$map operator is just like the array.map() in javascript, it takes an input array, applies an expression on each item and return the transformed array.

Here you can directly transform data as compared to filter where you need to provide a condition to handle the data appropriately.

Let's transform the notOpenOnDays array and shorten the weekday names:

Run Code

1{
2 $project: {
3 _id: 0,
4 newPrice: {
5 $map: {
6 input: "$notOpenOnDays",
7 as: "noOpenOn",
8 in: {
9 $substrCP: [ "$$noOpenOn", 0, 3 ]
10 }
11 }
12 }
13 }
14 }

We have the substring operator to get only first 3 characters and remove the rest:

1[
2 {
3 "newPrice": [ "Mon", "Thu" ]
4 },
5 {
6 "newPrice": [ "Wed" ]
7 },
8 {
9 "newPrice": [ "Sun" ]
10 },
11 {
12 "newPrice": [ "Sat", "Sun" ]
13 },
14 {
15 "newPrice": [ "Mon" ]
16 }
17]

There are a bunch of other array expression operators, you can refer them here.

Naming field in project stage

The aggregation pipeline provides great flexibility when it comes to naming fields. Any name you specify in the project stage acts as the "new" name for that field from that point forwards.

You give an existing field a new name which to effectively rename it:

Run Code

1{
2 $project: {
3 closedOn: "$notOpenOnDays"
4 }
5}

Now the output will contain closedOn field with notOpenOnDays array as value.

Literal fields

By default, whatever you put in the project stage is treated as an expression and it is evaluated to a value. You can use $literal operator if you want MongoDB to treat any field's value like a constant or literal.

Let's say we want to find out the restaurants where the entry fee is exactly equal to "$1", we can leverage literal operator to achieve this:

Run Code

1{
2 "$project": {
3 affordable: {
4 $eq: [
5 "$entryFee",
6 {
7 $literal: "$1"
8 }
9 ]
10 }
11 }
12 },
13 {
14 "$project": {
15 _id: 0,
16 status: {
17 "$cond": {
18 "if": "$affordable",
19 "then": "restaurant is affordable 🍕🍟",
20 "else": "Not affordable 🙈"
21 }
22 }
23 }
24 }

You can chain as many project stages as you need in the aggregation pipeline. Output of the above pipeline evaluates to 👇🏻

1[
2 {
3 "status": "Not affordable 🙈"
4 },
5 {
6 "status": "Not affordable 🙈"
7 },
8 {
9 "status": "Not affordable 🙈"
10 },
11 {
12 "status": "Not affordable 🙈"
13 },
14 {
15 "status": "restaurant is affordable 🍕🍟"
16 }
17]

Conditionally excluding fields

In the above example, we saw a bunch of redundant information in the output. We are only interested in knowing the affordable restaurants so it makes sense to not output the other ones.

With MongoDB 3.6, you can remove fields from the output conditionally. Let's modify our above project stage to only output affordable restaurants:

Run Code

1{
2 "$project": {
3 name: 1,
4 affordable: {
5 $eq: [
6 "$entryFee",
7 {
8 $literal: "$1"
9 }
10 ]
11 }
12 }
13 },
14 {
15 "$project": {
16 _id: 0,
17 status: {
18 "$cond": {
19 "if": "$affordable",
20 "then": "$name",
21 "else": "$$REMOVE"
22 }
23 }
24 }
25 }
  • We're now including the name in the first project stage to make sure it is available in the next stage.
  • $cond operator is updated to show the name of only the affordable restaurants.

Output:

1[
2 {},
3 {},
4 {},
5 {},
6 {
7 "status": "A Cake A Shake"
8 }
9]

Conclusion

So to summarize we have seen that the project stage in the MongoDB aggregation pipeline can do much more than just include & exclude fields. We can create arrays, modify its elements, perform filters, slice strings, conditionally remove fields and get data as per our needs.

If you think something is missing or have any questions, please post it down in the comments. I'd be happy to connect!

 
Liked the article? Share it on: Twitter
No spam. Unsubscribe at any time.