Once you get beyond the basics of GraphQL, you'll likely hear people talk about the “N+1 problem.” This might seem scary, it does sound like O(N) notation, which is usually the last thing you hear before your whiteboard interview implodes. But, rest assured this is a simple concept hiding behind a computer science-y name.
The problem with queries
Let's say I have a DB of authors and their books, a simple “has many” relationship. Now, I want to get all my authors, and all their books. In REST, you'd make a route that uses your ORM choice to do something along the lines of:
Under the hood (and simplifying for the sake of explaining), it would execute 2 queries: one to get all the authors, and one to get all their books. To use pseudo SQL it would be like:
2 queries. Boom. Done. Since the ORM gets all the ids from the first query, matching all the relationships is easy with the second.
Why GraphQL has trouble with this
Here's the issue, the above approach only works because the second query already had a list of every author_id
. GraphQL doesn't work that way since each resolver function really only knows about its own parent object (don't worry about context right now). That means your ORM won't have the luxury of a list of author IDs anymore.
So if we took that same request from above and put it into a GraphQL query:
The first layer could have a resolver that hits the DB once and gets all the authors, but that's it. In the next layer the books
resolver can't use all those results at once to find all the books. Each book resolver would only get it's own parent author. This means our ORM would have to hit the DB from one resolver at a time. Here's some pseudo code for the GraphQL version:
And that would create pseudo SQL like this:
Remember when we used to be efficient? That was nice. This is where the name comes from, by the way. We will always make 1 initial query to the DB and return N results, which means we will have to make N additional DB queries. Personally, I think that means it should be called “1+N” but starting formulas with variables is what all the cool kids do.
What's the solution?
Is this the achilles heel of GraphQL? Is the cost of a nice interface all our efficiency? Of course not. There's a really handy tool that came out right alongside GraphQL called DataLoader. Essentially what it does is wait for all your resolvers to load in their individual keys. Once it has them, it hits the DB once with the keys, and returns a promise that resolves an array of the values. It batches our queries instead of making one at a time.
New solutions often have new problems, but as long as you learn about all your tools, there's nothing you won't be able to fix. So on that note, go check out the next article on DataLoaders!
Happy coding everyone,
Mike