If you're building a GraphQL powered API in Node, you can't do it without DataLoader. This library lets you batch your queries and keep your API just as efficient as its REST predecessors. In this article we'll just go over the raw mechanics of the library, without getting too bogged down in GraphQL.
Why do we need DataLoader?
Simply put, DataLoader solves GraphQL's “N+1 problem”. If you don't know what that is, read this quick explanation. I'm going to assume you understand what that is from this point on, but to recap, it states:
For every 1 database query that returns N results, you will need to make N additional queries
That many queries is inefficient. We would be better off batching those N queries into 1 so instead of N+1 we would always just have 2. This is where DataLoaders come in.
DataLoaders: The overview
At the highest level, a DataLoader:
- Collects an array of keys during one tick of the event loop
- Hits the database once with all those keys
- Returns a promise which resolves an array of values
All you need to make a DataLoader is a batching function that takes in an array of keys, and resolves to an array of values. Both arrays must be the same length, otherwise it will break when it tries to turn them into a key/value store. Let's work with the following DataLoader and fake DB:
We have an async function that iterates through a given array of ids and then returns an array of usernames. This is mocking out how a real program would await a result from the DB using an ORM. Remember, since it's async, our function's return value is automatically wrapped in a Promise. We then take the batch function and pass it into a new DataLoader.
Using a DataLoader with load()
To actually use a DataLoader, you don't call the batch function. Instead, you use the method load()
. That is how your DataLoader can collect all the keys it needs. Each load()
saves the key and returns a promise. At the tick of the event loop, it takes all the keys and then passes them into the batch function. The batch function resolves to its values which are then stored with the corresponding keys. Finally, each load()
's promise resolves to the value of its given key.
A brief aside on the event loop
I keep mentioning ticks, so we should talk about the event loop. If you have 20-ish minutes, watch this video by Phillip Roberts. If not, suffice it to say that each tick of the event loop is what kicks the next resolved async callback into the main callstack.
All you really need to know is that DataLoader is using this event loop ticking process as a way of marking when to fire the batch function. It does this because by that point, all the load()
methods will have been fired off for a given query, meaning it knows exactly how many keys to check against the DB.
Back to DataLoaders
OK, now look at this example code:
Event Loop Tick 1 called once per tick: [ 1, 2 ] Here is the user: Bo Tick 2 called once per tick: [ 3, 4 ] Tick 3 called once per tick: [ 5, 6 ]
As you can see, our batch function really is only called once per tick. Remember, setTimeout
is async, so it kicks us into the next tick every time.
Also notice that even though the batch functions are called with arrays, you can see that load(2)
resolves to the proper user of Bo and nothing else. Each load()
method will always return the single value. It will likely be a single array in the real world, so lets try for a more realistic example:
Our loader would be in the books
field in this example, and it would find all the books associated with an author_id
. Let's use a for loop to simulate 3 different author parent objects with ids 1, 2, and 3:
I only get fired once Author #1 books: [ { title: 'book 1', author_id: 1 } ] Author #2 books: [ { title: 'book 2', author_id: 2 } ] Author #3 books: [ { title: 'book 3', author_id: 3 }, { title: 'book 4', author_id: 3 } ]
Since this all occurred in the same tick of an event loop, our batch function only fired once. And again, each load()
resolved one value, an array with either one or two books.
Caching
While the batching is the main focus, DataLoaders also have caching. That means if you ever call load()
with the same key twice, it'll look up the value in the DataLoader's key/value store without firing the batch function again:
Event Tick 1 I ran! Event Tick 2 cached res: Bo
The batch function wasn't run, it just looked up the value.
The loadMany() Method
Sometimes you do want to access more than one key at a time. In those cases, use the loadMany()
method:
If you plugged this in to an example above, you'd get Returns an array of values: [ ‘Kate', ‘Sara' ]
. Neat right?
Load away!
There you have it, the fundamentals DataLoader. There is of course a lot more to this that what's listed here, but with the basics behind you, there are a lot more fun challenges ahead!
Happy coding everyone,
Mike