Understanding How Node’s Streams Work — PART I
This is the first post of the Streams lecture series.
From the early days of computers, streams are a fundamental concept of Unix systems. The concept basically is streams of data being passed over from one point to another as manageable bytes of data (chunk) one at a time, each chunk being manipulated in a series of processes stream is attached to. When working with the terminal, data, especially textual data, can be sent over many programs via pipe (|) so that each program receives the bytes of data just after it went through the proceeding program manipulating the data as needed. This entire concept is similar to Node’s stream module, and pipe() is used to pipe data to different forms of streams.
Why Streams Exist in Node, Anyway?
The main reason why Node is sitting on the top of the technology stack is because of its way of handling I/O operations in a more sophisticated way. I/O operations are operations that any system needs to perform when it deals with reading/writing to files from/to disk, sending network requests, receiving network requests, accessing databases, etc. These processes are very expensive tasks when it comes to the CPU because they consume a lot of CPU time that the CPU can’t handle its operations at the same rate (All the I/O interaction in Node at the OS level is achieved by livuv library). To achieve higher throughput, Node makes I/O operations asynchronous, which we can provide callback functions to any code that handles I/O operations to successfully complete the operation.
There’s no issue with this code it works as expected, but there are some other aspects that need to be worth considering. All the content in magicdata.txt is loaded up to the memory in a buffer and then it is provided to us to consume. If the file is too large, this ends up being Node consumes a lot of memory to fill all the content in the file eating a lot of memory, and also the program will crash if the file size exceeds the defined buffer size which is at around 2GB for 64-bit systems.
If this file is being served through the network, the users that are requesting the file might need to wait for a considerable amount of time because it is served only after the entire file has been loaded up to the memory.
With help of streams, we can approach a better solution by sending the content of the file chunk by chunk upon the request of the client.
This is made possible, because both of the request and response that have been provided are readable and writable streams, respectively.
However, in this implementation, there is a bit of pressure on the response stream (writable stream) as reading data from the disk is much faster than sending data over HTTP to a client that is requesting the data. This is what we call backpressure because the response stream cannot handle the data at the same rate it is being received which in turn overwhelms the stream. We solve this issue using the piping mechanism that is introduced with streams.
In the next lecture of this series, we will dive more into streams, and understand the use-cases in different scenarios.