Read File in Node and process the same

Read File in Node and process the same

Problem

I wanted to read a file and process each line of the file. I have used the readStream to read the file and then invoke the processRecord method. The processMethod need to make multiple calls and need to make the final data before its written to the store.

The file has 500K records.

The issue that Im facing is that, the files are read at a significant pace and I believe the node is not getting enough priority to actually process the processLine method. Hence the memory shoots upto 800MB and then slows down.

Any help is appreciated.

The code that Im using is given below –

var instream = fs.createReadStream('C:/data.txt');
var outstream = new stream;

var rl = readline.createInterface({
    input: instream,
        output: outstream,
        terminal: false
});
outstream.readable = true;

rl.on('line', function(line) {

 processRecord(line);   
    }
Problem courtesy of: Vaya

Solution

The Node.js readline module is intended more for user interaction than line-by-line streaming from files. You may have better luck with the popular byline package.

var fs = require('fs');
var byline = require('byline');

// You'll need to check the encoding.
var lineStream = byline(fs.createReadStream('C:/data.txt', { encoding: 'utf8' }));

lineStream.on('data', function (line) {
    processRecord(line);
});

You’ll have a better chance of avoiding memory leaks if the data is piped to another stream. I’m assuming here that processRecord is feeding into one. If you make it a transform stream object, then you can use pipes.

var out = fs.createWriteStream('output.txt');

lineStream.pipe(processRecordStream).pipe(out);
Solution courtesy of: qubyte

Discussion

Leave a Reply

Your email address will not be published. Required fields are marked *