NodeJS REGEX: Efficient Regex Search for multiple keywords?

NodeJS REGEX: Efficient Regex Search for multiple keywords?

Problem

I have a search now that iterates over each json object that I have and each keyword. I want to match this search exclusive, not inclusive and I’m guessing I will need more robust regex. Basically, test true if the string contains ALL of the keywords. (order does not matter)

Searching for “This Text” would include the following results:

“this text”, “this is a text”, “This Text”, “Text This”, “this is a long string and text”, “a long string with this in the middle and text”, “that this that this text”

and negate text similar to the following strings:

“that text”, “this is not”, “text that is not included”

Here’s the script I have right now.

items.forEach(function(item) { //iterate over the items array
    var s = JSON.stringify(item); //convert each item in items to a string
    var matched = false;
    sarray.forEach(function(qs) { //take the toArray converted query and iterate over it
        var r = new RegExp(qs, "g"); //compose a regex object with the stringified query
        if(r.test(s)) { //if regex finds the keyword in the item string,
            matched = true; //set matched to true
        }
    });
    if(matched) {
        results.push(item); //push the item into the results array
    }
Problem courtesy of: James_1x0

Solution

I would use a simple function instead of regular expression because in this case, a complex regex would be needed.

/**
 *
 * Look for all `items' inside `str'.
 *
 *@param str   the string to search inside
 *@param items all items that must appear in the string
 *
 *@return 
 *    TRUE  => All items were found
 *    FALSE => At least one item was not found
 */
function all_items_present(str, items) {
 var i;
 var len=items.length;
 var found=true;

    for(i=0;i<len;i++) {
        if(str.search(items[i])==-1) {
            found=false;
            break;
        }
    }

    return found;
}

// returns true
all_items_present(
   '{"title":"This text is foo", "location":"Austin, TX(bar)", "baz":false}',
   ['foo','bar','baz']
);

Demo

http://jsfiddle.net/7FwUP/1/


Here is the equivalent regex for finding foo, bar and baz in no particular order:

^.*?(?:foo.*?bar.*?baz|foo.*?baz.*?bar|baz.*?foo.*?bar|baz.*?bar.*?foo|bar.*?baz.*?foo|bar.*?foo.*?baz).*?$

Description

Regular expression visualization

Solution courtesy of: Stephan

Discussion

Leave a Reply

Your email address will not be published. Required fields are marked *