Skip to content Skip to sidebar Skip to footer

Find Word In Html

I am trying to find given word in HTML string and add a span around it. What I am doing now is this: function find(what:String,where:String) { var regexp:RegExp=new RegExp(what

Solution 1:

To account for html tags and attributes that could match, you are going to need to parse that HTML one way or another. The easiest way is to add it to the DOM (or just to a new element):

var container = document.createElement("div");
container.style.display = "none";
document.body.appendChild(container);  // this step is optional
container.innerHTML = where;

Once parsed, you can now iterate the nodes using DOM methods and find just the text nodes and search on those. Use a recursive function to walk the nodes:

functionwrapWord(el, word)
{
    var expr = newRegExp(word, "i");
    var nodes = [].slice.call(el.childNodes, 0);
    for (var i = 0; i < nodes.length; i++)
    {
        var node = nodes[i];
        if (node.nodeType == 3) // textNode
        {
            var matches = node.nodeValue.match(expr);
            if (matches)
            {
                var parts = node.nodeValue.split(expr);
                for (var n = 0; n < parts.length; n++)
                {
                    if (n)
                    {
                        var span = el.insertBefore(document.createElement("span"), node);
                        span.appendChild(document.createTextNode(matches[n - 1]));
                    }
                    if (parts[n])
                    {
                        el.insertBefore(document.createTextNode(parts[n]), node);
                    }
                }
                el.removeChild(node);
            }
        }
        else
        {
            wrapWord(node, word);
        }
    }
}

Here's a working demo: http://jsfiddle.net/gilly3/J8JJm/3

Solution 2:

You won't be able to process HTML in any reliable way using regex. Instead, parse the HTML into a DOM tree and iterate the Text nodes checking their data for content.

If you are using JavaScript in a web browser, the parsing will have already have been done for you. See this question for example wrap-word-in-span code. It's much trickier if you need to match phrases that might be split across different elements.

Solution 3:

functionfind(what:String,where:String)
{
    what = what.replace(/(\[|\\|\^|\$|\.|\||\?|\*|\+|\(|\)|\{|\})/g, "\\$1")
          .replace(/[^a-zA-Z0-9\s:;'"~[\]\{\}\-_+=(),.<>*\/!@#$%^&|\\?]/g, "(?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<])")
          .replace(/</g,"&lt;?").replace(/>/g,"&gt;?").replace(/"/g,"(?:\"|&quot;?)")
          .replace(/\s/g, "(?:\\s|&nbsp;?)");

    what = "(>[^<]*|^[^<]*)(" + what + ")";
    varregexp:RegExp=newRegExp(what,'gi');
    return where.replace(regexp,'$1<span>$2</span>');
}
  1. The first replace function adds a backslash before characters which have a special meaning in a RE, to prevent errors or unexpected results.
  2. The second replace function replaces every occurrence of unknown characters in the search query by (?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<]). This RE consists of three parts: First, it tries to match a HTML entity. Second, it attempts to match a HTML numeric entity. Finally, it matches any non-whitespace character (in case the creator of the HTML document didn't properly encode the characters).
  3. The third, fourth and fifth replace functions replaces <, > and " by the corresponding HTML entities, so that the search query will not search through tags.
  4. The sixth replace function replaces white-space by a RE (\s|&nbsp;?), which match white-space characters and the HTML entity.

The only shortcoming of this function is that undocumented special characters (such as ) match any HTML entity/character (following the example, not only &euro; and are valid matches, but also &pound; and @).

This proposed solution suits in most cases. It can be inaccurate in complex situations, which is probably not worse than a DOM iteration (which is very susceptible to memory leaks and requires more computing power).

When you work with HTML elements which have Event listeners assigned through DOM, you should iterate through all (child) elements, and apply this function to every Text node.

Solution 4:

  • Pure JavaScript (based on Sizzle.getText from jQuery); Demo: http://jsfiddle.net/vol7ron/U8LLv/

    var wrapText = function ( elems,regex ) {
        var re = newRegExp(regex);
        var elem;
    
        for ( var i = 0; elems[i]; i++ ) {
            elem = elems[i];
    
            // Get the text from text nodes and CDATA nodesif ( elem.nodeType === 3 || elem.nodeType === 4 ) {
                parent = elem.parentNode;
                re.lastIndex = 0;
                if(re.test(elem.nodeValue)){               
                    var span = document.createElement('span');
                    span.innerHTML = RegExp.$1;
    
                    if (RegExp.leftContext != ''){
                       parent.insertBefore(document.createTextNode(RegExp.leftContext),elem);    i++;
                    }
    
                    parent.insertBefore(span,elem);   i++;
    
                    if (RegExp.rightContext != ''){
                       parent.insertBefore(document.createTextNode(RegExp.rightContext),elem);   i++;
                    }
    
                    parent.removeChild(elem);
                }                   
    
            // Traverse everything else, except comment nodes
            } elseif ( elem.nodeType !== 8 ) {
                wrapText( elem.childNodes, regex );
            }
        }
    
        return;
    };
    
    
    var obj = document.getElementById('wrapper');
    wrapText([obj],/(spain)/gi);
    

Post a Comment for "Find Word In Html"