Compare two HTML sources and display visual differences

35,088

Solution 1

Use python's difflib. For example:

import difflib

file1 = open('file1.html', 'r').readlines()
file2 = open('file2.html', 'r').readlines()

htmlDiffer = difflib.HtmlDiff()
htmldiffs = htmlDiffer.make_file(file1, file2)

with open('comparison.html', 'w') as outfile:
    outfile.write(htmldiffs)

This will create an html file named comparison.html containing the diffs between the two html files file1.html and file2.html. Here file1.html is considered the source, or original version whichever is more appropriate for your case, and file2.html is the changed version or new version, again, whichever is more appropriate here.

Hope that helps!

Solution 2

Use daisyDiff api http://code.google.com/p/daisydiff/ You can call this api from a command prompt after your java code returns a difference.

Solution 3

Have you tried BackstopJS ?

It's not documented but there is a misMatchThreshold parameter you can use to hide subtl differences: https://github.com/garris/BackstopJS/issues/52

Share:
35,088

Related videos on Youtube

roger_that
Author by

roger_that

Java Software Developer

Updated on July 09, 2022

Comments

  • roger_that
    roger_that almost 2 years

    I am trying to show where the two HTML pages differ. I am trying to figure out a way if i can compare the HTML source code of two webpages(almost similar), and show/highlight the differences visually(on UI).

    What I tried: I thought of taking snapshot of the page and then use Resemble.js to compare two images. But that shows very minute differences as well and results are something which is not clear.

    I thought of comparing the DOM structure or the source code and then show what or where actually the two pages differ on UI.

    Is there any way i could achieve this? I am using Selenium- Webdriver to get the snapshots and the HTML source code.

    EDIT:

    I guess my question was not clear. Actually, i wanted to find out the difference in HTML content for webpages in order to detect A/B tests being performed currently. I first grabbed the html source into a text file and then compared it with previously captured HTML source using Java-Diff util . This gave me the actual lines which differ in two text files with HTML source.

    Now, the problem is, how can i show this difference on UI as in highlighting the areas which i found are different? Hope this would make it more clear.

    The below code shows the lines which differ

    List<String> original = fileToLines("HTML Source diff/originalSource.txt");
        List<String> revised = fileToLines("HTML Source diff/sourceAfterCookieClear.txt");
    
        // Compute diff. Get the Patch object. Patch is the container for computed deltas.
        Patch patch = DiffUtils.diff(original, revised);
    
        System.out.println("Printing Deltas\n");
        for (Delta delta : patch.getDeltas()) {
            String revisedText = delta.getRevised().toString();
            String content = revisedText.substring(revisedText.indexOf(" [")+2,revisedText.indexOf("]]"));
            writeTextToFile(content,"difference.html");
        }
    

    Any leads in form of code would be helpful.

  • roger_that
    roger_that over 10 years
    Thanks for the reply. I have already gone through this and this tells me where the two HTML source differ, which is fine. I am stuck to a point that, how should I show this difference as UI. I mean something like- highlighting the element that has the difference or something of the sort.
  • Husman
    Husman over 10 years
    It gets far too complicated at this stage, effectively you need to build an engine to parse HTML, then render it on the screen whilst maintaining a mapping to the code. So in your code you can point to an element and your engine will be able to highlight that object intelligently (based on CSS rules i.e. visibility/overlap/0px width). At which point you can do this with 2 HTML pages and get your engine to highlight the differences. Luckily there is already a library that does something like this: code.google.com/p/daisydiff
  • roger_that
    roger_that over 10 years
    Well, I was also thinking about the same as to reach to an element and then apply some CSS rule to highlight it. Yes, I also gone through daisydiff as well but that is also giving some exceptions (null pointer in its Main class). Don't understand why. Its just too messy.
  • Umair Ayub
    Umair Ayub about 8 years
    can I save resultant file in rendered HTML format instead of source format?
  • Sнаđошƒаӽ
    Sнаđошƒаӽ about 8 years
    @Umair yes you can. The output of make_file in HtmlDiff is rendered HTML. Give it a try. In my example, the file created is named 'comparison.html'
  • Umair Ayub
    Umair Ayub about 8 years
    No dear, it saves comparison.html in HTML-source format ... it is not rendered HTML ...
  • Sнаđошƒаӽ
    Sнаđошƒаӽ about 8 years
    @Umair What do you mean by rendered HTML? And what do you mean by "HTML-source format"? Also what do you mean by saving in "rendered HTML format"? I have no idea. Can you elaborate, or a link perhaps?
  • Umair Ayub
    Umair Ayub about 8 years
    rendered mean when you click on page it shows a real webpage... but when i click on it it just shows HTML only like <html><body><p>Text here</p></body</html> But i want is Text here
  • Sнаđошƒаӽ
    Sнаđошƒаӽ about 8 years
    @Umair Take a look at the docs. I think your problem is somewhere else. An html file containing <html><body><p>Text here</p></body</html> cannot but be rendered in a browser as Text here. Are you using django by any chance? If that's the case use autoescape. Take a look here. I am guessing it because I also faced similar problem with django.
  • Umair Ayub
    Umair Ayub about 8 years
    This is how it looks like postimg.org/image/k6gwpvqq9 RAW HTML ... I am not using DJango ... I have python 2.7
  • Admin
    Admin over 3 years
    This answer should not be here or accepted at all: the tags and question were java related and the person answered here is referring to a python code.
  • Sнаđошƒаӽ
    Sнаđошƒаӽ over 3 years
    @francogrex yeah, I agree now. Looking back at the question again my answer does seem inappropriate. I should try and reopen this, it's an interesting question. BTW, it's not accepted currently. But it received the bounty coz there was no other answer with 2 or more up vote, pity. Also look at the edit history, it was asked in 2013 but bounty was added in 2016!
  • Admin
    Admin over 3 years
    @Sнаđошƒаӽ it's no problem as long as it's clearly mentioned - which your above comment now did and clarified. If it could be tagged for python too that would be nice to help those searching for python specific solutions to find it.

Related