Home » Java » Core Java » Generic Text Comparison Tool with LCS Approach

About Cagdas Basaraner

Cagdas Basaraner
Cagdas Basaraner is a software engineer graduated from Hacettepe University Computer Engineering department (Turkey), having 5 years professional experience. He is working on JEE web technologies, and also a former developer of information systems using Microsoft technologies and Command & Control (C4I) systems with Java technologies.

Generic Text Comparison Tool with LCS Approach

Detecting and showing differences of two texts (especially having hundreds or thousands of lines) is a common problem. Using pure java.lang.String class methods may be a solution, but the most important issue for that kind of operations, “performance” will not be satisfactory. We want an efficient solution which may have a view as below:

Text Difference Tool Example

The problem contains two parts:

  • Detecting differences of two texts: For detecting differences, an efficient dynamic algorithm of LCS (Longest Common Subsequence) used in this solution. This solution has O(text1WordCount * text2WordCount) complexity and coded as “longestCommonSubsequence” method below.
  • Visualizing the differences: For visualizing, an HTML tag based approach is used, which marks new words of text2 with green color and old words of text1 with red color. This solution has O(changedWordsCount * (text1WordCount+text2WordCount)) complexity and coded as “markTextDifferences” method below.

Note1: For simplicity, “normalizeText” method is used for removing \n, \t and multiple space characters. Note2: This class was created as a Vaadin component. But “longestCommonSubsequence” is pure generic and “markTextDifferences” method is generic on HTML based visual components, so they can also be used with different frameworks.

import java.util.ArrayList;
import com.vaadin.ui.CustomComponent;
import com.vaadin.ui.Label;
import com.vaadin.ui.Layout;
import com.vaadin.ui.VerticalLayout;

/**
* Text comparison component which marks differences of two texts with colors.
* 
* @author cb
*/
public class TextCompareComponent extends CustomComponent {

    private Layout mainLayout = new VerticalLayout();
    private ArrayList<String> longestCommonSubsequenceList;
    private static final String INSERT_COLOR = "#99FFCC";
    private static final String DELETE_COLOR = "#CB6D6D";

    public TextCompareComponent(String text1, String text2) {

        text1 = normalizeText(text1);
        text2 = normalizeText(text2);

        this.longestCommonSubsequenceList = longestCommonSubsequence(text1, text2);
        String result = markTextDifferences(text1, text2, 
            longestCommonSubsequenceList, INSERT_COLOR, DELETE_COLOR);

        Label label = new Label(result, Label.CONTENT_XHTML);
        mainLayout.addComponent(label);
        setCompositionRoot(mainLayout);
    }

     /**
      * Finds a list of longest common subsequences (lcs) of given two texts.
      * 
      * @param text1
      * @param text2
      * @return - longest common subsequence list
      */
     private ArrayList<String> longestCommonSubsequence(String text1,String text2){

         String[] text1Words = text1.split(" ");
         String[] text2Words = text2.split(" ");

         int text1WordCount = text1Words.length;
         int text2WordCount = text2Words.length;

         int[][] solutionMatrix = new int[text1WordCount + 1][text2WordCount + 1];

         for (int i = text1WordCount - 1; i >=0; i--) {
             for (int j = text2WordCount - 1; j >= 0; j--) {
                 if (text1Words[i].equals(text2Words[j])){
                     solutionMatrix[i][j] = solutionMatrix[i + 1][j + 1] + 1;
                 }
                 else {
                     solutionMatrix[i][j] = Math.max(solutionMatrix[i + 1][j],
                         solutionMatrix[i][j + 1]);
                 }
              }
          }

          int i = 0, j = 0;
          ArrayList<String> lcsResultList =new ArrayList<String>();

          while (i < text1WordCount && j < text2WordCount) {
              if (text1Words[i].equals(text2Words[j])) {
                  lcsResultList.add(text2Words[j]);
                  i++;
                  j++;
              } 
              else if (solutionMatrix[i + 1][j] >= solutionMatrix[i][j + 1]) {
                  i++;
              }
              else {
                  j++;
              }
          }
          return lcsResultList;
      }

      /**
       * Normalizes given string by deleting \n, \t and extra spaces.
       * 
       * @param text - initial string
       * @return - normalized string
       */
       private String normalizeText(String text) {

           text = text.trim();
           text = text.replace("\n", " ");
           text = text.replace("\t", " ");

           while (text.contains("  ")) {
               text = text.replace("  ", " ");
           }
           return text;
       }

      /**
       * Returns colored inserted/deleted text representation of given two texts.
       * Uses longestCommonSubsequenceList to determine colored sections.
       *
       * @param text1
       * @param text2
       * @param lcsList
       * @param insertColor
       * @param deleteColor
       * @return - colored result text
       */
      private String markTextDifferences(String text1, String text2,
        ArrayList<String> lcsList, String insertColor, String deleteColor) {

        StringBuffer stringBuffer = new StringBuffer();

        if (text1 != null && lcsList != null) {

            String[] text1Words = text1.split(" ");
            String[] text2Words = text2.split(" ");
            int i = 0, j = 0, word1LastIndex = 0, word2LastIndex = 0;

            for (int k = 0; k < lcsList.size(); k++) {
                for (i = word1LastIndex, j = word2LastIndex;
                    i < text1Words.length && j < text2Words.length;) {
                    if (text1Words[i].equals(lcsList.get(k)) &&
                        text2Words[j].equals(lcsList.get(k))) {
                        stringBuffer.append("<SPAN>" + lcsList.get(k) + " </SPAN>");
                        word1LastIndex = i + 1;
                        word2LastIndex = j + 1;
                        i = text1Words.length;
                        j = text2Words.length;
                    }
                    else if (!text1Words[i].equals(lcsList.get(k))) {
                        for (; i < text1Words.length &&
                            !text1Words[i].equals(lcsList.get(k)); i++) {
                            stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                deleteColor + "'>" + text1Words[i] + " </SPAN>");
                        }
                    } else if (!text2Words[j].equals(lcsList.get(k))) {
                        for (; j < text2Words.length &&
                            !text2Words[j].equals(lcsList.get(k)), j++) {
                            stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                insertColor + "'>" + text2Words[j] + " </SPAN>");
                        }
                    }
                }
            }
            for (; word1LastIndex < text1Words.length; word1LastIndex++) {
                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                    deleteColor + "'>" + text1Words[word1LastIndex] + " </SPAN>");
            }
            for (; word2LastIndex < text2Words.length; word2LastIndex++) {
                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                    insertColor + "'>" + text2Words[word2LastIndex] + " </SPAN>");
            }
        }
        return stringBuffer.toString();
    }
}

Reference: A Generic Text Comparison Tool Implementation with LCS Approach from our JCG partner Cagdas Basaraner at the CodeBuild blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

and many more ....

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*


+ five = 11

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.