About Jakub Holy

Jakub is an experienced Java[EE] developer working for a lean & agile consultancy in Norway. He is interested in code quality, developer productivity, testing, and in how to make projects succeed.

Simple vs. Easy: Writing A Generic Code To Avoid Duplication (Representation of Data To Import)

In our batch jobs for data import we had many similar classes for holding the data being imported. Technically they are all different, with different fields, yet conceptually they are all same. I find this conceptual duplication discomforting and have written a single, more generic, class to replace them all.

The refactoring has been inspired by Clojure and its preference of few generic structures such as maps with many functions over the OO way of many case-specific data structures (i.e. classes), as explained for example in this interview of Rich Hickey, starting with “OO can seriously thwart reuse”.

The original code:

public interface LogEntry {
     Date getTimestamp();
     /** For dumping into a .tsv file to be imported into Hadoop */
     String toTabDelimitedString();
     String getSearchTerms();
     boolean hasSearchTerms();
     String[][] getColumns();
     String getKey();
     void mergeWith(LogEntry entry);
}
// Another impl. of LogEntry, to showcase the conceptual and, 
// to a degree, factual duplication
public class PlayerEventLogEntry implements LogEntry {
    
    public static final String[][] COLUMNS = { ... }
    
    private String userId;
    private String contentId;
    private String sessionId;
    ...
    
    public PlayerEventLogEntry(...) { /* assign all fields ... */ }

    @Override
    public String[][] getColumns() {
	return COLUMNS;
    }
    ...
        
	@Override
	public String toTabDelimitedString() {
		StringBuilder sb=new StringBuilder();
		sb.append(contentId);
		sb.append(FIELD_DELIMITER);
		sb.append(userId);
		sb.append(FIELD_DELIMITER);
		sb.append(sessionId);
		sb.append(FIELD_DELIMITER);
		...
		return sb.toString();
	}
}
// Called from a data import job to process individual log lines.
// Some time later, the job will call toTabDelimitedString on it.
public class PlayerEventsParser ... {

   @Override
    public LogEntry parse(String logLine) throws LogParseException {
        ... // tokenizing, processing etc. of the data ...
        return new PlayerEventLogEntry(userId, contentId, sessionId, ...);
    }
    
}
// One of 15 implementations of LogEntry for import of right granted event logs
public class RightGrantedLogEntry implements LogEntry {

	private static final char FIELD_DELIMITER = '\t';
        
    public static final String[][] COLUMNS = { { "messageId", "STRING" }
		{ "userId", "STRING" },{ "rightId", "STRING" }, ... };


	private String messageId;
	private Date timestamp;
	private String userId;
	private String rightId;
	...

	public RightGrantedLogEntry(String messageId, Date timestamp, String userId, String rightId, ...) {
		this.messageId=messageId;
		this.timestamp=timestamp;
		this.userId=userId;
		this.rightId=rightId;
		...
	}

	@Override
	public String[][] getColumns() {
		return RightGrantedLogEntry.COLUMNS;
	}

	@Override
	public String getKey() {
		return messageId;
	}

	@Override
	public String getSearchTerms() {
		return null;
	}

	@Override
	public Date getTimestamp() {
		return timestamp;
	}

	@Override
	public boolean hasSearchTerms() {
		return false;
	}

	@Override
	public void mergeWith(LogEntry arg0) {}  
	
	@Override
	public String toTabDelimitedString() {
		StringBuilder sb=new StringBuilder();
		sb.append(messageId);
		sb.append(FIELD_DELIMITER);
		sb.append(userId);
		sb.append(FIELD_DELIMITER);
		sb.append(rightId);
		sb.append(FIELD_DELIMITER);
		...
		return sb.toString();
	}
}

The refactored code, where all the LogEntry implementations have been replaced by MapLogEntry:

public interface LogEntry {
    // same as before
}
// The generic LogEntry implementation
// JavaDoc removed for the sake of brevity
public class MapLogEntry implements LogEntry {
    
    private static final char FIELD_SEPARATOR = '\t';
    private Map<String, String> fields = new HashMap<String, String>();
    private final String[][] columns;
    private final StringBuilder tabString = new StringBuilder();
    private final Date timestamp;
    private String key;
    
    public MapLogEntry(Date timestamp, String[][] columns) {
        this.timestamp = checkNotNull(timestamp, "timestamp");
        this.columns = checkNotNull(columns, "columns");
    }

    @Override
    public String toTabDelimitedString() {
        return tabString.toString();
    }
    
    public MapLogEntry addInOrder(String column, String value) {
        checkAndStoreColumnValue(column, value);
        appendToTabString(value);
        return this;
    }

    public MapLogEntry validated() throws IllegalStateException {
        if (fields.size() != columns.length) {
            throw new IllegalStateException("This entry doesn't contain values for all the columns " +
                	"expected (" + columns.length + "). Actual values (" + fields.size() + "): " + toTabDelimitedString());
        }
        return this;
    }

    private void checkAndStoreColumnValue(String column, String value) {
        final int addedColumnIndex = fields.size();
        checkElementIndex(addedColumnIndex, columns.length, "Cannot add more values, all " + columns.length +
                " columns already provided; column being added: " + column);
        String expectedColumn = columns[addedColumnIndex][0];
        if (!column.equals(expectedColumn)) {
            throw new IllegalArgumentException("Cannot store value for the column '" +
                    column + "', the column expected at the current position " + addedColumnIndex +
                    " is '" + expectedColumn + "'");
        }
        fields.put(column, value);
    }

    private void appendToTabString(String value) {
        if (tabString.length() > 0) {
            tabString.append(FIELD_SEPARATOR);
        }
        tabString.append(valOrNull(replaceFildSeparators(value)));
    }
    
    /** Encode value for outputting into a tab-delimited dump. */
    Object valOrNull(Object columnValue) {
        if (columnValue == null) {
            return HiveConstants.NULL_MARKER;
        }
        return columnValue;
    }

    @Override
    public Date getTimestamp() {
        return timestamp;
    }

    @Override
    public String getKey() {
        return key;
    }

    public void setKey(String key) {
        this.key = key;
    }

    public MapLogEntry withKey(String key) {
        setKey(key);
        return this;
    }

    /**Utility method to simplify testing. */
    public Map<String, String> asFieldsMap() {
        return fields;
    }
    ...
}
// Called from a data import job to process individual log lines.
// Some time later, the job will call toTabDelimitedString on it.
public class PlayerEventsParser ... {

   @Override
    public LogEntry parse(String logLine) throws LogParseException {
        ... // tokenizing, processing etc. of the data ...
        return new MapLogEntry(timestamp, getColumns())
                    .addInOrder("userid", userId)
                    .addInOrder("contentid", contentId)
                    .addInOrder("sessionid", sessionId)
                    ...
                    .validated();
    }
    
}

Improvements We have replaced about 15 classes with one, made it possible to change the way data are transformed into a tab-separated string at a single place (DRY), and provided a nice, fluent API whose use looks quite similarly at each place. The new MapLogEntry is also much more testing-friendly (it would be a nightmare to modify all the existing classes to support what M.L.E. does).

Objections Somebody might consider a number of primitive POJO classes simpler than one generic class. The one generic class is certainly more complex than a primitive data structure but in total the solution is less complex because there is fewer pieces and the single piece is used in the same way everywhere so the resulting cognitive load is smaller. The former code is more “easy” to understand while the latter is, all in all, more “simple.”

Principles DRY
 

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

2 Responses to "Simple vs. Easy: Writing A Generic Code To Avoid Duplication (Representation of Data To Import)"

  1. cs94njw says:

    I’m really sorry – I don’t understand what this article is about. Could you add a simpler description?

  2. Basanta Raj Onta says:

    I had the similar type of problem, planning to replace the similar structure with Generic stuff. Thanks for information. Don’t know if able to apply the same way, but will try.

Leave a Reply


− three = 1



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close