Software Development

Bit fiddling every programmer should know

Bit fiddling looks like magic, it allows to do so many things in very efficient way.
In this post i will share some of the real world example where bit operation can be used to gain good performance.

Bit wise operation bootcamp

Bit operator include.

 – AND ( &)
 – OR ( | )
 – Not ( ~)
 – XOR( ^)
 – Shifts ( <<, >>)

Wikipedia has good high level overview of  Bitwise_operation. While preparing for this post i wrote learning test and it is available  learningtest github project. Learning test is good way to explore anything before you start deep dive. I plan to write detail post on Learning Test later.

In these examples i will be using below bits tricks as building block for solving more complex problem.

  • countBits  Count number of 1 bits in binary
  • bitParity – Check bit added to binary code
  • set/clear/toggle – Manipulating single bit
  • pow2 – Find next power of 2 and using it as mask.

Code for these function is available @ Bits.java on github and unit test is available @ BitsTest.java

Lets look at some real world problems now.

Customer daily active tracking

E-commerce company keep important metrics like which days customer was active or did some business. This metrics becomes very important for building models that can be used to improve customer engagement. Such type of metrics is also useful for fraud or risk related usecase.

Investment banks also use such metrics for Stocks/Currency for building trading models etc.

Using simple bit manipulation tricks 30 days of data can be packed in only 4 bytes, so to store whole year of info only 48 bytes are required.

Code snippet

public class CustomerActivity {

    private final int[] months = new int[12];

    public void record(LocalDate day) {
        int monthOffSet = day.getMonthValue() - 1;
        int monthValue = months[monthOffSet];
        // Set bit for day in 32 bit int and then OR(|) with month value to merge value
        months[monthOffSet] = monthValue | 1 << (day.getDayOfMonth() - 1);
    }

    public int daysActive(Month month) {
        int monthValue = months[month.ordinal()];
        return countBits(monthValue);
    }

    public boolean wasActive(LocalDate day) {
        int monthOffSet = day.getMonthValue() - 1;
        int monthValue = months[monthOffSet];
       // Set bit for day in 32 bit int and then AND(|) with month value to check if bit was set
        return (monthValue & 1 << (day.getDayOfMonth() - 1)) > 0;
    }
   
}

Apart from compact storage this pattern have good data locality because whole thing can be read by processor using single load operation.

Transmission errors

This is another area where bit manipulation shines.
 
Think you are building distributed storage block management software or building some file transfer service,  one of the thing required for such service is to make sure transfer was done properly and no data was lost during transmission. This can be done using bit parity (odd or even) technique, it involves keeping number of ‘1’ bits to odd or even.

/*
    Used for verification for data transferred over network or data saved on disk. Parity bits is used in many hardware for deducting errors.
    Caution: This is simple technique and comes with some limitation of deduction of error with odd or even.
    Hadoop name nodes performs some checks like this to check data integrity.
 */
public class Transmission {

    public static byte transmit(byte data) {
        return Bits.oddParity(data); // Add 1 bit to keep odd parity if required.
    }

    public static boolean verify(byte data) {
        return (Bits.countBits(data) & 1) == 1; // Checks if parity is Odd on receiver end.
    }

}

Another way to do such type of verification is Hamming_distance. Code snippet for hamming distance for integer values.

/*
    Using bits count to find distance between 2 integer. Some of application are error deduction while data transfer
 */
public class HammingDistance {

    public static int weight(int value) {
        return Bits.countBits(value);
    }

    public static int distance(int value1, int value2) {
        return Bits.countBits(value1 ^ value2);
    }

    public static int distance(String value1, String value2) {
        throw new IllegalArgumentException("Not implemented");
    }
}

Very useful way to keep data integrity with no extra overhead.

Locks

Lets get into concurrency now. Locks are generally not good for performance but some time we have to use it.  Many lock implementation are very heavy weight and also hard to share between programs .In this example we will try to build lock and this will be memory efficient lock, 32 locks can be managed using single Integer.

Code snippet

/*
    This is using single Int to manage 32 locks in thread safe way. 
    This has less memory usage as compared to JDK lock which uses one Int(32 Bytes) to manage single lock.
 */

public class Locks {
    public static final int INT_BYTES = 32;
    private AtomicInteger lock = new AtomicInteger(0);

    public boolean lock(int index) {
        int value = lock.get();
        if (Bits.isSet(value, index)) {
            return false;
        }
        int newLock = Bits.set(value, index);
        return lock.compareAndSet(value, newLock);
    }

    public boolean release(int index) {
        int value = lock.get();
        int newLock = Bits.clear(value, index);
        return lock.compareAndSet(value, newLock);
    }
}

This example is using single bit setting trick along with AtomicInteger to make this code threadsafe.

This is very lightweight lock. As this example is related to concurrency so this will have some issues due to false sharing and it is possible to address this by using some of the technique mention in  scalable-counters-for-multi-core post.

Fault tolerant disk

Lets get into some serious stuff. Assume we have 2 disk and we want to make keep copy of data so that we can restore data incase one of the disk fails, naive way of doing this is to keep backup copy of every disk, so if you have 1 TB then additional 1 TB is required. Cloud provider like Amazon will be very  happy if you use such approach.

Just by using XOR(^) operator we can keep backup for pair of disk on single disk, we get 50% gain.

50% saving on storage expense.

Code snippet testing restore logic.

public void restoreDisk() {

        RaidDisk disk1 = new RaidDisk(2);
        disk1.set(0, MoreInts.toByte("01101101"));
        disk1.set(1, MoreInts.toByte("00101101"));

        RaidDisk disk2 = new RaidDisk(1);
        disk2.set(0, MoreInts.toByte("11010100"));

        RaidDisk raidDisk = disk1.xor(disk2); // This xor allow to keep data for both disk in RaidDisk

        RaidDisk newDisk1 = raidDisk.xor(disk2); // If we loose disk1 then disk1 can be restore using raidDisk ^ disk2 
        RaidDisk newDisk2 = raidDisk.xor(disk1);

        assertEquals(disk1.toBinary(), newDisk1.toBinary());
        assertEquals(disk2.toBinary(), newDisk2.toBinary());
    }

Disk code is available @ RaidDisk.java

Ring buffer

Ring buffer is very popular data structure when doing async processing , buffering events before writing to slow device. Ring buffer is bounded buffer and that helps in having zero allocation buffer in critical execution path, very good fit for low latency programming.

One of the common operation is finding slot in buffer for write/read and it is done by using Mod(%) operator, mod or divide operator is not good for performance because it stalls execution because CPU has only 1 or 2 ports for processing divide but it has many ports for bit wise operation.

In this example we will use bit wise operator to find mod and it is only possible if mod number is powof2. I think it is one of the trick that everyone should know.

n & (n-1)

If n is power of 2 then ‘x & (n-1)’ can be used to find mod in single instruction. This is so popular that it is used in many places, JDK hashmap was also using this to find slot in map.

public class RingBuffer<T> {
    public RingBuffer(int size) {
        this.capacity = Bits.powOf2(size);
        this.mask = capacity - 1;
        buffer = new Object[this.capacity];
    }

    private int offset(int index) {return index & mask;
        //return index % capacity;
    }
    public boolean write(T value) {
        if (buffer[offset(write)] != null)
            return false;
        buffer[offset(write++)] = value;
        return true;
    }
    public T read() {
        if (read == write)
            return null;
        T value = (T) buffer[offset(read)];
        buffer[offset(read++)] = null;
        return value;
    }
}

Conclusion

I have just shared at very high level on what is possible with simple bit manipulation techniques.

Bit manipulation enable many innovative ways of solving problem. It is always good to have extra tools in programmer kit and many things are timeless applicable to every programming language.

All the code used in post is available @  bits repo.

Published on Java Code Geeks with permission by Ashkrit Sharma, partner at our JCG program. See the original article here: Bit fiddling every programmer should know

Opinions expressed by Java Code Geeks contributors are their own.

Ashkrit Sharma

Pragmatic software developer who loves practice that makes software development fun and likes to develop high performance & low latency system.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button