Software Development

Recognizing hand-written shapes programmatically: find the key points of a rectangle

A far-fetched goal I have is using sketching on a whiteboard as a way to define programs. I mean formal programs that you can execute. Of course through your sketches you would define programs in a high level domain specific language, for example describing a state machine or an Entity-Relationship diagram.

To do so I would like to start recognizing rectangles. Then I will move to recognize other shapes, connecting lines and recognizing text present in the diagram. For now let’s focus on recognizing rectangles.

My general approach would be the following:
 

  1. recognize the meaningful lines
  2. recognize key points among those lines
  3. classify those key points using AI
  4. find shapes by combining the classified key points

Ok. This is not going to be something I complete over a week-end.

The input images

We will use 3 images: two have them have been drawn on a whiteboard by me, under different light conditions. The third one was found on the Internet. It has the particularity that the sketch was done on a graph paper (i.e., there is a grid on the paper).

sm2-150x150
Whiteboard (natural light)

sm3-150x150
Whiteboard (artificial light)

state-flowchart-150x150
Graph paper

Let’s see how we can process these images. We will use Java and the BoofCV image processing library.

Gray scale

As first thing we convert the image to gray scale. Here we get a problem with the image taken under artificial light:

Screenshot-from-2016-04-02-14-51-23-1024x482

We want to remove that giant gray blob on bottom right corner. To do so we will use derivatives.

Derivatives

We blur the image, to reduce the effect of noise and calculate the derivates. This is a way to capture the sharp variations of colors which happens vertically or horizontally.

We would got something like this for the image taken under natural light:

Screenshot-from-2016-04-02-14-44-43-821x1024

However for the image taken under artificial light we see the noise:

Screenshot-from-2016-04-02-14-53-42-1024x479

At this point we take each point of image and look if around it there is an high number of points with an high derivative (either horizontal or vertical). We keep the points satisfying the condition and we set all the other points to white. We do that a couple of times.

This is the result:

Screenshot-from-2016-04-02-14-56-12-1024x479

Contours

We do some additional filtering and then we invoke a function to find the contours inside the image. We draw the external contours in red and the internal ones in blue.

Screenshot-from-2016-04-02-14-58-19-1024x484

We then remove the short contours

Screenshot-from-2016-04-02-14-59-01-1024x475

Key points

The contours we get are drawn as a list of segments which are very short. Let’s draw the extremes of the segments in blue.

Screenshot-from-2016-04-02-15-00-19-1024x483

Yes, they are very short: you just see a continuous set of extremes, very close one to each other. We want to get much less segments and much longer.

To do that we use basically two strategies:

  1. we simply merge consecutive extremes which are very close
  2. we take sequences of three consecutive points: A, B, C. If B is very close to the line between A and C we just remove B

We apply two times both these strategies and get much simpler contours. This is the final results.

Screenshot-from-2016-04-02-15-03-35-150x150Screenshot-from-2016-04-02-15-03-19-150x150Screenshot-from-2016-04-02-15-05-50-150x150

What next

Now we have a reasonable number of relevant points. I want to now proceed to classify them through machine learning techniques. For example I want to recognize single points to be a top left corner of a rectangle or points part of an arrow. Then I will proceed to combine those recognized points to obtain entire shapes (my rectangles!).

Right now I am already generating the images to classify and I am thinking about which features to use for machine learning. I have some ideas, but we will see them in one of next posts.

Training images looks like this:

point_1_109

Federico Tomassetti

Federico has a PhD in Polyglot Software Development. He is fascinated by all forms of software development with a focus on Model-Driven Development and Domain Specific Languages.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button