About Vlad Mihalcea

Vlad Mihalcea is a software architect passionate about software integration, high scalability and concurrency challenges.

Why Unix utilities are worth learning

Why VIM?

Sooner or later there comes the day when your easy-to-use IDE becomes useless for handling huge files. There aren’t many editors capable of working with very large files, like production logs for instance.

I’ve recently had to analyze a 100 MB one-line JSON file and once more VIM saved the day. VIM, like many other Unix utilities, is both tough and brilliant. Git interactive rebase requires you to know it, and if you’re still not convinced, maybe this great article will make you change your mind.

Let’s see how easily you can pretty print a JSON file with VIM. First we will download a one-line JSON file from Reddit.

$ wget http://www.reddit.com/r/programming.json
--2014-01-24 12:21:04--  http://www.reddit.com/r/programming.json
Resolving www.reddit.com (www.reddit.com)... 77.232.217.122, 77.232.217.113
Connecting to www.reddit.com (www.reddit.com)|77.232.217.122|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28733 (28K) [application/json]
Saving to: `programming.json'

100%[======================================>] 28,733      --.-K/s   in 0.03s

2014-01-24 12:21:04 (1021 KB/s) - `programming.json' saved [28733/28733]

This is how it looks like:

vim_json_one_line2

Pretty printing

Python comes along with most Unix distributions, so running the following VIM command manages to do the trick:

%!python -m json.tool

vim_json_pretty

Let’s save the pretty printed JSON file and put other Unix tools to work.

:w programming_pretty.json

Matching time

Let’s say we want to extract all “domain” related values:

"domain": "mameworld.info"

Sed to the rescue

$ sed -nr 's/^.*"domain":\s*"(.*?)".*$/\1/p' <programming_pretty.json | sort -u
blog.safaribooksonline.com
chadfowler.com
cyrille.rossant.net
dot.kde.org
evanmiller.org
fabiensanglard.net
galileo.phys.virginia.edu
github.com
halffull.org
ibuildings.nl
jaxenter.com
jobtipsforgeeks.com
kilncode.com
libtins.github.io
mameworld.info
miguelcamba.com
minuum.com
notes.tweakblogs.net
perfect-pentago.net
periscope.io
reuters.com
tech.blog.box.com
tmm1.net
vocalbit.com
youtube.com

Multi-line matching

Sed is line oriented, and while it offers multi-line support, it’s no match for perl. Let’s say I want to match all authors in the following JSON pattern:

"data": {   
   "author": "justrelaxnow", 
}

This is how I do it:

$ perl -0777 -n -e 'print "$2\n" while (m/("data":\s*\{.*?"author":\s*"(.*?)"[,|\s*\}].*?\},)/sgmp)' programming_pretty.json | sort -u
AmericanXer0
azth
bionicseraph
bit_shiftr
charles_the_hard
Gexos
jakubgarfield
johnwaterwood
joukoo
justrelaxnow
Kingvash
krets
mariuz
mopatches
nyphrex
pseudomind
rluecke3
sltkr
solidus-flux
steveklabnik1
sumstozero
swizec
vocalbit
Wolfspaw

Conclusion

Unix tools are old school, some of those being written forty years ago. The learning curve might be steep, but learning them is a great investment. A great software library stands the test of time and Unix tools are a good reminder that tough jobs call for tough tools.
 

Reference: Why Unix utilities are worth learning from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

4 Responses to "Why Unix utilities are worth learning"

  1. Tomasz N. says:

    You don’t have to use vim for interactive rebase in git, see GIT_EDITOR or core.editor configuration properties. Moreover in order to format JSON using Python script you don’t have to open that file in vim in the first place. You can simply pipe file to script and then pipe results to second file.

  2. Indeed you don’t need VIM for git or for formatting json. That vim command is actually calling python to do its job. But VIM is installed almost anywhere and it’s worth knowing its basics.

  3. Jan Andersen says:

    So much fail.

    1. I didn’t see a single justification for dragging Vim into this article.

    2. Vim is running on my Windows PC so it is not like it is specific to Unix.

    3. The sed expression is overly complicated. There is no need to aanchor the expression with ^ and $.

    4. FINDSTR (on Windows) combined with an FOR /F loop to tokenize the output would have done the same job. Again, not specifically Unix.

    5. Perl (and) have been ported to Windows, so again, not really Unix-specific.

    No, the take-away here is to use the best tool for the job. GUI tools are fine, but for some jobs it pays to leave your comfort zone and wield the power of the command line.

  4. I used all those tools on windows, yet they are still Unix tools. Once learned you can use them on any OS these days, so it’s really useful to know them.

    1. Vim manages to open a 1GB XML file, which I can grep/sed or do anything I want with it. How many IDEs can do that?
    2. I ran Vim on windows and yes it is a Unix tool.
    3. I want an exact match that’s why I used the ^ and $.
    4. FINDSTR and FOR/F windows batching is the “poor man’s batch utilities”. For OS independent batch scripts I prefer using Python. It’s way more powerful and much more readable.
    5. Quoting Wiki: “Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.” [http://en.wikipedia.org/wiki/Perl]

    So Perl, Vim, Sed, grep and many other Unix original tools are available on Linux, Mac and Windows and for a very good reason. They are handy and practical.

Leave a Reply


six − = 5



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close