How to train a dragon, or can a software developer become an SRE — part 1

Maryna CherniavskaJune 4th, 2019Last Updated: June 3rd, 2019

1 43 7 minutes read

Preface

This is a story about a rather unusual experiment, which our company ran with me as a (willing) guinea pig, to try and retrain a software developer as an SRE. SREs (or DevOps, and there’s a controversy on whether it’s the same job or not) are a hot item right now, I think maybe even more so than data scientists (well I don’t have stats in hand to confirm it, that’s rather a one-sided view). Anyway, our company was desperately searching for SREs, and then the bright idea came to one head.

We have all those devs, and they are all technical people too, right? And they work with infrastructure too, only a bit on the other side, but at least they have some idea about it, right? And maybe retraining a senior developer would actually be easier and less costly than training a junior SRE?

However the idea came to be born, born it was, and moreover, it was implemented. (Spoiler alert: due to unforeseen circumstances, the experiment ran shorter than expected.)

But the initial idea was that the best training is hands-on, and that to make sense, it should happen along a 2–3 months period, and that the chosen software developer was to be given as an apprentice (call it a trainee if you wish, but I actually like the word apprentice here) to an actual SRE, working on one of the teams.

That chosen software developer was me, a backend programmer with about 15 years of experience, currently working with Java and Kotlin.

And this was how the story began.

Some disclaimers first.

To protect people’s sensitivity, I won’t give them their real names. Let them be called A., B., C. etc. in order of appearance. I understand it might make the story somewhat artificial, but these are my real thoughts, and they are about real people. So… bear with me. Anyway, I am the author. I have the power.

Another thing I must note is, the company made no assumptions that I will switch to be an SRE after the training. I was free to go back to development if I felt it suited me better. But in case I would say I found my vocation, the company was ready to embrace me in a new position.

21. January 2019

So, first day as an apprentice. I come to work and my “master” SRE, A., is already there. Turns out he comes to work before 8am, which I can’t possibly match since my commute alone takes almost an hour, and I also have breakfast at home. Oh well. I have stopped trying to impress people with my working hours long ago because I think it is not a good metric of your performance, but still in cases like this, I feel inadequate. Whatever.

A. starts to help me onboard by sending links to the Wiki pages I need to read, and at the same time we go through the tools I need, tools I already have installed, repos I need to check out etc. What’s good is that I have had some exposure to the tools: I have used Terraform (very little, true, but I know how the TF file looks at least), I already have my GPG key set up and A. only needs to add it to some projects and repos for me to be able to use them; I have an idea of what git and git-crypt is used for and though I am not fluent in console, I can do basic stuff.

Of course A. is working in the console with the speed of light, and of course he has his console split into 4 parts, each performing its own task.

Approximately like this. But worse because this one is mine. But you get what I mean.

And of course the font is so tiny it is unreadable for me, but he quickly corrects it without me needing to say anything. I suppose he noticed that my console and app fonts are scaled almost to the maximum. One more reason for me to feel inadequate: I can’t consume so much information at once. It is easier for me to use tabs in the console instead of splitting the windows, and it is also much easier for me to read stuff that doesn’t make me strain my eyes. But the fact that A. works like this isn’t SRE-specific: a lot of developers I know do the same. When I recall that most of them wear either glasses or contact lenses and I still don’t use any, I feel a bit better, but still not a lot.

A. also seems to be using Visual Studio Code as a main IDE to edit stuff. He’s happy that I use it also, — what I don’t mention is that I actually am a newbie in that as well. My IDE of choice is IntelliJ IDEA. It is perfect for Java and Kotlin, which are my main specialty. But IDEA is actually pretty slow when loading large projects, and it is also not very helpful with the terraform syntax, so lately I tried VSCode as a replacement for the configuration-only projects. But whereas I know my way around IDEA pretty well, as in — I remember a lot of standard shortcuts and have also set up some of my own, — I am basically still trying to blunder my way around VSCode.

Do you ever mistype stuff when someone is sitting behind your shoulder? A. is great and very patient, but I feel like I misspell the simplest commands (I challenge you to mistype git — but that is something I actually managed!).

I attend the standup and sit through weekly planning where I predictably don’t understand much. Not that the concepts are unfamiliar, but I just realise I have very little idea what this team is actually doing — what are their current goals, challenges etc. The team also seems to be following the sprint workflow which we don’t. I realise that I will probably have a much worse meeting-to-coding ratio that I’ve had previously.

By the end of the day, A. basically has hardly left my side. We have resolved a production JIRA to increase some read/write limits on the AWS Dynamo DB, which required a couple of line changes in the code containing the Terraform configuration and a few console commands to stage and apply the change of config. It wasn’t exactly difficult, but I hope I will just be able to remember it all tomorrow. My head aches because all the new information I’ve consumed is threatening to squeeze out of my ears. Mercifully, it is time for my master SRE to go home. I stay to type up my notes about the day and add some links that he shared with me to Kotlink (which is a tool Illia Sorokoumov invented and I swear by because I can’t possibly remember all the browser links I need. And no, Chrome Bookmarks aren’t the same). And then I go out into the freezing Berlin night.

This is not going to be easy.

But this is going to be manageable. Especially if my master SRE will continue to keep his cool.

I wonder what he’s thinking though. He probably might not share this opinion.

22. January 2019

Second day, more of the same. I find that I remember most of the stuff we did with Terraform but almost nothing about OpenShift which we also did a little. Makes sense, because I used Terraform at least a bit, but the OpenShift is completely new. I really need to do a dump of console commands and keep them as a reference.

I start to think that this will help a lot with my goal to get to be an architect at some point. Already, I see many more opportunities to think about the architecture and not the actual application logic.

Why? As a developer, I think I just always knew in theory that I needed to think about the architecture, but somehow I never actually tried to find holes in it. I could think of a basic design that satisfies the requirements but not about its cost or performance with live traffic — bottlenecks, interactions with other systems. And also, I was too little exposed to the actual behind-the-scenes setup and it was just easier to leave it to someone else because I either didn’t have access, or didn’t know how.

23. January 2019

SRE B. after a meeting: “If you have any trouble understanding any stuff in SRE meetings, I can always stay with you and explain whatever you need.”.
Me: “How much time do you have?…”.

25. January 2019

A. asked me whether I was excited to think of being on call at some point. No, not really… I think this ad-hoc thing is what puts me off most in the SRE job.

This is something we don’t think much about, but a lot of SRE work is putting out fires. The best SRE is the one that doesn’t allow many fires to happen, of course. But it’s not realistic to think that none will happen. Which means that sometimes you will still be fixing something in a hurry, because a disgruntled engineering manager is breathing down your neck because something just broke in production. And even if it’s not production and/or there’s no fire breathing manager, your flow will still be broken.

These notes are to be continued in the next article. Hope you found them interesting! If yes, let me know in the comments and I will continue. If not, then also let me know in the comments… and maybe I won’t!

Published on Java Code Geeks with permission by Maryna Cherniavska, partner at our JCG program. See the original article here: How to train a dragon, or can a software developer become an SRE — part 1

Opinions expressed by Java Code Geeks contributors are their own.