I wrote this post in July 2017, and I just found it (4 years later) in my Drafts folder. What a great journey down memory lane it was to read this today. Ask 2017 me if he knew what he’d be into 4 years on, and…yeah. Life moves pretty fast.
We have been doing a lot of hiring lately — I am lucky to be at such a company. It feels like that’s all we’ve done over the time that I’ve been at Red Hat. In every interview I am routinely asked what is it like to work at Red Hat. Mostly I’d pass on a few relevant anecdotes, and move on.
As I’ve just come up on my 8 year anniversary at Red Hat, I thought I would write some of this stuff down to explain more broadly what it’s like to work at Red Hat, more specifically in the few groups I’ve been in, more specifically my personal experience in those groups…
How did I get here?
In 2007, I met Erich Morisse at a RHCA training class at Red Hat’s Manhattan office. Erich was already a Red Hatter, and somehow I ended up with his business card. Fast forward a year or so, and…
I got married in 2008 in Long Island, NY. Within 3 months of that, I applied at Red Hat and flew to Raleigh for the interview. Within 4 months of that, my wife and I moved to Raleigh, and I started at Red Hat on July 20, 2009, as a Technical Account Manager in Global Support Services. I was so excited that they’d have me that the 15% pay cut didn’t bother me.
Life as a TAM
I think there are still TAMs in Red Hat, but I won’t pretend to know what their life is like these days. My experience was filled with learning, lots of pressure and lots of laughs. We had a great group of TAMs…I am still in contact with a few, even 6+ years later. As a TAM we are ultimately tasked with keeping Red Hat’s largest accounts happy with Red Hat (whatever the customer’s definition of happy is). That can mean a variety of things. I personally found that the best way to build and maintain a good relationship was to be onsite with the customer’s technical team as much as possible. While that meant a lot of 6:00am flights, I think it ended up being worth it if only to build up the political capital necessary to survive some of the tickets I’ll describe below.
At the time, TAMs carried about 4-6 accounts, and those accounts largely came from the same vertical, whether it was government, military, national labs, animation studios, and my personal favorite FSI (financial services industry). I gravitated towards the FSI TAMs for a few reasons:
- They were the most technical
- They had the most pressure
- I felt I’d learn from them
I ended up moving to that sub-group and taking on some of the higher profile banks, stock exchanges and hedge funds as my accounts. Supporting those accounts was very challenging for me. I was definitely in over my head, but actually that is where I thrive. For whatever reason, I naturally gravitate towards pressurized, stressful situations. I think there was an experience at a previous job at a datacenter operator (where we were constantly under pressure) that made me learn how to focus under duress and and eventually crave pressure.
I’ll relay two stories from my time as a TAM that I will never forget.
- 2010: Onsite for a major securities exchange platform launch (moved from Solaris to RHEL). This led to one of the nastiest multi-vendor trouble tickets I was ever on. That ticket also introduced me to Doug Ledford (now one of the Infiniband stack maintainers) and Steven Rostedt (realtime kernel maintainer, sadly now over at VMware). In retrospect I cam see how much I grew during the lifetime of that ticket. I was getting access to some of the best folks in the world (who were also stumped). Helping debug along with them was truly an honor. I think we went through over 40 test kernels to ultimately fix it.
- 2011: A customer purchases a fleet of server gear that has buggy NICs in every aspect. Firmware is terrible. Drivers are not stable or performant. While the hardware issues were not on my plate, certainly the drivers in the kernel that Red Hat was shipping were very much my responsibility. In this situation, I made several trips out to the customer to ensure them that everything was being done to remedy the situation. I knew this was a serious issue when each time out there I was presenting to higher and higher ranking management. We worked with that vendor daily for quite a while. They fixed bugs in both firmware and driver (upstream), Red Hat kernel folks backported those patches and we tested everything onsite. I don’t know if we got to 40 kernels, but it was at least 20. Plus a dozen or so firmware flashes across roomfuls of machines. This scenario taught me:
- I needed to up level my public speaking experience if I was going to be in rooms with highest levels of management. To do this I joined local Toastmasters club along with another TAM. That other TAM founded Red Hat’s own chapter of Toastmasters, and I was the first to speak at it.
- I should get more hands on experience with high end hardware itself so that I could relate more to the customer’s Ops folks. I ended up working with some gear loaned to me by Red Hat Performance team. They always seemed to have the cool toys.
- More about tc, qdiscs, network buffers, congestion algorithms and systemtap than I’d care to admit.
At time time, I felt like I barely survived. But feedback I received was that I did manage to make the best of bad situations, and the customers are still customers so…mission accomplished. I also became the team lead of the FSI TAMs, and began concentrating on cloning myself by writing documentation, building an onboarding curriculum and interviewing probably 3 people a week for a year.
Becoming a performance engineer
After working with those exchanges, I knew a thing or two about what their requirements were. I got a kick out of system tuning, and wanted to take that to the next level. My opportunity came in a very strange way. Honestly, this is how it happened…I subscribed to as many internal technical mailing lists as I could. Some were wide open and I began monitoring them closely to learn (I still do this).
One day a slide deck was sent out detailing FY12 plans for the performance team. I noted buried towards the end of the deck that they planned on hiring. So, I reached out to the director over there and we had about an hour long conversation as I paced nervously in my laundry room (it’s the only place I could hide from my screaming infants). At the time, that team was based in Westford, MA. I flew up there and did a round of interviews. Within a few days, I was hired and planning my transition out of the support organization.
I believe what got me the job was that I had learned so much low level tracing, and debugging hackery while supporting the FSI sector that I ended up doing very similar work to what was being done on the performance team. And that experience must have shone through.
Being a performance engineer
I remember my first project as a performance engineer: help the KVM team to see if they could use ebtables to build anti-spoofing rules into our hypervisor product called Red Hat Enteprise Virtualization. I remember thinking to myself…oh shit…what is RHEV? What is ebtables? I was under pressure again. Good. Something familiar, at least. To help out the RHEV team I had to quickly learn all of the guts of both topics as well as build load/scale tests to prove out whether it would work or not. I’ll skip to the punchline though…ebtables is abandonware, even 6 years ago. No one cares to fix anything and it’s been on the guillotine for a long time. Based on the issues encountered, I might have been the first (only?) person to really performance and scale test it.
This initial experience was not unlike most experiences on the performance team:
- You generally have no clue what the next project will require, so you get very good at soaking up new material.
- Don’t be surprised…you are likely the first person to performance or scale test a feature. Get used to it. Developers develop on their laptops.
Most of that is still true to this day — although as time went on, I learned to be more proactive and to engage not only with developers about what they’re working on, but also religiously reading LWN, attending conferences like LinuxCon and like I mentioned, subscribing to as many mailing lists as possible.
The biggest project (not for long) I had on this team was the initial bringup of RHEL7. I look back with great fondness on the years 2012-2014 as I was able to see the construction of the world’s leading Linux distribution from a very unique vantage point: working with the very people who “make RHEL feel like RHEL”. That is … debating over kernel configs…backwards compatibility discussions…working with partners to align hardware roadmaps…GA/launch benchmark releases…can we do something like kSplice…will we reduce CONFIG_HZ.
This last bit brings me to the part of RHEL7 that I had the most to do with…timers. As the vast majority of financial transactions happening on stock exchanges occur on RHEL, we had to pay very close attention to the lowest levels of performance. Timers are an area only the smartest, bravest kernel developers fear to tread. Our goal was to build NOHZ_FULL and test the hell out of it. Nowadays we take this feature for granted in both the financial industry as well as telco where without nohz_full (I am told), all the worlds packets will be a few microseconds late. And that is not good.
You can see some of my nohz_full work here (or read the RHEL docs on the subject, as I wrote those too).
While Red Hat was not my first job, I do consider Red Hat my first (job) love. It is the first job I had that I’d call career-worthy, in that I could see myself working here for a while (there was plenty of work and the company was growing).