How we built the Cloud Infra team

Sat 30 January 2021 by Patrick Pierson

This was one of the greatest messages I have ever written. We filled 11 job requisitions between April 2020 and December 2020 for my team. The following is a write up on my take on how to build a Cloud Infrastructure (Cloud Infra) team. I won’t sugar coat anything in this write-up but I am very proud of what was accomplished. Hiring for a team wasn’t the hardest thing I have done in my life, but it took a lot of effort to get it close enough to right, in my opinion at least. As you read through this, take into consideration the following. "Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand." - The Prime Directive (http://retrospectivewiki.org/index.php?title=The_Prime_Directive).

What is Cloud Infra at IronNet?

My team’s job is an interesting one. We build, upgrade, break, fix, and maintain cloud infrastructure at IronNet. My team is involved in making sure the cloud infrastructure at IronNet is secure by working with our security team. We make sure our cloud infrastructure is running as cheaply as possible by working with our Finance team. We work hand in hand with Site Reliability Engineers and Sales Engineers to make sure what we build goes out to the customers. When we started hiring, my team was mostly focused on writing Python that uses the AWS (Amazon Web Services) SDK (Software Development Kit) (https://aws.amazon.com/sdk-for-python/) to interact with the AWS APIs (Application Programming Interface) to create resources in the cloud. Over the last few months, Cloud Infra has expanded into writing more Cloudformation and using Terraform on a daily basis.

What type of Cloud Infra person was I looking for?

Mark Manning (https://www.linkedin.com/in/markmanning/) is Global Recruiting & Engagement Director at IronNet. During this whole process, I talked with him or Nate Frickel (https://www.linkedin.com/in/nfrickel/) daily. When this process started, Mark asked me to help him build a “Performance Profile”. At first, I didn't really know what to put down. The performance profile is what applicants see on the IronNet website and also the their answers to questions on the application. While the positions were open for all different types of people from Mid-level to Senior, all were needed to be filled by what I called a “Cloud Person”.

In my opinion, a “Cloud Person” is someone who likes to tinker with the Cloud. AWS was preferred but we did not rule someone out if they used Azure or Google Cloud. They should be able to explain what Continuous Integration Continuous Deployment (CICD) systems are and why everyone should be using them. I did not have a specific requirement for an Infrastructure as Code language but people who annotated one on their resume were interviewed first. IronNet uses Kubernetes for a lot of its products and we decided that some knowledge of Kubernetes is required for Cloud Infra members. Lastly, some knowledge of a modern programming language was needed.

On top of all of that, I found myself very interested in people that were able to talk passionately about projects they were working on. We interviewed and hired Max Anderson (https://mcbanderson.com/) who is part of the team that created Reflex (https://github.com/reflexivesecurity). Reflex is a project that lets you enforce security best practices on AWS accounts. Max and the others we hired are all who I consider being Cloud people. They know parts of the Cloud in and out and are not afraid to show it.

Cloud Infra Then and Now

Then

Before we started expanding the team it was five Full Time Employees (FTEs) and one contractor. The team was focused on the Cloud Infra product backlog which consisted mostly of FedRAMP, Azure, and other issues involving the deployment of IronDefense, IronNet’s core product, on the Cloud. We did agile ceremonies (https://en.wikipedia.org/wiki/Agile_software_development) on a two-week basis. We struggled to work on those issues in a timely manner and found a lot of our efforts were overcome by events (OBE). One example of an OBE effort that still stings a little today was our hope to run an environment for all engineering teams that we were going to call Cloud Services. We attended another team's agile ceremonies and planned to implement Elasticsearch and Kafka clusters for them. As time went on, we were taking too long and they implemented those clusters for themselves. In hindsight, had we had the team we have now, I would have been able to embed someone on their team long term to help get those clusters set up for that team but also in a way that other teams could benefit from them as well if needed.

Now

Today Cloud Infra has 14 people, eleven FTEs, and three contractors. We have moved into more of a functional team able of providing team members to the other agile teams with the internal team goal of coordinating all cloud infrastructure efforts. We are also providing the infrastructure for and champion CICD for all of Engineering. We are just getting started here but I see a great future for us by being a “Center of Excellence” for CICD and Cloud Infrastructure for all of IronNet.

6 + 11 does not equal 14

If you do some math on the then vs now numbers you will quickly realize it does not equal 14 hires. Hindsight is 2021 (noone wants to talk about 2020). I wish I could go back in time but we have to move on. IronNet had a rough year like everyone else.

On a much happier note, I also had two team members promoted into new roles. One now runs our Software as a Service (SaaS) effort and the other runs the Sensor team. I am very happy they are in those roles and did not mind putting in more effort to backfill them.

Thanks for making it this far. The rest of this will explain in detail how we hired the team.

Hiring a Cloud Infra team

Application Phase

I found the application that people submitted to be the biggest factor in my reviewing their resumes. Originally, we had questions that would automatically disqualify people but I had that removed. I reviewed all resumes that came in after taking note as to their “score”. To explain that a little better, one question on the application was “Are you proficient with Kubernetes?”. Over time I found that this open-ended question could be answered in a couple of different ways. It was a yes/no question but I used this question during my screening phone call. If someone said yes to the question I asked them about it and I wanted their input on what they meant by proficient.

Another question I had on the application was their Github username. Github (https://github.com/) is a company that provides source code hosting. If applicants did not put their Github username here that was fine, but I found on average applicants who entered their Github username but were not active on Github did not make it very far in the process. I personally believe Github is a software developer’s business card. I use it to show that I am involved in open-source projects and it makes it easy to see my activity to both open source and private projects. I know not everyone uses Github which is why it was not a disqualification if an applicant did not enter their username or have an active profile. I just found it interesting that the applicants that entered a username but were not active on Github did not make it very far in the application process.

One of the biggest things I found myself looking for were keywords like “Terraform” or “Cloudformation” on resumes. Both of these keywords are Infrastructure as Code tools. Nine times out of ten I would find these keywords on a resume which if they also answered yes to the Kubernetes question. I would then move them to the phone screening phase.

Lastly, we asked if they were AWS certified. This question also did not disqualify them but if they were AWS certified, answered yes to the Kubernetes question, had Infrastructure as Code tools in their resume, and had an active Github profile, I wanted to talk to them as soon as possible. If they did not have all of the above it was still very likely that I would schedule a phone call, but they were not at the top of my list at this time.

Phone Screening Phase

I believe I conducted about 115 phone screenings at 30 minutes apiece. I won’t say this is the most important step, but like the other phases it is very important. Depending on what the applicant had checked on my list above I would start by pressing them on those points. My reasoning was validating what I am being told first. If they said yes to the proficient with Kubernetes question, I wanted to know what they meant by proficient. I left it open to their interpretation. I honestly really liked that because in some cases I got very detailed responses about what proficient was to them. In other cases, I would get a one-sentence answer which made me think they did not actually know very much about Kubernetes.

I would then press them on the Infrastructure as Code tools they knew. If they used Terraform, I would ask them how they store state and lock others from deploying while they are deploying their terraform. I used this question because it should be a very simple question to anyone that uses Terraform on AWS. It would however catch a few folks that were potentially not fully up to speed on Terraform on AWS. My goal was to get them to talk about S3 state storage and locking the deployment with a DynamoDB table. If they used Cloudformation, I would ask them how they could lock one developer from impacting the resources of another developer. My hope here was to get them to talk about Tagging instances and properly setting up IAM to block other developers from impacting those resources.

Overall, my goal during the phone screening was to get them to talk about themselves. If they did not want to talk about themselves I would try to pull it out of them. This may have biased me towards people that are more outgoing but communication is crucial now that everyone is working from home. I need people that are comfortable enough to be able to talk about their knowledge and accomplishments.

Code Challenge Phase

This phase was fairly easy to orchestrate thanks to us using https://www.qualified.io/. We had two parts to the challenge. One was a fairly simple complementary DNA test that returned the compliment to the given DNA string. The second was a test to return the first non-repeating letter in a sequence. Between the two tests, it should take no longer than an hour or so to complete but I did not time anyone and told them there was no time limit.

At first, I moved forward on anyone that got 80% or more as a combined score. After talking to a few folks on my team, I decided to make the requirement 100% on both tests because we determined that previously others in various parts of the company passed with 100% and we would be doing them a disservice if we hired anyone that scored less than that.

I really enjoyed using qualified.io because it enabled me to see how well an applicant did in the test as well as showed me if anyone dropped off.

Pairing Session Phase

I found this section to be the second most important. This was usually between someone else on the team, the applicant, and myself. After they passed the Code Challenge Phase we would have the recruiting team share with the applicant the following line: This is a one-hour paring session to implement some part of the to be provided infrastructure using AWS CloudFormation, Terraform, or the AWS Software Development Kit (SDK). We wanted the applicant to know it was an Infrastructure as Code challenge but specifically named the AWS SDK as one of the options to allow them the freedom to use pretty much any tool they are familiar with.

Usually, this session was conducted with one of my team’s senior members, but from time to time it was a great learning experience for some of the junior team members. I wish I had thought about including them earlier in the process, as I was only able to get two of the junior team members to join.

We would share with the applicant that while this was about their ability to pair with a team, we were also looking for their ability to identify the problem and solve it with Infrastructure as Code tools.

Interestingly enough, one of the key takeaways I had here was we found Windows users struggled with the pairing session more often than Linux and Mac users. We never fully identified why and we did not hold a bias when someone was using Windows. I just know that more often we had Windows users not pass this phase. My metrics leaned closer to a higher failure rate of Windows users vs Linux and Mac users.

The challenge was to create a two-tier web app using two pre-built docker containers. One connected to a MySQL database. The second connected to the first and pulled data from it. We had a requirement that the database be highly available and that both containers be available via a single endpoint. We usually found applicants would choose to use AWS’s Relational Database Service (RDS) in a MultiAZ deployment. Often the applicant would use ECS although we would have the occasional applicant use EC2 instances running docker or EKS. Lastly, most put whatever infrastructure they stood up behind an AWS Load Balancer to provide the single endpoint.

Ultimately the goal was their ability to pair. We had many people only stand up 25% of the infrastructure we presented to them in the hour. As long as they were very communicative and at least understood what they were trying to do, we were pretty confident we wanted to move them to the next phase.

Panel Phase

This phase took the biggest investment of my team and IronNet’s time because it was a panel interview. I had, at a minimum, three people from my team and two people from other teams attend. Usually, I would have someone from the Technical Services team and our SaaS team attend because they are the direct customers of what we build. I tried to get the same people to attend each panel for consistency’s sake. We found that when the recruiting team met with the applicant a week before the panel to share with them the agenda and encourage them to build a presentation, that the applicant on average would do better.

The first ten minutes was a “Who and Why”. We wanted to know what the applicant wanted us to know about them as a person. Some of my favorite things I found out about applicants were things like their history of being in a heavy metal band. Another applicant built a very popular open-source service. After that, we would have them explain to us why they are the right person for the role. I liked this part of the interview because it helped me understand where people thought they fit into the organization. Lastly, we wanted to know why they were interested in IronNet. I truly do not think there is any wrong answer here but no one ever said “because I like money”. We found most people like the idea of IronNet’s mission and that is why they wanted to be part of it.

Next, we had them “Whiteboard the Challenge”. This was a 25-minute exercise where we took the pairing session we had them participate in and have them whiteboard out any of the architecture they had not completed during the one-hour session before. This allowed us to see that they truly understood what was being asked of them in that session and allowed them to present their ideas on how to properly build that environment. We really wanted to see what they learned from the challenge as well.

The last part of the panel was a 25 minutes focus on a “Successful Development Example”. I wanted to do this because I had found it so easy for people in my past to talk about things they did not like in their previous roles. I wanted to have people talk about stuff in their history that was successful and that they were proud of. We wanted to know how they designed it and talk about things they learned from building it.

Before we ended with the applicant, we would ask any open questions the team had and give the applicant a chance to ask questions of the team. Once that was complete we would say goodbye to the applicant and then decide. Each panel participant would be given a chance to say yes or no and the collective would decide on the next steps. Once this was decided I would work with our recruiting team to get the person hired if the panel wanted to move forward.

What I learned

  1. Hiring takes a lot of time but it will be worth it if you put the effort into it.
  2. If you involve others along the whole way there is no way you can screw it up. The IronNet team as a whole made my hiring efforts successful.
  3. This process I followed is not complete and I will always continue to update the process.
  4. Be honest with the applicant. They appreciate it if you tell them on the call that they are not what you are looking for.

Hopefully, this helps explain one of the many different hiring processes out there