Farm Credit Services of America (FCSAmerica) is a great place to work. You see it in our people and the relationships they have with each other and our customers. Our passion and commitment to serving both rural America and each other is key to our success in the marketplace.
We are looking for an experienced Solution Reliability Engineer to join the Solutions Enablement team. In this technical role, you will play an integral role at the intersection of software development and infrastructure/system engineering. Your work will directly contribute to the creation of highly available, resilient and performant solutions and infrastructure components that are highly available and fault tolerant.
Ideal candidates will have experience with an application performance monitoring tool such as NewRelic. Candidates may have experience with SQL Server Management Studio, Git and Microsoft Azure DevOps. A working knowledge of Salesforce platforms and .NET is a plus as well.
- Implement practices, tools and processes that increase visibility into how solutions are performing.
- Select, support, and maintain tools that enable proactive monitoring of solutions.
- Partner with business and technology teams to establish the appropriate level of reliability and availability of applications.
- Create consistent ways to track and manage reliability targets.
- Evolve tooling to create capabilities that enable the aggregation of logging data across distributed systems to improve the traceability of solutions.
- Serve as a technical champion for application and infrastructure reliability and availability.
- Collaborate with technology teams to develop improved resiliency capabilities at all levels of solutions.
- Provide technical expertise to support key initiatives related to solution reliability.
- Collaborate with teams to consult on building fault tolerant solutions to improve availability.
- Support and evolve postmortem processes to drive meaningful outcomes.
- Contribute to the creation of guidance that improves the traceability of applications, developing a common approach for increasing the observability of solutions through monitoring, logging, and alerting.
- Troubleshoot problems or flaws affecting the availability, reliability, performance, and efficiency of components and features.
- Implement tools and processes that drive down Mean Time to Discovery and Resolution (MTTD/MTTR) to minimize the impact of future errors.
- Drive operational efficiency with automation.
- Develop highly resilient deployment strategies that reduce deployment risk and increase operational efficiencies.
- Create automation and processes to enable teams to deploy, configure, scale, and monitor applications.
- Automate manual, repetitive tasks in support of solution resiliency.
- Bachelor’s degree in Computer Science, Management Information Systems, Data Sciences, or equivalent.
- 5+ years’ relevant technical experience in software development, network engineering, or systems administration/engineering.
- Experience with application monitoring and profiling tools.
- Foundational understanding of distributed systems designs and interactions between cloud technology layers and components.
- Experience with scheduling platforms, continuous builds, and the use of source control repositories.
- Experience with logging/application monitoring tools.