Transportation drives humanity forward. At Stratio we have a purpose: to change the transportation industry. We believe in a future with no disruptions, where vehicles never break down, a zero downtime future. For that, we rely on great individuals and great teams.
The Site Reliability Engineer (SRE)/SysAdmin is responsible for keeping all production systems running smoothly. SRE at Stratio is a blend of pragmatic operator and software craftsperson that applies sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase. You should be specialized in systems, whether it be networking, Linux, or some more specific interest in scaling, algorithms, or distributed systems.
- Assist in the configuration of hardware and software;
- Research and identify solutions to software and hardware issues;
- Diagnose and troubleshoot technical issues, including account setup and network configuration;
- Provide technical support, either via phone, email, or chat, until they’ve solved a technical issue;
- Properly escalate unresolved issues to internal teams (e.g. software developers);
- Provide prompt and accurate feedback to customers;
- Ensure all issues are properly logged;
- Refer to internal database or external resources to provide accurate tech solutions;
- Prioritize and manage several open issues at one time;
- Document technical knowledge in the form of notes and manual;
- Monitor and log core infrastructure;
- Manage and scale complex distributed systems;
- Solid foundation in deployment and management for large scale Linux systems;
- Strong Linux system-level analysis capabilities;
- SRE/SysAdmin experience and comfortable operating software in a Linux based environment;
- Understand large-scale complex systems from a reliability perspective;
- Experience collecting system and application metrics for observability (Nagios, Prometheus, Syslog);
- Deep network analysis experience;
- Knowledge and experiences about highly available and scalable architectures;
- Familiarity with container orchestration and containerization services, especially Kubernetes and Docker;
- Have experience with web servers (Apache, Nginx, HAProxy)
- Passion for solving problems using open source software;
- Ability to work under pressure;
- Fluency in English.
- Familiarity with Ansible, Puppet, or Chef configuration management tools;
- Familiar with at least one Cloud environment, for example, AWS, GCP, or Azure (AWS certifications is a plus);
- Experience in software engineering and automation;
- Familiar with Infrastructure as Code (Terraform or similar);
- Experience in managing and deploying Apache Kafka;
- Experience in managing and deploying Elasticsearch;
- Experience with Redis cache service;
- Experience in managing Databases (Postgres, SQLServer);
We expect you to:
- Mentor and grow elements of the team with less experience;
- Be responsible for the specification of new features and improvements;
- Design and implement the new features/tools, always thinking of performance, scalability and reliability;
- Always keep searching for new tools and frameworks;
- Be able to work completely autonomously.
You can find our Culture Manifesto and more team information here.