As a Site Reliability Engineer for the Cloud Center of Excellence team located in Singapore, you will apply software engineering skills to progress, protect, and provide for the software and systems behind all Thales AWS infrastructure ecosystem, with an ever-watchful eye on their availability, latency, performance and capacity. You will drive reliability and performance across a massive scale by mastering the full depth of the stack.
A week in the life of a Site Reliability Engineer:
• Working on key initiatives to help the operational scaling and growth of the non-production and production services.
• Contributing to and maintaining engineering and system standards.
• Providing expert diagnosis and review of the live service and application performance.
• Providing deep support to the live service with emphasis on service reliability and mitigation over break and fix.
• Perform regular performance tuning, technological watch and updates on service platform.
• Identifying and developing processes, tools, automation, and software changes to address top operational issues.
• Working in close collaboration with engineering and support operations teams to shape the future roadmap and establish strong operational readiness across teams.
Knowledge, Skills and Experience:
• To succeed at this job, you must have a good understanding of AWS cloud services and DevOps practices.
• You must have past experiences on Linux based OS, Python programing language, automation technics (e.g.: Chef TerraForm), code versioning like Git, CI/CD chain and monitoring/logging like Splunk.
• You must be used to manipulate code to improve system availability and latency and to optimize code for stability, functionality and scalability for a segment of Thales AWS infrastructure.
• We would like someone to join our team who love “Infrastructure as a Code” and that think about “Server Immutability”.
Click on "I'm Interested" to apply now!