Are you looking for your next big challenge?
For more than 90 years, Caterpillar Inc. has been making sustainable progress possible and driving positive change on every continent. Customers turn to Caterpillar to help them develop infrastructure, energy and natural resource assets. With 2018 sales and revenues of $54.7 billion, Caterpillar is the world’s leading manufacturer of construction and mining equipment, diesel and natural gas engines, industrial gas turbines and diesel-electric locomotives.
Caterpillar is investing in our digital future, and we’re looking for the best Site Reliability Engineers. Our iconic products have evolved from mechanical work horses to highly sophisticated, electronically-controlled worksite solutions. This transformation, along with our smart factories and our integrated dealer network, has a wealth of data ready to be leveraged by our customers and our dealers. Think you have what it takes to develop the software and architect the platform to support Caterpillar’s digital revolution?
We at Caterpillar Digital are working to put together the Digital platform for delivering industry-leading digital solutions in support of profitable growth for Caterpillar, dealers & our end customers.
Come join us in this exciting journey and be part of the world class organization and play a key role in its digital transformation.
Roles & Responsibilities :
Reliability in highly complex, integrated systems typically crosses between multiple programming languages, third-party services and integrations – as well as software and hardware – an SRE needs to be multi-talented and who
• Think about systems - edge cases, failure modes, behaviours, specific implementations.
• Debug production issues across services and levels of the stack.
• Make monitoring and alerting alert on symptoms and not on outages.
• Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
• Have an urge to collaborate and communicate asynchronously.
• Have an urge for delivering quickly and iterating fast.
As SRE technical lead, you will be in charge of a team of people dedicated to proactively building reliability into the product. Responsibilities are but not limited to
• Lead, coach, and train other members of the team.
• Participates in System Engineering Intake Process
• Manage Escalations through ITSM process .
• Meeting SLO, SLA, SLI’s defined.
• Setting task prioritization.
• Manage on-call /on-rotation.
• Improve Service observability.
• Proactively testing the flexibility and resilience of the system.
• Disaster Recovery and High Availability strategy
• Lead Innovation and enhancements in DevOps process.
• Bachelor’s degree, preferably in Computer Science, Software Engineering, or any other Engineering field.
• 8+ years with DevOps with Production Support expertise
• Knowledge in architecting CI/CD solution on any platform with prior experience is must.
• Expertise in at least one technology stack designing, coding, testing, and delivering software.
• Working knowledge of Infrastructure components. (E.g. routers, load balancers, cloud products, container systems, compute, storage and networks).
• 8+ years experience on Key AWS services: EC2, S3, VPC, Route 53, RDS, CloudFormation, EC2, DynamoDB (NoSQL), Lambda, logging/CloudWatch, IAM, Certificate Manager, ELB, EBS, ECS, CloudFront/WAF, SQS, SNS, SES.
• 8+ years expertise in ELK Monitoring Tool or any other Open Source IT monitoring, network monitoring, server and applications monitoring.
• 8+ years prior experience in DevOps and/or application development teams. Hands on experience using large scale software development, preferably in one of these languages: Java, Python, scripting languages is a must.
• Understanding on Docker and at least one Docker Container orchestration – ECS, Kubernetes
• Understanding of configuration Management tools like Ansible/Puppet/Chef/PowerShell/Terraform.
• Understanding of Git, Bitbucket, Jira, Jenkins, Sonar, Splunk, Maven, AIM and/ or Continuous Delivery tools.