DevOps Engineer (SRE)

Hytech Lihat semua pekerjaan

  • Kuala Lumpur
  • Tetap
  • Sepenuh masa
  • 2 hari lepas
About HytechHytech is a leading management consulting firm headquartered in Australia and Singapore, specialising in digital transformation for fintech and financial services organisations. We deliver end-to-end consulting services and provide robust middle- and back-office solutions that enable our clients to optimise operations, enhance efficiency, and stay ahead in a fast-evolving digital landscape.With more than 2,000 professionals worldwide, Hytech has a strong and growing international presence, with offices across Australia, Singapore, Malaysia, Taiwan, the Philippines, Thailand, Morocco, Cyprus, Dubai, and beyond.Responsibilities(Business Continuity & High Availability Architecture)
  • Define, implement, and operate SRE practices, including SLA/SLO/SLI design, availability, connectivity, and disaster recovery strategies
  • Lead architecture design and execution for high availability, high concurrency, and large-scale systems (e.g., microservices, service mesh, multi-active/multi-region)
  • Drive system observability, security compliance, and cost optimization (e.g., cost allocation and governance)
  • Design resilient architectures for mission-critical systems with high availability, elasticity, and fault tolerance
(Observability, Monitoring & Reliability Engineering)
  • Build observability platforms using tools such as Datadog, Prometheus, OpenTelemetry, logging systems, and alerting platforms (Flashcat/Nightingale)
  • Implement full-stack monitoring across applications, infrastructure, and business metrics to enable precise issue detection
  • Establish proactive monitoring systems with alerting, anomaly detection, and automated remediation capabilities
  • Lead incident management (P1/P2), including rapid recovery, root cause analysis (RCA), and continuous improvement mechanisms
(Platform Engineering & Efficiency Optimization)
  • Plan and implement platform engineering strategies to improve scalability, availability, and performance
  • Build standardized platforms for system reliability, observability, and security while optimizing cost efficiency
  • Design and optimize CI/CD pipelines (e.g., GitHub Actions, Jenkins, ArgoCD, Helm) to improve delivery speed and quality
  • Establish standards for containerization, middleware, and deployment processes, ensuring scalability, reliability, and high availability
  • Resolve system bottlenecks through capacity planning, performance tuning, and reliability improvements
(Technology Leadership & Collaboration)
  • Deeply collaborate with business and engineering teams to embed reliability, observability, scalability, and security into system design
  • Lead the definition and implementation of technical standards, security baselines, and quality control mechanisms
  • Drive best practices adoption, tooling standardization, and engineering efficiency improvements
Key Requirements
  • 5+ years in SRE / DevOps / Platform Engineering or related roles
  • Proven experience in designing and operating high-availability, large-scale systems
  • Cloud platforms: AWS (EC2, EKS, IAM, S3, VPC, NLB/ALB, RDS, ElastiCache), or equivalent (Azure/GCP)
  • Infrastructure as Code: Terraform / CloudFormation
  • CI/CD & automation: Jenkins, GitHub Actions, ArgoCD, CodeBuild, Helm
  • Containerization: Docker, Kubernetes (K8s)
  • Observability: Metrics, Logs, Traces (e.g., Prometheus, OpenTelemetry, Datadog)
  • Strong system thinking and analytical problem-solving capability
  • Excellent cross-functional collaboration and communication skills
  • Self-driven with strong ownership and continuous improvement mindset
(Nice to Have)
  • Experience in fintech, payments, or high-security environments
  • Experience with high-concurrency, low-latency system design
  • AI-driven operations (AIOps) or automation experience
  • Certifications (e.g., AWS, CKA/CKS)
  • Experience with large-scale systems or international project delivery
What We Offer
  • Easy access to public transportation (LRT & KTM).
  • Transportation allowance.
  • Corporate insurance coverage, including dental, optical, and outpatient claims.
  • Gym and fitness claims.
  • Ongoing training and development opportunities.
  • Exposure to exciting projects that support career growth and professional development.

Hytech

Pekerjaan yang sama

  • Cloud Infra/DevOps Engineer

    AvePoint

    • Kuala Lumpur
    Beyond Secure. AvePoint is the global leader in data security, governance, and resilience, going beyond traditional solutions to ensure a robust data foundation and enable organiza…
    • 3 hari lepas
  • Senior CEW Developer (DevOps Engineer)

    Accenture

    • Kuala Lumpur
    About Accenture Accenture is a leading global professional services company that helps the world's leading businesses, governments and other organizations build their digital core,…
    • 3 hari lepas
  • Junior DevOps Engineer

    Involve Asia

    • Kuala Lumpur
    We are looking for a Junior DevOps Engineer with a focus on backoffice operations and support. This entry-level role is ideal for candidates with 0-2 years of experience in DevOps,…
    • 3 hari lepas