Cloud Engineer - AI ML
Accenture Lihat semua pekerjaan
- Kuala Lumpur
- Tetap
- Sepenuh masa
- Own complex incidents end‑to‑end: triage, reproduce, diagnose, and resolve issues for AI/ML products; maintain transparent customer communication and accurate case records.
- Response, diagnosis, resolution and tracking by phone, email and chat of customer support queries.
- Maintain response and resolution speed as defined by SLOs.
- Keep high customer satisfaction scores and follow quality standards in 90% of cases.
- Assist and respond to consults from other technical support representatives through existing systems and tools.
- Use existing troubleshooting tools and techniques to establish root cause for queries and provide a customer facing root cause assessment.
- Understand business impact of customer issue reports and follow internal issue prioritization guidelines, provide justification on priority for a given single customer report.
- Perform internal classification queries documenting classes of problems and preventative actions for further retroactive analysis.
- Reactively (e.g. as a result of a query) file issue reports to Google engineers, collaborate with Google engineers to diagnose customer issues, build documentation, procedures, document desired behavior and/or steps to reproduce, and suggest code-level resolutions for complex product bugs, assist engineers to drive bugs to resolution.
- Perform community management tasks as needed by the business.
- Promptly and independently resolve technical incidents and escalations, with effective communication to all stakeholders internally and externally, so that no monitoring is needed by Google engineers.
- Take cases involving customer-specific requirements on architectural design, provide solutions limited to a particular product (or a subset of product features).
- Community contributions: solutions posts, FAQs, and guidance on best practices for AI/ML deployments and responsible AI usage.
- Introduction/AutoML: dataset ingestion, labeling, AutoML training failures, metric drift, imbalance handling.
- Notebooks: environment provisioning, dependency/runtime conflicts, GPU/TPU access, kernel issues.
- AI Vector Search: index build latency, recall/precision tuning, ANN configuration, embedding mismatches.
- Pipelines: DAG orchestration failures, component contract issues, artifact lineage, caching.
- Prediction (Online/Batch): endpoint scaling, model versioning, cold‑start latency, batch job retries.
- Training: hyperparameter tuning, distributed training, accelerator utilization, checkpointing.
- Model Registry: version promotion policies, metadata integrity, rollback flows.
- Managed Datasets: schema evolution, governance, access control.
- Explainable AI: feature attributions, baselines, compliance requests.
- Feature Store: ingestion latency, online/offline store consistency, backfills.
- LLMs & GenAI Introduction: prompt engineering pitfalls, safety filters, quota/latency.
- Vertex AI Gemini: model selection, context window sizing, tool‑use function calling, grounding.
- Vertex AI Search & Conversation: data connectors, retrieval quality, schema/FAQ ingestion.
- Discovery AI Retail Search: relevance tuning, synonym/attribute mapping, cold‑start catalogue issues.
- Vertex Gen AI Studio: prototype to production handoff, evaluation harnesses.
- Vertex Model Garden: model availability, versioning, licenses, tuning envelopes.
- Dialogflow ES/CX: intent/flow design, session state, webhook reliability, NLU regression.
- CCAI Platform / CCaaS: telephony integration, routing, agent desktop, compliance.
- CCAI Insights: transcript accuracy, sentiment, redaction, analytics pipelines.
- Contact Center AI (General): deployment patterns, multichannel orchestration.
- Speech‑to‑Text / Text‑to‑Speech: language/acoustic models, latency, accuracy, voice settings.
- Agent Assist: suggestion quality, knowledge base integration, real‑time performance.
- Healthcare Data Engine (HDE): FHIR mapping, interoperability, privacy controls.
- Document AI: processor selection, field extraction accuracy, batch throughput.
- Vision API: model outputs, rate limits, edge cases, dataset curation.
- Technical Support Experience (L2/L3) for cloud AI/ML platforms, with proven incident ownership, RCA delivery, and cross‑functional collaboration.
- Troubleshooting & Analysis: proficiency with logs, metrics, tracing; ability to interpret model artifacts, pipeline steps, and service quotas.
- Communication: customer‑friendly RCA and escalation narratives; ability to handle sensitive, high‑impact scenarios.
- Language: Mandarin B2 (CEFR) mandatory; English professional working proficiency.
- 2-6 years of experience on google cloud or any cloud platform such as AWS or Azure
- AutoML, Notebooks, Pipelines, Vector Search, Training/Prediction (online/batch), Model Registry, Managed Datasets, Explainable AI, Feature Store.
- Gemini family on Vertex AI; Search & Conversation; Discovery AI Retail Search; Gen AI Studio; Model Garden (model selection, safety, evaluation).
- Dialogflow ES/CX design and troubleshooting; CCAI Platform/CCaaS integrations; CCAI Insights; STT/TTS; Agent Assist.
- HDE (FHIR/health data), Document AI processors, Vision API.
- Google Cloud Professional ML Engineer, Professional Cloud Architect/Developer, Data Engineer; Dialogflow/CCAI badges; Responsible AI training.
- Relevant third‑party: conversational design, speech technologies, healthcare data standards.