System Analyst, Digital Operations Center (DOC)
Employment Type: Full-Time
Industry: Advertising/Marketing/Public Relations
This role will be responsible for data analysis and correlating events for predictive and preventative controls, creating logic to pinpoint the root cause of an outage or service degradation and configuring automated alerting.
What you’ll do
§ Employs production monitoring tools to ensure that all systems & applications are running
§ Analyze environment capacity and alert thresholds ensuring thresholds are accurately configured with evolving traffic and workloads.
§ Provide product and system availability metrics and identify SLA violations.
§ Actively analyze historical monitoring and incident data correlating issues to quantify problems, with escalation to the problem management team
§ Able to independently resolve incidents to service restoration within established service level objectives.
§ Demonstrate advanced troubleshooting capabilities narrowing the investigation of incidents that aren’t “Standard” and reducing time to restoration
§ Ensures all incidents are properly annotated closed within and across shifts.
§ Make recommendations to improve the key NOC process areas: event management, incident management and problem management.
§ Carry out routine administrative tasks: e.g. deployments, adding capacity, implementing changes (RFCs)
§ College graduate or at least 4 years of relevant work experience.
§ Systematic problem solving approach, coupled with a strong sense of ownership and drive.
§ Hands-on experience in UNIX/Linux systems, scripting (Bash, Ruby, Python) and networking concepts.
§ Experience deploying and managing applications using a public cloud provider.
More about you
§ Maintain Principal Relationships with all technology-related staff with the purpose of fulfilling monitoring, reporting, and incident management responsibilities for enterprise-wide systems.
§ Maintains close relationships with staff involved in all levels of load testing, batch job processing, and network operations activities
§ Experience with New Relic Query Language (NRQL)
§ Knowledge of Selenium , Ansible, Chef, Python, Lambda, API’s.
§ Experience managing application logs, creation of alerts and maintaining production systems.
Loading some great jobs for you...