Job Responsibilities:
I. Cloud Account Management
- Apply/terminate cloud accounts for the company.
- Account recharge.
- Allocation of sub-accounts and corresponding permissions.
- Management of AccessKeys.
- Management of MongoDB Atlas accounts, including associating billing with game cloud accounts, managing AccessKeys, and managing organizations/projects/members.
II. Online Service (Games) Daily Operation and Maintenance
- Integration with company's security and handling security tickets.
- Certificate management: Alerting for expiring certificates and renewing them.
- Domain management: Management of all domains and DNS resolution.
- Network planning and management: Complying with network planning specifications, establishing VPCs and Subnets as needed, maintaining necessary security group configurations, and establishing connections between VPCs as needed.
- Monitoring and alerting for cloud resources such as servers, disks, networks, databases, message queues, load balancers, Kubernetes, etc.
- Log management: Querying and exporting critical service logs.
- Cost optimization: Collaborating with cost optimization team and development team to optimize the usage cost of cloud resources.
- Upgrading, scaling, and adjusting Online Service (Games) and related cloud resources.
III. Game Daily Operation and Maintenance
- Purchase and termination of cloud resources: Purchasing/terminating cloud resources for games as needed (e.g., purchasing servers for games, setting up tlogs, or for game-specific purposes).
- Monitoring and alerting for game cloud resources (servers, disks, networks, databases, etc.).
- DBA-related tasks: Collaborating with game operations to manage game data, including querying or exporting data as needed, database splitting, game merging or splitting, etc.
- Cost optimization: Collaborating with IEGG cost optimization team, PGOS development team, and game operations team to optimize the usage cost of cloud resources.
Job Requirements:
- Bachelor's degree or above in Computer Science or related field.
- Familiarity with Linux development environment, proficient in at least one scripting programming (shell/python, etc.), capable of writing tools or systems to improve operational efficiency.
- Experience in at least one public cloud (Tencent Cloud/AliCloud/AWS, etc.) operation is a plus.
- Experience in operating large-scale online services or platforms is a plus, with good analytical and troubleshooting skills for difficult problems.
- Deep understanding of distributed systems, familiarity with commonly used open-source components in the Internet industry (nginx, redis, kafka, mysql, mongodb, k8s, etc.).
- Detail-oriented, able to strictly follow operational procedures and regulations, with good ownership, service awareness, and teamwork spirit, proactive thinking and self-driven, keen risk awareness, and good risk identification capabilities.
- Good documentation skills, timely documenting technical documents and operational solutions, adept at summarizing and sharing internally and externally.