Site Reliability Engineer

Wikiloc

Descripció de la oferta

We’re hiring one Site Reliability Engineer for our Operations team. As Wikiloc keeps growing our systems have to be re-architected to relieve bottlenecks, keep data safe and secure and make the ecosystem easier to maintain and evolve. Your job will be to spot and mitigate all the things that could go wrong before they do. But you know that from time to time it’s likely that that thing will hit the fan so part of the work includes being available for on-call duty.

Our stack includes PostgreSQL, Redis, NGINX, HAProxy, Tomcat, ElasticSearch, Cloudflare, Amazon S3, postfix and all our servers are dedicated machines at Hetzner running Debian. Our provisioning, configuration and deploying tool is Ansible.

As SRE, we expect you to master the Unix toolchain, have a solid grasp on SQL and basic networking concepts, and decent programming skills capable of not just writing systems scripts but also understanding existing backend code and making the necessary changes if they impact system performance, security and stability.

For this position the following experience is relevant to us:

Scaling high-traffic web apps (host architecture/databases/caching)
Performance benchmarking and monitoring tools
Infrastructure as a code
Continuous integration
Great communication skills
Reconfiguring services on the fly
An eye for automation and instrumentation
Java in a production environment is a plus
Master’s degree in computer science is a plus
Solid experience with one or more of our areas Ansible, Redis, Elastic Search, Postgres, Nginx, Tomcat, Amazon S3, is definitely a bonus.

We have built, scaled and optimized our infrastructure ourselves with no external help other than meetings with friends at startups that have already surfed the big waves before us to exchange ideas and validate our roadmaps. We know our setup deeply and you’ll be working hand-in-hand with a team that is always happy to answer questions and share their collective wisdom. You’ll teach us and you can expect the same in return.

We started in 2006 with a single, manually set-up server. Today, our servers receive millions of requests every day and our infrastructure looks much different and more complex. We have migrated to an "infrastructure as code" architecture that spans dozens of servers and a handful of heterogeneous services, in order to fulfill the needs of 6 million users that rely on Wikiloc for their outdoor activities. Our community has collectively created 17 million trails and 30 million photos (that's over 70 TB of data!), and that number is growing very rapidly, with 700,000 new trails and 900,000 new images added to Wikiloc in the last month alone.

As SRE, here are some things you could work on:

Participate actively in the design and implementation of the next generation of Wikiloc's systems infrastructure, making it faster, more resilient and fault tolerant. This includes architecture design, capacity planning, fine-tuning, upgrades, evaluating new technologies and working closely with the backend team.
Set up and maintain a state-of-the-art monitoring system that keeps track of not just server metrics, but also custom application performance metrics.
Resolve production incidents and identify solutions that prevent that incident from happening again.
Deploy and operate new services and technologies needed by the product team.
Improve our data backup processes, to ensure we are always ready in case of disaster recovery.
Review code changes from the backend team and make recommendations on performance critical code and database queries.
Create documentation on processes, best practices and "red books" on incident response.
Provide technical assistance and support to other teams on application design and tuning, when it might affect performance, security, data safety or systems stability.

We’re looking for experienced candidates only for this role. You should have been doing this work before in a professional setting, but you also won’t know how all our systems work on day one. You will have an onboarding process with time to learn.

Also, when you apply please tell us about yourself, about what you can bring to Wikiloc and something that excites you in joining us. Explain some achievement you've been actively involved that you feel proud of.

We work on only one thing, Wikiloc, and we've been paying special attention to team efficiency and happiness for 14 years, from conceptualization and UX to support and operations. And as you will see we have very little rotation of team members. You can look forward to doing your best work and building a career here, with your direct contribution to continuing to deliver great products to millions of people while having fun together along the way.

Jornada laboral

Flexible

Com aplicar?

jobs@wikiloc.com

Aplica a aquesta oferta Subscriu-te al nostre butlletí

Comparteix aquesta oferta amb algú a qui li pot ser útil:

Teletreballa.com

info@teletreballa.com

Visita Teletrabajo.jobs

Teletreballa.com

Site Reliability Engineer

Wikiloc

Descripció de la oferta

Jornada laboral

Com aplicar?

Altres ofertes que et poden interessar

Programador/a Big Data Scala Spark

Data Engineer + Cloud + Spark

Tècnica o tècnic dimplantacions

Python Odoo developer

UI Developers

Auxiliars administratius/ves

Teletreballa.com