work-from-anywhere

SITE RELIABILITY ENGINEER JOBS

,,,,,,,,,,,,
Supabase

Site Reliability Engineer: Postgres

0 days ago Apply
Description

Supabase is an Open Source and fully remote company building developer tools for databases.

We are seeking an experienced SRE to manage the infrastructure of our Postgres databases. We currently manage over 1M Postgres instances and are growing fast.

You will:

  • Help build the Supabase Postgres offering.

  • Focus on improving the reliability of database backups and recovery

  • Implement high availability with minimal downtime failover

  • Help operationalize database management for our users by implementing maintenance windows, blue-green deployments as part of database upgrades, etc.

  • Help users self serve debug their databases by improving database observability

  • Improve the performance of provisioned Postgres databases and expose knobs for our users to further tune their database performance

  • Improve our system architecture to reduce costs while balancing security and performance.

  • Design CI/CD systems to speed up deployments with proper change and release management processes.

  • Escalated storage support tickets and sharing the on-call responsibility for the storage service.

You are:

  • Experience in designing multi-tenant database solutions, designing for failover, fault-tolerance, and disaster recovery

  • Experience with orchestrating stateful workloads at scale or having used a Postgres operator like the ones from Zalando or Crunchy is a plus

  • Experience with tools in the Postgres ecosystem like pgbackrest, barman, Patroni, Stolon, etc

  • 5+ years experience in SRE/DevOps/Cloud Infrastructure

  • 3+ years of experience in building with Golang

  • Experience in managing large deployments on AWS

  • Knowledge of networking

  • Experience with Infrastructure as Code tools

We offer:

  • 100% remote work from anywhere in the world. No location-based adjustment to your salary.

  • ESOP (equity ownership in the company)

  • Autonomous work. We work collaboratively on projects, but you set your own pace.

  • Health, Vision and Dental benefits. Supabase covers 100% of the cost for employees and 80% for dependents

  • Generous Tech Allowance for any office setup you need

  • Annual Education Allowance

  • Annually run off-sites.

 

About the team

  • We're a startup. It's unstructured.

  • Collectively founded more than 30 startups.

  • Globally distributed team with more than 30 different nationalities.

  • We deeply believe in the efficacy of collaborative open source. We support existing communities and tools, rather than building "yet another xx".

  • We "dogfood" everything. If you use it in your project, we use it in Supabase.

Process

  • The entire process is fully remote and all communication will happen over email or via video chat.

  • Once you've submitted your application, the team will review your submission and may reach out for a short screening interview over a video call.

  • If you pass the screen you will be invited to up to four follow-up interviews.

  • The calls:

    • usually take between 20-45 minutes each depending on the interviewer.

    • most of the time, are all 1:1.

    • will be with the founders, a member of either the growth or engineering team (depending on the role) and usually one other person from your immediate team or function.

  • Once the interviews are over, the team will meet to discuss several roles and candidates and may:

    • ask one or two follow-up questions over email or a quick call.

    • go directly to making an offer.

Canonical

Senior Site Reliability Engineer

0 days ago Apply
Description <p>Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.</p> <p>The company is founder led, profitable and growing.</p> <h2><strong>We are hiring a Senior Site Reliability Engineer</strong></h2> <p>Next-gen operations at scale, with pure Python infra-as-code, from bare metal to containers and applications. Our goal is to perfect enterprise infrastructure devops.</p> <p>We run hundreds of private cloud, Kubernetes, and application clusters for customers across physical and public cloud estate, and we are raising the bar on what's possible with automation by embracing a universal operator pattern and model-driven operations.</p> <p>To succeed in this role you need to believe in automation as a pure software engineering problem, not a hack-it-till-it-works-for-me problem. You need to be interested in the scientific approach to operations at scale, driven by metrics and code, and you need to be able to learn the entire stack, from bare metal networking and kernel up to serverless and open source applications.</p> <p>Location: Globally remote role</p> <h3><strong>The role entails</strong></h3> <p>Our cloud operations engineers bring Python software-engineering skills and rigour to the operations domain. We practise devsecops from bare metal to application. We architect and run OpenStack, Kubernetes and software defined storage, and we enable devsecops for applications running on that infrastructure too.</p> <p>To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for global brand name customers.</p> <p>As a member of the team you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure. We drive upgrades to keep our customers on the latest, best solutions.</p> <h3><strong>What we are looking for in you</strong></h3> <ul> <li>Degree in Software Engineering or Computer Science</li> <li>Experience with Linux and familiarity with Linux networking and storage</li> <li>Python software development expertise</li> <li>Operational experience</li> <li>Excellent interpersonal skills, curiosity, flexibility, and accountability</li> <li>Ability to travel internationally twice a year, for company events up to two weeks long</li> </ul> <h3><strong>Nice-to-have skills</strong></h3> <ul> <li>Experience with OpenStack or Kubernetes deployment or operations</li> <li>Familiarity with public or private cloud management</li> </ul> <h2><strong>What we offer colleagues</strong></h2> <p>We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more often for graduates and associates) to ensure we recognise outstanding performance. In addition to base pay, we offer a performance-driven annual bonus or commission. We provide all team members with additional benefits, which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.</p> <ul> <li>Distributed work environment with twice-yearly team sprints in person</li> <li>Personal learning and development budget of USD 2,000 per year</li> <li>Annual compensation review</li> <li>Recognition rewards</li> <li>Annual holiday leave</li> <li>Maternity and paternity leave</li> <li>Employee Assistance Programme</li> <li>Opportunity to travel to new locations to meet colleagues</li> <li>Priority Pass, and travel upgrades for long haul company events</li> </ul> <h2><strong>About Canonical</strong></h2> <p>Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence - in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004.​ Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.</p> <h3><strong>Canonical is an equal opportunity employer</strong></h3> <p>We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. <a href="https://canonical.com/careers/diversity/identity">Whatever your identity, we will give your application fair consideration.</a></p> <p>&nbsp;#LI-Remote</p> <p><br><br></p> <div class="SnapLinksContainer" style="margin-left: 0px; display: none;"> <div class="SL_SelectionRect" style="top: 322.6px; left: 726px; height: 0.400002px; width: 0px;"> <div class="SL_SelectionLabel">&nbsp;</div> </div> </div>
Canonical

Senior Site Reliability / Gitops Engineer

0 days ago Apply
Description <p>Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.</p> <p>The company is founder-led, profitable, and growing.</p> <p>We are hiring a&nbsp;<strong>Senior Site Reliability / Gitops Engineer</strong> to our Information Systems (IS) team. <span style="font-weight: 400;">This role is an opportunity for an “automation-first” senior technologist</span><span style="font-weight: 400;"> with a passion for Linux to build a career with Canonical and drive the success with those leveraging Ubuntu and open source products.&nbsp; If you have experience of IT operations automation, Infrastructure as Code and a passion for technology, then you will enjoy working with some of the best people in the industry at Canonical.</span></p> <h2><strong>Job Summary</strong></h2> <p><span style="font-weight: 400;">The IS team at Canonical supports and maintains all of Canonical’s IT production services. The team is in charge of running services used by over 60 million Ubuntu users.</span></p> <p><span style="font-weight: 400;">As an Senior SRE &amp; Gitops engineer you’ll be in a unique position to drive operations automation to the next level, both in our own private clouds as well as in the public clouds. We do this by utilizing the best of open source infrastructure as code software, software development practices such as CI/CD pipelines, and Canonical’s leading products for software operation automation.</span></p> <p><span style="font-weight: 400;">In addition to defining the infrastructure as code, you will improve Canonical products and the open-source technologies they’re based on by providing critical feedback to developers on how their products operate at scale. This is done by submitting bugs (and sometimes writing pull requests) and collaborating on design and implementations with other teams within the company.</span></p> <p><span style="font-weight: 400;">You’ll be part of a global team of SREs that work together and support each other to provide the best possible services to our company, Canonical’s customers and the Ubuntu Community.</span></p> <h2><strong>As a Senior Site Reliability / Gitops Engineer you will</strong></h2> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">Drive the development of automation, Gitops in your team as an embedded tech lead</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Closely collaborate with the IS architect to align your solutions with the IS architecture vision</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Design and architect services that IS can offer to the organization as products</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Apply your experience of IaC to develop infrastructure as code practice within IS by constantly increasing automation and improving IaC processes</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Develop skills in troubleshooting, capacity planning, and performance investigation, Setting up, maintaining and using observability tools such as Prometheus, Grafana, and Elasticsearch; design, implement and maintain monitoring and alerting for various systems and services</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Provide assistance and work with globally distributed engineering, operations, and support peers</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Be given uninterrupted development time to focus on larger projects and automation of manual tasks</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Share your experience, know-how and best practices with other team members in design sessions, mentorship and ‘doing work together’</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Carry final responsibility for time-critical escalations</span></li> </ul> <h2><span style="font-weight: 400;">What we are looking for in you</span></h2> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">A modern view on hosting architecture, driven by infrastructure as code across both private and public clouds.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">A product mindset thriving to develop products rather than solutions.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Python software development experience, with large projects</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Experience working with Kubernetes or other container orchestration systems.</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Proven exposure to manage and deploy cloud infrastructure with code.&nbsp;&nbsp;</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Practical knowledge of Linux networking, routing, and firewalls</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Affinity with various forms of Linux storage, from Ceph to Databases</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Hands-on experience administering enterprise Linux servers</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Extensive knowledge of cloud computing concepts and technologies</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Bachelor's degree or greater, preferably in computer science or related engineering field</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Able to communicate clearly and effectively in English over email, chat, video or voice calls and in-person</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Motivated and able to troubleshoot from kernel to web, and willing to ask others when appropriate</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">A willingness to be flexible and able to learn new things quickly</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Be inspired by the needs of fast-changing environments</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Happy to work within distributed teams</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Be passionate and familiarized about open-source, especially Ubuntu or Debian</span></li> </ul> <h1>What we offer colleagues</h1> <p>We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more often for graduates and associates) to ensure we recognize outstanding performance. In addition to base pay, we offer a performance-driven annual bonus or commission. We provide all team members with additional benefits which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.</p> <ul> <li>Distributed work environment with twice-yearly team sprints in person</li> <li>Personal learning and development budget of USD 2,000 per year</li> <li>Annual compensation review</li> <li>Recognition rewards</li> <li>Annual holiday leave</li> <li>Maternity and paternity leave</li> <li>Team Member Assistance Program &amp; Wellness Platform</li> <li>Opportunity to travel to new locations to meet colleagues</li> <li>Priority Pass and travel upgrades for long-haul company events</li> </ul> <h1>About Canonical</h1> <p>Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open-source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence; in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since our inception in 2004.​ Working here is a step into the future and will challenge you to think differently, work smarter, learn new skills, and raise your game.</p> <p><span style="font-weight: 400;">We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. </span><a href="https://canonical.com/careers/diversity/identity"><span style="font-weight: 400;">Whatever your identity, we will give your application fair consideration.</span></a></p> <p><span style="font-weight: 400;">#LI-remote&nbsp;</span></p> <p>&nbsp;</p>
Canonical

Site Reliability Engineer

0 days ago Apply
Description <p>Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.</p> <p>The company is founder led, profitable and growing.</p> <h2><strong>We are hiring a Site Reliability Engineer</strong></h2> <p>Next-gen operations at scale, with pure Python infra-as-code, from bare metal to containers and applications. Our goal is to perfect enterprise infrastructure devops.</p> <p>We run hundreds of private cloud, Kubernetes, and application clusters for customers across physical and public cloud estate, and we are raising the bar on what's possible with automation by embracing a universal operator pattern and model-driven operations.</p> <p>To succeed in this role you need to believe in automation as a pure software engineering problem, not a hack-it-till-it-works-for-me problem. You need to be interested in the scientific approach to operations at scale, driven by metrics and code, and you need to be able to learn the entire stack, from bare metal networking and kernel up to serverless and open source applications.</p> <p>Location: Globally remote role</p> <h3><strong>The role entails</strong></h3> <p>Our cloud operations engineers bring Python software-engineering skills and rigour to the operations domain. We practise devsecops from bare metal to application. We architect and run OpenStack, Kubernetes and software defined storage, and we enable devsecops for applications running on that infrastructure too.</p> <p>To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for global brand name customers.</p> <p>As a member of the team you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure. We drive upgrades to keep our customers on the latest, best solutions.</p> <h3><strong>What we are looking for in you</strong></h3> <ul> <li>Degree in Software Engineering or Computer Science</li> <li>Experience with Linux and familiarity with Linux networking and storage</li> <li>Python software development expertise</li> <li>Operational experience</li> <li>Excellent interpersonal skills, curiosity, flexibility, and accountability</li> <li>Ability to travel internationally twice a year, for company events up to two weeks long</li> </ul> <h3><strong>Nice-to-have skills</strong></h3> <ul> <li>Experience with OpenStack or Kubernetes deployment or operations</li> <li>Familiarity with public or private cloud management</li> </ul> <h2><strong>What we offer colleagues</strong></h2> <p>We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more often for graduates and associates) to ensure we recognise outstanding performance. In addition to base pay, we offer a performance-driven annual bonus or commission. We provide all team members with additional benefits, which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.</p> <ul> <li>Distributed work environment with twice-yearly team sprints in person</li> <li>Personal learning and development budget of USD 2,000 per year</li> <li>Annual compensation review</li> <li>Recognition rewards</li> <li>Annual holiday leave</li> <li>Maternity and paternity leave</li> <li>Employee Assistance Programme</li> <li>Opportunity to travel to new locations to meet colleagues</li> <li>Priority Pass, and travel upgrades for long haul company events</li> </ul> <h2><strong>About Canonical</strong></h2> <p>Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence - in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004.​ Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.</p> <h3><strong>Canonical is an equal opportunity employer</strong></h3> <p>We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. <a href="https://canonical.com/careers/diversity/identity">Whatever your identity, we will give your application fair consideration.</a></p> <p>&nbsp;#LI-Remote</p> <p><br><br></p>
Canonical

Site Reliability / Gitops Engineer

0 days ago Apply
Description <p>Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.</p> <p>The company is founder-led, profitable, and growing.</p> <p>We are hiring a&nbsp;<strong>Site Reliability / Gitops Engineer</strong> to our Information Systems (IS) team. <span style="font-weight: 400;">This role is an opportunity for an “automation-first” senior technologist</span><span style="font-weight: 400;"> with a&nbsp;</span><span style="font-weight: 400;">passion for Linux to build a career with Canonical and drive the success with those leveraging Ubuntu and open source products. &nbsp;<span style="font-weight: 400;">If you have experience of IT operations automation, Infrastructure as Code and a passion for technology, then you will enjoy working with some of the best people in the industry at Canonical.</span><br></span></p> <h2>Job Summary</h2> <p><span style="font-weight: 400;">The IS team at Canonical supports and maintains all of Canonical’s IT production services. The team is in charge of running services used by over 60 million Ubuntu users.</span></p> <p><span style="font-weight: 400;">As an SRE &amp; Gitops engineer you’ll be in a unique position to drive operations automation to the next level, both in our own private clouds as well as in the public clouds. We do this by utilizing the best of open source infrastructure as code software, software development practices such as CI/CD pipelines, and Canonical’s leading products for software operation automation.</span></p> <p><span style="font-weight: 400;">In addition to defining the infrastructure as code, you will improve Canonical products and the open-source technologies they’re based on by providing critical feedback to developers on how their products operate at scale. This is done by submitting bugs (and sometimes writing pull requests) and collaborating on design and implementations with other teams within the company.</span></p> <p><span style="font-weight: 400;">You’ll be part of a global team of SREs that work together and support each other to provide the best possible services to our company, Canonical’s customers and the Ubuntu Community.</span></p> <h2>As a Site Reliability / Gitops Engineer engineer you will</h2> <ul> <li style="font-weight: 400;">Apply your experience of IaC to develop infrastructure as code practice within IS by constantly increasing automation and improving IaC processes</li> <li style="font-weight: 400;">Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems</li> <li style="font-weight: 400;">Develop new features and improve the resilience and scalability of the existing cloud and container portfolio at Canonical</li> <li style="font-weight: 400;">Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure</li> <li style="font-weight: 400;">Develop skills in troubleshooting, capacity planning, and performance investigation, Setting up, maintaining and using observability tools such as Prometheus, Grafana, and Elasticsearch; design, implement and maintain monitoring and alerting for various systems and services</li> <li style="font-weight: 400;">Collaborate with development teams to design service architecture, documentation, playbooks, policies and operational procedures</li> <li style="font-weight: 400;">Provide assistance and work with globally distributed engineering, operations, and support peers</li> <li style="font-weight: 400;">Be given uninterrupted development time to focus on larger projects and automation of manual tasks</li> <li style="font-weight: 400;"><span style="font-weight: 400;"><span style="font-weight: 400;">Share your experience, know-how and best practices with other team members in design sessions, mentorship and ‘doing work together’</span></span></li> <li style="font-weight: 400;">Carry final responsibility for time-critical escalations</li> </ul> <h2><span style="font-weight: 400;">What we are looking for in you</span></h2> <ul> <li style="font-weight: 400;"><span style="font-weight: 400;">A deep experience of, and knowledge to define operations in code, using version control, peer review and CI/CD to roll out changes both to applications and infrastructure</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Strong modern engineering background (peer-review, unit testing, SCM, CI/CD, Agile)</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Python software development experience, with large projects</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Practical knowledge of Linux networking, routing, and firewalls</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Affinity with various forms of Linux storage, from Ceph to Databases</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Hands-on experience administering enterprise Linux servers</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Extensive knowledge of cloud computing concepts and technologies</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Bachelor's degree or greater, preferably in computer science or related engineering field</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Able to communicate clearly and effectively in English over email, chat, video or voice calls and in-person</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Motivated and able to troubleshoot from kernel to web, and willing to ask others when appropriate</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">A willingness to be flexible and able to learn new things quickly</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Be inspired by the needs of fast-changing environments</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Happy to work within distributed teams</span></li> <li style="font-weight: 400;"><span style="font-weight: 400;">Be passionate and familiarized about open-source, especially Ubuntu or Debian</span><span style="font-weight: 400;"><br></span></li> </ul> <h1>About Canonical</h1> <p>Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open-source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence; in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since our inception in 2004.​ Working here is a step into the future and will challenge you to think differently, work smarter, learn new skills, and raise your game.</p> <p><strong>Canonical is an equal opportunity employer</strong></p> <p><span style="font-weight: 400;">We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. </span><a href="https://canonical.com/careers/diversity/identity"><span style="font-weight: 400;">Whatever your identity, we will give your application fair consideration.</span></a></p> <p><span style="font-weight: 400;">#LI-remote&nbsp;</span></p> <p>&nbsp;</p>
CloudLinux

Senior Site Reliability Engineer (SRE) for Release Engineering (remote-only)

0 days ago Apply
Description

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and "doing work together".

Requirements

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.

Benefits

What's in it for you?

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

CloudLinux

Senior Site Reliability Engineer (SRE) for Release Engineering (remote-only)

0 days ago Apply
Description

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and "doing work together".

Requirements

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.

Benefits

What's in it for you?

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

CloudLinux

Senior Site Reliability Engineer (SRE) for Release Engineering (remote-only)

0 days ago Apply
Description

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and "doing work together".

Requirements

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.

Benefits

What's in it for you?

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

CloudLinux

Senior Site Reliability Engineer (SRE) for Release Engineering (remote-only)

0 days ago Apply
Description

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and "doing work together".

Requirements

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.

Benefits

What's in it for you?

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

CloudLinux

Senior Site Reliability Engineer (SRE) for Release Engineering (remote-only)

0 days ago Apply
Description

CloudLinux is looking for a brilliant Senior Site Reliability Engineer (SRE) to join the Release Engineering Department, a team that plays a critical role in maintaining both external and internal infrastructure related to package repositories, with a strong focus on delivering and managing repository distribution to users.

This role offers a unique opportunity to collaborate with multiple development teams, accelerate progress, and provide enterprise-level solutions globally. Responsibilities include Linux OS administration, designing system solutions at an architectural level, advancing cloud technologies, system programming, Python/Linux scripting, and working with virtualization. This is a remote position best suited for professionals located in Europe and CIS, as the team primarily operates within European time zones.

As our Senior Site Reliability Engineer, you will:

  • Design, implement, and manage scalable, resilient, and secure wide company repository infrastructure for CloudLinux products as a first assignment.
  • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems.
  • Monitor system performance and troubleshoot issues proactively to ensure optimal uptime and reliability.
  • Automate deployment processes using Infrastructure as Code (IaC) principles.
  • Share your experience, know-how, and best practices with other team members in design sessions, system architecture discussions, mentorship, and "doing work together".

Requirements

To be successful, you should have:

  • Strong background in development: an ideal candidate had started a career as a developer, then rolled to infrastructure-based projects on a large scale. 
  • Proven experience as a leading SRE or in a similar role, with a strong focus on Linux environments.
  • Proficiency in modern agile SDLC practices and principles, orchestration, and CI/CD tooling i.e. Python, Java, Terraform, Ansible, Cloudformation, Puppet, Chef, or similar.
  • Knowledge of the Grafana ecosystem or similar, building dashboards, alert rules, PromQL, as well as frontend observability.
  • Excellent technical knowledge of IT Infrastructure, including network and application load balancers, switches, routers, and IP addressing.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
  • English: at least Intermediate level required.

Benefits

What's in it for you?

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

Scroll

Senior / Staff Site Reliability Engineer

0 days ago Apply
Description <div class="content-intro"><p>Scroll is a Layer 2 scaling solution for Ethereum, specifically focusing on zkRollups. Key aspects of Scroll are zkRollup technology, Scalability, Efficiency, Security, and Developer-friendly. Overall, Scroll plays a crucial role in addressing Ethereum's scalability challenges and facilitating the growth of decentralized finance (DeFi) and other blockchain-based applications by providing a scalable and efficient Layer 2 solution.&nbsp;</p></div><h2><strong>Position Overview</strong></h2> <p>We are looking for a Senior or Staff Site Reliability Engineer to lead the design, implementation, and management of our infrastructure and development operations to ensure the best reliability, security, and scalability. You will work closely with our development team to build and maintain automated deployment pipelines, monitor and analyze system performance, and identify and resolve issues before they impact our users. You are expected to become SRE team lead after 3 month probation period.</p> <p>This is a dynamic role in a fast-paced blockchain environment, ideal for someone embracing ownership and autonomy to grow with us.</p> <h2><strong>Responsibilities</strong></h2> <ul> <li><strong>Platform Engineering &amp; Developer Enablement</strong> <ul> <li>Design, build, and maintain internal developer tools to improve developer lifecycle, including building, testing, and deploying.</li> <li>Create tools that streamline developer workflows, including monitoring, logging, and debugging utilities.</li> </ul> </li> <li><strong>Infrastructure &amp; System Architecture</strong> <ul> <li>Design, provision, and maintain cloud environments focused on scalability, reliability, and security.</li> <li>Automate deployment and maintenance processes ensuring seamless integration and rapid iteration.</li> </ul> </li> <li><strong>Reliability, Monitoring &amp; Security</strong> <ul> <li>Implement observability solutions to gain actionable insights, enhance performance, and ensure high availability of blockchain services.</li> <li>Work closely with the security team to harden infrastructure and mitigate potential threats.</li> <li>Operate and maintain a fleet of hundreds of GPU-based zk provers. Track prover health, detect failures, and optimize performance in real time.</li> </ul> </li> </ul> <h2><strong>Requirements</strong></h2> <ul> <li>5+ years of experience as a DevOps, Infrastructure, Site Reliability or Cloud Engineer</li> <li>3+ years of experience as Backend Developer</li> <li>Familiarity with hybrid cloud environments (AWS, Azure, GCP,&nbsp;etc.) and the ability to design, provision, and maintain them securely and efficiently.</li> <li>Good at any modern programming language (Go, Rust, Python). You need to be a good programmer for custom tooling.</li> <li>Linux&nbsp;administration experience, from hardware optimizations to advanced OS-level configurations.</li> <li>Experience working with configuration management tools like Terraform and Ansible</li> <li>Experience working with containers and using them in production systems</li> <li>Self-motivated individual with enthusiasm for learning and building things</li> <li>Collaborative, communicative, and confident in their abilities to work well with all team members at all seniority and skill levels</li> </ul> <h2><strong>Preferred Qualifications</strong></h2> <ul> <li>Understand system architecture and business</li> <li>Previous experience as a platform engineer</li> <li>Previous experience as a tech lead</li> <li>Previous experience with Kubernetes, microservices, and GitOps tooling</li> <li>Previous experience in a blockchain company</li> <li data-stringify-indent="0" data-stringify-border="0">Previous experience in optimizing blockchain specific infrastructure</li> </ul> <p>&nbsp;</p> <p>&nbsp;</p><div class="content-conclusion"><h4><strong data-stringify-type="bold">What We Offer</strong></h4> <ul class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Mission-Driven, Collaborative, and Innovative Environment:&nbsp;Join a team united by a shared vision, working with like-minded individuals and cutting-edge technology to advance Ethereum and blockchain innovation.</li> <li data-stringify-indent="0" data-stringify-border="0">Comprehensive Compensation and Remote Flexibility: Benefit from a competitive salary package and generous discretionary benefits, while enjoying the remote work from anywhere with flexible hours. Additionally, receive support for your workspace with a home office setup allowance and monthly co-working membership stipend.</li> </ul> <ul> <li>Remote Hiring: <span class="s1">Team members </span>outside the US, UK, Canada, and Hong Kong<span class="s1"> will be engaged as </span>independent contractors<span class="s1">, with the flexibility to receive payment in </span>fiat, USDC, or other agreed-upon options<span class="s1">.</span></li> </ul> <ul class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Private Healthcare Benefits:&nbsp;Private healthcare benefits through the Employer of Record (EoR) are only available in the US, UK, Canada, and Hong Kong.</li> </ul> <p><em>Scroll is proud to be an equal opportunity workplace. We are committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. If you have a disability or special need, please let us know and we'll do our best to accommodate.</em></p> <p>&nbsp;</p></div>
Airalo

Senior Site Reliability Engineer

0 days ago Apply
Description
About Airalo
Alo! Airalo is the world’s first eSIM store that helps people connect in over 200+ countries and regions across the globe. We are building the next digital service that revolutionizes the telecom industry. We are a travel-tech company and an equal-opportunity environment that values and executes diversity, inclusion, and equity. Our team is spread across 50+ countries and six continents. What glues us together is our commitment to changing the way you connect.

About you
We hope that you care deeply about the quality of your work, the intrinsic worth of tasks, and the success of your team. You are self-disciplined and do not require micromanagement in terms of your skillset and work ethic. You do your best to flourish as an individual every day while working hard to foster a collaborative team environment. You believe in the importance of being — and staying — authentic, honest, positive, and kind. You are a good interlocutor with clear and concise communication. You are able to manage multiple projects, have an analytical mind, pay keen attention to detail, and love to get your hands dirty. You are cognizant, tolerant, and welcoming of vulnerabilities and cultural differences.

About the Role
Position: Full-time / Employee
Location: Remote-first
Benefits: Health Insurance, work-from-anywhere stipend, annual wellness & learning credits, annual all-expenses-paid company retreat in a gorgeous destination & other benefits

We are looking for an experienced Site Reliability Engineer to join our growing engineering team. We are a company that values SRE principles and practices. We believe in empowering our SREs to make data-driven decisions, automate operational tasks, and continuously improve the reliability of our systems. We foster a blameless culture where everyone is encouraged to learn from mistakes and share knowledge. If you are passionate about building and maintaining highly reliable systems, we would love to hear from you!

Responsibilities include but are not limited to:

  • Develop and maintain reliable, scalable, and efficient systems.
  • Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and improve system reliability.
  • Conduct blameless post-incident reviews to identify root causes and implement preventive measures.
  • Drive automation of operational tasks and incident response.
  • Develop and maintain runbooks and playbooks for common operational tasks and incident response.
  • Mitigate operational risks.
  • Work with software engineers to design systems for reliability, scalability, and maintainability.
  • Continuously evaluate and optimize system performance, capacity, and cost.
  • Participate in on-call rotation and be available to troubleshoot and resolve critical issues.
  • ,

    Must-haves:

  • Bachelor’s degree in Computer Engineering or a similar discipline.
  • 5+ years of experience as a Site Reliability Engineer or in a similar role.
  • 3+ years of experience with AWS services including strong knowledge of container orchestration.
  • 2+ years of Kubernetes experience
  • Deep understanding of observability principles and tools (logging, monitoring, tracing).
  • Experience with incident management and postmortem analysis.
  • Experience and interest in infrastructure as a code approach (Terraform).
  • Experience with chaos engineering and other techniques for testing system resilience.
  • Experience with CI/CD tools such as GitHub Actions.
  • Proficiency in at least one programming language (Python, Go, Java, etc.) for automation and tooling.
  • Comfortable with messaging systems (SNS, SQS, etc)
  • Ability to work independently and collaboratively in a fast-paced environment.
  • Team player and open to new ideas.
  • Good communication skills and fluency in English.
  • ,

    Good to have:

  • Prior experience with Scrum and other agile methods.
  • Certification in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or similar.
  • Experience with AI-driven SRE tools for anomaly detection and improvements
  • Contributions to open-source SRE projects or communities.
  • Prior work experience in telecommunications.
  • Knowledge of eSIM and GSMA related technologies and services.
  • If you are interested in this position, please apply via the link.

    #EMEA
    #LI-TE1


    We sincerely thank all applicants in advance for submitting their interest in this opportunity. Airalo is an equal opportunity employer and values diversity, equity & inclusion. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We are committed to providing reasonable accommodations upon request for individuals with disabilities throughout our job interview process.
    BioRender

    Software Engineer, Site Reliability (Senior or Staff)

    0 days ago Apply
    Description

    At BioRender, we’re on a mission to accelerate the world’s ability to learn, discover, and communicate science — transforming how knowledge is shared and making science open, collaborative, and easily understandable by all.

    We’re shaping the future of science communication and are looking for talented individuals to help bring this vision to life! 🚀

    As our Sr/Staff SRE in the Platform Engineering team, you'll get in on the ground floor and play a pivotal role in developing and shaping a resilient, high-performant, and secure platform for BioRender's engineering prowess. Aligned with our company mission to accelerate the world’s ability to learn, discover, and communicate science, your objective is to design, implement, and maintain robust, scalable, and fault-tolerant systems that our customers rely on. Harnessing the power of automation, CI/CD, and Infrastructure as Code, you'll seamlessly integrate and deploy our applications into the cloud while establishing observability enhanced with actionable alerts and automation to detect performance bottlenecks. You'll adeptly address production issues, promptly restore services, and lead post-mortems to continually enhance our engineering excellence, thereby fulfilling our company's vision to be the go-to trusted place where science is communicated.

    Our ideal fit:

    • You have experience working in a fast-paced, competitive environment and have a deep desire to work collaboratively, solve problems, and find win/win solutions.

    • You are passionate about architecting and building scalable platform solutions.

    • You are a results-oriented individual who takes initiative and has a strong bias for action.

    • You’re a creative thinker who finds efficient and simple solutions and evangelizes best practices.

    • You have effective communication skills, a sense of ownership and drive to consistently improve yourself and others. 

    • You’re a selfless team player who sees the big picture and puts common goals at the forefront of solutioning and decision-making.

    What you'll be doing:

    • Elevate and Innovate: Enhance platform resilience by constantly seeking ways to improve the reliability, scalability and release efficiency of the platform

    • Develop Robust Observability and Monitoring Solutions: Define, build, deploy, maintain, and extend advanced observability and monitoring tools to bolster system reliability and availability.

    • Define and Monitor Performance Metrics: Play a key role in formulating and tracking Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to establish precise benchmarks for system performance.

    • Solve Complex Issues and Conduct Root Cause Analysis: Swiftly respond to escalated incidents, troubleshoot intricate system and application problems, and conduct thorough root cause analyses to implement corrective measures.

    • Thought Leadership and Innovation: stay up to date with the latest industry trends and emerging technologies and iterate on best practices to increase the quality & velocity of development and deliverables.

    • Architect Scalable and Reliable Systems: Lead in the design and architecture of scalable, distributed, fault-tolerant systems that uphold performance and reliability standards.

    • Mentorship and Evangelism: Champion the adoption of new technologies, disseminate best practices, and advocate for architectural patterns. Mentor and guide fellow engineers in the organization.

    What you bring to the table:

    • 10-12+ years of experience in the software/DevOps/SRE realm

    • Strong programming skills in 2 or more of these languages: javascript, typescript, python, Go

    • Ability to troubleshoot complex distributed systems at scale

    • Database Performance Monitoring and best practices

    • Comfortable innovating and establishing new practices, processes, and tooling

    • Strong analytical skills, system design, and architecture for cloud applications

    • CI/CD, configuration management, monitoring, and automation expertise

    • Advanced knowledge of observability and best practices (ELK, Datadog, OpenTelemetry, Prometheus, Grafana)

    • Deployment and orchestration via AWS ECS, k8s, CloudRun etc.

    • Understanding of Linux, virtualization, networking, VPCs, firewalls, security groups

    • Hands-on knowledge of AWS and resources provisioning via CLI/API/IaC

    • Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience.

    Why join us?

    • We are mission-driven: we work collaboratively towards our shared vision of improving scientific communication and accelerating scientific discovery. BioRender figures have appeared in more than 54,000 publications! 

    • BioRender is loved by millions! We have a world-class NPS and a community of loyal fans and users in 200+ countries!

    • Our company is backed by top investors and accelerators like Y Combinator, and we are on a growth trajectory comparable to many top-performing SaaS companies 

    • We’re remote-first with team members across Canada and the U.S., offering you the flexibility to work from anywhere. 

    BioRender is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.