Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
We are looking for a Principal Engineer to define the technical direction and architecture for AI Data Infrastructure at DigitalOcean. This role will lead the design, development, and operation of services that help AI-native applications ground, retrieve, reason over, and remember data at scale. These services will power DigitalOcean’s Agentic AI and Inference customers by providing production-grade knowledge bases, vector search, hybrid retrieval, context management, memory systems, and graph-based data infrastructure.
As a Principal Engineer, you will work across engineering, product, platform, and customer-facing teams to build foundational AI data services that are reliable, performant, scalable, cost-efficient, and simple for developers to use. You should be equally comfortable setting long-term architecture, making hard technical trade-offs, mentoring senior engineers, and going deep into system design when the business depends on getting the architecture right.
We are looking for someone who can span technical strategy and hands-on execution—someone who has strong distributed systems judgment, understands database and retrieval system internals, and can turn emerging AI infrastructure patterns into durable cloud services.
What You’ll Do
- Architect and guide the implementation of high-scale, reliable, secure AI data infrastructure services for agentic and inference workloads.
- Define the technical architecture for vector databases, knowledge bases, hybrid search, semantic search, context graphs, agent memory, and retrieval orchestration.
- Make foundational decisions on indexing, storage layout, sharding, replication, caching, query execution, ranking, consistency, latency, availability, and cost-performance trade-offs.
- Design systems that support multiple retrieval patterns, including dense vector search, keyword/BM25 search, metadata filtering, reranking, graph traversal, and context-aware retrieval.
- Build and operate managed services that customers can trust for production AI workloads, including observability, SLOs, capacity planning, backups, upgrades, failover, and disaster recovery.
- Partner with product managers and engineering leaders to translate customer needs and business priorities into a clear multi-year technical roadmap.
- Collaborate with Inference, Managed Databases, Storage, Kubernetes, App Platform, IAM, and Observability teams to ensure AI data services are deeply integrated into the DigitalOcean platform.
- Identify architectural bottlenecks, scaling risks, retrieval quality gaps, operational weaknesses, and cost inefficiencies before they become customer-impacting problems.
- Establish engineering standards, design review practices, operational mechanisms, and technical decision frameworks for AI data infrastructure.
- Mentor engineers across teams and raise the bar for architectural rigor, operational excellence, systems thinking, and customer impact.
- Stay current with advances in vector databases, retrieval-augmented generation, graph databases, memory systems, embedding models, reranking, agent frameworks, and AI data management.
Key Responsibilities
Architect and Build
- Design and evolve distributed AI data systems optimized for low latency, high recall, high availability, strong operational control, and efficient unit economics.
- Lead architecture for vector indexing and retrieval systems, including ANN algorithms, HNSW-style indexes, quantization, compression, partitioning, filtering, and recall-latency trade-offs.
- Architect knowledge base infrastructure, including ingestion, chunking, embedding generation, indexing, metadata management, retrieval, reranking, evaluation, and re-indexing workflows.
- Design context management and memory systems that enable agents to persist, retrieve, summarize, and reason over relevant state across sessions and tasks.
- Evaluate when to use vector search, lexical search, relational stores, object storage, graph databases, or purpose-built retrieval layers—and design clean integration patterns across them.
- Take a hands-on technical leadership role when needed to unblock delivery, validate architecture, or guide implementation of critical systems.
Reliability, Performance, and Scale
- Own architectural mechanisms for availability, failover, durability, capacity management, tenant isolation, cost controls, and operational safety.
- Lead performance tuning across ingestion, embedding, indexing, query serving, graph traversal, reranking, and retrieval pipelines.
- Define SLOs and operational dashboards for latency, throughput, recall quality, freshness, availability, error rates, cost, and customer-visible reliability.
- Drive automation for provisioning, upgrades, scaling, monitoring, alerting, incident response, and fleet operations.
- Build systems that scale from small developer workloads to large production AI applications with billions of objects, high-dimensional vectors, high query volume, and strict latency expectations.
Technical Leadership
- Set the technical vision for AI Data Infrastructure and influence architecture across multiple teams.
- Lead design reviews and author technical proposals that clarify trade-offs, risks, sequencing, and long-term platform implications.
- Establish standards for service design, APIs, data modeling, observability, operational readiness, testing, and production excellence.
- Mentor senior and staff engineers, helping them make better architectural decisions and operate with higher technical judgment.
- Create a culture where engineers understand not only how a system works, but why the design is correct for the customer and business.
Cross-functional Collaboration
- Work with product, engineering, design, sales engineering, support, and go-to-market teams to understand customer problems and convert them into scalable platform capabilities.
- Partner with customer-facing teams on architecture patterns for AI-native applications, retrieval-augmented generation, agentic workflows, and enterprise knowledge systems.
- Translate complex technical concepts into clear guidance for executives, product leaders, engineering teams, and customers.
- Help define migration and adoption paths for customers moving from self-managed vector databases, custom RAG pipelines, fragmented knowledge stores, or prototype agent memory systems to DigitalOcean-managed services.
Innovation and Future Roadmap
- Research and evaluate emerging technologies in vector databases, graph databases, AI memory, context engineering, retrieval evaluation, multimodal indexing, and agent data infrastructure.
- Identify which capabilities DigitalOcean should build, partner for, or integrate from open source.
- Build durable platform primitives rather than one-off features, ensuring DigitalOcean’s AI data services remain simple, composable, open, and cost-effective.
- Drive the evolution from basic retrieval infrastructure toward intelligent data systems that help agents learn, remember, and improve over time.
Key Metrics
- Availability, latency, durability, and operational health of AI data services.
- Retrieval quality, freshness, recall, precision, and reranking effectiveness.
- Time to ingest, index, re-index, and make customer data queryable.
- Cost efficiency across storage, memory, compute, indexing, and query serving.
- Customer adoption of knowledge bases, vector search, hybrid retrieval, and agent memory capabilities.
- Engineering velocity, architectural clarity, and reduction of operational toil.
- Successful integration with DigitalOcean Inference, Managed Databases, Storage, Kubernetes, and App Platform services.
What You’ll Add to DigitalOcean
- 12+ years of experience designing and building distributed systems, databases, storage systems, search infrastructure, data platforms, or cloud infrastructure at scale.
- Deep technical expertise in vector databases, search systems, database internals, or distributed data infrastructure.
- Strong understanding of vector indexing, ANN search, hybrid search, semantic search, metadata filtering, reranking, query planning, storage engines, caching, replication, and high availability.
- Experience designing or operating production-grade services for AI, data, search, analytics, databases, or retrieval-heavy workloads.
- Familiarity with knowledge base systems, retrieval-augmented generation, embedding pipelines, chunking strategies, context windows, memory systems, and agentic AI application patterns.
- Experience with graph databases, knowledge graphs, context graphs, or graph-based retrieval is strongly preferred.
- Strong systems architecture judgment, including the ability to reason through consistency, latency, availability, durability, cost, scale, and operational trade-offs.
- Hands-on experience with cloud-native infrastructure, Kubernetes, observability systems, infrastructure as code, CI/CD, and production operations.
- Fluency in one or more backend systems languages such as Go, Java, C++, Rust, or Python.
- Proven ability to lead large, ambiguous, cross-team technical initiatives without relying on formal authority.
- Strong written and verbal communication skills, with the ability to explain complex architecture clearly to both technical and business audiences.
- A track record of mentoring engineers and raising the technical bar across an organization.
Compensation Range:
- $227,040 - $283,800
- This is a remote role
JR: 2025-7225
#LI-Remote
#LI-SK1
Why You’ll Like Working for DigitalOcean
- We innovate with purpose. You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud and AI so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a powerful sense of responsibility for customers, products, employees, and decisions.
- We prioritize career development. At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, and education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth and development.
- We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support you from our Employee Assistance Program to Local Employee Meetups to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations and preferences.
- We reward our employees. The salary range for this position is based on market data, relevant years of experience, and skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program.
- DigitalOcean is an equal-opportunity employer. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Application Limit: You may apply to a maximum of 3 positions within any 180-day period. This policy promotes better role-candidate matching and encourages thoughtful applications where your qualifications align most strongly.
DigitalOcean Holdings
Cloud Computing · Public · New York, USA