Schemata

Schedule a demo

Please fill out the form below and we’ll get back to you soon.

Back to Careers

Data Engineer

Full-Time | Hybrid | San Francisco

We are seeking a highly skilled Data Engineer to join our dynamic, early-stage team full-time. You will play a foundational role in designing, building, and scaling both the cloud infrastructure and data pipelines that power our AI-driven 3D interactive applications, neural rendering systems, and analytics frameworks.

At Schemata, we are transforming the $400B Virtual Training and Simulation market by integrating AI, neural rendering, and spatial computing into highly regulated industries. Our platform ingests vast amounts of structured and unstructured data—including 3D scans, dense technical documentation, and user interactions – we need a versatile data engineer to build out robust data pipelines, infrastructure, and analytical tools.

This is a high-impact, cross-functional role—you will work end-to-end on everything from cloud architecture and infrastructure-as-code to data ingestion, machine learning pipelines, and multi-modal outputs that empower both our internal teams and external customers.

Core Responsibilities

  • Build and optimize scalable machine learning based pipelines to process diverse data sources, including 3D spatial data, technical documentation, and user data
  • Implement real-time and batch data processing systems
  • Support ML engineers with data pipelines for training/inference across structured and unstructured data (text, images, video, 3D assets)
  • Design and manage scalable AWS cloud architecture to support AI-driven 3D applications while implementing infrastructure-as-code for reliability
  • Optimize distributed computing and storage solutions for cost-effective, high-performance workloads
  • Work closely with product and engineering teams to integrate data-driven features into interactive 3D applications
  • Document cloud architecture, data models, and infrastructure for cross-team collaboration
  • Stay current with emerging technologies in cloud/data engineering, AI, and spatial computing to continuously improve our stack

Essential Skills & Experience

  • 4+ years of experience in data engineering, platform engineering, or cloud engineering roles, with a proven track record of delivering end-to-end solutions
  • Proficiency in Python and SQL, with experience building scalable cloud pipelines (AWS Batch)
  • Strong expertise in AWS and infrastructure-as-code (Terraform)
  • Experience designing and implementing real-time and batch processing workflows
  • Knowledge of data modeling, distributed computing, and storage architectures
  • Familiarity with containerization and orchestration tools (Docker, Kubernetes)
  • Experience with data visualization tools (Grafana or similar)

Nice to Have

  • Experience working with unstructured 3D data, such as point clouds, mesh files, or volumetric captures
  • Expertise in modern data warehousing and lakehouse architectures (Databricks, Snowflake, Redshift, or BigQuery)
  • Familiarity with ML Ops and integrating machine learning pipelines into data workflows
  • Experience working with graph or multimodal data architectures
  • Knowledge of graph databases (e.g., Neo4j) or vector search for AI-powered retrieval
  • Previous experience working in highly regulated industries (e.g., defense, energy, finance)

Why Join Us?

  • Own and shape the platform engineering function at a fast-growing company
  • Tackle unique, high-impact infrastructure and data challenges at the intersection of AI, spatial computing, and neural rendering
  • Work with a world-class team of engineers, researchers, and product builders, solving real-world problems in high-stakes industries
  • Fast-paced, high-ownership environment—your work will directly impact the scalability, reliability, and performance of our core products

Apply for this job

* indicates a required field.


Schemata

650 California St
San Francisco CA

info@schemata.com

© Copyright 2025

Terms and Conditions