Data Engineer

Full-Time | Hybrid | San Francisco

We are seeking a highly skilled Data Engineer to join our dynamic, early-stage team full-time. You will play a foundational role in designing, building, and scaling both the cloud infrastructure and data pipelines that power our AI-driven 3D interactive applications, neural rendering systems, and analytics frameworks.

At Schemata, we are transforming the $400B Virtual Training and Simulation market by integrating AI, neural rendering, and spatial computing into highly regulated industries. Our platform ingests vast amounts of structured and unstructured data—including 3D scans, dense technical documentation, and user interactions – we need a versatile data engineer to build out robust data pipelines, infrastructure, and analytical tools.

This is a high-impact, cross-functional role—you will work end-to-end on everything from cloud architecture and infrastructure-as-code to data ingestion, machine learning pipelines, and multi-modal outputs that empower both our internal teams and external customers.

Core Responsibilities

Build and optimize scalable machine learning based pipelines to process diverse data sources, including 3D spatial data, technical documentation, and user data
Implement real-time and batch data processing systems
Support ML engineers with data pipelines for training/inference across structured and unstructured data (text, images, video, 3D assets)
Design and manage scalable AWS cloud architecture to support AI-driven 3D applications while implementing infrastructure-as-code for reliability
Optimize distributed computing and storage solutions for cost-effective, high-performance workloads
Work closely with product and engineering teams to integrate data-driven features into interactive 3D applications
Document cloud architecture, data models, and infrastructure for cross-team collaboration
Stay current with emerging technologies in cloud/data engineering, AI, and spatial computing to continuously improve our stack

Essential Skills & Experience

4+ years of experience in data engineering, platform engineering, or cloud engineering roles, with a proven track record of delivering end-to-end solutions
Proficiency in Python and SQL, with experience building scalable cloud pipelines (AWS Batch)
Strong expertise in AWS and infrastructure-as-code (Terraform)
Experience designing and implementing real-time and batch processing workflows
Knowledge of data modeling, distributed computing, and storage architectures
Familiarity with containerization and orchestration tools (Docker, Kubernetes)
Experience with data visualization tools (Grafana or similar)

Nice to Have

Experience working with unstructured 3D data, such as point clouds, mesh files, or volumetric captures
Expertise in modern data warehousing and lakehouse architectures (Databricks, Snowflake, Redshift, or BigQuery)
Familiarity with ML Ops and integrating machine learning pipelines into data workflows
Experience working with graph or multimodal data architectures
Knowledge of graph databases (e.g., Neo4j) or vector search for AI-powered retrieval
Previous experience working in highly regulated industries (e.g., defense, energy, finance)

Why Join Us?

Own and shape the platform engineering function at a fast-growing company
Tackle unique, high-impact infrastructure and data challenges at the intersection of AI, spatial computing, and neural rendering
Work with a world-class team of engineers, researchers, and product builders, solving real-world problems in high-stakes industries
Fast-paced, high-ownership environment—your work will directly impact the scalability, reliability, and performance of our core products

Apply for this job

* indicates a required field.