Data Engineer
PropStream
Role Overview
We are looking for a hands-on, senior Databricks Architect to design, build, and govern our Lakehouse data platform from the ground up. You will own the end-to-end architecture of our data infrastructure — from raw ingestion through the Medallion layers to serving — and establish the engineering standards that will guide the entire data organization.
This is a highly strategic and technical role focused on driving adoption of Databricks, Unity Catalog, and modern Lakehouse patterns across all data products and pipelines. Key Responsibilities
Lakehouse Architecture & Design
Design and implement a production-grade Medallion Architecture (Bronze / Silver / Gold) across all data pipelines.
Establish best practices for Delta Lake table design, partitioning strategies, Z-ordering, and optimization across large-scale datasets.
Define data modeling standards and schema evolution policies across the Lakehouse.
Architect end-to-end data flows from ingestion (streaming and batch) through transformation and serving layers. Unity Catalog & Data Governance
Lead the setup, configuration, and rollout of Unity Catalog as the centralized governance layer for all data assets.
Design metastore hierarchy, catalog/schema/table organization, and tagging standards.
Implement fine-grained access control (row-level, column-level), data masking policies, and audit logging.
Establish data lineage tracking and ensure end-to-end visibility across all pipelines.
Define and enforce data classification and sensitivity frameworks for PII and regulated data assets. Pipeline Development & Orchestration
Build and maintain production-grade data pipelines using PySpark, Delta Live Tables (DLT), and Databricks Workflows / Jobs.
Design modular, reusable pipeline patterns including incremental ingestion, CDC (Change Data Capture), and full-refresh strategies.
Implement robust pipeline observability: logging, alerting, lineage tracking, and SLA monitoring.
Leverage Databricks Repos for CI/CD integration, managing code promotion across dev / staging / production environments. Performance & Compute Optimization
Optimize Spark execution plans, identify and resolve performance bottlenecks across large-scale distributed workloads.
Right-size cluster configurations: Serverless warehouses, auto-scaling job clusters, and photon-enabled SQL warehouses.
Leverage Serverless Warehouses and SQL Warehouses for BI and ad hoc analytics workloads, minimizing cost and cold-start latency.
Manage cost governance for compute, storage, and DBU consumption across workspaces. Developer Experience & Standards
Set up and maintain Databricks Repos with standardized project structures and Git integration.
Define Python coding standards, notebook best practices, and modular library patterns for the data engineering team.
Build reusable Python utility libraries for common patterns: schema validation, data quality checks, Delta operations, and logging.
Establish unit testing and integration testing frameworks for Spark pipelines. Security, Compliance & Networking
Configure workspace-level and account-level security: Private Link, IP access lists, secrets management via Databricks Secrets or AWS Secrets Manager.
Design and enforce network isolation for sensitive data workloads.
Ensure compliance with data residency and access control requirements for customer data. Collaboration & Enablement
Partner with data engineers, data scientists, and analytics engineers to ensure the platform meets diverse workload needs.
Mentor the engineering team on Databricks, Spark optimization, and Lakehouse best practices.
Produce architectural documentation, runbooks, and internal knowledge bases.
Evaluate and recommend new Databricks features and third-party integrations relevant to the organization's data roadmap. Required Qualifications
Core Databricks & Lakehouse
5+ years of hands-on experience with Databricks, with at least 2 years in an architect or senior lead role.
Deep expertise in Unity Catalog: metastore setup, three-level namespace, ACL design, and data governance workflows.
Strong mastery of the Medallion Architecture and Delta Lake: ACID transactions, time travel, compaction, and OPTIMIZE/VACUUM strategies.
Proven experience designing and deploying production pipelines with Databricks Jobs and Workflows, including multi-task job DAGs, retry logic, and notifications.
Hands-on experience with Databricks Repos and CI/CD integration for notebook and Python library deployments.
Experience configuring and operating Serverless SQL Warehouses and Serverless compute for Jobs. Apache Spark
Expert-level PySpark development: DataFrames, Spark SQL, window functions, broadcast joins, and UDFs.
Strong understanding of Spark internals: DAG execution, shuffle optimization, memory management, and speculative execution.
Experience with structured streaming and micro-batch processing patterns.
Proven ability to diagnose and resolve Spark performance issues using Spark UI and event logs. Python & Software Engineering
Advanced Python skills with a strong software engineering background: packaging, testing (pytest), virtual environments, and dependency management.
Experience building modular Python libraries for data engineering use cases.
Familiarity with common data engineering libraries: pandas, pydantic, great_expectations or similar DQ frameworks. Cloud & Infrastructure
Experience deploying Databricks on AWS, including workspace provisioning, IAM integration, and VPC configuration.
Familiarity with cloud-native storage (S3/ADLS), external locations in Unity Catalog, and storage credentials management.
Exposure to infrastructure-as-code tooling (Terraform, Databricks Asset Bundles, or similar). Preferred Qualifications
Databricks Certified Data Engineer Professional or Databricks Certified Associate Developer for Apache Spark certifications.
Experience with Delta Live Tables (DLT) for declarative pipeline authoring.
Familiarity with dbt (data build tool) integrated with Databricks SQL.
Experience with Databricks Feature Store or MLflow for ML platform use cases.
Exposure to Databricks Marketplace and Partner Connect integrations.
Experience with Elasticsearch, Apache Kafka, or other streaming/search technologies complementary to the Lakehouse.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Data Engineer in Vapi vacancy
- Job Description : Design, develop, and optimize database-centric solutions on a high-volume data platform. Comfortable with at least one language with Python. Work extensively with SQL in a large-scale production environment Develop and improve ETL/data pipelines supporting...Suggested
- Role: AWS Data Engineer Experience: 7-8 years Location: Pan India (Remote - UK Shift) 7-8 years of experience in data engineering Strong expertise in SQL (complex queries, optimization) Hands-on experience in PySpark for large-scale data processing Experience with AWS services...SuggestedRemote job
- ...Minimum Requirements ~ Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience ~5+ years of experience working in data integration, pipelines, data modeling, ~ Experience designing and deploying code...Suggested
- AWS PySpark Redshift Data Engineer Contract India (Offshore) Technet IT has partnered with a global technology services organisation supporting a portfolio of enterprise life sciences clients (large-scale, multi-national environment) to hire an experienced AWS PySpark Redshift...SuggestedLong term contractContract work
- About the Job We are seeking a skilled Data Engineer to architect, build, and optimise scalable data platforms on cloud infrastructure. The role involves close collaboration with cross-functional teams to deliver robust, secure, and high-performance data solutions that support...Suggested
- Job Summary: We are seeking a talented AWS Data Engineer to join our dynamic Data Engineering team. The ideal candidate will be responsible for designing, developing, and maintaining scalable data pipelines and architectures in the AWS cloud environment. This role will collaborate...
- Sikich India is seeking an experienced Data Engineer to join our Data & AI practice. You will design, build, and optimize end-to-end data solutions using Microsoft’s data platforms, including Microsoft Fabric, Azure Synapse Analytics, Databricks, and Power BI. Your work will...
- ...Job Title: Data Engineer Work Mode: Remote Location: Gurgaon Hiring Alert | Data Engineer (3–4 Years Experience) We are looking for a talented Data Engineer to join our growing team! If you have strong expertise in SQL and Power BI , along with experience...Remote jobFlexible hours
- We’re seeking a Data Engineer with at least one year of hands‑on experience in data engineering and analytics to join our collaborative team. In this role, you’ll build and maintain end‑to‑end ETL pipelines, design data models, and transform raw data into actionable insights...
- .../ Remote Required Skills & Experience Strong experience with big data technologies such as Apache Spark, Hadoop, and Hive. Hands‑on experience... ...world constraints. Job Description We are seeking a skilled Data Engineer to design, build, and optimize large‑scale batch data pipelines...Contract workRemote job
- We are looking for a highly skilled and motivated Data Engineer with strong expertise in AWS data services to join our data platform team. The ideal candidate will have hands-on experience designing scalable data pipelines, workflow orchestration frameworks, and large-scale...
- Job Title: Data Engineer - AWS & Databricks Job Summary: We are looking for a results-driven Data Engineer with strong expertise in Amazon Web Services (AWS) and Databricks to build scalable and efficient data solutions. The candidate will be responsible for developing robust...
- Azure Data Engineer (Contract) – INDIA – Fully Remote – 6 Months We’re hiring an experienced Azure Data Engineer to support a growing analytics platform and help scale modern data pipelines in a cloud-first environment. This is a hands-on role focused on building reliable...Contract workRemote job
- ...Job Purpose We are seeking a highly skilled Senior Data Engineer (Contractor) to design, build, and optimize scalable data platforms and pipelines. This role will play a critical part in developing modern data lakehouse architectures on AWS, enabling advanced analytics and...For contractors
- Key Responsibilities: * Design and develop end-to-end data solutions using Microsoft Fabric components (Data Factory, Synapse Data Engineering, Data Warehouse, Real-Time Analytics). * Build scalable and efficient ETL/ELT pipelines for structured and unstructured data. * Develop...
- Job description Location- Bangalore Experience- 5-7 Yrs Overview: Data Engineer with experience in designing, building, and optimizing scalable data pipelines and data models. Strong expertise in SQL, Azure Data Factory (ADF), and Python for data processing, transformation,...
- Avigna is hiring Data Engineer Our IT Delivery Center aims to build strong and sustainable solutions for customers across Europe. We are looking for a Data Engineer to join the team to build, support, and drive customer innovations, bringing efficiency to processes through...Remote job
- ...provide on-demand video interviews service to employers and job-seekers. We looking for part time remote interviewer for Snowflake Data Engineer skills. Please apply immediately if you are interested. you will get a call from our team member. PLEASE REVIEW THE DETAILS, AND...Part timeFreelanceImmediate startRemote job
- Hope you are doing good Our Client Is Looking For an MDM Data Engineer Find below the Job Description. Kindly reply to me back with your updated resume, contact details, and the best time to reach you. Job Term: Contract - (6+ months) Project Location : Remote Work Interview...Contract workRemote job
- ...with hyper-focused, domain-centered teams and cutting-edge tech, data, and analytics. Our real-world practitioners work collaboratively... ...outcomes Job Description: Mode of work: Remote Role: Senior Data Engineer 9 to 15 years of prior experience as a software engineer or data...Remote jobShift work
- Job Tittle: Data Engineer Experience: 6-10 Years We are seeking a Senior Data Engineer to design, build, and maintain the core infrastructure powering our data-driven ecosystem. In this role, you will own the end-to-end data pipeline, converting complex business needs into...
- Job Summary We are looking for a skilled Data Engineer with strong experience in Google Cloud Platform (GCP) to design and build scalable data pipelines. The ideal candidate will have hands-on expertise in processing JSON data structures (including complex 3×3 nested formats...
- Job Description Infiligence is a multi national AI-led Platform Engineering company with offices in Chennai, Calgary, and Pleasanton, California... .... Key Responsibilities Design, develop, and maintain scalable data pipelines for batch and real-time data processing using Azure (...Worldwide
- Engineer the Data Backbone for AI with goML At goML, we build modern Generative AI, AI/ML, and Data Engineering solutions that help enterprises turn data into intelligent, scalable systems. Our mission is to bridge advanced data platforms with real-world business needs—enabling...Remote jobFlexible hours
- Must Haves: -3+yrs Python experience for data engineering and backend development - API building using FastAPI - Azure AI Search for building scalable search and retrieval solutions - Strong SQL expertise Day to Day: This person will be responsible for designing, building...Remote job
- We are hiring a Data Engineer/ Software Engineer with 3-5 years of relevant experience in data engineering. About Forage AI: Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured...Full time
- JD - Data Engineer - (Contract) This is a contractual position for D Square Consulting Services Pvt Ltd Experience - 5-7 Years Location - Bangalore Work mode - Hybrid Notice period: Immediately to 30 days Job Summary We are seeking a skilled Data Engineer to join our dynamic...Contract workHybrid workImmediate start
- Job Title: Data Engineer Work Location: Any Infosys Development Center (Preferred: Bangalore, Karnataka) Experience: Relevant Experience: 5+ Years Total Experience: 6-8 Years Domain: Banking Additional Details Work Mode: Hybrid Shift Timing: General Shift (10:00 AM - 7:15 PM...Hybrid workShift work
- Skill: Power Bi Data Engineer Location: Pune (Shivaji Nagar Office) Experience: 7 to 12 years Np: Immediate to 30 Days Job Description: Seeking a Senior Specialist with 7 to 11 years of experience in Power BI, SQL and data analysis to drive data visualization and analytics...Work at officeImmediate start
- ...network across Americas, EMEA and Asia Pacific. About the Role: Position Title: Security Engineer Corporate Title: AVP Location: Bengaluru. Job Profile / Position details: As a Security & Data Engineer, you will be responsible for designing, implementing, and maintaining secure...Hybrid workWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Engineer. Be the first to apply!
