O’Reilly logo

O’Reilly

Data Integration Engineer

🇺🇸 Remote - US

🕑 Full-Time

💰 $110K - $138K

💻 Data Science

🗓️ January 14th, 2025

BigQuery CI/CD ETL

Edtech.com's Summary

O'Reilly Media is hiring a Data Integration Engineer. The role involves contributing to the development and expansion of systems and tools, focusing primarily on ETL processes using Talend, with responsibilities in code review, bug fixing, and maintenance.

Highlights

  • Architect and build ETL pipelines in Talend Data Integration.
  • Manage data ingestion from various platforms using Talend.
  • Develop real-time data pipelines with Google Pub/Sub and Dataflow.
  • Build and optimize datasets in BigQuery, including advanced SQL queries.
  • Enhance and optimize PostgreSQL databases and queries.
  • Required skills include expertise in Talend, BigQuery, and PostgreSQL.
  • Experience with Google Cloud services, Python, Git, and Jenkins is needed.
  • Compensation ranges from $110,000 to $138,000 annually.
  • Over 6 years of professional data engineering experience required.
  • Strong communication skills and a collaborative spirit are essential.

Data Integration Engineer Full Description

About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 40 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more: https://www.oreilly.com/about/

Diversity

At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more: https://www.oreilly.com/diversity

About the Team 

Our data platform team is dedicated to establishing a robust data infrastructure, facilitating easy access to quality, reliable, and timely data for reporting, analytics, and actionable insights. We focus on designing and building a sustainable and scalable data architecture, treating data as a core corporate asset. Our efforts also include process improvement, governance enhancement, and addressing application, functional, and reporting needs. We value teammates who are helpful, respectful, communicate openly, and prioritize the best interests of our users. Operating across various cities and timezones in the US, our team fosters collaboration to deliver work that brings pride and fulfillment.
 
About the Role 

We are seeking an experienced and detail-oriented Data Integration Engineer to contribute to the development and expansion of a suite of systems and tools, with a primary focus on ETL processes. The ideal candidate will have a deep understanding of modern data engineering concepts and will have shipped or supported code and infrastructure with a user base in the millions and datasets with billions of records. The candidate will be routinely implementing features, fixing bugs, performing maintenance, consulting with product managers, and troubleshooting problems. Changes you make will be accompanied by tests to confirm desired behavior. Code reviews, in the form of pull requests reviewed by peers, are a regular and expected part of the job as well.
 
Salary Range: $110,000 - $138,000

What You’ll Do 

  • ETL Development with Talend:
    • Architect and build complex ETL pipelines in Talend Data Integration, ensuring scalability, reusability, and maintainability of workflows
    • Implement sophisticated data transformations, including lookups, joins, aggregates, and custom routines using Talend’s tMap, tJavaRow, tSQLROW and JSON components
    • Automate data ingestion from REST APIs, FTP servers, cloud platforms, and relational databases into cloud or on-premises storage
    • Leverage Talend's integration with BigQuery for seamless data flow into analytical systems, employing native connectors
    • Familiarity with Talend’s debugging tools, logs, and monitoring dashboards to troubleshoot and resolve job execution issues
    • Optimize Talend jobs by using efficient memory settings, parallelization, and dependency injection for high-volume data processing
    • Integrate Talend with Google Cloud Storage, Pub/Sub, and Dataflow to create hybrid workflows combining batch and real-time data processing
    • Manage Talend deployments using Talend Management Console (TMC) for scheduling, monitoring, and lifecycle management

  • BigQuery Data Management:
    • Build high-performance BigQuery datasets, implementing advanced partitioning (DATE, RANGE) and clustering for cost-effective queries
    • Proficient in working with JSON and ARRAY data structures, with expertise in leveraging BigQuery to efficiently nest and unnest objects as required for complex data transformations and analysis
    • Write advanced SQL queries for analytics, employing techniques like window functions, CTEs, and array operations for complex transformations
      • Implement BigQuery federated queries to integrate external datasets from Cloud Storage or other data warehouses
    • Fundamental understanding of Designing and managing BigQuery reservations and slots involves allocating compute resources effectively to balance performance, cost, and workload demands across various teams and projects

  • Real-time Data Pipelines with Google Pub/Sub and Dataflow:
    • Implement Pub/Sub topics and subscriptions to manage real-time data ingestion pipelines effectively
    • Integrate Pub/Sub with Talend for real-time ETL workflows, ensuring low-latency data delivery
    • Implement dynamic windowing and triggers for efficient aggregation and event handling
    • Optimize streaming pipelines by fine-tuning autoscaling policies, worker counts, and resource configurations

  • PostgreSQL Database Development and Optimization:
    • Be able to enhance, modify existing PostgreSQL queries and functions
    • Write advanced PL/pgSQL functions and triggers for procedural data logic
    • As needed develop materialized views and indexed expressions to speed up query execution for large datasets
    • Monitor and optimize queries through EXPLAIN/ANALYZE
 
What You’ll Have 

Required: 

  • 6+ years of professional data engineering experience (equivalent education and/or experience may be considered)
  • Strong experience with Talend Data Integration for designing and optimizing ETL pipelines
  • Experience working with JSON and ARRAY data structures in BigQuery, including nesting and unnesting
  • Experience in integrating and optimizing streaming data pipelines in a cloud environment
  • Experience with Python
  • Experience with Git/ Github
  • Experience with deployment tools such as Jenkins to build automated CI/CD pipelines
  • Hands-on experience with Google Cloud Storage, Pub/Sub, Dataflow, and Dataprep for ETL and real-time data processing
  • Proficient in building and managing real-time data pipelines with Google Pub/Sub and Dataflow
  • Proficient in BigQuery, including dataset management, advanced SQL, partitioning, clustering, and federated queries
  • Solid understanding of PostgreSQL, including PL/pgSQL, query optimization, and advanced functions
  • Familiarity with optimizing BigQuery performance through reservations, slots, and cost-effective query techniques
  • Excellent problem-solving skills and ability to optimize high-volume data workflows
  • Strong communication skills to collaborate effectively with cross-functional teams
  • Strong drive to experiment, learn and improve your skills
  • Respect for the craft—you write self-documenting code with modern techniques
  • Great written communication skills—we do a lot of work asynchronously in Slack and Google Docs
  • Empathy for our users—a willingness to spend time understanding their needs and difficulties is central to the team
  • Desire to be part of a compact, fun, and hard-working team
 
Preferred:

  • Experience Integrating BigQuery ML for advanced machine learning use cases, including regression, classification, and time-series forecasting

Additional Information: At this time, O'Reilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H-1B, STEM, OPT, CPT, EAD and Permanent Residency process)