
Dibuka
Disiarkan
•
Berakhir dalam 2 hari
Dibayar semasa penghantaran
I’m sitting on a large database that stores semi-structured records, and I need a robust transformation layer that turns this raw content into analysis-ready tables. The data is already captured and stored; the task begins once the records land in the database and ends when the transformed results are written back to a target schema (or files, if that proves more efficient). Key points you should know • Source: relational database containing nested JSON / key-value blobs. • Goal: parse, normalize, and flatten these blobs into well-defined columns while preserving relationships and lineage. • Scale: millions of rows, so solutions that leverage Spark, Hadoop, BigQuery, Snowflake, or well-tuned SQL/Python pipelines are welcome—as long as they remain maintainable. Deliverables 1. Transformation code (Python, PySpark, SQL, or Scala) with clear comments. 2. A runnable job definition or workflow file (Airflow DAG, Spark submit script, dbt model, etc.) that shows how to execute the pipeline end-to-end. 3. Simple README explaining prerequisites, run steps, and how new fields should be added in future. Acceptance criteria • Pipeline processes at least 10 GB of source data without errors. • Output tables/files match the target schema I’ll provide and contain no missing or malformed records. • Execution can be parameterized for date ranges or incremental loads. If you’ve built similar ETL or ELT jobs against semi-structured data and can demonstrate performance at scale, I’d love to see your approach.
ID Projek: 40359210
6 cadangan
Dibuka untuk pembidaan
Projek jarak jauh
Aktif 3 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
6 pekerja bebas membida secara purata ₹14,295 INR untuk pekerjaan ini

I understand that you're facing a significant challenge with transforming semi-structured data into analysis-ready tables, especially given the scale of millions of rows. A robust transformation layer is essential to ensure that you can efficiently parse, normalize, and flatten nested JSON/key-value blobs while preserving their relationships. With over 12 years of experience in building ETL and ELT pipelines using tools like Python, PySpark, and SQL, I am adept at leveraging technologies such as Spark and AWS to manage large datasets. I will provide clear transformation code along with a runnable job definition (potentially an Airflow DAG) and an informative README for future scalability. My approach focuses on ensuring error-free processing of at least 10 GB of source data while maintaining compliance with your target schema. Could you share more about the specific relationships within your data that need to be preserved during this transformation?
₹12,500 INR dalam 7 hari
4.6
4.6

Hi [ClientFirstName], I’ve read your Semi-Structured Data Transformation brief and I’m confident we can build a robust, scalable layer that turns nested JSON/key-value blobs into analysis-ready tables while preserving lineage and relationships. I’ve led similar ETL/ELT efforts over multi-terabyte datasets, using PySpark, SQL-based pipelines, and modular transformation layers that are easy to maintain and extend. My approach starts from the landing records in your relational store, applies a schema-aware parser to flatten and normalize fields, preserves lineage through metadata catalogs, and outputs to a clearly defined target schema or, if preferred, flat files for downstream analytics. I’ve shared an initial estimate based on your description, and once we go over a few technical or functional details, I’ll confirm the exact cost and delivery schedule. The pipeline will be designed to scale to millions of rows and at least 10 GB per run, with parameterized date ranges and incremental loads. I will provide runnable code (Python/PySpark or SQL), a workflow definition (Airflow DAG or Spark submit), and a concise README for future field additions. What is your preferred target schema format and your lineage metadata approach (e.g., using a data catalog, column-level lineage, or table-level provenance) for the transformed data? Best regards, Asad
₹7,770 INR dalam 1 hari
3.7
3.7

Using my extensive skills in Excel and Data Analysis, I can revolutionize your semi-structured data transformation process. With over 17 years as a professional freelancer under my belt, I've handled projects worth €500,000+ with satisfied clients from over 200 countries. You might be sitting on a large database of data but rest assured I can parse and transform it into well-defined columns while preserving relationships and lineage - a crucial aspect for the success of any analysis. Suffice to say that dealing with millions of rows is second nature to me, and I have hands-on experience using powerful tools like Spark, Hadoop, BigQuery, Snowflake and SQL/Python-based pipelines to manage such scaling issues without sacrificing maintainability. It’s all about finding the right balance between performance and reliability, which I'm proficient at. The end result will be transformed tables/files mirroring your target schema down to the last detail. I'll also ensure you receive a job definition/workflow file that allows you to execute the pipeline end-to-end effortlessly. My availability round-the-clock means I can work according to your timezone schedule. Hire me now to stop worrying about missing or malformed records; you will get a seamless process with solid documentation for any future updates. Let's get started on this transformative journey!
₹30,000 INR dalam 63 hari
3.5
3.5

Hello Client, I’ll deliver your robust transformation layer with precision and efficiency, ensuring smooth performance and error-free results that handle millions of rows of nested JSON data. You’ll receive well-commented Python, PySpark, or SQL code alongside a runnable workflow (Airflow DAG or dbt model) for seamless end-to-end execution. Clear documentation and setup instructions will guide you through prerequisites and future field additions. My past work includes scalable ETL pipelines leveraging Spark and BigQuery to process large datasets with parameterized incremental loads. I focus on practical solutions, fast delivery, and professional communication—ready to start immediately and adapt to your timeline. Let’s connect today and move your project forward with confidence. Regards, Anton Prinsloo
₹10,000 INR dalam 14 hari
0.0
0.0

Hi! I read your project "Semi-Structured Data Transformation". I’m a multi-language developer working mainly with Python, JavaScript, Java, Go, and automation-heavy projects. What I can offer here: - clean implementation - clear communication - fast turnaround - support for fixes/adjustments after delivery If you want, send me the core requirement or first milestone you care about most, and I’ll outline the cleanest implementation path. Available to start immediately.
₹18,500 INR dalam 7 hari
0.0
0.0

Bagha, India
Ahli sejak Apr 9, 2026
₹1500-12500 INR
$250-750 NZD
$30-250 USD
₹100-400 INR / jam
$250-750 USD
€250-750 EUR
$25-50 AUD / jam
$10-30 USD
£5-10 GBP / jam
$15-25 USD / jam
₹750-1250 INR / jam
₹1250-2500 INR / jam
$10-30 USD
$15-25 USD / jam
₹12500-37500 INR
₹3000-3200 INR
$30-250 CAD
₹1500-12500 INR
$15-25 USD / jam
$1500-3000 USD
₹750-1250 INR / jam