
Ditutup
Disiarkan
Dibayar semasa penghantaran
Senior Data Pipeline / ETL Engineer (FastAPI, PostgreSQL, OpenSearch) – Build MVP Data Ingestion Pipeline for Financial Intelligence Platform Overview We are building a financial intelligence infrastructure platform that aggregates global corporate registries, sanctions lists, and ownership data to produce compliance-grade investigative reports. The GitHub repository, architecture documentation, and core backend services already exist. What we need now is a senior data pipeline engineer who can complete the data ingestion and normalization pipeline so that our MVP feature works perfectly. The primary MVP workflow is: Search a person or company → resolve the entity → check sanctions exposure → reconstruct ownership relationships → produce an evidence-backed report. Your role will be to build the ingestion pipeline that powers this workflow. This is not a greenfield project. You will be working inside an existing architecture and repository. ⸻ What You Will Build You will implement the ETL / data pipeline layer that ingests and prepares structured data for the RealScore screening workflow. The pipeline must support ingestion of: • Global sanctions lists • OFAC SDN • EU Consolidated • UN Sanctions • Corporate registry data • Beneficial ownership data • Structured entity datasets The system must perform: 1. Data Ingestion Automated ingestion jobs that pull structured datasets and load them into PostgreSQL. Requirements: • Idempotent ingestion • SHA-256 checksum tracking • Version tracking for data updates • Retry mechanisms for failed jobs ⸻ 2. Data Normalization Convert raw records into a standardized entity schema. Examples: • normalize company names • remove legal suffixes • standardize jurisdictions • normalize identifiers The output should populate: • entities • identifiers • relationships • evidence_records ⸻ 3. Entity Resolution Implement a multi-pass entity resolution system: Pass 1 — Deterministic • exact identifier matches (LEI, registry IDs) Pass 2 — Semi-deterministic • normalized name match Pass 3 — Probabilistic • fuzzy matching via OpenSearch • Jaro / similarity scoring Goal: resolve duplicate records across datasets. ⸻ 4. Relationship Construction Build ownership and control relationships: Examples: • company → director • company → shareholder • entity → sanctioned entity • entity → related entities Relationships must be stored so they can be used by the risk engine. ⸻ 5. Pipeline Orchestration Pipeline must support: • scheduled ingestion jobs • dependency ordering • failure recovery • logging Suggested tools (already in repo): • Python • FastAPI • PostgreSQL • Celery / Redis • OpenSearch ⸻ Expected Output When a user searches for a person or company: The system must be able to: 1. Resolve the entity 2. Check sanctions exposure 3. Trace ownership relationships 4. Generate a structured evidence report This pipeline is the core engine powering that workflow. ⸻ Existing Stack You will work inside an existing repository that includes: • FastAPI backend • PostgreSQL database • ingestion service skeleton • normalization schemas • entity service • GitHub repo with architecture documentation You will extend the current pipeline, not rebuild from scratch. ⸻ Required Experience Minimum requirements: • 5+ years building production ETL pipelines • Python data engineering experience • PostgreSQL data modeling • Search engines (OpenSearch or Elasticsearch) • Experience with large structured datasets Preferred experience: • sanctions / compliance data • entity resolution systems • corporate registry datasets • financial intelligence or AML systems ⸻ Deliverables You will deliver: • fully functional ingestion pipelines • normalized entity datasets • entity resolution implementation • pipeline orchestration • documentation inside the repo The pipeline must run via: make seed make pipeline and populate the database correctly. ⸻ Important This is a serious engineering role, not a simple script task. We are looking for someone who can think like a data systems architect, not just write quick ETL scripts. Please include in your proposal: 1. Examples of ETL pipelines you have built 2. Your experience with entity resolution 3. Experience with large data ingestion systems 4. Your GitHub profile ⸻ Budget Open to fixed price or milestone structure depending on experience. We prioritize quality and reliability over the lowest bid. ⸻ If you have experience building serious data pipelines and investigative data systems, we would like to hear from you. All proposals must include GitHub links showing your past work specifically with pipelines.
ID Projek: 40356113
141 cadangan
Projek jarak jauh
Aktif 1 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
141 pekerja bebas membida secara purata $510 USD untuk pekerjaan ini

⭐⭐⭐⭐⭐ Build Data Ingestion Pipeline for Financial Intelligence Platform ❇️ Hi My Friend, I hope you're doing well. I just reviewed your project details and see you are looking for a Senior Data Pipeline Engineer. You don’t need to look any further; Zohaib is here to help you! My team has handled over 50 similar projects in data engineering. I will efficiently create the ingestion and normalization pipeline you need to ensure the MVP feature works seamlessly. ➡️ Why Me? I can easily build your data ingestion pipeline as I have over 5 years of experience in ETL processes, Python programming, and PostgreSQL data modeling. My expertise also includes entity resolution and working with large structured datasets, ensuring a robust solution for your financial intelligence platform. ➡️ Let's have a quick chat to discuss your project in detail. I can show you samples of my previous work. Looking forward to connecting with you! ➡️ Skills & Experience: ✅ ETL Pipeline Development ✅ Python Programming ✅ PostgreSQL Data Modeling ✅ OpenSearch Integration ✅ Data Normalization ✅ Entity Resolution ✅ Data Ingestion ✅ API Development ✅ Data Quality Assurance ✅ Pipeline Orchestration ✅ Celery / Redis Management ✅ Documentation and Reporting Waiting for your response! Best Regards, Zohaib
$350 USD dalam 2 hari
7.9
7.9

Hi I can extend your existing FastAPI, PostgreSQL, and OpenSearch stack to deliver a production-grade ingestion and normalization pipeline for the MVP screening workflow. My background includes Python ETL systems, idempotent ingestion jobs, entity normalization, relationship modeling, search indexing, and data workflows that must stay reliable under repeated updates. The main technical challenge here is not just loading sanctions and registry datasets, but resolving overlapping records across sources without corrupting lineage, evidence, or ownership chains. I solve that by building checksum-aware ingestion, versioned raw-to-normalized transforms, deterministic and fuzzy entity resolution layers, and reproducible orchestration with strong logging and recovery paths. I can work inside an existing repository, preserve your current architecture, and implement the pipeline so it runs cleanly through make seed and make pipeline. My focus would be stable dataset ingestion, normalized entity graphs, explainable evidence records, and relationship construction that the risk engine can trust. This is the kind of backend data engineering work I’m comfortable with because it requires both system design discipline and careful implementation detail. Thanks, Hercules
$500 USD dalam 7 hari
6.4
6.4

i’ve done very similar recently building ETL pipelines for compliance data (sanctions + entity graphs) using FastAPI, Postgres, OpenSearch, and Celery Which datasets are highest priority for MVP so we design ingestion order and dependencies correctly? Do you already have a canonical entity schema finalized or should I refine normalization + relationship models? I suggest using hash-based upserts with version tables to ensure idempotency and full audit trace. I suggest separating deterministic and fuzzy resolution pipelines with scoring thresholds to improve accuracy and avoid noisy merges. I will first audit your repo, finalize schema, and implement ingestion with checksum/versioning. Then I will build normalization + multi-pass entity resolution with OpenSearch scoring. Finally I will wire orchestration, logging, and validate end-to-end via make pipeline. Best, Dev S.
$700 USD dalam 10 hari
6.4
6.4

Hello there, I will build the full ingestion pipeline — sanctions list loaders (OFAC SDN, EU Consolidated, UN), corporate registry ingestion, normalization layer, and multi-pass entity resolution — all wired into your existing FastAPI/PostgreSQL/OpenSearch stack so `make seed` and `make pipeline` work cleanly. For entity resolution, I will structure the three passes as separate Celery stages with dependency ordering — deterministic ID matching first, then normalized name dedup, then OpenSearch fuzzy scoring with Jaro-Winkler. Running them as discrete stages means you get full auditability on which pass linked each record, which matters for compliance-grade evidence trails. I am happy to share relevant pipeline examples and my GitHub in our chat. Ready to start whenever you are. Kamran
$270 USD dalam 10 hari
5.6
5.6

Hi, I’m a senior data engineer with 7+ years building production ETL pipelines for structured and semi-structured datasets, including financial intelligence and compliance data. I have extensive experience designing idempotent ingestion pipelines, multi-pass entity resolution, and relationship construction workflows. I can extend your existing FastAPI/PostgreSQL/OpenSearch stack to: • Automate ingestion of global sanctions lists, corporate registries, and ownership datasets with SHA-256 checksums, version tracking, and retries. • Normalize raw records to a standardized entity schema (company names, jurisdictions, identifiers). • Implement deterministic, semi-deterministic, and probabilistic entity resolution with fuzzy matching and similarity scoring. • Build ownership and control relationships usable by your risk engine. • Orchestrate scheduled ingestion jobs with Celery, handle dependencies, failures, and logging. I have previously delivered similar pipelines that power investigative reporting, AML screening, and entity analytics. I follow strict data engineering and architectural best practices, ensuring maintainable, testable, and production-ready pipelines. I can start immediately and will deliver fully functional pipelines integrated into your repo, ready to run via make seed and make pipeline.
$250 USD dalam 5 hari
5.7
5.7

Hello, I’ve carefully reviewed your project and am excited about the opportunity to work with you. With over 7 years of experience in building production-grade ETL pipelines and financial data systems, I specialize in designing ingestion and entity resolution engines that power high‑integrity compliance workflows. I am confident I can complete your ingestion and normalization pipeline within your existing FastAPI, PostgreSQL, Celery, and OpenSearch architecture efficiently and effectively. Here’s my approach: Implement robust ETL jobs with idempotency, checksum validation, version tracking, and retries for all sanctions lists and registry datasets. Build normalization layers that standardize names, identifiers, and jurisdictions into your existing schemas. Develop deterministic, semi‑deterministic, and probabilistic entity resolution with OpenSearch scoring to unify records across datasets. I am available to start immediately and aim to deliver a functional MVP pipeline within 21 days. Additional instructions / notes (optional): I will follow your repo architecture and extend existing ingestion services. I will document the entire pipeline and provide GitHub‑linked examples of my past ETL systems. Best regards, Jushua
$555 USD dalam 3 hari
5.8
5.8

Hi, I have strong experience in Bubble, AI-powered app development, API integrations, and building scalable subscription-based platforms with high user engagement. I have real hands-on experience creating full-featured apps in Bubble including user profiles, content libraries, gamification systems, AI chat integration, notifications, and third-party integrations, ensuring a smooth user experience and a structure that can grow with the product. You can expect clear communication, fast turnaround, and a high-quality result that fits seamlessly into your existing workflow. Best regards, Juan
$500 USD dalam 3 hari
5.8
5.8

⭐Hi, I’m ready to assist you right away!⭐ I believe I’d be a great fit for your project since I have over 5 years of experience building production ETL pipelines using Python, PostgreSQL, and OpenSearch. My expertise in data engineering and experience with large structured datasets make me the ideal candidate to complete the data ingestion and normalization pipeline for your financial intelligence platform. I have a proven track record of implementing complex data pipelines and ensuring the seamless flow of data to support compliance-grade investigative reports.
$400 USD dalam 9 hari
5.4
5.4

Hi, I have worked on many similar ETL tasks with Python, FastAPI, PostgreSQL, OpenSearc/ElasticSearch, Redis with celery queues, entity extraction and resolution. Past projects include ETL pipeline with entity extraction using Spacy, data ingestion to postgres, elasticsearch, used NATs instead of kafka. Another project, scraping relevant articles/news/bulletins etc for an list of ERP softwares, extracting all kinds of entities, storing in Elasticsearch and potsgres using redis and celery queues. Many more projects exactly simialr to your requirements. Lets connect.
$700 USD dalam 15 hari
5.5
5.5

Hello, I can deliver what you need. I went through your project details and found that I worked on almost the exact same task about two months ago. I am an experienced and specialized freelancer with 6+ years of practical experience in Python, FastAPI, PostgreSQL and I’m able to complete and deliver this project promptly. Feel free to visit my profile to check latest work and feedback from clients. Let us make this great together, please connect in chat. Talk soon.
$750 USD dalam 7 hari
5.1
5.1

Build the ETL layer that ingests global sanctions, corporate registry, and ownership datasets into PostgreSQL, normalizes to a canonical entity schema, runs deterministic → semi-deterministic → probabilistic entity resolution (OpenSearch + similarity scoring), and writes relationships and evidence_records with SHA-256 checksums, versioning, and retryable, idempotent jobs exposed via make seed / make pipeline. Practical pitfall to flag: mutating canonical entities during ingestion destroys provenance and makes rollbacks and audits impossible. Keep immutable raw tables, produce versioned normalized records, and run resolution as a separate, idempotent pass that emits relationship edges and evidence with checksums. Relevant proof: pipeline examples exist in private GitHub repos (regulated-data ETL and resolution work). Access can be granted under NDA or shared as sanitized excerpts. Implementation approach in brief: - Celery tasks + Redis for orchestration and retries; Celery beat or dependency DAG for ordering. - Ingest → raw tables with checksum/version → normalization layer (cleaning, suffix removal, jurisdictions) → multi-pass resolution (identifier-first, normalized-name, OpenSearch fuzzy + Jaro) → relationship builder → evidence_records. - Add integration tests and repo docs; ensure make targets populate DB. Requested items: 1) Examples: private pipeline repos available on request. 2) Entity resolution: planned multi-pass design above with deterministic fallback and OpenSearch scoring. 3) Large ingestion: will stream loads, batch upserts with checksums, and track versions. 4) GitHub: private; can share access. Can you add read access to the repo and a small sample sanctions/registry file so a 48-hour plan and concrete make pipeline steps can be outlined?
$500 USD dalam 7 hari
4.8
4.8

I can help you. I will implement the multi-pass entity resolution by leveraging OpenSearch’s phonetic analyzers (n-grams/Double Metaphone) to catch transliteration variances in global sanctions lists that standard fuzzy matching often misses. A hidden problem in beneficial ownership data is "circular ownership" and recursive depth; I will design the relationship construction logic to be graph-traversal ready within PostgreSQL to ensure your risk engine doesn't hang on complex parent-subsidiary loops. To ensure the "evidence-backed" requirement, I’ll implement a temporal versioning layer so that your reports can recreate the ownership state as it existed at the exact time a sanction was issued, rather than just the current state.
$250 USD dalam 7 hari
4.9
4.9

Hi there, I’ve reviewed your project and understand you need a senior data pipeline engineer to complete the ETL layer powering your financial intelligence MVP. The focus here is building a robust ingestion and normalization pipeline within your existing FastAPI and PostgreSQL architecture, ensuring accurate entity resolution, sanctions screening, and relationship mapping. I’ll design idempotent ingestion jobs with checksum and version tracking, along with resilient retry mechanisms so your datasets remain consistent, traceable, and production-ready. I have strong experience building production ETL systems handling large structured datasets, including normalization pipelines, schema design, and multi-pass entity resolution using deterministic and fuzzy matching techniques. I can implement efficient pipelines using Python, Celery, Redis, and OpenSearch, ensuring proper orchestration, dependency handling, and logging. I’ll work directly within your existing repository, extending services rather than rebuilding, and ensure everything runs seamlessly via your make commands. I can also provide clean documentation and a walkthrough of the pipeline design. Happy to share relevant pipeline work and GitHub examples once we connect. Best regards, Muhammad Adil Portfolio: https://www.freelancer.com/u/webmasters486
$450 USD dalam 6 hari
5.0
5.0

Hello. I came across your project, Build Data Ingestion Pipeline and it aligns well with my background. I have hands-on experience with Python, PostgreSQL, Redis that's directly relevant here. Feel free to reach out if you have questions.
$250 USD dalam 7 hari
4.4
4.4

We’ve carefully reviewed your requirements and fully ✅understand that this is a production grade data ingestion + entity resolution pipeline ✨ not a simple ETL script. The goal is to power a financial intelligence workflow with accurate entity resolution, sanctions matching, and relationship mapping. We have experience building ⚡Python based ETL systems with FastAPI, PostgreSQL, and OpenSearch for large structured datasets, so we can execute this with architectural precision. Deliverables 1️⃣ Idempotent ingestion pipelines (sanctions + registry datasets) 2️⃣ Normalized entity schema (entities, identifiers, relationships, evidence) 3️⃣ Multi pass entity resolution (deterministic + fuzzy via OpenSearch) 4️⃣ Relationship graph construction (ownership, control, sanctions links) 5️⃣ Pipeline orchestration (Celery/Redis, retries, logging, scheduling) 6️⃣ Clean repo integration + documentation (make seed, make pipeline) Workflow 1️⃣ Review existing repo, schemas, ingestion skeleton 2️⃣ Implement ingestion with checksum + version tracking 3️⃣ Build normalization + standardization layer 4️⃣ Develop entity resolution (exact → fuzzy matching) 5️⃣ Construct relationship mappings + validation 6️⃣ Add orchestration, logging, and failure recovery Relevant Experience ✔ ETL pipelines for structured + semi-structured datasets ✔ Entity resolution using deterministic + fuzzy matching ✔ High volume ingestion systems with scheduling + retries With regards, Harshvir Singh
$500 USD dalam 12 hari
5.0
5.0

As a tech enthusiast with a passion for programming, I believe that my 7 years of experience in software development across various domains, would make me an ideal candidate for this project. I have extensive experience in ETL and data engineering, particularly with Python – one of the main technologies used in this project – as well as familiarity with large structured datasets. My proficiency with PostgreSQL data modeling and Lastly, choosing me means selecting someone dedicated to not only delivering on time but also meeting your expectations accurately. Ensuring client satisfaction is pivotal to me and I take great pride in creating solutions that not just solve problems, but enhance efficiency and productivity. I am ready to leverage my capabilities comprehensively, delivering fully functional pipelines, normalized entity datasets, pipeline orchestration as well as ensuring documentation is thorough and easily accessible for future additions or updates. Let's build a powerful financial intelligence infrastructure together!
$250 USD dalam 7 hari
6.2
6.2

Hey! I can take ownership of your ETL pipeline and deliver a production-grade ingestion and normalization layer that reliably powers your RealScore workflow. I have 5+ years of experience building scalable data pipelines with Python, PostgreSQL, and OpenSearch, including entity resolution and large dataset ingestion. - Build idempotent ingestion pipelines with checksum/version tracking and retry logic. - Implement robust normalization into structured entity schemas (entities, identifiers, relationships, evidence). - Design multi-pass entity resolution (deterministic + fuzzy matching with OpenSearch/Jaro scoring). - Construct ownership and sanctions relationships for downstream risk analysis. - Orchestrate pipelines with Celery/Redis, ensuring scheduling, logging, and failure recovery. - Integrate cleanly into your existing FastAPI architecture and repo. I’ve worked on high-volume ETL systems and understand the importance of accuracy, traceability, and performance in compliance-grade platforms. I can share relevant pipeline and data engineering work via GitHub. Ready to review your repo and start immediately. You missed a great opportunity before that! Best Regards, Muhammad Tahir Iqbal.
$700 USD dalam 13 hari
4.0
4.0

Leveraging over 5 years of hands-on experience in building robust, scalable, and high-performing ETL pipelines using Python and PostgreSQL, I am well-positioned to complete your data ingestion and normalization pipeline with utmost precision. Apart from a solid grounding in working with all the underlying technologies-Core Pyramid backend System, Elasticsearch, Redis etc.- I reckon the intrinsic knowledge of your project-stack would also go a long way in aiding fast implementation. Being an AI specialist with expertise in wielding large structured datasets like you have mentioned before, I can adeptly handle the complex requirement of normalizing data into a standardized entity schema ensuring optimum data integrity. Importantly for your use case, my past experience in building entity resolution systems comes not just as optional add-ons but rather thorough understanding how to make your workflow lookup quick and accurate. Moreover, having worked on myriad projects involving financial compliance data in my 8+ years IT career,I bring an inherent level of understanding about the finer nuances that is crucial for your system to adhere to compliance-grade standards. With me at the helm of this project, you can rest assured of timely delivery accompanied by comprehensive documentation inside the repo for easy manageability. Let's come together and build a highly efficient and reliable architecture that powers your MVP's workflow!
$500 USD dalam 7 hari
4.2
4.2

Hi, I’m Karthik, a Senior Data Engineer with 15+ years of experience building production-grade ETL/data pipelines, PostgreSQL-backed platforms, API-driven systems, and scalable backend workflows. Your requirement fits well with my experience in structured data ingestion, normalization, entity linking, and evidence-driven reporting systems. Relevant experience: • Built Python/FastAPI-based ingestion pipelines for large structured datasets • Designed idempotent ETL jobs with checksum/version tracking, retries, and audit logging • Strong PostgreSQL data modeling for entities, identifiers, relationships, and evidence layers • Worked with OpenSearch/Elasticsearch for fuzzy search, ranking, and record matching • Experience handling compliance-style datasets, ownership mapping, and risk-oriented workflows How I’ll approach this: Review existing repo, schemas, and ingestion skeleton Implement robust ingestion for sanctions + registry datasets Build normalization + standardized entity schema population Add multi-pass entity resolution (deterministic, normalized, fuzzy) Construct ownership/control relationships for downstream risk engine Wire orchestration, recovery, logging, and repo documentation I can work within your current architecture and deliver a reliable MVP pipeline runnable via make seed and make pipeline. GitHub/profile and relevant pipeline examples can be shared privately with proposal discussion. Warm Regards, Karthik B
$750 USD dalam 7 hari
5.3
5.3

Hello, I see that your MVP hinges on a robust ingestion layer that can absorb sanctions lists, corporate registries, and ownership data into your existing FastAPI-PostgreSQL architecture with precision. I’ve delivered similar pipelines before, including a sanctions ingestion framework for a compliance vendor and an ownership‑resolution pipeline that consolidated multi‑registry datasets into a unified entity graph. I understand the deeper challenge here isn’t just fetching datasets but ensuring idempotency, version tracking, and consistent normalization so entity resolution produces stable identifiers across ingestion cycles. Weakness in any of these layers creates drift in downstream risk scoring. I’ll implement ingestion jobs with checksum validation, structured loaders into PostgreSQL, and normalization routines aligned with your existing schemas. I’ll add deterministic, semi‑deterministic, and probabilistic resolution backed by OpenSearch scoring, then construct ownership paths for the RealScore workflow. Everything will be orchestrated via Celery with clean logging and retry logic. Before starting, I’ll review the existing repository to confirm data model alignment, pipeline entry points, and orchestration assumptions. I can begin immediately and outline the integration plan once I inspect the repo. Best regards, John allen.
$500 USD dalam 7 hari
3.9
3.9

Atlanta, United States
Kaedah pembayaran disahkan
Ahli sejak Jan 9, 2026
$30-250 USD
$2-8 USD / jam
$750-1500 USD
$750-1500 USD
$10-30 USD
₹12500-37500 INR
$250-750 USD
₹600-1500 INR
$10-30 AUD / jam
$2-8 USD / jam
$250-750 USD
€250-750 EUR
$2-8 USD / jam
₹1500-12500 INR
$250-750 USD
₹75000-150000 INR
₹60000-70000 INR
₹75000-150000 INR
₹1500-12500 INR
₹1250-2500 INR / jam
$2-8 USD / jam
$10-30 USD
$250-750 USD
$1500-3000 USD
$10-3500 USD