
Closed
Posted
Paid on delivery
I need a clean, well-tested Python pipeline that listens to our MongoDB (DocumentDB) product collection, transforms each incoming SKU document, and pushes the result into Apache Solr so the search index is always in lock-step with the source data. What the transform must do • Enrich taxonomy on two fronts: category as well as attribute level. • Keep only English content for now, but structure the code so Spanish or French could be switched on with minimal effort. • Filter out the fields we don’t want surfaced—specifically the SKU number, product description and price—while leaving all other data intact. • Merge duplicates that share the same SKU logic we will provide. Runtime behaviour The service should behave like a sink connector: near-real-time change-stream consumption from MongoDB, conversion, enrichment, de-duplication, then a commit into Solr using the bulk API. Failures must retry gracefully and never poison the queue. Tech stack expectations Python 3.10+, pymongo (or Amazon DocumentDB compatible driver), requests or Solr-py for indexing, and preferably asyncio so we do not block on network calls. Deliverables 1. Production-ready Python source code with clear module separation. 2. Unit tests covering the enrichment, filtering and deduplication logic. 3. A README that explains local setup (Docker Compose for MongoDB + Solr is fine), environment variables, and deployment steps. 4. Sample configuration file showing where taxonomy mappings, language choices and field filters are declared. Acceptance criteria • End-to-end run inserts, updates and deletes in MongoDB and reflects them in Solr within the agreed latency. • All English SKUs appear; non-English documents are ignored. • Filtered fields never reach Solr. • Duplicate SKUs index only once. If you have prior experience wiring MongoDB change streams to Solr or have built similar ETL connectors, I’d love to see a short code snippet or link in your proposal.
Project ID: 40439560
19 proposals
Remote project
Active 3 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
19 freelancers are bidding on average ₹1,046 INR for this job

Hi, I can build a production-ready Python pipeline that listens to MongoDB/DocumentDB change streams, transforms SKU documents, performs enrichment + deduplication, and syncs data into Apache Solr in near real time. I have 10+ years of full-stack development experience building ETL pipelines, async processing systems, API integrations, indexing workflows, and scalable backend services. Your requirements around enrichment, language filtering, retry handling, and Solr bulk indexing align well with systems I’ve developed before. The solution will include: - Async Python 3.10+ architecture - MongoDB/DocumentDB change stream listener - Taxonomy enrichment layer - Configurable language filtering (English-first with extensible i18n structure) - Field exclusion pipeline for SKU, description, and price - Deduplication engine based on shared SKU logic - Bulk Solr indexing with retry/backoff handling - Structured logging and fault-tolerant queue processing - Modular codebase with unit tests - Docker Compose setup for local testing - README + sample config files I’ll ensure inserts, updates, and deletes remain synchronized with Solr while maintaining low-latency processing and safe retry handling. I can start immediately and deliver clean, maintainable, production-grade code. Thanks
₹1,500 INR in 7 days
3.8
3.8

Hi, Your description cuts off midway, but I understand the core need: a clean, well-tested pipeline that listens to your MongoDB/DocumentDB collection and indexes documents into Solr. That's exactly the foundation you need for fast, reliable search. I'd structure this with PyMongo's Change Streams for real-time document capture and batch indexing to Solr via the JSON API. Unit tests isolate extraction logic and indexing separately, using pytest fixtures for both systems to avoid brittle test coupling. The pipeline itself is a single Python script with configurable batch size and error recovery for partial index failures. Before we start, one clarification: is your DocumentDB configured for Change Streams, or do you need a polling-based sync? That answer shapes the whole architecture. I can deliver a working scaffold and test suite within 24 hours of confirmation. Best regards, Val
₹600 INR in 7 days
1.8
1.8

APIE Tech has built production ETL pipelines connecting MongoDB change streams to search indices including Solr and Elasticsearch — this is a strong match for our experience. Our approach: Python 3.10+ asyncio service using pymongo change stream listener for near-real-time CDC from MongoDB/DocumentDB. Each document goes through your transform pipeline: taxonomy enrichment at category + attribute level, English-only filtering (with pluggable locale config for Spanish/French), field exclusion (SKU number, description, price), and SKU-based deduplication logic. Enriched documents commit to Solr via bulk API with retry/backoff on failures. Deliverables as specified: production-ready modular Python source, unit tests for enrichment/filtering/dedup, Docker Compose README, and a sample config file for taxonomy mappings and field filters. We have prior experience wiring MongoDB change streams to Solr — happy to share a code snippet. Can start immediately, deliver in 7 days.
₹1,500 INR in 7 days
0.0
0.0

Hi there, Building resilient change-stream pipelines from DocumentDB to search indexes is my core expertise. As a Python backend engineer, I know that maintaining lock-step synchronization without blocking network calls requires a robust asynchronous architecture to prevent queue poisoning and handle cursor timeouts gracefully. Here is my exact approach for your pipeline: Async Ingestion: I will use motor (the async wrapper for pymongo) and asyncio to listen to the DocumentDB change stream non-blockingly. Config-Driven Transformation: Language gating (English only), field dropping (SKU, description, price), and taxonomy enrichment will be strictly driven by a config.yaml. This guarantees switching on Spanish/French later is a simple configuration tweak, not a code rewrite. Smart Buffering & Solr Commits: The pipeline will buffer deduplicated documents in-memory and flush them to Solr’s bulk API using async HTTP, ensuring high throughput and resilience. I will deliver a production-ready repository complete with PyTest unit tests and a Docker Compose setup (Mongo + Solr) for instant local validation. Let’s connect to discuss your specific deduplication logic! Best regards, Chirag Bisht
₹700 INR in 7 days
0.0
0.0

A seamless integration of change streams from MongoDB to Apache Solr is critical for maintaining real-time accuracy in your product search index. Leveraging Python 3.10+ and asyncio will ensure non-blocking, efficient data processing throughout the pipeline. Implementing robust error handling will guarantee graceful retries without queue poisoning, addressing your key pain point. Delivered in 14 days, you'll receive production-ready code with extensive unit tests for enrichment, filtering, and deduplication logic, along with a clear README for setup. What does success look like for you at the end of this project?
₹925 INR in 14 days
0.0
0.0

Hello, I can build a clean, well-tested Python pipeline to sync your MongoDB/DocumentDB product collection with Apache Solr in near real time. I will create a sink-connector-style service using Python 3.10+, pymongo for change streams, async processing for non-blocking execution, and Solr bulk APIs for indexing. The pipeline will handle inserts, updates, and deletes from MongoDB, transform incoming SKU documents, enrich taxonomy at category and attribute levels, remove restricted fields, de-duplicate SKUs based on your provided logic, and commit the final data into Solr. The solution will be configuration-driven, so English will be enabled for the first release while Spanish/French can be added later with minimal code changes. Fields like SKU number, product description, and price will be strictly filtered before indexing. Deliverables include: Production-ready Python source code with clear modules MongoDB change-stream listener Transformation, enrichment, filtering, and deduplication logic Solr indexing client with retry/error handling Unit tests for enrichment, filtering, and duplicate handling README with Docker Compose setup for MongoDB + Solr Sample config file for taxonomy mappings, languages, and field filters I have experience building Python backend services, data pipelines, API integrations, and database-driven systems with clean documentation and testing. Best regards, Pavitra Srivastava
₹900 INR in 2 days
0.0
0.0

With over seven years of professional experience, including ETL tasks and data pipeline projects, I am well-equipped to take on the task of building your Python data pipeline to connect MongoDB with Apache Solr. I have a deep understanding of and hands-on experience with the tech stack you require: Python 3.10+, pymongo (or Amazon DocumentDB compatible driver), requests or Solr-py for indexing, and asyncio. Additionally, I'm adept at writing clean, well-documented, and tested production-ready code using modular design. When it comes to the runtime behavior you outlined, I understand the importance of reliable near-real-time data consumption without network-blocking. My skill in using asyncio will allow us to handle network calls efficiently. I have developed many pipelines that required similar functionalities: conversion, enrichment, de-duplication with retries in case of failure, and a straightforward user interface for configurations. Do not fear about the project running into any database or code-related issues; I know my way around them. Furthermore, my capabilities as a full-spectrum engineer extend to managing cloud-deployed AI agents—building structured outputs like you expect in Solr—as well as automating manual workflows with intelligent, self-correcting agents—deploying on Docker Compose too. Give me the opportunity to demonstrate these skills tn my work and let's build a high-performance and robust data pipeline for your unique needs!
₹1,050 INR in 7 days
0.0
0.0

Hello, I can build a clean Python ETL pipeline to listen to MongoDB/DocumentDB change streams, transform SKU documents, filter unwanted fields, handle English-only content, merge duplicate SKUs, and index the final data into Apache Solr using bulk API. I will structure the project with separate modules for MongoDB listener, transformation, enrichment, deduplication, Solr indexing, retry handling, and configuration. I can also include unit tests for filtering, enrichment, and duplicate handling, along with a README and sample config file. My planned deliverables: 1. Python 3.10+ source code 2. MongoDB change stream listener 3. Solr bulk indexing integration 4. Field filtering logic for SKU number, description, and price 5. Language-based filtering for English documents 6. Deduplication based on provided SKU logic 7. Unit tests 8. README with Docker Compose setup I can start immediately and share progress step by step.
₹1,050 INR in 7 days
0.0
0.0

"I am a Principal Data Scientist with extensive experience in Python, Data Cleaning, and SQL. I can efficiently handle large datasets and perform advanced data integration tasks while ensuring accuracy and performance. I am dedicated to delivering high-quality results within the specified timeline and can start immediately"I have completed many data cleaning and analysis projects using Python and SQL with 100% accuracy.
₹1,050 INR in 7 days
0.0
0.0

I can surely help you with this task, as it suits my academic background. I’ve worked with Python, MongoDB, and async data pipelines before. I would build this as a clean sink-style service using PyMongo change streams, asyncio, and Solr bulk indexing. The pipeline will handle enrichment, field filtering, duplicate merging, retries, and near real-time sync between MongoDB and Solr. I’ll keep the code modular so adding more languages later is straightforward. You’ll get tested production-ready code, Docker-based setup, sample configs, and clear documentation for deployment and retraining.
₹1,050 INR in 6 days
0.0
0.0

I believe I can bring great value to your project with my expertise. Your need for a clean, professional, and user-friendly Python pipeline that seamlessly integrates MongoDB change streams with Apache Solr, while automating enrichment, filtering, and de-duplication, aligns perfectly with my skills. I specialize in developing robust ETL connectors using Python 3.10+, pymongo, and asyncio for non-blocking operations, ensuring efficient, near-real-time data processing. While I am new to freelancer, I have tons of experience and have done other projects off site that involved complex data transformations and API integrations. I would love to chat more about your project! Regards, Andiswa Ngqika
₹700 INR in 25 days
0.0
0.0

Hi, I can build this Python pipeline cleanly with config-driven transforms, MongoDB/DocumentDB input, and Solr indexing. I would make it idempotent, with batching, retry/error logging, schema validation, and a simple test dataset so you can verify category + attribute enrichment before running it on production data. If DocumentDB change streams are available I can use them; otherwise I can implement reliable scheduled polling with last-updated checkpoints. Deliverables would include the script/package, .env config, README, sample mapping rules, and tests. Please share a sample SKU document and your target Solr schema/core fields so I can align the transform exactly.
₹1,050 INR in 7 days
0.0
0.0

Hi, Resonite Technologies has strong experience building Python-based ETL/data pipelines, MongoDB change-stream consumers, Solr indexing services, and real-time synchronization systems. We can build a production-ready near-real-time MongoDB → Solr sink pipeline with: ✔ MongoDB/DocumentDB change-stream listener ✔ Async Python 3.10+ architecture ✔ Taxonomy enrichment (category + attributes) ✔ English-only filtering with multilingual-ready design ✔ Field filtering (SKU/description/price exclusion) ✔ Duplicate SKU merge logic ✔ Bulk Solr indexing ✔ Retry/error handling without queue poisoning ✔ Modular clean codebase + unit tests ✔ Docker Compose setup + deployment README Suggested stack: Python asyncio, pymongo, requests/Solr-py, Docker, structured config-driven mappings. We understand the requirement for low-latency indexing, resilient retries, configurable enrichment logic, and scalable architecture for future language expansion. Deliverables will include source code, tests, configs, Docker setup, and deployment documentation. Regards, Karthik Resonite Technologies
₹1,500 INR in 7 days
0.0
0.0

Hi, This project aligns closely with my background in building large-scale real-time data pipelines and production-grade data infrastructure. I have 7+ years of experience working on distributed data systems at Uber and JPMorgan, including streaming ingestion pipelines, real-time transformations, scalable data processing systems, and reliability-focused backend infrastructure. My experience includes Kafka, Spark, Python, cloud-native data systems, and production ETL/data lake architectures. Your requirements around: * near-real-time change consumption * transformation and enrichment * deduplication logic * retry handling and operational reliability * clean modular design and testing map very well to the kinds of systems I’ve worked on in production environments. I also have experience working with MongoDB/DynamoDB-style NoSQL systems and event-driven ingestion patterns. While most of my large-scale streaming work has been around Kafka/data platform ecosystems, I’m comfortable designing reliable connector-style services and async processing workflows in Python. For this implementation, I’d focus on: * clean modular pipeline design * configuration-driven enrichment/filtering logic * reliable retry/error handling * async/non-blocking ingestion and indexing * maintainable testing and operational documentation Happy to discuss expected throughput, latency requirements, deployment environment, and the duplicate SKU reconciliation rules in more detail. Best, Pushpendra
₹1,200 INR in 10 days
0.0
0.0

Hi, I can build a clean and production-ready Python pipeline that listens to MongoDB/DocumentDB change streams, transforms SKU documents, applies enrichment and deduplication logic, and indexes data into Apache Solr in near real-time. The pipeline will include: * Async change-stream consumption * Taxonomy enrichment (category + attribute level) * English-only filtering with scalable multilingual structure * SKU deduplication and field filtering * Bulk indexing into Solr with retry/error handling * Unit tests and clear modular architecture I will also provide Docker-based local setup instructions, sample configuration files, and deployment documentation. Strong experience in Python ETL pipelines, MongoDB integrations, async processing, and scalable search indexing workflows using Solr and structured data transformation. Quick question: do you already have the taxonomy mapping and duplicate SKU rules prepared?
₹1,200 INR in 3 days
0.0
0.0

Hello, Your project is very clear and technically well-structured, and I’d be excited to help build this MongoDB → Solr data pipeline professionally. I have experience working with Python backend development, APIs, database integrations, and structured data processing workflows. I understand the importance of building a reliable near-real-time pipeline that is scalable, fault-tolerant, and cleanly organized. What I can deliver: • Production-ready Python 3.10+ pipeline • MongoDB/DocumentDB change-stream listener • SKU transformation and taxonomy enrichment logic • Duplicate merge handling based on provided SKU rules • Solr bulk indexing integration • Async/non-blocking architecture using asyncio • Retry and failure-handling mechanism to avoid queue poisoning • Clean module separation and maintainable code structure • Well-tested and documented implementation I also understand the importance of keeping the pipeline extensible for future multilingual support such as Spanish or French while maintaining efficient filtering and transformation logic. Why choose me? ✔ Clean and readable Python coding style ✔ Strong understanding of backend workflows and APIs ✔ Focus on scalability, performance, and reliability ✔ Quick communication and regular progress updates ✔ Dedicated support and revisions if required I can start immediately and would be happy to discuss the SKU logic, Solr schema expectations, and deployment environment in more detail before beginning development.
₹1,050 INR in 7 days
0.0
0.0

Delhi, India
Member since Nov 24, 2025
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
₹400-750 INR / hour
£2-5 GBP / hour
₹1500-12500 INR
₹12500-37500 INR
₹750-1250 INR / hour
$20000-50000 AUD
₹12500-37500 INR
₹12500-37500 INR
₹12500-37500 INR
$10-30 USD
$10-30 USD
$30-250 USD
min ₹2500 INR / hour
$25-50 USD / hour
₹37500-75000 INR
€8-30 EUR
₹600-1500 INR
₹37500-75000 INR
₹100-400 INR / hour