
Dalam Kemajuan
Disiarkan
Dibayar semasa penghantaran
Python Data Engineer – Web Scraping & Automated Data Pipeline About the Project I'm building Brisbane Unlocked — a local discovery platform that surfaces events, experiences, bars, markets, and hidden gems across Brisbane, Australia. The platform pulls data from 80+ sources including council open data, venue websites, ticketing platforms, and local community organisations. The backend data engine (called BB) is already built and deployed on a Digital Ocean Droplet (Ubuntu 24, Sydney region). It runs on a 6-hour automated cron schedule and currently has 8 working scrapers with 400+ events in the database. What I Need 1. Fix & improve existing scrapers • Fix 2 broken scrapers (Eventbrite, South Bank) • Apply data enrichment to 5 scrapers that return incomplete data 2. Build new scrapers — approximately 60 new sources across 7 categories • Events (~20 sources) — what's on guides, venue websites, arts institutions • Restaurants & Cafes (~8 sources) — local food guides, new openings • Bars & Nightlife (~6 sources) — bar guides, rooftop directories • Experiences (~8 sources) — activity platforms, river cruises, tours • Markets (~8 sources) — farmers markets, artisan markets, night markets • Kids & Family (~4 sources) — family guides, school holiday programs • Community & Grassroots (~5+ sources) — sports clubs, community noticeboards 3. Database migration • Migrate from SQLite to PostgreSQL via Supabase (free tier) • Create a community submissions table for user-submitted listings 4. Data quality • Deduplication, suburb normalisation, category consistency, cancelled event detection Tech Stack • Python 3.12, BeautifulSoup4, requests, lxml (already installed on the Droplet) • Playwright for JavaScript-rendered sites (to be installed as needed) • SQLite now → Supabase PostgreSQL (migration required) • Digital Ocean Droplet — Ubuntu 24.04, Sydney region • No APIs — all data collected by scraping public websites What I Provide • Full SSH access to the Droplet and all existing code • Complete source list with 80+ URLs, priorities, and technical notes • Working HTML prototype showing exactly what the platform looks like — acts as your build brief • Full developer brief documentation • Milestone-based payments — you only get paid as work is delivered and verified Pricing Please quote a fixed price for the full scope, or phase it as follows: • Phase 1: Fix + enrich existing scrapers • Phase 2: New scrapers by category • Phase 3: Database migration + data quality Do not quote per-source. To Apply — Please Answer These 4 Questions 5. Describe a similar scraping pipeline you've built — how many sources, and how did you handle data quality? 6. Have you used Playwright for JavaScript-rendered sites? Give a brief example. 7. Have you worked with Supabase or PostgreSQL? 8. A venue website doesn't use JSON-LD structured data. Walk me through how you'd extract event title, date, image and price from its HTML.
ID Projek: 40316813
39 cadangan
Projek jarak jauh
Aktif 18 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan

Hey, I’ve built a similar pipeline scraping ~70+ sources (events + listings) with mixed static/JS sites, using BeautifulSoup + Playwright, and handled quality with deduping (hash + fuzzy match), location normalization, and fallback parsing when fields are missing. Yeah, I’ve used Playwright a lot—usually for sites rendering via React where I wait for specific selectors, then extract DOM content or intercept network responses when possible to get cleaner data. Worked with PostgreSQL/Supabase as well—designing schemas, handling migrations, and optimizing queries for read-heavy workloads like listings platforms. If a site has no structured data, I inspect DOM patterns—target consistent containers (like event cards), grab title from heading tags, parse dates with regex or nearby labels, extract images from img/src or background styles, and price from text nodes—then clean/standardize everything before storing.
$500 AUD dalam 7 hari
4.6
4.6
39 pekerja bebas membida secara purata $466 AUD untuk pekerjaan ini

⭐⭐⭐⭐⭐ Build Robust Web Scrapers & Data Pipelines for Your Project ❇️ Hi My Friend, I hope you are doing well. I've reviewed your project needs and see you are looking for a Python Data Engineer. You don’t need to look any further; Zohaib is here to help you! My team has completed over 50 similar projects for data scraping and pipeline automation. I will enhance your existing scrapers, create new ones, and ensure data quality, all within your budget. ➡️ Why Me? I have 5 years of experience in Python web scraping and data engineering. My expertise includes building scrapers, data migration, and ensuring data quality. I also have a strong grip on PostgreSQL, Supabase, and other relevant technologies, which enables me to deliver efficient solutions for your project. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. Looking forward to discussing with you in our chat. ➡️ Skills & Experience: ✅ Python 3.12 ✅ Web Scraping ✅ BeautifulSoup4 ✅ Data Enrichment ✅ Database Migration ✅ PostgreSQL ✅ Playwright ✅ Data Quality Assurance ✅ Automation ✅ Error Handling ✅ Data Normalization ✅ API Integration Waiting for your response! Best Regards, Zohaib
$350 AUD dalam 2 hari
8.0
8.0

Hi I’ve built Python scraping pipelines that aggregate 50+ public sources into a normalized event database, with deduplication, location cleanup, date parsing, retry logic, and quality checks to keep listings usable at scale. I’m comfortable fixing fragile scrapers, enriching incomplete outputs, and expanding BB with a modular scraper structure so new sources are easier to maintain over time. I have used Playwright for JavaScript-rendered sites, including event pages where content loaded after hydration, and I handled those by waiting for stable selectors, then extracting rendered DOM safely. I have also worked with PostgreSQL and Supabase for schema design, migration, indexing, and scheduled ingestion workflows. For HTML-only venue pages without JSON-LD, I would map stable selectors for title, date, image, and price, normalize the values with parsing rules, and add fallbacks plus validation when fields are missing or inconsistent. For this project, the main technical challenge is keeping 60+ scrapers reliable while preserving category consistency and reducing duplicates across overlapping Brisbane sources. I would solve that with reusable parser patterns, Playwright only where needed, PostgreSQL migration, and a clear QA layer for suburb normalization, cancelled-event detection, and source health monitoring. Thanks, Hercules
$500 AUD dalam 7 hari
6.7
6.7

Hello, I have experience in building scalable web scraping pipelines and managing data pipelines similar to your requirements. I’m confident I can help you enhance and expand your platform Brisbane Unlocked, by improving the existing scrapers and building new ones to meet your needs. Fix & Improve Existing Scrapers: I’ll fix the two broken scrapers (Eventbrite and South Bank) and ensure they are properly extracting and processing data. I’ll apply data enrichment techniques to 5 existing scrapers to ensure that all scraped data is complete and accurate. Build New Scrapers: I’ll create 60 new scrapers, segmented into categories such as events, restaurants, bars, experiences, and community-related data. I will ensure each scraper handles pagination, error handling, and respects rate limits to prevent blocking from source websites. Database Migration & Data Quality: I will migrate the current data from SQLite to PostgreSQL via Supabase, ensuring a smooth transition and ensuring all data is correctly structured. I will also ensure that data quality is maintained by handling deduplication, suburb normalization, category consistency, and cancellation detection. Looking forward to collaborating on this project! Thanks
$600 AUD dalam 7 hari
6.5
6.5

Hi, I can scale your Brisbane Unlocked data engine from 8 to 68+ sources while stabilizing the existing pipeline. I have extensive experience building large-scale scraping architectures using Python, BeautifulSoup, and Playwright, specifically handling the mix of static and JavaScript-rendered sites you described. Regarding your questions: I previously built an events aggregator scraping 50+ sites, using fuzzy matching for deduplication and confidence scoring for data quality. I regularly use Playwright for dynamic sites, such as interacting with "Load More" buttons on ticketing platforms before parsing. I have successfully migrated multiple projects from SQLite to Supabase PostgreSQL. For sites without JSON-LD, I extract data by traversing semantic HTML5 tags or using structural DOM patterns with regex fallbacks for dates and prices. I will execute this in your requested phases: fixing the broken scrapers first, then rolling out the new categories, and finally migrating to Supabase with robust deduplication logic. I'm ready to start immediately on your Digital Ocean droplet. I also offer FREE post-delivery support to monitor the first few automated cron runs, tweak the new selectors if site layouts shift, and ensure the data enrichment logic is catching all edge cases. Let's discuss the project in more details.
$500 AUD dalam 6 hari
5.8
5.8

Hello. Thanks for your job posting. ⭐Python Data Engineering for Brisbane Unlocked⭐ I'm the developer you're looking for. I can successfully complete your project. Let's chat for a more detailed discussion. Thank you. Maxim
$250 AUD dalam 9 hari
5.4
5.4

Hello, I can scale and stabilize your Brisbane Unlocked pipeline with reliable scraping, structured storage, and strong data quality. I will fix broken scrapers (Eventbrite, South Bank), enrich incomplete sources, and build ~60 new scrapers using BeautifulSoup and Playwright where required. Each scraper will follow a modular structure with logging, retries, and failure handling for your 6-hour cron cycle. I will migrate SQLite to Supabase PostgreSQL with an optimized schema, including a community submissions table. Data quality will be ensured via deduplication, suburb normalization, category consistency, and cancelled event detection. The system will remain lightweight, scalable, and aligned with your DigitalOcean setup. I recommend phased delivery: Phase 1 (fix + enrich), Phase 2 (new scrapers), Phase 3 (migration + quality). Fixed pricing can be finalized after reviewing your sources and codebase. Answers: 5. Built 50+ source pipelines; used hashing, fuzzy matching, validation. 6. Yes; render JS, wait for selectors, parse DOM. 7. Yes; schema design, indexing, migrations. 8. Inspect HTML → extract via selectors → normalize/validate. Client Clarification Questions: 1. Should scraper priority follow your list or traffic impact? 2. Do you need real-time failure alerts or cron logs only? Thanks, Asif
$750 AUD dalam 11 hari
5.4
5.4

I can fix (and enrich) your existing scrapers, add new ones and migrate the scraped data onto a new database. I reckon the second part of the project will take a while due to the number of scrapers involves (i.e. 80+), so I hope you're ok with an estimated 7 day duration. Skills wise, my Python scripting and database experience makes me qualified to carry out this job. I understand the intricacies (not to mention fragility) involved in parsing web content to get the exact information within it, having dealt with scraping tools like BeautifulSoup, amongst others. Here are the answers to your questions: 5. Describe a similar scraping pipeline you've built — how many sources, and how did you handle data quality? Answer: Used Python Selenium to periodically parse a JavaScript based site. Since the element CSS classes changed on every parse, I parse the content based on structure and string matching to get the needed data 6. Have you used Playwright for JavaScript-rendered sites? Give a brief example. Playwright, no. But I have scraped JavaScript sites using Selenium with Python 7. Have you worked with Supabase or PostgreSQL? Yes to both 8. A venue website doesn't use JSON-LD structured data. Walk me through how you'd extract event title, date, image and price from its HTML. Using string pattern matching and looking out for ">" and "</" to get the value for say "Title" etc I can start right away. Looking forward to working on your Python project.
$600 AUD dalam 7 hari
5.6
5.6

Your project is just my project so I will always do my best to meet your all requirements. I checked your project details carefully. With my core skill Python or JavaScript, I have finished Web Scrapting project which extract product name, product price, product date and product selling number from real Online Product selling site and save this to excel or goofle sheet for uploading into client's google driver a few days ago. I have rich experienced in Web Scraping, Data Extraction, Machine Learning (ML), Data processing, using Python and Artificial Intelligence. I have full experience to develop python code for extracting the specific data from web site with python, HTML, CSS and JavaScript. If you give me your project, You can get best result with shortest time and best quality result. I am sure for your project and i can complete your project perfectly on time and with high quality. Please send me your message to discuss more about your project. I am waiting your reply now. Thanks.
$350 AUD dalam 3 hari
5.4
5.4

⭐⭐⭐⭐⭐ ✅Hi there, hope you are doing well! I recently developed a Python-based data pipeline aggregating event and venue information from over 50 local sources, which worked smoothly by maintaining robust scraping routines and ensuring data accuracy. The key to success in this project is maintaining data quality through effective deduplication and consistent categorization. Approach: ⭕ Fix and improve the two broken scrapers for Eventbrite and South Bank quickly. ⭕ Enhance data completeness on five scrapers with enrichment techniques. ⭕ Build scalable scrapers for the 60 new sources across all specified categories. ⭕ Perform seamless migration from SQLite to Supabase PostgreSQL, including designing the community submissions table. ⭕ Implement data cleaning steps: deduplicate, normalize suburbs, enforce category consistency, and mark canceled events. ❓I’d appreciate clarification on the priority or timelines for the three phases of work? ❓Are there any specific performance benchmarks or error rates you aim to achieve with the scrapers? I am confident my experience with Python scraping, PostgreSQL, and automated pipelines will allow me to deliver reliable and scalable data engineering for Brisbane Unlocked. Best regards, Nam
$550 AUD dalam 5 hari
3.8
3.8

With over 7 years of professional experience as a Full-Stack Developer, I am well-versed in Python and PostgreSQL, making me the perfect candidate for your Python data engineering project. My skills extend beyond Python and PostgreSQL; I am also knowledgeable in technologies such as BeautifulSoup4, requests, lxml and Playwright that align with the requirements of this project. To answer your initial questions, I have previously built a scraping pipeline that dealt with similar complexities — parsing vast amounts of data from 30+ sources. I value data quality immensely and thus prioritize thorough deduplication, normalization, consistency checks to ensure the scraped information is reliable. Regarding Playwright for JavaScript-rendered sites, yes, I have hands-on experience in using this tool to scrape data from such sites. I am glad you mentioned the migration to PostgreSQL as it happens to be one of my core competencies. Apart from that, I deliver work on time without compromising quality and believe in constant communication through project updates, ensuring you are always up-to-date with my progress. I bring not only technical expertise but also the ability to think holistically about your project—anticipating needs and providing scalable solutions that align seamlessly with your goals. Given the chance, I will help make your Brisbane Unlocked platform a robust, efficient and comprehensive local discovery tool for all Brisbane residents.
$500 AUD dalam 7 hari
3.5
3.5

Hello, With extensive experience in Python-based web scraping and data pipeline development, I have successfully integrated data from over 50 diverse sources, ensuring high data quality through deduplication, normalization, and validation techniques. I will enhance your existing scrapers, restore the broken ones, and expand your platform with approximately 60 new sources across multiple categories. Could you clarify the preferred timeline for each project phase to better align the deliverables? Thanks, Juan Aponte
$400 AUD dalam 4 hari
3.4
3.4

Hello! I'm excited to see your project — it aligns with my experience. I’ve built scraping pipelines with 50+ sources using Python, BeautifulSoup, and Playwright, including event platforms. I handled data quality with deduplication, location normalization, and validation rules to keep datasets clean and consistent. I’ve used Playwright for JS-heavy sites where content loads dynamically, handling selectors, waits, and pagination efficiently. I also have experience with PostgreSQL and Supabase, including migrations from SQLite and building scheduled pipelines. For sites without structured data, I extract fields by analyzing DOM patterns, targeting consistent selectors for title, date, image, and price, then normalize formats with fallback logic. I can fix your existing scrapers, scale new ones, and complete the database migration with strong data quality controls. Questions: Do you want deduplication applied globally across all sources or per category first? Should cancelled events be detected from source signals or periodic re-checks? Hope we can team up and make this project a success! Thank you for considering my proposal.
$500 AUD dalam 7 hari
3.6
3.6

Hello, I have delivered multi-source scraping systems before, handling many feeds with normalisation, deduping and stable scheduling, which matches this project well. My background includes pipelines with dozens of sources where I guided quality by simple rules, light enrichment and clear consistency checks. I have used Playwright for pages that need rendering and handled PostgreSQL migrations that kept structure clean while adding new tables for community data. I extract core fields from HTML with straightforward parsing, using clear selectors and fallback patterns to keep results reliable. Here is my portfolio https://www.freelancer.com/u/nickyjohnl Regards, Nicky
$250 AUD dalam 5 hari
3.2
3.2

Hello Brisbane Team, I hope you’re well. I’m a solo Python data engineer with a strong track record building robust scraping pipelines and automated data engines. I specialise in turning messy public-site data into clean, query-friendly datasets and reliable ETL flows, focused on accuracy and maintainability. I’ll leverage your existing BB engine, fix the broken scrapers, and architect scalable additions for 60+ new sources across seven categories, with PostgreSQL migration and quality controls baked in. I’ve built end-to-end scraping pipelines for 20+ sources, handling data quality with deduplication, normalization, and anomaly detection, while ensuring graceful error handling and observability. I’ll repair the current Eventbrite and South Bank scrapers, enrich incomplete fields, and design modular scrapers for future growth. Phase 1 covers fixing and enriching; Phase 2 adds new sources; Phase 3 migrates to Postgres and implements dedupe rules and parity checks. I’ll deliver with a clear build brief, code, and tests. Let’s align on priorities and milestones, then I’ll deliver quickly and reliably. Best regards, Billy Bryan
$450 AUD dalam 5 hari
3.3
3.3

HELLO, I CHECKED YOUR REQUIREMENT CLEARLY AND UNDERSTAND YOU NEED A SCALABLE PYTHON DATA ENGINEERING PIPELINE FOR WEB SCRAPING, DATA ENRICHMENT, AND POSTGRES MIGRATION FOR BRISBANE UNLOCKED. I HAVE 8+ YEARS EXPERIENCE IN PYTHON, WEB SCRAPING, DATA PIPELINES, PLAYWRIGHT, POSTGRESQL, AND DATA QUALITY ENGINEERING. <<--------What I’ll do:------->> • Fix broken scrapers (Eventbrite, South Bank) & enrich incomplete datasets • Build 60+ scalable scrapers across all categories using BS4 + Playwright • Implement deduplication, suburb normalization, and event validation • Migrate SQLite → Supabase PostgreSQL with optimized schema • Add community submissions table + cron-stable pipeline improvements <<--------Delivery:------->> • Fully automated scraping engine (6-hour cron optimized) • Clean, normalized, deduplicated dataset • PostgreSQL (Supabase) migration completed • Documentation + maintainable modular scraper architecture ANSWERS: Built 100+ source scraping pipeline with deduplication + NLP-based cleaning Yes—used Playwright for dynamic event sites (scroll/load + DOM extraction) Yes—designed schemas, indexing, and migrations in PostgreSQL/Supabase Parse DOM via selectors → extract title/date/image/price → clean + validate via rules LET’S DISCUSS YOUR PROJECT IN DETAIL — LOOKING FORWARD TO YOUR RESPONSE. THANKS
$250 AUD dalam 7 hari
3.2
3.2

Hello, I have experience with Python for web scraping and automated data pipelines, similar to your Brisbane Unlocked project. In a recent e-commerce platform, I implemented dynamic scrapers that adapt to website changes, ensuring continuous data collection, and developed an automated pipeline that enriches product data from multiple sources before storing it in a database. For your project, I could fix the broken Eventbrite and South Bank scrapers by implementing error handling and retry mechanisms. Additionally, I can enhance data completeness by applying data enrichment techniques using external APIs for the 5 scrapers with incomplete data. Let's discuss!
$450 AUD dalam 5 hari
2.1
2.1

Hi, This is a well-structured data pipeline, and I like that you’ve already got the backend running — this is more about stability, scalability, and data quality than starting from scratch. Given the scope, I’d recommend starting with Phase 1 (fixing and improving existing scrapers + data enrichment) to build a reliable base before expanding further. Here’s how I’d approach it: • Debug and stabilize broken scrapers • Improve parsing for incomplete or inconsistent data sources • Add structured enrichment and fallback handling • Ensure clean, consistent outputs for downstream use I’ve built scrapers for both static and JavaScript-heavy websites using browser automation tools (Playwright/Selenium), handling dynamic content, lazy loading, and structured extraction. I’m also comfortable with your stack (Python, BeautifulSoup, requests, lxml) and working directly on a deployed Droplet environment. **Answers to your questions:** 1. I’ve worked on scraping pipelines with multiple sources, focusing on reliability using retry logic, structured parsing, and data validation. 2. Yes I use Playwright/Selenium for JS-heavy sites if needed. 3. I have experience with PostgreSQL and can handle migration from SQLite, including schema setup and data transfer. 4. For JSON-LD, I extract script tags, parse the JSON, and safely map fields like title, date, image, and price while handling variations. Happy to start with Phase 1 and deliver a stable foundation we can build on. Regards, Shreyas
$350 AUD dalam 6 hari
1.4
1.4

With a passion for problem-solving, keen eye for detail, and a mastery of Python data engineering, I believe I am the perfect fit for your Brisbane Unlocked project. Overseeing the development of various web applications and automation systems, I have built numerous scraping pipelines, some even to the tune of 100+ sources! For instance, when it comes to data enrichment and quality control, I implement rigorous processes that include deduplication techniques, data normalization, and automated event cancellation detection. As a connoisseur of all things Pythonic with particular expertise in web scraping using powerful libraries like BeautifulSoup4 and requests, your project aligns seamlessly with my skills repertoire. While I haven't specifically used Playwright, my aptitude at adapting to new technologies means I would have no qualms incorporating it into our workflow if required. In terms of database management and migrations, I have ample experience with both Supabase and PostgreSQL that will ensure your transition is efficient and seamless.
$750 AUD dalam 3 hari
1.4
1.4

With my extensive experience in API development and Python programming, I am confident that I can meet and exceed your expectations for the job of enhancing your Brisbane Unlocked data engine. My previous projects, including a scrapers pipeline tasked with handling over 150 sources simultaneously, have shaped my ability to not only build but also maintain clean, efficient codes - while approaching data quality with precision. I understand the value of real-time data in a dynamic environment like yours and know exactly how to transform raw HTML data into crucial event information like title, date, image, and price. While I haven't used Playwright for JavaScript-rendered sites specifically, I am always excited about incorporating new technologies into my skillset. Therefore, getting equipped via installing the Plawright won't be an issue. Furthermore, having worked extensively with both Supabase and PostgreSQL enhances my ability to comfortably handle the required database migration as part of your project phases. Seamless code migrations, particularly Urbanscraper's SQLite to Supabase PostgreSQL is something I've successfully achieved in past engagements. Atting detail and proposals on work progression makes me confident that our working relationship will be characterized by timely بuggestion submission and-results – oriented milestone delivery. I'm ready to start our conversation on chat or video call – let's transform your great idea into an outstanding product, together!
$500 AUD dalam 7 hari
0.6
0.6

Hello, Greetings , Good morning! I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best quality and highest performance at lowest price. I can complete your project on time and your will experience great satisfaction with me. I’m well versed in React/Redux, Angular JS, Node JS, Ruby on Rails, html/css as well as javascript and jquery. I have rich experienced in Data Processing, Scrapy, Data Collection, SQLite, PostgreSQL, BeautifulSoup, Python, Web Scraping, API Development and Data Mining. For more information about me, please refer to my portfolios. I’m ready to discuss your project and start immediately. Looking forward to hearing you back and discussing all details.. Please respond at your earliest convenience
$555 AUD dalam 2 hari
0.0
0.0

Brisbane, Australia
Kaedah pembayaran disahkan
Ahli sejak Mac 22, 2026
$8-15 USD / jam
$15-25 USD / jam
₹12500-37500 INR
$250-750 USD
₹600-1500 INR
$15-25 USD / jam
₹1500-12500 INR
₹1500-12500 INR
$10000-20000 USD
$750-1500 CAD
$8-15 USD / jam
₹12500-37500 INR
₹100-400 INR / jam
$250-750 USD
$30-250 USD
€250-750 EUR
₹12500-37500 INR
₹12500-37500 INR
$250-750 USD
$250-750 USD