
Closed
Posted
I need a reliable web-scraping bot that automatically pulls fresh content from a specific news site on a schedule I can adjust. At minimum the script should capture the headline and full article text; if author name, publication date, and embedded image URLs can be extracted too, that’s a welcomed bonus. Build it in Python using a well-supported stack such as Requests/BeautifulSoup, Scrapy, or Selenium—whatever you feel is most robust for handling pagination and occasional layout changes. The bot should: • Navigate through the latest articles section (and subsequent pages if present) • Respect [login to view URL] and reasonable rate-limits • Output clean, de-duplicated data to CSV or JSON and optionally push to a simple SQLite file For acceptance I’ll run the script, point it at the live site, and expect a sample file containing at least 100 recent articles with the agreed fields correctly populated. Include clear setup instructions plus comments in the code so I can tweak XPaths or CSS selectors later if the site redesigns. Let me know your preferred toolset, estimated turnaround, and any clarifying questions about the target domain.
Project ID: 40391278
42 proposals
Remote project
Active 19 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
42 freelancers are bidding on average ₹1,238 INR/hour for this job

Hi there, My approach: I'd use Scrapy as the core framework since it handles pagination natively, respects rate limiting out of the box, and has excellent middleware for rotating user agents if needed. For the parsing layer, I'd pair it with BeautifulSoup for flexibility when selectors inevitably need tweaking after a site redesign. A few things I want to confirm before we lock in the timeline: Which specific news site are you targeting? Different sites have very different structures (some are SPA, some use pagination, some block scrapers aggressively) Do you have any preference on output format—CSV, JSON, or SQLite? Or all three? Is there a particular time of day the articles need to be fetched, or just "regular intervals"? I can deliver a working script within 3-4 days that gives you your 100+ article sample with the bonus fields (author, date, images) included where available. The code will be well-commented with clear sections for updating selectors when the site inevitably changes their HTML. One thing to note: if the target site uses JavaScript heavily to render content, I may need to integrate Selenium as a fallback—though I'd only go that route if the initial Requests approach hits a wall. Let me know the target and I'll advise on the best path forward. Thanks for the details—looking forward to hearing which site you're working with so I can give you a more precise estimate.
₹750 INR in 30 days
7.6
7.6

Your scraper will break the moment that news site changes its DOM structure or implements anti-bot measures like Cloudflare. I've seen this exact scenario kill three projects in the past year because developers hard-coded selectors without building fallback logic. Quick question - does this site use infinite scroll or traditional pagination? And are you planning to run this hourly, daily, or on-demand? The architecture changes completely if you're hitting their servers 24 times a day versus once. Here's the approach: - SCRAPY + ROTATING USER AGENTS: Build a spider with middleware that rotates headers and implements exponential backoff when rate-limited, preventing IP bans that would stop data collection entirely. - XPATH FALLBACK CHAINS: Write primary selectors for headline/body/author but include 2-3 backup patterns so the scraper degrades gracefully instead of crashing when they tweak CSS classes. - SQLITE + HASH DEDUPLICATION: Store article URLs as SHA-256 hashes to detect duplicates in 0.001 seconds instead of scanning the entire database, critical when you're processing thousands of entries. - SELENIUM HEADLESS (conditional): Only spin up a browser if the site requires JavaScript rendering - most news sites serve static HTML, so you'll avoid the 10x performance penalty of running Chrome in the background. I've built 8 production scrapers for media monitoring clients that have run uninterrupted for 18+ months. The difference between a script that works today and one that works next year is error handling and selector resilience. Let's discuss the target domain and your monitoring frequency before I architect the solution - I don't build scrapers that require weekly maintenance.
₹900 INR in 30 days
7.2
7.2

Hello, I'm Asma, Web Developer and Graphic Designer with 10 years of experience working with clients and agencies from around the world. Creative problem solver with a passion for creating visually appealing and user-friendly digital solutions. I love building luxurious brands and designing captivating visual identities. I've worked with clients in lifestyle, property, fashion, hospitality, and luxury sectors. 24/7 Support & Faster Response . #WEBSITE DESIGNING / DEVELOPMENT #WORDPRESS/HTML/JS/CSS/PHP/LARAVEL/SHOPIFY #GRAPHIC DESIGNING #UX/UI #FIGMA #SQUARESPACE #SOCIAL MEDIA MARKETING #PHOTOSHOP/ILLUSTRATOR #GOOGLE ADS #JEWELERY DESIGNER #LOGO DESIGN #BANNER DESIGN #BUSINESS CARD #STATIONARY DESIGN #CD COVER #POWERPOINT PRESENTATION #BOOK COVER #LETTERHEAD DESIGN #3D LOGO #WORDPRESS #WEBSITE PAGE SPEED UP UPTO 95-99 #WEBSITE SEO #FIGMA TO WORDPRESS/HTML/JS/CSS/PHP/LARAVEL #PSD TO WORDPRESS/HTML/JS/CSS/PHP/LARAVEL ...... ETC :)
₹1,000 INR in 1 day
6.4
6.4

Hi there, I have read your project requirement carefully. You need a reliable Python-based web scraping bot that extracts headlines, full article content, and optional metadata (author, date, images) from a news site, with scheduling, pagination handling, and clean structured output. We can build this using Scrapy (preferred for scalability) or Requests + BeautifulSoup, ensuring robust scraping with pagination support, de-duplication, rate limiting, and compliance with robots.txt. The output will be available in CSV/JSON and optional SQLite, with clean, well-commented code so you can easily update selectors if the site changes. A few questions before we proceed: ============================= Can you share the target news site URL for structure analysis? Do you want built-in scheduling (cron/Task Scheduler) or manual execution? Should images be downloaded or only URLs stored? Any proxy/anti-bot handling required for this site? Best Regards, Srashtasoft Team
₹1,000 INR in 40 days
6.4
6.4

Hi, I came across your project "News Site Scraping Bot" and I'm confident I can help you with it. About Me: I'm a agency owner with over 8+ years of experience in PHP, JavaScript. , and I understand exactly what’s needed to deliver high-quality results on time. Why Choose Me? - ✅ Expertise in required Technologies and 1 year post deployment free support - ✅ On-time delivery and excellent communication - ✅ 100% satisfaction guarantee Let’s discuss your project in more detail. I’m available to start immediately and would love to hear more about your goals. Looking forward to working with you! Best regards, Deepak
₹900 INR in 40 days
5.8
5.8

Hi, Lets get connect over a chat. I have more than 9 years of experience in building custom platforms in python. I will walk through to my work samples as well. I am online right now. Thanks Ali
₹1,000 INR in 40 days
5.3
5.3

Hi, My extensive experience over the past 15+ years, during which I've successfully completed more than 150+ projects, makes me a prime candidate for your web-scraping project. My fluency in Python (including with popular libraries like Requests/BeautifulSoup, Scrapy, and Selenium) combined with my deep understanding of data collection from different website structures particularly qualify me for this task. To facilitate easy usage for you in the future, I will include clear setup instructions and comprehensive comments allowing you to tweak XPaths or CSS selectors as needed. Being mindful of site redesigns, I assure you that these won't hamper the scraping process even if the layout of the targeted site undergoes a change. Lets have a chat warm regards Usama Ansari
₹1,000 INR in 40 days
2.6
2.6

Hi, As a Developer with a knack for automation and web scraping, I am confident I can create the perfect scraping bot for your needs. My fluency in multiple automation frameworks like Selenium, BeautifulSoup, Scrapy, as well as my JavaScript and Python skills uniquely qualify me to understand and adapt to the specific demands of your project. With a remarkably keen eye for detail, my proposed solution guarantees data accuracy, ensuring that all desired fields from the target website are effectively captured and outputted for your convenience. I'm extremely familiar with navigating through complex paginated layouts and proficiently handling any incidental layout changes through careful abstractions that won't break under new designs. Regards Akif A
₹1,000 INR in 40 days
1.1
1.1

Hello, I am a Data Engineer specializing in building automated ETL pipelines and resilient data structures. I can deliver a reliable, scheduled news-scraping bot that provides clean, de-duplicated data while remaining easy for you to maintain. Technical Approach: Engine: I will use Python (Scrapy) for its native ability to handle concurrency and rate-limiting. If the site is a Single Page Application (SPA), I will integrate Playwright for robust rendering. Data Integrity: I will implement a SQLite pipeline to ensure de-duplication based on unique article URLs, while also providing CSV/JSON exports. Deep Extraction: The bot will capture headlines, full text, authors, dates, and image URLs. Maintainability: I will modularize CSS/XPath selectors. If the site changes its layout, you’ll only need to update a single config file. Why Me? With experience building Data Warehouses and automating cloud-based pipelines (AWS), I write production-ready code. I will include clear documentation and comments so you can easily adjust the schedule or selectors. Timeline: 3 days. Questions: Does the target site use infinite scroll? Would you like the script to run via a simple local Cron job or a cloud-based trigger? I am ready to deliver a high-quality, professional-grade scraper. Best regards, Antônio Viana Data Engineer | Python Specialist
₹1,000 INR in 40 days
0.0
0.0

Hi there—your project immediately stood out because it aligns closely with the type of automation work I’ve been focusing on recently. Over the past year, I’ve built reliable scraping bots using Python and Selenium to extract structured data from job listing platforms and business directories, handling pagination, dynamic content, and clean data output. I’m comfortable designing scripts that are both robust and easy to maintain when site structures evolve. I’m a strong fit for this project because I prioritize stability and clarity in my builds. Using Selenium, I can reliably navigate dynamic news sites, extract full article content along with metadata, and structure the output into clean JSON/CSV formats with optional MySQL storage. I also design my scripts with modular selectors and clear comments, so adjusting to layout changes is straightforward. Additionally, I’m mindful of rate-limiting and best practices to ensure the scraper runs smoothly without triggering blocks. If you can share the target news site, I’d be happy to review its structure and outline a quick extraction plan before getting started. I can begin immediately and deliver a working version quickly for you to test. Looking forward to your reply so we can get this up and running.
₹800 INR in 40 days
0.0
0.0

Hi, This is a great fit for my experience—I’ve built several Python-based scrapers for news and content-heavy sites with similar requirements (pagination, structured extraction, and change-tolerant parsing) I’d implement this using Python + Beautiful Soup as the core framework.
₹1,250 INR in 40 days
0.0
0.0

Hi, I can build a reliable Python-based scraping bot that extracts fresh articles from your target news site and delivers clean, structured data ready for use. The solution will be tailored depending on the site structure, using Requests + BeautifulSoup for static pages or Scrapy/Selenium if pagination or dynamic loading is required. It will include: * Scraping latest articles with pagination support * Extracting headline + full article text (plus author, date, and image URLs if available) * Clean, de-duplicated output in CSV or JSON * Optional SQLite storage for structured querying * Configurable scheduling so you can control run frequency * Well-documented code so you can easily adjust selectors if the site changes I will also ensure the script is production-ready and can successfully extract at least 100 recent articles as required. Quick question: Does the site require any login or is it fully public without restrictions? I can start immediately and deliver a working version quickly.
₹1,000 INR in 40 days
0.0
0.0

I’m confident I can complete what you’re asking for. The libraries I use will depend on how the site is built, but that can be identified quickly. I estimate it will take around 7 days to complete, and I’m available for regular check-ins to ensure the result matches your expectations and requirements. Best of luck
₹750 INR in 40 days
0.0
0.0

Hello I'm a cse student and I'm looking for a few side gigs to better my skills, and this is a good opportunity for me, and I believe this will be beneficial for both parties since you'll have a product for a affordable price and I'll have a job to work on.
₹750 INR in 15 days
0.0
0.0

News site scrapers are a core part of my Python work : pagination, layout-resilient selectors, deduplication, and clean output are all standard in my builds. My stack for this: Requests + BeautifulSoup for static sites, Selenium if JavaScript rendering is involved. Output to CSV, JSON, and SQLite as requested, with an adjustable schedule via APScheduler or a simple cron-ready entry point. The script will capture headline, full article text, author, publication date, and image URLs where available , clearly commented so you can update any selector yourself if the site redesigns. Deliverables: Clean, documented Python script Sample output file with 100+ articles Setup instructions (install, configure, run) Turnaround: 2–3 days from receiving the target URL. Can you share the target domain now? Some news sites have specific anti-scraping measures that affect the approach, and I'd rather flag that upfront than mid-project.
₹750 INR in 40 days
0.0
0.0

Hi, I can build a robust, production-ready Python scraping bot tailored to your requirements. I’ll use the most reliable approach (Requests/BeautifulSoup or Selenium if needed) to ensure stability even with pagination and minor layout changes. You’ll get: -Clean extraction of headline, full content, and optional metadata (author, date, images) -De-duplicated output (CSV/JSON + optional SQLite) -Rate-limited, robots.txt-aware scraping -Well-structured, documented code for easy future tweaks -Sample output with 100+ recent articles Timeline:1–2 days after finalizing the target site I focus on building scrapers that don’t break easily and are simple to maintain long-term. Share the site and I’ll get started. Best regards, Harshada Redekar
₹750 INR in 30 days
0.0
0.0

Hi, I can build a Python-based scraping bot to extract news articles including headline, full text, date, author, and images. I will use BeautifulSoup/Scrapy to ensure clean and structured data output in CSV or JSON format. I can also handle pagination and make the script easy to update if the website layout changes. I will provide: - Clean and well-commented code - Sample output with recent articles - Easy setup instructions I can start immediately and deliver quickly. Let me know the target website. Thanks
₹800 INR in 40 days
0.0
0.0

Kolkata, India
Member since Apr 22, 2026
₹600-666 INR
$10-30 USD
$30-250 USD
$250-750 USD
€30-250 EUR
£20-250 GBP
$30-250 USD
₹3000-4000 INR
$30-250 USD
$1500-3000 USD
$15-25 USD / hour
$30-250 USD
$750-1500 USD
₹600-1500 INR
$30-250 USD
$2-8 USD / hour
$250-750 USD
$250-750 USD
$1500-3000 USD
₹12500-37500 INR