
Dalam Kemajuan
Disiarkan
Overview I run an SMS platform built with: • Node.js / TypeScript • Supabase (Postgres) • Twilio (Programmable Messaging) We have an existing campaign sending system that is well-documented architecturally, but not working correctly in production. I need a developer to debug and stabilize the existing system, not rebuild it. ⸻ The Problem (Real Production Behavior) When sending a campaign to ~14,000 recipients: • ~2,500 messages successfully send • ~6,800 messages fail with internal system errors (not Twilio errors) • ~5,500 messages remain stuck in the queue for days This is consistent and reproducible. ⸻ Important Context (Architecture) Our system: • Uses a Postgres-backed queue (message_logs table) • Has multiple status states (pending, processing, ready, reserved, retry_scheduled, etc.) • Has two pipelines: • v1: polling-based queue worker • v2: scheduler + admission + leased batch workers • Uses: • row claiming (FOR UPDATE SKIP LOCKED) • leasing (reserved_at / reserved_until) • retry scheduling • pacing and rate limits • Twilio send + status polling We also support: • large campaigns (10k+ recipients) • windowed dispatch (partial materialization) • retries and delivery reconciliation On paper, the system is solid. In practice, it is not behaving correctly. ⸻ What I Actually Need I need someone to: 1. Audit the real pipeline vs expected behavior You will: • trace how message_logs rows move through statuses • verify: • claim logic • lease/reclaim logic • retry handling • pipeline selection (v1 vs v2) • identify where rows: • get stuck • error internally • stop progressing ⸻ 2. Identify root cause(s) Based on current symptoms, likely areas include: • rows stuck in intermediate states (processing, ready, reserved, etc.) • lease reclaim not working correctly • retry scheduling broken or incomplete • v1/v2 pipeline interaction issues • worker crashes or failures after claim • incorrect status transitions • broken completion or backlog counting • queue starvation or over-throttling You must confirm the actual root cause, not guess. ⸻ 3. Fix the sending pipeline After identifying issues, implement fixes so that: • campaigns process continuously until completion • rows do not remain stuck • failures are properly classified • retries behave correctly • reclaim logic works reliably • no rows sit idle for long periods ⸻ 4. Implement controlled batching + pacing We want stable throughput such that: • campaigns send in controlled batches (~500–1500 at a time) • Twilio is not overwhelmed • system does not create backlog pileups • throughput is steady and predictable Important: This should be implemented properly with rate limiting and worker behavior, not hardcoded batch loops. ⸻ 5. Improve observability (very important) Right now, it is hard to see what’s happening. Add clear logging/metrics for: • rows claimed per cycle • rows sent / failed / retried • rows stuck in each status • lease/reclaim behavior • retry scheduling • Twilio send attempts vs successes • internal errors (with real root cause, not generic) ⸻ 6. Validate the fix You must: • test with large campaigns (or simulation) • demonstrate: • no large backlog remains stuck • rows flow correctly through states • campaigns complete fully • provide logs or evidence ⸻ 7. Document what changed Provide a short write-up: • root cause(s) • what was broken • what you fixed • what to monitor going forward ⸻ What This Is NOT • Not a full rewrite • Not “let’s switch to Redis/BullMQ” • Not just adding batching on top of broken logic This is a debug + stabilize the existing system project. ⸻ Ideal Candidate You have experience with: • Twilio SMS / messaging systems • Postgres-backed queues or job systems • status-based pipelines / workers • retry + leasing + queue recovery logic • debugging real production systems Bonus: • worked on bulk messaging / notification systems ⸻ Required Questions (must answer) 1. Have you worked with Twilio messaging? What did you build or debug? 2. Have you worked with Postgres-based queues (not Redis)? 3. If rows are: • partially sent • partially failing internally • partially stuck in queue what would you inspect first? 4. How would you debug rows stuck in a leased/reserved state? 5. What logs or metrics would you add immediately? ⸻ Project Structure (Suggested) • Milestone 1: Audit + root cause analysis • Milestone 2: Fix pipeline + batching • Milestone 3: Validation + documentation ⸻ Blunt Summary The system is architecturally complex but operationally unreliable. I need someone who can: • read an existing system • find where reality diverges from design • fix it so campaigns actually send end-to-end without getting stuck
ID Projek: 40352549
92 cadangan
Projek jarak jauh
Aktif 11 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan

I'm prepared to delve into your SMS platform built with Node.js/TypeScript, Supabase, and Twilio. By thoroughly auditing the pipeline, I'll identify issues with status transitions, lease handling, and stuck rows. Confirming the root causes, I'll implement fixes to ensure steady campaign completion without backlog or row idleness. Enhanced observability will be achieved by adding detailed logging and metrics. I'll test fixes with large campaigns to validate consistent message flow and provide comprehensive documentation. When can we start? I can have something to show you within 24 hours.
$9 USD dalam 40 hari
0.0
0.0
92 pekerja bebas membida secara purata $17 USD/jam untuk pekerjaan ini

Hello, As an engineer at Live Experts, I have deep experience across multiple facets of software development, including proficiency in languages and platforms like Node.js, Postgres, and Twilio--all of which will be central to your project. In my previous roles, I've debugged and stabilized various complex systems and architectures to ensure they perform as expected in production. In fact, one of my most notable achievements was optimising a client's SMS platform which resulted in a 40% increase in throughput and virtually eliminated stuck queue issues like yours. I genuinely understand the frustration caused by inconsistent production behaviors in an otherwise well-documented system. So, instead of rebuilding it from scratch, my approach aligns with the essence of your project - debugging and stabilizing the existing system. Throughout my career, I've honed my skills precisely around auditing pipeline mechanisms, identifying root causes, and successfully implementing stable fixes ensuring uninterrupted campaign processing. Additionally v1/v2 pipeline interactions remind me of a similar issue where my solution involved re-designing the architecture's flow so as to gain more control over each message status state. One thing that sets me apart from other freelancers is the depth of emphasis I place on observability. My understanding goes beyond mere implementation; I firmly believe that proper logging/metrics can proactively prevent Thanks!
$50 USD dalam 1395 hari
6.8
6.8

This looks like a great fit, I will audit your message_logs status transitions, trace where rows stall between `processing`, `reserved`, and `ready`, and fix the claim/lease/reclaim logic so campaigns complete fully. First thing I will inspect is your `FOR UPDATE SKIP LOCKED` query alongside `reserved_until` expiry — stuck rows usually mean the reclaim window never fires or conflicts with the v1/v2 pipeline selection. Questions: 1) Are v1 and v2 workers running simultaneously on the same rows? 2) Do you have logs showing which status rows hold when they stall? Looking forward to talking through the details. Kamran
$18 USD dalam 40 hari
6.4
6.4

Hello, I understand the complexity of your SMS platform using Node.js, Supabase, and Twilio, and I have experience debugging production issues in similar environments. I can thoroughly audit your pipeline, trace message status flows, and identify root causes of stuck messages and internal errors without rebuilding the system. I will implement fixes to ensure smooth campaign processing including proper lease and retry logic, controlled batching with pacing, and improved observability through detailed logging and metrics. I will also validate the system with large campaigns and provide clear documentation on what was fixed and what to monitor going forward. Thanks, Teo
$25 USD dalam 35 hari
6.0
6.0

Hello there, I am a skilled developer with expertise in Node.js/TypeScript, Supabase (Postgres), and Twilio, and I am excited about the opportunity to help debug and stabilize your existing SMS campaign sending system. I understand the complexity of the issue you are facing with messages failing to send and getting stuck in the queue, and I am confident in my ability to troubleshoot and resolve these production behavior issues while maintaining the integrity of your current system architecture. Regards, Yogesh Kumar
$9 USD dalam 34 hari
5.7
5.7

Hello, I’ve carefully reviewed your project and am excited about the opportunity to work with you. With 7 years of experience in Node.js, TypeScript, Postgres backed queues, and Twilio messaging systems, I specialize in stabilizing complex pipelines and restoring predictable throughput. I am confident I can solve your stuck queue, internal failures, and pipeline inconsistency efficiently and effectively. Here’s my approach: First I will audit the full pipeline, trace message state transitions, verify claim logic, leasing and retry flows, and pinpoint where rows stop progressing. Next I will identify root causes inside v1 and v2 workers, resolve broken transitions, fix lease reclaim, ensure retries work, and eliminate queue starvation. I am available to start immediately and aim to deliver the full audit and first fixes within 3 days. Additional instructions or notes optional: I will enhance observability with clear metrics and logs and validate stability with simulated large campaigns. I will document root causes, fixes, and monitoring guidelines. Thanks, Jushua
$20 USD dalam 27 hari
5.1
5.1

As a leader in the software development industry, STR Softwares LLP is primed to provide you with the necessary expertise to debug and stabilize your existing SMS sending pipeline. We have a decade of experience working with Python and database management skills that are directly transferable for scanning, tracing, and identifying any bugs within your Postgres-backed queue system. What sets us apart from other contenders is our proficiency in managing high-intensity processes and crucial debugging abilities. We understand the challenges of handling large-scale campaigns, and we’re experienced in tailoring solutions for stable throughput. Moreover, we can implement controlled batching and pacing for you, ensuring that your system never overwhelms Twilio while maintaining system stability. Lastly, as a team that prides itself on excellent communication and project management, you will be continuously updated on our progress. Our dedication to providing 24/7 communication ensures that you will never have to wonder about what's happening with your pipeline during the project. We look forward to working together to transform the current inconsistencies into a streamlined, highly-functional system poised for long term success.
$12 USD dalam 40 hari
4.8
4.8

I have reviewed the details of your SMS sending pipeline project and am confident in my ability to debug and stabilize the existing system. With over a decade of experience in Debugging, Node.js, PostgreSQL, and Software Engineering, I have successfully resolved similar issues in the past. By conducting a thorough audit of the pipeline, identifying root causes, and implementing targeted fixes, I will ensure continuous campaign processing, eliminate message queue bottlenecks, and enhance retry handling. Additionally, I will introduce controlled batching and pacing mechanisms for stable throughput and improved observability through detailed logging and metrics. I am well-versed in Twilio messaging systems, Postgres-backed queues, and status-based pipelines, making me the ideal candidate for this project. Let's discuss how I can assist in resolving these operational challenges and ensuring seamless campaign delivery.
$12 USD dalam 40 hari
4.8
4.8

Hello, your outline of the Twilio + Supabase pipeline issues makes it clear the failures aren’t random, your leased states, reclaim logic, and v1/v2 pipeline interaction are likely drifting out of sync under heavy load. I’ve debugged similar high‑volume messaging systems where queue states stalled due to incorrectly reclaimed locks and inconsistent status transitions. I’ve previously stabilized a Postgres‑backed SMS dispatcher for 30k+ recipients by fixing flawed claim logic and adding proper pacing so throughput stayed consistent without backlog buildup. The deeper challenge here is not the Twilio send itself but the lifecycle of message_logs rows, specifically how transitions between pending→processing→reserved or retry_scheduled fail under concurrency. A junior developer would miss how v1 and v2 workers compete for the same rows and leave partial batches orphaned. I will trace row movement through each state, validate lease expiry, instrument claim/reclaim behavior, and patch status transitions so workers progress continuously. I’ll also implement controlled pacing and observability so failures become diagnosable instead of silent. Before starting, I need clarity on your current worker deployment environment and concurrency levels. Best regards, John allen.
$15 USD dalam 36 hari
4.5
4.5

Hi there, I have read your project requirement. You need an experienced developer to audit, debug, and stabilize your existing SMS campaign system built with Node.js, TypeScript, Supabase (Postgres), and Twilio—focusing on fixing queue inconsistencies, resolving stuck messages, correcting retry and lease logic, and ensuring campaigns complete reliably with proper batching and observability. We have strong experience working with Twilio and Postgres-based queue systems. We can deeply analyze your current pipeline (v1 and v2), identify real root causes behind stuck and failed messages, fix claim/lease/retry flows, and implement controlled batching with proper rate limiting. We also enhance logging and metrics so system behavior is clearly visible and predictable. A few quick questions: =================== Are both v1 and v2 pipelines currently active in production simultaneously? Do you have centralized logging/monitoring or only application logs? What is your current Twilio throughput/rate limit configuration? Are workers running as multiple instances or a single process? Best Regards, Srashtasoft Team
$12 USD dalam 40 hari
4.7
4.7

Hi, I would love to help. I have reviewed your project and noticed that it is very similar to a task I completed two months ago. I am a skilled freelancer with 6+ years of experience in PostgreSQL, Node.js, Software Development and I can deliver the results as quickly as possible. You can visit my profile to check my latest work and recent reviews. Connect in chat to discuss details and next steps. Talk soon.
$15 USD dalam 40 hari
3.9
3.9

Hi there! You are running a Postgres-backed Twilio SMS system, and the real challenge is stopping messages from getting stuck in intermediate states while ensuring retries, leases, and pipelines run smoothly — that is exactly where most messaging systems fail at scale. I recently debugged a 12k+ recipient SMS campaign system where rows were stuck due to lease reclaim issues and broken retry logic; by fixing pipeline state transitions and adding observability, we achieved 100% delivery with predictable throughput. I will audit your current pipelines, identify root causes, implement reliable batching and pacing, and improve logging so campaigns flow end-to-end without blockage. Check our work: https://www.freelancer.com/u/ayesha86664 Do you have any metrics on how many rows fail versus remain stuck per pipeline (v1 vs v2) so I can target the critical path first? I am ready to start — just say the word. Best Regards, Ayesha
$10 USD dalam 40 hari
4.0
4.0

Hi there. Which pipeline is currently active in production, v1 polling worker or v2 leased batch system, or both running together? When rows get stuck in reserved or processing, does reserved_until expire correctly and get reclaimed, or stay locked forever? This is a classic production queue issue and the architecture is already strong. Focus should be on tracing real state transitions, fixing lease and retry flow, and stabilizing worker behavior with controlled batching. Worked on similar bulk messaging systems with Node.js, Postgres queues, and Twilio where partial sends and stuck jobs happened under load. The main issue was broken lease reclaim and workers failing after claim without proper state recovery. That was solved by auditing status transitions, fixing reclaim logic with SKIP LOCKED, adding retry guards, and introducing proper rate control with metrics. Strong background in backend systems and security auditing helps debug real production failures deeply. Ready to start with audit and root cause immediately Best, Ivan
$15 USD dalam 40 hari
3.9
3.9

Hey there I’ve been working with Node.js, TypeScript, PostgreSQL and distributed worker systems for over 7+ years, and I love helping stabilize complex production systems like yours. I’m confident I can debug your SMS pipeline, identify why messages are stuck or failing, and make sure campaigns process smoothly end-to-end. I’ve gone through your architecture and requirements carefully, and I already have a strong idea of where issues like leasing, retry handling, or pipeline conflicts may be happening—but I have a couple of quick questions to validate assumptions before starting. I’m excited to work with you and make this system stable, observable, and scalable. Talk soon, Pavlo
$15 USD dalam 30 hari
3.8
3.8

This looks like a great fit, I will audit your Twilio + Supabase + Node.js sending pipeline, trace how message_logs rows move through statuses in production, identify exactly where rows get stuck, fail internally, or stop progressing, then fix the pipeline so campaigns process to completion without backlog. I will implement proper batched pacing with rate limiting, add observability logging across the full lifecycle, validate with a large campaign test, and document every root cause and fix. To answer your required questions: 1) I have worked with Twilio Programmable Messaging for bulk notification systems including delivery reconciliation and status callback handling. 2) Yes — I have built and debugged Postgres-backed job queues using FOR UPDATE SKIP LOCKED, which is exactly what your system uses. 3) For partially sent/failing/stuck rows, I would first query message_logs grouped by status with timestamps to find where rows accumulate, then check if the claim query and lease reclaim are racing or if workers are crashing after claiming but before updating status. 4) For stuck leased rows, I would check reserved_until values against current time to see if the reclaim window is too long or if the reclaim worker is not running on schedule. 5) Immediate metrics: rows per status per minute, claim-to-send latency, lease age distribution, retry count histogram, and Twilio API response codes per batch. Looking forward to discussing further.
$12 USD dalam 40 hari
3.8
3.8

Hello, The primary challenge is diagnosing and rectifying the root causes of message failures and stuck rows in the existing SMS campaign system. I will conduct a thorough audit of the message_logs table to trace the state transitions and identify where rows get stuck or fail. This includes validating the claim, lease, and retry logic across both pipeline versions. After pinpointing the issues, I will implement fixes to ensure continuous processing of campaigns and enhance the reclaim logic. Controlled batching with appropriate rate limiting will be established, ensuring no backlog forms and throughput remains stable. Finally, I will enhance observability by integrating comprehensive logging and metrics for all critical operations. Deliverables include a fixed and validated sending pipeline, detailed documentation of the changes made, and metrics to monitor future performance. My background includes extensive work with Twilio and Postgres-backed job systems, allowing me to effectively resolve this issue. I can start immediately. Best Regards.
$12 USD dalam 40 hari
3.0
3.0

Hello, The primary engineering challenge here involves identifying the discrepancies between the expected and actual behaviors of the message processing pipeline. Another concern is ensuring that rows do not get stuck in intermediate states and that retry mechanisms function correctly under load. Could you clarify how the existing logging currently captures failures versus successes? What metrics are you currently monitoring for the message queues, and are there specific thresholds that trigger alerts? Additionally, how are the v1 and v2 pipelines currently interacting, and what specific metrics would you find most useful to improve observability? I am ready to assist in stabilizing your SMS platform and ensuring reliable campaign processing.
$8 USD dalam 40 hari
3.1
3.1

⚡ I'm ready to start! ⚡ I’ve worked on Twilio-based messaging systems and debugged production pipelines where Node.js workers interact with Postgres queues using leasing, retries, and SKIP LOCKED patterns. In one case, I stabilized a high-volume notification system where jobs were getting stuck between reserved and retry states due to lease expiry and worker failure edge cases. Given your setup with Twilio and Supabase, the symptoms you’re seeing point to breakdowns in state transitions, lease recovery, or worker lifecycle rather than external delivery issues. I’d begin by tracing real row lifecycles across both pipelines, validating claim, lease, and retry flows under load, and identifying exactly where progression halts or diverges from expected behavior. From there I’d fix the underlying transition or recovery issues, implement proper pacing at the worker level, and add clear metrics around queue states, throughput, and failure types so behavior is visible and predictable. I communicate directly, share findings as I go, and will back every fix with evidence from controlled tests so you can trust the system is actually stable. I would be glad to work with you on this project. Thanks.
$12 USD dalam 40 hari
3.0
3.0

Hello, This is exactly the kind of production debugging and stabilization work I specialize in. With 11+ years in backend systems (Node.js, Postgres, queues, APIs), I’ve fixed similar high-volume messaging pipelines with stuck states, lease issues, and inconsistent throughput. Relevant Experience: Twilio SMS systems (bulk campaigns, delivery tracking, retries) Postgres-based queues using FOR UPDATE SKIP LOCKED + leasing Debugging distributed workers, retry pipelines, and state machines What I’d Inspect First: Status transitions in message_logs (invalid/terminal loops) Lease expiry & reclaim logic (reserved_until not releasing) Worker failures after claim (rows stuck in processing/reserved) Retry scheduler gaps or misclassified failures v1 vs v2 pipeline conflicts causing starvation Debug Approach: Trace row lifecycle end-to-end Add targeted logs for claim → send → retry → completion Identify dead states and broken transitions Validate throughput vs backlog vs rate limits Fix Plan: Repair state machine + reclaim logic Stabilize retry handling Implement controlled batching with proper pacing (not hacks) Add observability (status counts, lease metrics, error classification) Key Metrics I’d Add: Rows per status, claim rate, stuck durations Lease reclaim success/failure Twilio attempts vs internal failures Retry queue health I focus on fixing root causes—not surface symptoms. Ready to start with audit immediately. Best regards, Jagrati.
$8 USD dalam 40 hari
2.4
2.4

Hello, I’ve debugged high-volume Twilio SMS pipelines on Node/TS with Postgres queues (SKIP LOCKED, leasing, retries), stabilizing systems that stalled under load. For your case, I’ll trace message_logs state transitions end-to-end, validate claim/lease/reclaim and v1/v2 interaction, pinpoint where rows stall, then fix transitions, retry scheduling, and worker recovery. I’ll implement proper rate-limited batching (500–1500), add observability (claims/sent/failed/stuck/leases), and validate with large runs. (1) yes—built/debugged bulk SMS senders; (2) yes—PG queues; (3) inspect state transitions, leases, worker failures; (4) check expired leases + reclaim queries; (5) add per-state counts, claim/success/fail metrics, lease timers, error traces. Oscar
$12 USD dalam 40 hari
2.2
2.2

Hello, I can help debug and stabilize your existing Node.js/TypeScript SMS campaign system with Postgres and Twilio. I’ll audit the pipeline to trace why messages fail or get stuck, check claim, lease/reclaim, retry logic, and pipeline behavior, and identify root causes. Then I’ll fix the system so campaigns run end-to-end, rows don’t remain stuck, retries work properly, and controlled batching (~500–1500 messages) ensures stable throughput without overwhelming Twilio. I’ll also add clear logging/metrics for rows claimed, sent, failed, retried, and stuck, so you can monitor campaigns effectively. Finally, I’ll validate with large campaigns and provide documentation outlining fixes, root causes, and monitoring guidance. Estimated budget: $300–$450 USD, phased in milestones for audit, fixes, and validation. I have experience with Twilio, Postgres-backed queues, status pipelines, and bulk messaging systems. Ready to start immediately and ensure your campaigns run reliably.
$12 USD dalam 40 hari
2.3
2.3

Leesburg, United States
Kaedah pembayaran disahkan
Ahli sejak Ogo 11, 2022
$250-750 USD
$10-30 USD
$30-250 USD
$8-15 USD / jam
$2-8 USD / jam
$30-250 USD
₹1500-12500 INR
£1500-3000 GBP
$5000-10000 USD
$250-750 USD
₹12500-37500 INR
$15-25 USD / jam
$250-750 USD
₹10000-15000 INR
₹12500-37500 INR
$30-250 USD
$15-25 USD / jam
₹600-1500 INR
$30-250 USD
$20-30 USD / jam
$1500-3000 USD
$10-30 USD
₹1500-12500 INR
$250-750 USD
$15-25 AUD / jam