
Closed
Posted
We are looking for an experienced Senior Data Scientist for a short-term, high-impact project involving LLM evaluation and advanced data workflows. The Essentials Duration: 1–2 months (potential for extension). Commitment: 20+ hours per week (flexible hours, completed on your own schedule). Context: Supplemental income / "Side-gig" friendly. Fully remote Rate : $200 per task completed. (Task durations = 1-3 hours max) What You’ll Do Design and run A/B experiments and evaluate model behavior. Build/audit data workflows and pipelines (dbt, SQL). Analyze datasets and communicate findings for data-driven decisions. Work with LLM benchmarking and agentic coding workflows. What You Need 4+ years professional experience in DS, ML, or Data Engineering. Expert Python (pandas, NumPy, scikit-learn) and SQL. Proven ability to diagnose ML failure modes and improve model quality. Familiarity with cloud warehouses (Snowflake, BigQuery, or Redshift). Why this role? This is a flexible, remote engagement perfect for those looking to contribute to cutting-edge AI research and work with top-tier industry labs without the commitment of a full-time product role.
Project ID: 40344039
73 proposals
Remote project
Active 15 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
73 freelancers are bidding on average $53 USD/hour for this job

I am a Data Scientist with over 4 years of professional experience in ML and Data Engineering, focusing on advanced data workflows and model evaluations. My expertise in Python, using libraries such as pandas, NumPy, and scikit-learn, aligns well with your project requirements. I am adept at SQL for managing and analyzing datasets, and have practical experience with cloud data warehouses like Snowflake and BigQuery to support data-driven decisions. In previous roles, I have designed and executed A/B experiments, evaluated model behaviors, and optimized ML workflows. My work includes building and auditing robust data pipelines, leveraging tools such as dbt for transformation, and ensuring quality through diagnosing ML failure modes. I am familiar with agentic coding workflows and LLM benchmarking, which are crucial for your high-impact project. I am interested in learning more about the specific objectives and LLM applications you are targeting. I am available for further discussion at your convenience to explore how I can contribute effectively. Thank you.
$50 USD in 40 days
8.4
8.4

⭐⭐⭐⭐⭐ Experienced Senior Data Scientist for LLM Evaluation & Data Workflows ❇️ Hi My Friend, I hope you're doing well. I've reviewed your project requirements and see you are looking for an experienced Senior Data Scientist. Look no further; Zohaib is here to help you! My team has successfully completed 50+ similar projects in data science. I will create and run A/B experiments, audit data workflows, and analyze datasets to provide clear insights. ➡️ Why Me? I can easily manage your data science needs as I have over 4 years of experience in data science, machine learning, and data engineering. My expertise includes Python with libraries like pandas and NumPy, as well as SQL for data manipulation. Additionally, I have a strong grip on diagnosing ML issues and enhancing model quality using cloud warehouses. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. Looking forward to discussing this with you! ➡️ Skills & Experience: ✅ Data Analysis ✅ Python (pandas, NumPy, scikit-learn) ✅ SQL ✅ A/B Testing ✅ Machine Learning ✅ Data Workflows ✅ Cloud Warehouses (Snowflake, BigQuery, Redshift) ✅ Model Evaluation ✅ Data Visualization ✅ Problem Diagnosis ✅ Communication Skills ✅ Remote Collaboration Waiting for your response! Best Regards, Zohaib
$50 USD in 40 days
7.9
7.9

Hello, At Live Experts, LLC we have built a reputation on not just meeting, but exceeding, our clients' expectations. With over four years of professional experience in Data Science, our expertise in Python (including pandas, NumPy, and scikit-learn) and SQL is top-notch. We fully understand the importance of clean and well-organized data workflows for efficient LLM evaluation and we're highly skilled in building/auditing these pipelines. One standout quality that sets us apart from others is our ability to diagnose ML failure modes and improve model quality. This is critical in ensuring that data-driven decisions are reliable, a skill we've fine-tuned over the years. In addition to these proficiencies, we have comprehensive knowledge of cloud warehouses such as Snowflake, BigQuery, and Redshift - a strong asset for your project. Moreover, being "side-gig" friendly with a flexible schedule allows us to fully commit 20+ hours per week while taking care of other commitments. This ensures efficient task management within your given timeframe. In conclusion, hiring us means partnering with a highly skilled senior data scientist who would not only bring deep domain expertise but also an uncompromising commitment to drive results Thanks!
$50 USD in 1047 days
7.5
7.5

I am a seasoned Data Scientist with over 4 years of professional experience in DS, ML, and Data Engineering, making me a perfect fit for your Senior Data Scientist role for Advanced LLM Workflow. I understand the importance of designing and running A/B experiments, building data workflows, and analyzing datasets for data-driven decisions in a high-impact project like yours. My expertise in Python (pandas, NumPy, scikit-learn) and SQL, along with my ability to diagnose ML failure modes and improve model quality, positions me as the ideal candidate for this position. For your project, I can strategically optimize data workflows and pipelines using my experience in data processing and analysis. With a proven track record in similar domains and handling complex AI research tasks, I am confident in delivering the results you desire. Let's discuss your project requirements in detail and craft a roadmap for successful completion. Reach out to me to take the next steps towards achieving your project goals.
$50 USD in 15 days
7.0
7.0

✅ Proposal for Senior Data Scientist for Advanced LLM Work With over 4 years of professional experience in data science and machine learning, I am ideally suited for this high-impact project. My expertise includes Python (pandas, NumPy, scikit-learn), SQL, and diagnosing ML models to enhance their accuracy and efficiency. I have a strong background in designing A/B testing experiments, building and auditing complex data workflows and pipelines using dbt and SQL, and leveraging cloud warehouses like Snowflake and BigQuery. My experience ensures effective evaluation and optimization of LLMs and data-driven decision-making. I look forward to bringing my skills to your cutting-edge AI research project.
$50 USD in 30 days
7.1
7.1

Hi A common issue in LLM evaluation projects is inconsistent experiment design and lack of reproducible pipelines, which leads to unreliable conclusions and hidden model failure modes. I address this by structuring A/B evaluations with controlled prompts, standardized metrics, and statistically valid comparisons to ensure decisions are data-driven. My workflow combines Python (pandas, NumPy, scikit-learn) for analysis with SQL/dbt pipelines to build clean, versioned datasets in warehouses like BigQuery or Snowflake. I’ve worked on diagnosing LLM weaknesses such as hallucination patterns, prompt sensitivity, and edge-case failures, then iterating with targeted evaluation sets. For agentic workflows, I focus on traceability and logging to understand how models behave across multi-step reasoning tasks. I can also audit existing pipelines to improve performance, data quality, and reproducibility. The goal is a reliable evaluation system that scales and gives clear insights into model performance. Thanks, Hercules
$200 USD in 40 days
6.6
6.6

Dear , We carefully studied the description of your project and we can confirm that we understand your needs and are also interested in your project. Our team has the necessary resources to start your project as soon as possible and complete it in a very short time. We are 25 years in this business and our technical specialists have strong experience in Python, SQL, Machine Learning (ML), Hadoop, Data Science, Artificial Intelligence, NumPy, Data Analysis, A/B Testing, Pandas and other technologies relevant to your project. Please, review our profile https://www.freelancer.com/u/tangramua where you can find detailed information about our company, our portfolio, and the client's recent reviews. Please contact us via Freelancer Chat to discuss your project in details. Best regards, Sales department Tangram Canada Inc.
$50 USD in 5 days
7.4
7.4

Noticed you're focusing on LLM evaluation—recently fine-tuned a custom LLM for a logistics firm that improved their route efficiency by 15%. Extensive experience with A/B testing ensures precise model behavior evaluation. Curious about the datasets in your advanced workflows—are they domain-specific or more general? Can start reviewing your data architecture today to ensure smooth experiment integration. Happy to discuss how best to align this with your timelines. Let me know when you're free for a quick chat.
$50 USD in 7 days
5.6
5.6

Hi, I can help you with this. I am a developer with extensive experience with automations and integrations. I've helped clients with similar projects. Let me know your interest, Sincerely, Nicolas
$50 USD in 7 days
5.3
5.3

Hi there To deliver meaningful results on LLM evaluation and data workflows, the most critical part is designing experiments that go beyond surface metrics and actually expose model failure patterns and pipeline weaknesses. I’d approach this by structuring A/B experiments around specific behaviors (accuracy, consistency, hallucination patterns), then tying those results back into your data pipelines using SQL/dbt to identify where improvements should happen—whether in data quality, prompt structure, or workflow design. This ensures the outcome is not just analysis, but clear, actionable improvements to model performance and system reliability. My process is simple: audit current workflows and datasets, define evaluation metrics and test cases, run structured experiments, then deliver insights with concrete recommendations for iteration. I’m ready to start with a focused evaluation setup and quickly move into high-impact experiments aligned with your task-based workflow..
$60 USD in 40 days
5.3
5.3

Hello there, I can support your LLM evaluation and data workflow project by designing A/B experiments, auditing pipelines in dbt/SQL, and analyzing datasets to generate actionable insights. I will also evaluate model behaviors, diagnose failure modes, and provide recommendations for improving model quality. All work will be fully remote, using Python (pandas, NumPy, scikit-learn) and SQL, integrated with your cloud warehouse environment (Snowflake, BigQuery, or Redshift) as needed. I would be happy to discuss your project in further details at your convenience. Best, Darren
$50 USD in 40 days
5.2
5.2

Hi Louis C., Just last week I completed a similar task successfully, so I can get started on this without any ramp-up time. Two quick checks: 1) Which LLM providers/models and endpoints are in scope, and what success metrics matter most for A/B (quality, latency, cost, safety)? 2) What’s your current data stack (Snowflake/BigQuery/Redshift), dbt version/repo structure, orchestration, and how are datasets labeled/partitioned today? Two practical improvements: standardize an evaluation harness with versioned datasets, prompt templates, seeds, and auto-logging for tokens/latency/cost to make results reproducible; add dbt data-quality gates (tests, freshness, SLAs) plus drift/guardrail checks to block bad deploys. Execution plan: Phase 0 — Discovery: access, review pipelines, align metrics and task breakdown. Phase 1 — Baseline: run current models, build a golden dataset, set pass/fail thresholds. Phase 2 — Eval toolchain: Python/SQL harness, dataset versioning, warehouse logging, quick dashboards. Phase 3 — A/B + diagnostics: run experiments, create error taxonomy, fix failure modes via prompts/agents and data slices. Phase 4 — Pipeline hardening: audit dbt/SQL, optimize queries, add tests/docs/CI. Phase 5 — Benchmark + handoff: schedule nightly runs, finalize dashboards/runbooks, prioritize next tasks. Best Regards, Sid
$51 USD in 10 days
5.3
5.3

You need someone who can run rigorous LLM A/B experiments and tighten dbt/SQL pipelines fast — that’s my sweet spot. I’ve run agentic coding benchmarks and diagnostic evaluations under the same short timelines you describe. A common blind spot is conflating prompt variability with true model regressions; without prompt calibration and holdout stratification your A/B noise will mask real failure modes. I recently built an LLM evaluation stack for an enterprise search team: designed A/B tests for two prompt strategies, implemented dbt models and SQL scoring on BigQuery, and shipped dashboards that cut false-positive regressions by 18% in two sprints. My approach: perform a quick audit of your current data workflows and metrics, implement reproducible dbt models and SQL scoring pipelines, run controlled A/B experiments with proper holdouts, and deliver a short playbook plus failing-case tags you can act on. I’m comfortable with Python (pandas, NumPy, scikit-learn), SQL, and cloud warehouses. Can we hop on a 15-minute call to align priorities, and do you already have a dbt project and labeled holdout I can inspect before we start? Regards, Zweidevs
$50 USD in 7 days
4.8
4.8

I am an experienced Senior Data Scientist specializing in designing and implementing advanced Large Language Model (LLM) workflows for complex data-driven applications. With deep expertise in machine learning, natural language processing, and AI model deployment, I develop scalable pipelines that integrate LLMs effectively into business processes. My experience includes prompt engineering, fine-tuning models, and building automated workflows that extract, analyze, and synthesize information from unstructured and structured data, ensuring actionable insights and high-quality outputs. I focus on creating robust, efficient, and reproducible LLM workflows tailored to specific organizational needs, while ensuring data privacy, model performance monitoring, and optimization. By collaborating closely with stakeholders and engineering teams, I deliver solutions that not only leverage the latest AI technologies but also integrate seamlessly into existing systems. My goal is to enhance decision-making, operational efficiency, and innovation through state-of-the-art LLM-driven workflows that are scalable, maintainable, and future-ready.
$50 USD in 40 days
4.9
4.9

Hi, I am an experienced Data Scientist with 4+ years in Python, SQL, and advanced ML workflows, and I am well-versed in designing experiments, evaluating models, and building robust data pipelines. I have hands-on experience with LLM evaluation, A/B testing, and agentic coding workflows, making me confident in delivering high-quality results for your short-term project. I am proficient with pandas, NumPy, scikit-learn, and cloud warehouses like Snowflake, BigQuery, and Redshift, and I have a track record of diagnosing ML failure modes, improving model performance, and communicating actionable insights to stakeholders. My workflow includes careful data auditing, pipeline optimization, and reproducible analysis for robust decision-making. I am fully remote, flexible, and can commit 20+ hours per week. I am comfortable completing tasks within 1–3 hours each, ensuring fast and accurate delivery. I am excited by the opportunity to contribute to cutting-edge AI research in a flexible, high-impact capacity and can start immediately.
$50 USD in 40 days
4.4
4.4

Hello!, I am a US-based senior software engineer with extensive experience in data science, machine learning, and AI automation. After carefully reviewing your project description for a Senior Data Scientist for Advanced LLM Workflow, I believe my 15 years of expertise aligns perfectly with your needs. I specialize in Python, SQL, and data analysis, and have successfully delivered high-impact projects that involve machine learning and intelligent workflow automation. My projects often include building robust ETL pipelines and leveraging advanced ML models for data-driven insights. To ensure I fully understand your requirements, could you please clarify the following questions? 1. What specific goals do you aim to achieve with the advanced LLM workflow? 2. Are there any existing data infrastructures or tools you currently utilize that I should be aware of? My approach focuses on clear communication and structured milestones, ensuring that we stay aligned throughout the project. I’m committed to delivering solutions that not only meet your technical requirements but also drive tangible results for your business. If you’re looking for a serious professional who pays attention to detail and can deliver high-quality results, I’m your ideal candidate. Let’s discuss how I can help you achieve your project goals. Best, James Zappi
$50 USD in 14 days
3.9
3.9

Hi there, I'm Kristopher Kramer from McKinney, Texas. I’ve worked on similar projects before, and as a senior full-stack and AI engineer, I have the proven experience needed to deliver this successfully, so I have strong experience in Artificial Intelligence, Python, Machine Learning (ML), SQL, Data Science, NumPy, A/B Testing, Hadoop, Pandas and Data Analysis. I’m available to start right away and happy to discuss the project details anytime. Looking forward to speaking with you soon. Best regards, Kristopher Kramer
$50 USD in 40 days
4.3
4.3

Hello, I am Vishal Maharaj, with 20 years of expertise in Python, SQL, Artificial Intelligence, and NumPy. I have thoroughly reviewed your project requirements. To tackle this project, I plan to design and execute A/B experiments, assess model behavior, create and assess data workflows and pipelines using dbt and SQL, analyze datasets, and present findings for informed decision-making. With over four years of experience in Data Science and Machine Learning, I am well-equipped to diagnose ML failure modes and enhance model quality. Additionally, my familiarity with cloud warehouses like Snowflake, BigQuery, and Redshift will be beneficial for this project. Let's discuss further to initiate the chat. Cheers, Vishal Maharaj
$50 USD in 40 days
2.6
2.6

Hello, I’m interested in Senior Data Scientist for Advanced LLM Workflow and would be glad to contribute my expertise to ensure its successful completion. I’ve taken the time to understand your expectations and objectives. I will ensure each stage of the project is handled professionally and carefully. You can expect a final result that matches your standards and requirements. As a Senior Software Engineer, I bring extensive experience in Python, Artificial Intelligence, Machine Learning (ML), SQL and technical assessment. I’ve worked on similar projects where understanding both business needs and technical capabilities was essential. I’m confident in delivering accurate, efficient, and high-quality results. I have a few questions before we get started. Could you please send me a message in the chat so we can discuss the details? Talk soon, Dax Manning
$50 USD in 40 days
2.0
2.0

Hi there, I read your posting for a Senior Data Scientist to run LLM evaluation and build robust data workflows. I have 6+ years delivering production ML and analytics: designing A/B experiments, diagnosing failure modes, and building maintainable pipelines with SQL, dbt, and Python (pandas, NumPy, scikit-learn). I’ve run LLM benchmark suites, instrumented agentic coding flows, and translated model behaviors into concrete dataset and training fixes. My approach is pragmatic: I’ll align on evaluation metrics, create reproducible experiment scripts, audit and harden ETL/dbt models against sampling and label drift, and deliver clear analyses and next-step recommendations. I can work asynchronously on 20+ hours/week and deliver task-based results you can validate quickly. Which specific LLMs, datasets, and evaluation metrics do you want prioritized in the initial A/B experiments? Sincerely, Cindy Viorina
$50 USD in 21 days
2.2
2.2

Wellingborough, United Kingdom
Member since Apr 2, 2026
$30-250 USD
₹12500-37500 INR
$10-30 USD
$30-250 USD
$3000-5000 USD
₹100-400 INR / hour
₹150000-250000 INR
$250-750 USD
₹12500-37500 INR
$250-750 USD
$250-750 USD
€12-18 EUR / hour
€2-6 EUR / hour
£20-250 GBP
$10 USD
$25-50 USD / hour
₹750-1250 INR / hour
$30-250 USD
₹12500-37500 INR
₹600-1500 INR