
Open
Posted
•
Ends in 3 days
Paid on delivery
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
Project ID: 40488342
18 proposals
Open for bidding
Remote project
Active 3 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
18 freelancers are bidding on average ₹8,096 INR for this job

I'm a data engineer and quantitative developer experienced in cryptocurrency data pipelines and ML-ready feature engineering. I'll build a robust local pipeline ingesting aggTrades, klines, and bookDepth from Binance Vision for all five symbols, aligning multi-frequency data to a unified timestamp, extracting core microstructure features, and enforcing strict lookahead-bias prevention via t−1 rolling calculations. Deliverables include a clean DuckDB/Parquet storage architecture, normalized features with rolling z-scores, Triple Barrier Method labeling, and NaN-free per-symbol Parquet files ready for immediate model training. Fully documented and reproducible. Ready to start immediately.
₹8,000 INR in 7 days
6.0
6.0

Hi there, I will build a local Binance Vision pipeline that downloads aggTrades, 1m klines and bookDepth ZIPs for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT and BNBUSDT and produce ML-ready Parquet/DuckDB datasets without lookahead bias. - Ingestion & DB: scripted downloader + DuckDB schema and partitioned Parquet export for daily/monthly ZIPs (aggTrades, klines, bookDepth) with timestamp alignment across tick, snapshot and 1m bars. - Feature extraction: Polars-based pipelines computing microstructure features (volume imbalance, VWAP shifts, Parkinson volatility) aligned to unified timestamps and saved per-symbol Parquet with no NaNs/infs. - Risk/QA: staged deployment with backup checkpoint and post-fix validation to ensure rolling features use t-1 parameters (no lookahead) and reproducible outputs. Skills: ✅ Binance Vision aggTrades/klines/bookDepth ✅ DuckDB / Parquet storage ✅ Polars / Pandas feature pipelines ✅ Lookahead-bias prevention / rolling t-1 calculations ✅ Local deployment / disk-efficient partitioning ✅ Triple Barrier labeling / directional labels Certificates: ✅ Microsoft® Certified: MCSA | MCSE | MCT ✅ cPanel® & WHM Certified CWSA-2 I am available to start immediately. Do you want per-symbol outputs as Parquet only or also a DuckDB catalog with indexed tables for downstream queries? Best regards,
₹12,436 INR in 1 day
4.3
4.3

As an experienced Python developer with a strong background in data pipeline development and management, I believe I am the perfect fit for your Binance Futures project. Over the past 7+ years, I've built numerous local pipelines, ingesting and processing large volumes of data in various formats. My proficiency in technologies like DuckDB and Parquet has enabled me to design robust storage architectures that handle millions of rows without memory issues — an essential requirement for your needs. Additionally, my expertise in Python-based data handling libraries such as Polars and Pandas ensures swift yet accurate processing of your unique dataset. I appreciate the significance of a lookahead bias-free and ML-ready dataset, especially when it comes to financial analysis. Rest assured, my understanding of rolling features computation using t−1 parameters and data normalization techniques like rolling z-scores aligns perfectly with your project's objectives. Lastly, one thing that sets me apart is my long-term vision and dedication to completing any project successfully. A task as complex as yours demands clear communication, clean code, impeccable documentation - all of which are principles deeply embedded within my work ethics. By choosing me, you'd not only be securing a skilled Python developer but also a solid partner who'll help you make smarter technical decisions, saving both time and money along the way. Looking forward to discussing your project further!
₹12,000 INR in 7 days
4.4
4.4

Hello, I can efficiently build a robust, local pipeline to process Binance Futures historical data into an ML-ready dataset as per your requirements. I’ll handle programmatic data ingestion from Binance Vision, set up a scalable storage solution using DuckDB or Parquet, align different data frequencies, and extract microstructure features using Polars or Pandas. I’ll ensure strict lookahead bias prevention, normalize features, and implement labeling with the Triple Barrier Method. The output will be clean Parquet files, ready for model training. With 5+ years of Python development experience, I’m confident in delivering this project. Let’s discuss further or I can share samples of similar work. Thanks, Adegoke. M
₹7,500 INR in 3 days
2.5
2.5

Hi, I am currently building an identical pipeline — Binance Vision ingestion (aggTrades, klines, bookDepth), DuckDB + Parquet storage, microstructure feature extraction with strict lookahead bias prevention using t-1 parameters, and Triple Barrier labeling. I can deliver your full scope: ingestion, alignment, feature extraction, normalization, and ML-ready output.
₹10,000 INR in 7 days
2.1
2.1

I can't write an effective proposal yet — the project description is incomplete. It cuts off at "We are looking for a Python developer with experience to build a robust, local pipeline that process" and I don't know what the pipeline needs to process or what the full requirements are. To write a strong proposal that mirrors the client's specific pain (per Val's approach), I need: 1. **Full description** — what exactly should the pipeline process? (Binance futures data, order data, price feeds, something else?) 2. **Specific deliverables** — what outputs or functionality do they expect? 3. **Timeline** — when do they need it? 4. **Any other context** — existing infrastructure, data volume, update frequency, etc. Can you provide the complete project posting? Once I have it, I'll write a proposal that: - Directly addresses their specific need in the first sentence - Recommends concrete technical choices (Binance API client, ccxt, data storage approach) - Shows realistic understanding of the $1500 scope - Ends with a commitment-driving next step
₹1,500 INR in 7 days
2.3
2.3

As the founder of Paper Perfect, a reputed freelance development agency, I want to express my strong interest in taking charge of your Python development needs to build a Binance Futures Data Pipeline. With years of collective experience in data management and processing, my team and I are skilled in making large-scale data operations possible. I have a deep understanding of Python and its libraries like Pandas, which is an essential skill for this task. Importantly, we fully acknowledge the significance of trend analysis within distinct trades, ordering snapshots, and the potential it holds to shape financial strategies. Our expertise lies in aligning these different frequencies into a unified timestamp sequence and ensuring they remain bias-free in accordance with ML readiness. At Paper Perfect, we not only deliver on time but also emphasize the need for clean and normalized data for significant analysis and model training. We assure you that your project will comply with these norms through our recommended storage solution using DuckDB or Parquet. So, entrust us with your project and let's create a robust pipeline that transcends your expectations. Reach out via our website and let us bring your vision to life.
₹7,000 INR in 7 days
2.2
2.2

As someone who is adept at Python development, I would be an excellent choice for your Binance Futures Data Pipeline project. Having worked with E-commerce and CMS-based websites extensively, I understand the importance of processing and managing large volumes of data efficiently. My 9+ years of experience in web and mobile app development has honed my skills in ingesting and storing vast amounts of information while ensuring smooth execution. This skill set will come in handy while setting up a local storage solution using DuckDB or Parquet for your project, facilitating the handling of millions of rows without memory issues. Furthermore, my proficiency in Python is not limited to just database setup. I am also well-versed in Python-associated libraries like Polars and Pandas which can be utilized for computing features on the aligned data. Additionally, my work with ML models necessitated me to be meticulous about data quality and prevention of biases, like lookahead bias. Given that this project too requires a stringent approach towards data manipulation to ensure model readiness, I believe my experience adhering to data integrity will shine through here. Lastly, beyond just technical competence, I pride myself on delivering projects that resonate with my clients' needs and having a cross-functionality between different facets of IT. My offer comes not just with the assurance of high-quality Python coding but also a holistic package - rangingproject.
₹15,000 INR in 7 days
2.0
2.0

As a versatile developer with a demonstrated proficiency in Python, I am confident in my ability to successfully execute the project of building a local pipeline for processing and normalizing Binance Futures data. Having already implemented numerous data ingestion and database management solutions with tech stacks ranging from Node.js to MongoDB, I am uniquely equipped to design an efficient storage architecture for handling the massive dataset anticipated in this project, whether it is using DuckDB or Parquet. One thing that sets me apart is my attention to detail, especially when it comes to avoiding data biases and ensuring long-term data stability. I can guarantee that all features will be handled strictly adhering to the lookahead bias prevention guidelines and normalized effectively through rolling z-scores or min-max normalization as required - a vital aspect of generating a reliable ML-ready dataset. Additionally, having been a contributor in modern JavaScript frameworks and tools, I have an instinct for optimization. With this project calling for features computation on aligned data, my command on Python/Polars (or Pandas) scripting can prove to be a significant asset. Finally, with experience developing CRM & ERP systems and AI integrations similar to what might be needed in this project, I believe I can ensure prompt delivery of clean Parquet files that are structured for immediate model training.
₹7,000 INR in 7 days
1.5
1.5

The core challenge is building a local pipeline that ingests Binance Vision's daily ZIP archives for aggTrades, klines, and bookDepth across five symbols, then outputs ML-ready Parquet or DuckDB tables free of lookahead bias, all without hitting memory limits on a local machine. I would use asyncio and aiohttp to parallelize the ZIP downloads, Pandas for initial parsing and alignment of the three data sources by timestamp, then DuckDB for deduplication, feature computation, and final export to Parquet. A sharp question: for bookDepth data, do you need the full order book snapshot at each timestamp or only the depth update deltas, since the latter reduces storage by an order of magnitude but requires a reconstruction step?
₹1,500 INR in 3 days
0.4
0.4

We’ve worked on a project with a very similar scope, giving me strong insight into delivering quality results efficiently. I understand the importance of a clean user-friendly UI for high-end customers. I am well-equipped to handle the Python development for your Binance Futures Data Pipeline, ensuring accurate ingestion, database setup, microstructure feature extraction, and ML readiness with a focus on lookahead bias prevention and data normalization. I'd love to chat about your project and how I can assist. Walk away with a free consultation. Regards, Nabeel Ismail
₹5,650 INR in 7 days
0.0
0.0

Rahul here, I can help build a robust Python-based data pipeline that ingests Binance Futures historical data from Binance Vision and transforms it into a clean, ML-ready dataset optimized for quantitative research and model training. I have experience with Python, Polars, Pandas, DuckDB, Parquet, feature engineering, time-series processing, and large-scale financial datasets. The pipeline will handle data ingestion, frequency alignment, microstructure feature generation, rolling normalization, lookahead-bias prevention, labeling, and efficient storage for millions of records. The final output will be clean Parquet/DuckDB datasets with aligned timestamps, engineered features, proper normalization, Triple Barrier or directional labels, and no NaN/infinite values—ready for immediate ML workflows. I’m ready to discuss the feature set, architecture, and implementation plan and can start immediately.
₹6,000 INR in 2 days
0.0
0.0

Hi, I lead a team of software developers, data scientists and full stack engineers with 20+ years of combined experience in developing and deploying enterprise grade software solutions. We have several years of experience in automated trading bot development and can design bots based on indicators like RSI, MA, MACD, Bollinger bands, harmonic and Fibonacci. recently, we created an algorithm for trading based on advanced machine learning research. Looking forward to discussion, Best Regards, Radhika
₹15,000 INR in 7 days
1.4
1.4

Hi, For your pipeline, I can handle the full workflow from downloading Binance Vision historical files, to storing them efficiently in DuckDB/Parquet, aligning trades, order book snapshots, and klines into a single timeline, and generating clean ML-ready features. One thing I pay close attention to is avoiding data leakage. All rolling calculations, volatility measures, and normalization steps will be based only on information available at the time, ensuring the final dataset is suitable for real model training and backtesting. I'll also implement the labeling logic (Triple Barrier or directional labels), perform data validation and cleaning, and deliver structured Parquet/DuckDB outputs that can be plugged directly into an ML pipeline. My focus is always on building something reliable, reproducible, and easy to extend later as new features or symbols are added. Mohamed
₹10,000 INR in 7 days
0.0
0.0

Dear Sir/Madam, I am an experienced Python Developer with strong expertise in building scalable backend systems, APIs, automation tools, and full-stack applications. I specialize in delivering clean, efficient, and production-ready solutions. I have successfully developed and deployed multiple live applications including healthcare platforms, legal service apps, school management systems, fintech apps, and real-time communication systems. My Core Python Expertise ✔ Django & Django REST Framework ✔ FastAPI (High-performance APIs) ✔ Flask ✔ SQLModel / SQLAlchemy ✔ PostgreSQL / MySQL / MongoDB ✔ Supabase Integration ✔ Authentication (JWT, OAuth) ✔ Payment Gateway Integration (PhonePe, Razorpay, Stripe) ✔ Web Scraping (BeautifulSoup, Selenium) ✔ Automation Scripts ✔ WebSocket & Real-time Systems ✔ Docker Deployment ✔ AWS / VPS Deployment ✔ REST API Design & Optimization What I Can Build For You Secure REST APIs SaaS backend architecture Admin dashboards Real-time chat systems Payment systems Data processing systems Microservices architecture AI/ML API integration Custom business logic systems Recent Project Experience Healthcare booking & wallet system Legal consultation backend platform School ERP & management API Fintech wallet & transaction management Real-time chat application (WebSocket + MQTT) Location-based services & geo APIs
₹8,000 INR in 10 days
0.0
0.0

Lucknow, India
Member since Sep 25, 2021
₹600-1500 INR
₹150000-250000 INR
₹600-1500 INR
₹1500-12500 INR
₹1500-12500 INR
₹600-1500 INR
$25-50 USD / hour
£10-15 GBP / hour
₹12500-37500 INR
$30-250 AUD
€30-250 EUR
₹12500-37500 INR
₹750-1250 INR / hour
$750-1500 USD
$30-250 USD
$250-750 USD
₹12500-37500 INR
₹750-1250 INR / hour
₹12500-37500 INR
₹750-1250 INR / hour
$15-25 USD / hour
₹1500-12500 INR
₹1500-12500 INR
$10-30 USD
₹600-1500 INR