
Closed
Posted
Paid on delivery
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
Project ID: 40488333
4 proposals
Remote project
Active 7 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
4 freelancers are bidding on average ₹944 INR for this job

Hi there, You’re absolutely in the RIGHT PLACE. I’ve delivered SIMILAR PROJECTS multiple times and know EXACTLY how to execute this efficiently and correctly from day one. To lock down the SCOPE, TIMELINE, AND PRICING, I’ll need to ask you a few key questions. Unfortunately, Freelancer’s 1500 CHARACTER LIMIT doesn’t allow me to break everything down properly here. Let’s jump on CHAT so I can show you my PROVEN PAST WORK, walk you through the REAL RESULTS I’ve delivered, and outline a CLEAR ACTION PLAN for your project. You’ll immediately see why my approach is DIFFERENT and EFFECTIVE. If you’re serious about getting this done RIGHT, I’m ready to move forward. Looking forward to CONNECTING and WINNING TOGETHER. Cheers, Mayank Sahu
₹1,050 INR in 7 days
0.0
0.0

Building a robust local pipeline for processing Binance Futures historical data requires precise handling of ingestion and feature extraction to mitigate data leakage risks. By utilizing DuckDB for local storage, you can efficiently manage millions of rows while ensuring seamless alignment of tick-by-tick trades, order book snapshots, and 1m klines into a unified timestamp sequence. The implementation of Python/Polars allows for effective computation of rolling features with strict lookahead bias prevention. I can deliver an initial proof of concept within 15 days. When can we start? I can have something to show you within 24 hours.
₹925 INR in 9 days
0.0
0.0

Lucknow, India
Member since Sep 25, 2021
₹600-1500 INR
₹1500-12500 INR
₹600-1500 INR
₹600-1500 INR
₹1500-12500 INR
₹10000-30000 INR
$10-30 USD
$10-30 USD
₹12500-37500 INR
$15-25 USD / hour
₹750-1250 INR / hour
$30-250 NZD
$10-30 USD
₹12500-37500 INR
$15-25 USD / hour
₹12500-37500 INR
$250-750 AUD
$30-250 USD
₹400-750 INR / hour
₹750-1250 INR / hour
$8-15 USD / hour
€30-250 EUR
₹750-1250 INR / hour
$15-25 USD / hour
$30-250 USD