
Selesai
Disiarkan
Dibayar semasa penghantaran
I need an end-to-end ML experiment to predict duplicate customer records in a financial dataset from Kaggle. The goal is to build a proactive classification model that flags likely duplicates before they reach reporting, analytics, or risk pipelines. The workflow should include data loading, EDA, synthetic duplicate labelling (since labels won’t exist), feature engineering, model training, and evaluation. Duplicate pairs will be created using techniques like exact duplication, small perturbations, and formatting inconsistencies. Features should include exact matches, numeric differences (age, income, spending), and similarity measures. Models to test include Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar, but deliver one final tuned model. Evaluation should focus on F1-score (target ≥0.85), with a balance between precision and recall. Deliverables: reproducible notebook, clean code, short report, and README.
ID Projek: 40330729
5 cadangan
Projek jarak jauh
Aktif 17 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan

Hey, I have extensive experience working in the Fintech Domain as a Applied ML Engineer and Data Scientist, since last 6+ years. I can complete your task and also provide you with report in less than 1 day.
$70 USD dalam 1 hari
2.9
2.9
5 pekerja bebas membida secara purata $44 USD untuk pekerjaan ini

Hello, With over 7 years of experience in Excel, Data Science, Data Visualization, Statistical Analysis, and Statistics, I have the expertise to handle your project efficiently. I have carefully reviewed the requirements for the project. To address the predictive data quality modeling for financial customer data using machine learning, I will begin by loading the dataset from Kaggle and performing exploratory data analysis (EDA). Synthetic duplicate labeling will be implemented due to the absence of labels. Feature engineering will involve creating features based on exact matches, numeric differences, and similarity measures. The workflow will include model training and evaluation using techniques like Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar algorithms to develop a tuned model. Evaluation will focus on achieving an F1-score of ≥0.85, balancing precision and recall. The deliverables will include a reproducible notebook, clean code, a concise report, and a README file detailing the project setup. I would like to discuss this project further with you. Please connect with me via chat for a detailed conversation. You can visit my profile at https://www.freelancer.com/u/HiraMahmood4072 Thank you.
$36 USD dalam 2 hari
6.4
6.4

Hi there, I am A.R.M. MASUD, with a strong Data Science background. As a Python developer, I have extensive experience building robust, scalable, and efficient solutions that address various business needs. I understand the importance of delivering high-quality, well-architected code, and I am committed to working closely with you to ensure the success of this project. I implement core functionality using Python, utilizing relevant libraries and frameworks such as Pandas, NumPy, GUI, SciPy, Matplotlib, Seaborn, Plotly, Scikit-learn, TensorFlow, Keras, PyTorch, spaCy, Flask, Django, FastAPI, OpenCV, and Jupyter. I am a professional responsible for extracting actionable insights and knowledge from large volumes of data through Machine Learning models, including CNNs, RNNs, LSTMs, GANs, Transformers, FNNs, ANNs, and DNNs. I conduct comprehensive unit, integration, and performance testing to ensure the solution is error-free and optimized. https://www.freelancer.com/u/MZITSERVICES I appreciate the opportunity to submit this proposal and am excited about the possibility of working with you to bring your project to life. Thanks A.R.M MASUD
$40 USD dalam 7 hari
4.7
4.7

Your duplicate detection challenge needs synthetic labeling since real financial datasets won't have duplicate flags. I'd start by loading your Kaggle dataset, creating controlled duplicates through exact matches and small perturbations (typos, formatting changes), then engineer similarity features like Levenshtein distance, numeric differences, and exact match indicators. XGBoost typically performs well for this type of classification with proper hyperparameter tuning. I built a price aggregation engine that tracks 800+ products across multiple stores, handling fuzzy matching and duplicate detection for similar products with slight naming variations. The pattern recognition work translates directly to customer record deduplication. You can see my automation projects at ffulb.com. Can deliver the complete notebook, tuned model hitting your F1≥0.85 target, and documentation within a week. Ready to start immediately.
$28 USD dalam 2 hari
1.5
1.5

South Africa
Kaedah pembayaran disahkan
Ahli sejak Mac 20, 2026
$250-750 USD
$10-30 USD
₹1500-12500 INR
₹600-1500 INR
$250-750 USD
₹12500-37500 INR
₹600-1500 INR
€250-750 EUR
$250-750 USD
$10-30 CAD
$30-250 USD
$15-25 USD / jam
€12-18 EUR / jam
€18-36 EUR / jam
₹750-1250 INR / jam
$15-25 USD / jam
£250-750 GBP
₹12500-37500 INR
€8-20 EUR
₹1500-12500 INR