Find Jobs
Hire Freelancers

Big Data Project

₹5000-5500 INR

Ditutup
Disiarkan 4 bulan yang lalu

₹5000-5500 INR

Dibayar semasa penghantaran
Store Sales Data Analysis: A Data Engineering Capstone Project Project Overview The project aims to analyze global sales data to offer actionable insights into regional sales trends, item popularity, and profitability. Real-World Implications • Optimizing Inventory: Know what items sell well in which regions. • Sales Strategy: Develop targeted sales strategies for different markets. Target Audience • Sales Managers • Business Analysts • Data Scientists Technologies and Tools • Data Processing: Pandas, Spark • Query Language: Hive • Data Visualization: Matplotlib, Seaborn • Big Data Technologies: HDFS, YARN Data Source The dataset includes: • Transaction Information: Region, Country, Item Type, Sales Channel, Order Priority, Order Date, Order ID, Ship Date • Sales Data: Units Sold, Unit Price, Unit Cost, Total Revenue, Total Cost, Total Profit Problem Statements Data Preprocessing 1. Null Value Elimination 2. Date Data Cleaning 3. Categorize Items 4. Sales Data Cleanup 5. Data Type Conversion 6. Seasonal Decomposition: Break down sales data into seasonal, trend, and residual components. 7. Feature Engineering: Create new features like Profit Margin, Sales Velocity. Data Analytics (Big Data Analysis with Visualization) 1. Number of Countries (Using Hive) 2. Units Sold by Region (Using Hive) 3. Most Recent Sales (Using Hive) 4. Products with Specific Letters (Using Spark) 5. Top Selling Countries (Using Spark) 6. Item Costs (Using Spark) 7. Sales Yearwise (Using PySpark) 8. Orders per Item (Using PySpark) 9. Country with Highest Sales (Using PySpark) 10. Customer Segmentation: Use clustering algorithms to identify different customer segments. 11. Time Series Forecasting: Predict future sales using ARIMA or LSTM. 12. Anomaly Detection: Identify any anomalies or outliers that could indicate fraudulent activity. 13. Association Rule Mining: Find associations between different products in the data (Using Spark). 14. Price Elasticity: Understand how the demand for a product changes with a change in its price (Using PySpark). 15. Correlation Between Priority and Profit: Analyze if 'Order Priority' has any correlation with 'Total Profit'. Data Visualization 1. Regional Sales Distribution 2. Top 10 Items Pie Chart 3. Sales Time Series 4. Profit Distribution 5. Sales by Item 6. Heatmap: Show the correlation between different numerical features like Unit Price, Unit Cost, and Total Profit. 7. Interactive Dashboard: Create an interactive dashboard where users can filter data by year, region, or item. 8. Geographic Heatmap with Time Slider: Show how sales in different regions have evolved over time. 9. Cohort Analysis: Visualize customer retention over time. 10. Bubble Chart: Display Units Sold, Total Revenue, and Total Profit in a three-dimensional bubble chart. Performance Metrics 1. Spark Job Metrics 2. Query Latency in Hive 3. HDFS Storage Utilization 4. Data Skew Detection 5. Resource Utilization with YARN 6. Task Failure Rates: Monitor and minimize the failure rates of tasks in Spark or Hive jobs. 7. Data Replication Metrics in HDFS: Track and optimize data replication times and success rates. 8. Data Ingestion Latency: Measure the latency of data ingestion from different sources into HDFS.
ID Projek: 37404545

Tentang projek

4 cadangan
Projek jarak jauh
Aktif 2 bulan yang lalu

Ingin menjana wang?

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
4 pekerja bebas membida secara purata ₹5,938 INR untuk pekerjaan ini
Avatar Pengguna
I can Store Sales Data Analysis. I am a freelancer having 7 years of experience in Python Language Development. I'm having the following skills in python: ◈ Object-Oriented Programming (OOP) in python ◈ R programming ◈ Jupyter notebook and Google Colab ◈ Data structures and Algorithms ◈ Web development with frameworks such as Django, Flask, and Streamlit and others ◈ Machine learning, deep learning, and Artificial Intelligence ◈ Database integration, including SQL and NoSQL ◈ Data analysis and visualizations ◈ Text processing and natural language processing, Tokenization ◈ Debugging and troubleshooting ◈ Functional programming. I hope I'm a good candidate for your project. I will deliver your project on time with quality assurance at affordable price. Please message me so that we can discuss more about your project.
₹5,250 INR dalam 2 hari
4.8 (3 ulasan)
2.3
2.3
Avatar Pengguna
I have an extensive experience as MLOps and worked with various projects and data formats. I find myself as a good candidate for this problem.
₹5,250 INR dalam 60 hari
0.0 (0 ulasan)
0.0
0.0

Tentang klien

Bendera INDIA
Nadia, India
0.0
0
Ahli sejak Sep 10, 2023

Pengesahan Klien

Terima kasih! Kami telah menghantar pautan melalui e-mel kepada anda untuk menuntut kredit percuma anda.
Sesuatu telah berlaku semasa menghantar e-mel anda. Sila cuba lagi.
Pengguna Berdaftar Jumlah Pekerjaan Disiarkan
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuatkan pratonton
Kebenaran diberikan untuk Geolocation.
Sesi log masuk anda telah luput dan telah dilog keluar. Sila log masuk sekali lagi.