Find Jobs
Hire Freelancers

ML - Data Analysis and Classification

₹2500-3000 INR

Ditutup
Disiarkan sekitar 2 bulan yang lalu

₹2500-3000 INR

Dibayar semasa penghantaran
Overview of the Task: The original test sheets contain many data sets each with 49 numbers. Each data set is a column. Each of the data sets/columns has 7 out of 49 numbers selected as Process numbers. These are given in bold red. Now, the last column, the rightmost column, is the target data set for prediction. All other columns are data sets to be used for training the model. The project's ultimate objective is to predict the 7 process numbers of that last column/data set using Machine Learning models. We are using as many as 5 different types of ML models to predict these 7 pattern numbers from the target data set which is the last column of each test sheet. During this process of prediction, we have come across certain observations. We had to solve those observations and improve the prediction accuracy by overcoming those observations with methods or approaches to be developed by expert data scientists. This task named “Data analysis and classification” is for that objective. We have predicted the 7-process number of approximately 50 data sets using these 5 ML models at various test sizes. These prediction results are illustrated in the Excel workbook file named: “Comparison of prediction results of 50 data sets”. How to read and understand this Excel workbook is explained below: 1) The workbook has 50 sheets. The leftmost sheet is named 388 and it goes to 438 at the rightmost sheet. Out of these 50 sheets data is now filled up to 431, totalling to 44 data sets. Data of the remaining sheets shall be filled in due course as the data becomes available. 2) The numbers given as the sheet names are the numbers of the data sets. From 388, 438. Each of these numbers is also the name of the target data set, the rightmost column of each test sheet. 3) One data asset can have up to 6 to 7 test sheets. Named 388-1, 3881A, 3881B, 388-2 …. up to 388-5. Each test sheet has a varying number of data sets for training and one target data set. The number of data sets in each test sheet is stated in the Test sheet names. 4) A test sheet name starts with the number of the target variable (or target column) where we have to predict the 7 numbers. 5) Each of the 50 sheets of the workbook has a list of 9 numbers predicted by different ML models. The models used were RF - Random Forest Classifier, SVML - SVM Linear Classifier kernel, SVMR - SVM RBF Classifier kernel, SVMP – SVM poly classifier kernel and NB - Naive Bayes Classifier. 6) The actual 7 values or pattern numbers are given in the coloured cells in the top left of each sheet. Wherever these numbers have occurred in prediction results are also coloured with respective colours. 7) You may also notice something like - 388-1, 388-2, 388-3, 388-4, etc. These are different variations of test sheets of the dataset numbered 388 in each of these 5 to 7 various test sheets 388 is the target column. So, we make predictions using each of these test sheets of various sizes. ? Finally, we noticed getting better results by changing the test sizes during the test-train split. So, we have also tested each of the models in different test sizes - 0.2, 0.3, 0.4, 0.5, 0.6. These test size values are given in brackets against each test sheet name. 9) At the top left of each you can also notice 'Result type'. This describes a special data manipulation criterion. 'No column removal' - No columns are removed from the test sheet, 'Two column removal' - First two columns are removed from the test sheet, 'Four column removal' - First four columns the first four training data sets are removed from the test sheet etc. This resulted in increased prediction accuracy a little bit, so please be on the lookout for this variable. The Task: A. You have to first look through various predictions of each sheet, there are 150 predictions in each sheet, and count, list out/tabulate the facts available there such as: a) How many of the pattern numbers have occurred in each type of prediction? b) Which type of prediction has the highest number of correct pattern numbers? c) Which type of prediction has a consistent result? This means having a similar number of correct numbers repeatedly. d) Variations in Dataset: Explore the variations of the same dataset (e.g., 388-1, 388-2) and note any significant differences in prediction accuracy. e) Effect of Test Sizes: Investigate the impact of different test sizes (0.2, 0.3, 0.4, 0.5, 0.6) on prediction accuracy for each model. f) Influence of 'Result Type': Assess how different 'Result Types' affect the accuracy, especially whether column removal enhances or hinders the predictions. And so on…. All such observations/facts available there will help us determine which type of mode and at what test size value has the best performance. B. Analyse each test sheet in detail using various metrics used in data science to determine what are the characteristics of a test sheet or the target data set that gives the best prediction result. a) Prediction Accuracy: Calculate the overall accuracy of predictions for each test sheet. This involves assessing the ratio of correct predictions to the total number of predictions. b) Precision, Recall, and F1 Score: Break down the performance using precision, recall, and F1 score metrics. Precision measures the accuracy of positive predictions, recall assesses the ability to capture all positive instances, and F1 score combines both metrics. c) Feature Importance: If applicable, analyze the importance of features in the prediction. This is particularly relevant if certain columns or variables significantly influence the model's performance. You may use the SHAP graphs generated using interpretML to achieve this. d) Hyperparameter Tuning: Explore the impact of hyperparameter tuning on model performance. Assess how adjustments to parameters influence the predictive accuracy. C. Analyse each dataset (each data set is the same as each column and has 49 numbers) in detail using various metrics that can be derived from a data set without taking into account or considering the prediction results. a) Descriptive Statistics: Compute basic descriptive statistics such as mean, median, standard deviation, minimum, and maximum values. This provides an initial understanding of the central tendency and variability of the dataset. b) Data Distribution: Visualize the distribution of the dataset using histograms, box plots, or kernel density plots. This helps identify any skewness, outliers, or patterns within the data. The objective of this analysis and expected results: After this detailed study and analysis, we will get the following ability/knowledge I) Be able to classify or categorise the Test Sheets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …… d) ….. II) Be able to classify or categorise individual data sets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …. d) …… III) Be able to remove or add training data sets from a test sheet to get the highest possible number of correct predictions per different types of prediction models and test size. IV) Any other corrective actions to help us get high prediction accuracy Plan of Action In order to ensure the precise predictions of these models we have to compute a few metrics. These metrics generally depict the efficiency of the model. The list of these metrics is mentioned below along with details: - Accuracy: Proportion for correctly classified occurrences as defined in the pattern set. You have to compute the counts which are matching to the pattern sets and compute the proportions. Similarly, it will give us the error rate as well. We know the threshold and use it to interpret the results. Confusion Matrix: Accuracy alone is not enough to conclude the efficiency of the model. Conduct the in-depth analysis using underlying information. This matrix will give the True Negatives and True Positives. False Negative and False Positives. These measures will help to understand what are the variations and whether we can rely on a particular model or not. Sensitivity and Specificity: These measures will give us an overview of how many true positives (Predictions) are identified as pattern numbers. Similarly, how many numbers are identified as non-pattern numbers?
ID Projek: 37827443

Tentang projek

19 cadangan
Projek jarak jauh
Aktif 10 hari yang lalu

Ingin menjana wang?

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
19 pekerja bebas membida secara purata ₹3,024 INR untuk pekerjaan ini
Avatar Pengguna
Dear Rajat K., I am writing to express my interest in your project ML - Data Analysis and Classification. As a Python developer with years of experience in the field, I believe I have the skills and expertise required to deliver high-quality work that meets your requirements. My experience includes working on a variety of Python projects, from web applications to data analysis and machine learning. I am proficient in using Python libraries such as NumPy, Pandas, and Scikit-learn, and I have experience working with databases such as MySQL and MongoDB. In addition to my technical skills, I have a strong attention to detail and am committed to delivering work on time and budget. I believe that communication is key to the success of any project, and I am always available to discuss progress and answer any questions you may have. If you choose to work with me, I am confident that I can deliver the results you are looking for. I am excited about the opportunity to work on your project and look forward to hearing from you. Thank you for your time and consideration. Sincerely, Zied B.
₹2,995 INR dalam 3 hari
4.6 (98 ulasan)
5.7
5.7
Avatar Pengguna
Hi, readout your proposal i can do it I have more than 5 years of experience working as a data analyst. I have strong command of R and Python. Assign this task to me will do it as mentioned. I will not ask for more than the mentioned price.
₹4,500 INR dalam 2 hari
4.9 (28 ulasan)
4.8
4.8
Avatar Pengguna
Hlo! I have done MS in statistics. I read your job description. I have expertise in SPSS, R studio, excel, ML, classification, and statistical analysis. I will provide you with the finest work that perfectly aligns with your time and budget constraints.  Thanks waiting for your response. Regards, Hira Mahmood
₹2,750 INR dalam 1 hari
5.0 (31 ulasan)
4.8
4.8
Avatar Pengguna
Hi there, I am excited to share my expertise and skills in Machine Learning, which I have acquired over the past 5 years. I am confident that I can meet your requirements. Ps. After carefully reading the project requirements I can assure you I'm able to get it done within a few hours, contact me to start working right away.
₹2,750 INR dalam 3 hari
4.9 (8 ulasan)
3.7
3.7
Avatar Pengguna
⭐ Hi, My availability is immediate. I read your project post on ML/Python Developer. We are experienced full-stack Python developers with skill sets in - Python, Django, Flask, FastAPI, Jupyter Notebook, Selenium, Data Visualization - Web App Development, Data Science, Web/API Scrapping, Machine Learning, AI - API Development, Authentication, Authorization - SQlAlchemy, PostegresDB, MySQL, SQLite, SQLServer, Datasets - Web hosting, Docker, Azure, AWS, Digital Ocean, GoDaddy, Web Hosting - ML Algorithms: linear regression, logistic regression, decision trees, random forests, neural networks, etc - Python Libraries: NumPy, pandas, scikit-learn, tensorflow, etc. Please send a message So we can quickly discuss your project and proceed further. I am looking forward to hearing from you. Thanks
₹5,200 INR dalam 1 hari
4.3 (13 ulasan)
3.9
3.9
Avatar Pengguna
I propose to conduct a comprehensive analysis of the Machine Learning models' predictions on various test sheets. The plan involves initial data exploration, counting and tabulating facts, exploring dataset and test size variations, analyzing the influence of 'Result Type,' and conducting detailed test sheet and dataset analyses. The analysis will include calculating accuracy, precision, recall, and F1 score for each test sheet, exploring feature importance, and assessing hyperparameter tuning effects. Descriptive statistics and data distribution visualization will be performed for each dataset. The outcomes will involve categorizing test sheets and datasets based on model compatibility and characteristics. Actionable insights for improving prediction accuracy will be provided, along with established thresholds for interpretation. The final report will summarize findings, analyses, and recommendations for stakeholders.
₹3,000 INR dalam 2 hari
5.0 (8 ulasan)
2.7
2.7
Avatar Pengguna
It will be done in no time, I have done it before too, so let me know how shall we proceed and I will get it done.
₹3,000 INR dalam 7 hari
4.7 (2 ulasan)
2.3
2.3
Avatar Pengguna
Dear Hiring Manager, I am excited about the opportunity to work on the task you outlined, focusing on data analysis and classification using machine learning models. Here's how I propose to approach and execute the project: With my expertise in Python, statistics, machine learning, and data science, I am confident in my ability to deliver actionable insights that will optimize prediction accuracy and enhance the performance of machine learning models for your project. Thank you for considering my proposal. I look forward to the opportunity to contribute to the success of your project. Best regards, Harsh Bhatti
₹2,850 INR dalam 1 hari
4.7 (4 ulasan)
2.3
2.3
Avatar Pengguna
Hi Rajat K., Good afternoon! My name is Jane an expert data analyst with skills including Machine Learning (ML), Predictive Analytics, Python, Data Science and Statistics. I have over 5 years in tutoring data analysis and statistics. Having completed similar project, I am confident in my ability to deliver high-quality results for this project. I am eager to discuss further details and see how I can contribute to your team. I am happy to offer a free consultation and a 10% discount for first-time clients. Please contact me to discuss more about this project. Regards Jane
₹3,055 INR dalam 2 hari
5.0 (2 ulasan)
0.0
0.0
Avatar Pengguna
Hi, This is Irfan Khan, done MBA Marketing and Advertisement along with many Google and Meta Certifications. Considering the 10+ Experience in Digital Marketing and leading an Artificial Intelligence Lab where we have 10+ GPU Machines to serve you better Services. I'm confident and assure you that I will be your best choice to achieving your goal / milestone not only in budget but Cost effectively and efficiently too. Please circle back to me if I can assist you in any regards. Thank You.
₹2,750 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
As an driven and meticulous data scientist, I have a extensive experience in Data Analysis and Classification and Machine Learning (ML). I am confident that my expertise will be a valuable asset in your project. With a strong grasp of key ML models such as Random Forest Classifier, SVM Linear Classifier, SVM RBF Classifier, SVM poly classifier, and Naive Bayes Classifier, I will be able to delve deep into your 388 to 431 test sheets on the workbook file and gather meaningful insights to improve prediction accuracy. Moreover, my proficiency in SQL and Python will enable me to efficiently tabulate and analyze all the predictions on each of the sheets. In addition to this, my ability to critically think and find unique approaches will be handy when overcoming the identification hurdles you've encountered thus far. I am confident that I can accurately count and list out facts that will be instrumental in enhancing your ML models. Finally, let me assure you that your need for varying test sizes is something which really interests me - it highlights not just my skills but also interests me professionally. Thus, I can offer an experienced perspective on the different possibilities with alternate testing sizes. My commitment to providing actionable insights and excellent data visualization fits perfectly with this project's objectives of predicting pattern numbers through ML models. Choosing me would guarantee both expertise and dedication towards your project's success!
₹3,000 INR dalam 10 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
With over 5 years of experience in the field, my deep understanding of both Data Science and Python makes me perfectly suited for your project. Navigating large datasets and applying Machine Learning models to predict patterns is at the core of what I do, and I have repeatedly demonstrated the ability to leverage such models across a range of test sizes, attaining optimal results each time.
₹2,750 INR dalam 30 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
Contact me i can solve it in better way. If you are interested you can contact i never disappoint the cleint. I can do this project in last price of 2000.
₹2,500 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
I'm the best candidate for your project as I've done various projects on ML prediction having higher accuracies. I'll showcase my projects if needed.
₹2,600 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
I am having 10 year experience working with python and machine learning. I solved about 40+ classifications problem, 30+ regressions problems and 10+ clustering problem during whole tenure of 10 years' experience.
₹2,750 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
already i made this model with your given dataset, if your ok i can present you, please let me know your interest, this presentations is free once your inrested we can move on further.
₹2,750 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
As an experienced data scientist with a specialization in Machine Learning (ML), I am confident that I can effectively handle your task of analyzing and classifying the data for your project. With over 8 years of industry experience, I have gained substantial knowledge and expertise in utilizing ML models to extract meaningful insights from complex data sets. In addition, my in-depth understanding of Python, a programming language extensively used for ML, will ensure proficient execution of your project. + Support Vector Machine
₹2,750 INR dalam 7 hari
0.0 (0 ulasan)
0.0
0.0

Tentang klien

Bendera INDIA
SIRSA, India
5.0
4
Kaedah pembayaran disahkan
Ahli sejak Jun 8, 2022

Pengesahan Klien

Terima kasih! Kami telah menghantar pautan melalui e-mel kepada anda untuk menuntut kredit percuma anda.
Sesuatu telah berlaku semasa menghantar e-mel anda. Sila cuba lagi.
Pengguna Berdaftar Jumlah Pekerjaan Disiarkan
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuatkan pratonton
Kebenaran diberikan untuk Geolocation.
Sesi log masuk anda telah luput dan telah dilog keluar. Sila log masuk sekali lagi.