Text Review Machine Learning

In this project, you will predict rating scores (1 5) from the text in online reviews. You are given a dataset with reviews and their rating scores. The task is to take as input the review text and predict what is the rating score for that review. You will experiment with several supervised

learning algorithms using this dataset. You must use Jupyter Notebook

There are 2 files, [login to view URL] and test.txt. Each file contains review-text and the score separated by a tab. There are 10K reviews in [login to view URL] and 1K reviews in test.txt.

You will need to first extract features from the review text. You will implement the bag-ofwords model. To do this, you will need to use packages in python sklearn. These packages convert documents into vectors after pre-processing the document (e.g. removing stop words, etc.) automatically. You should use the TF-IDF vectorization here. sklearn.feature_extraction.text.TfidfVectorizer. You will then experiment with PCA (dimensionality reduction) that is performed as a preprocessing step ([login to view URL]) to reduce the number of features.

Supervised learners

You will experiment with the following learners.

i) Neural networks (MLPClassifier in sklearn)

ii) Naïve Bayes (MultinomialNB in sklearn)

iii) Logistic Regression (LogisticRegression in sklearn)

iv) AdaBoosting (AdaBoostClassifier in sklearn)

v) SVM ([login to view URL] in sklearn)

Tasks to perform

i. Run 5-fold Cross Validation on the [login to view URL] using the 5 learning algorithms. Report the average-precision, average-recall and average-F1-scores. Parameters that you should try to change include

a. In neural networks change the number of hidden layers and number of units in each layer

b. In SVMs, change the penalty parameter C and the kernel type

c. In Adaboosting change the number of estimators (n_estimators)

d. In Logistic regression change the penalty: L1 regularization that can also perform feature selection and L2. Also change the regularization strength parameter ( C )

ii. Perform feature selection using PCA and re-run the algorithms with their optimal settings. Compare the results for different values of n_components (number of reduced features) in PCA.

iii. Include some additional knowledge into your model. Specifically, not all words are useful in predicting rating scores. Words that express sentiments are more likely to be useful. Use the sentiment words in [login to view URL] and [login to view URL] to filter words from the review text, and then evaluate the algorithms once again. What changes did you observe?

iv. Perform evaluation on the test dataset using the optimal parameter settings that were obtained from the training set. How did each algorithm perform? Report its precision, recall and f-scores. Which types of reviews were the hardest to predict?

Kemahiran: Perlombongan Data, Machine Learning (ML), Python

Lihat lagi: machine learning vancouver, machine learning companies vancouver, text review job, machine learning text classification, machine learning text analysis, text tagging machine learning, unsupervised text classification, text classification techniques, document classification machine learning, machine learning for text pdf, supervised machine learning a review of classification techniques, machine learning mehreen, machine learning thesis, machine learning freelance, machine learning statistical nlp option, transcribe videos text review, job machine learning, machine learning message board, machine learning outsource, matlab machine learning regression

Tentang Majikan:
( 0 ulasan ) Memphis, United States

ID Projek: #19249937

Dianugerahkan kepada:


HI. As a python/c++/Java expert with strong math background and ML experience, I can finish your project wonderfully. Please let me know your deadline and budget.

$30 USD dalam 3 hari
(30 Ulasan)

7 pekerja bebas membida secara purata $246 untuk pekerjaan ini


Hi there, I have read your project description and i'm confident i can do this project for you perfectly.I still have a few questions. please leave a message on my chat so we can discuss the budget and deadline of the Lagi

$333 USD dalam 3 hari
(13 Ulasan)

hi friend, I am a machine learning specialist that has completed multiple advance projects here and on guru. I also worked for a German company as a machine learner and data scientist. to sum it all up my projects, Lagi

$260 USD dalam 3 hari
(7 Ulasan)

Hello there, This is a default bid made. we'll discuss the price later in the chat after reading your project i can do this for you perfectly.I still have a few questions. please leave a message on my chat so we can Lagi

$300 USD dalam 3 hari
(6 Ulasan)

Hi, Sir!! i am a python expert and full-stack developer with full time. @$@$@$PLEASE CONTACT ME. I CAN DO IT WONDERFULLY $@$@$@ i use tensorflow, sklearn, keras for AI, ML, pandas for Data Analaysis, sele Lagi

$300 USD dalam 3 hari
(11 Ulasan)

Hi, I am an information engineer I can finish the required project send me the project details ... send me the dataset review it

$277 USD dalam 3 hari
(1 Ulasan)

I've had an experience with Text Reviews classification. Experienced in Python. Deadline is 10 days but I'll try to finish earlier.

$222 USD dalam 10 hari
(0 Ulasan)