Text Analysis Project using PySpark ML

I want someone to do a theme analyses around 5 million comments on a video sharing website using PySpark Ml library as the main tool. I will provide the dataset. The work environment should be Databricks Community Edition (you can create an account for free), and the deliverable is a Databricks notebook.

The data is at “video_creator – commentor_id – comment” granularity. What I want you to do is the following:

1. Remove comments that are not written in English.

2. For each commentor_id, append all his/her comments into one feature, call it “all_comments”. That is, aggregate the granularity of dataset into commentor_id – all_comments granularity

3. Transform the “all_comments” feature using Word2Vec modules of PySpark ML library (not the MlLib library as I want to do everything using dataframes)

4. Do a clustering of the transformed “all_comments” feature using the LDA module of PySpark ML.

5. Generate the most frequent words for each cluster as identified in field. I will do the interpretation of the results, and you don’t need to worry about it.

So overall, it’s a straightforward task of data clean, aggregation, and application of standard PySpark ML modules.

I estimate this project to take 2 to 3 hours of programming for someone good at Python and PySpark. I hope to get the project done in 3 days, up to 6 days is acceptable. If you place your bid, I will share with you the link to the data file. I don't have other instructions other than those five steps listed above.

Kemahiran: Sains Data, Python, Spark

Lihat lagi: twitter analysis using pyspark, sentiment analysis spark python, twitter sentiment analysis using spark github, spark streaming twitter python, twitter sentiment analysis scala, twitter sentiment analysis using pyspark, pyspark text classification, sentiment analysis python, conjoint analysis project, online exam project using java, mini project using java script, configure java project using serverxml tomcat, time series analysis project, microprocessor project using pic, set php project using wamp, system analysis design project using vb6, project using jquery database, save text flash project, proposal data analysis project, health information system project using aspnet

Tentang Majikan:
( 28 ulasan ) Durham, United States

ID Projek: #17903811

11 pekerja bebas membida secara purata $232 untuk pekerjaan ini


I have a good hands on working with Advanced R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Application. My area of e Lagi

$250 USD dalam 3 hari
(26 Ulasan)

Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp Lagi

$500 USD dalam 3 hari
(17 Ulasan)

I hope to see you in chat. Though I am new to freelancer.com I am an experienced python developer with full-stack knowledge and career. I'm sure I can do this perfectly. Thanks for your kind attention.

$200 USD dalam 2 hari
(27 Ulasan)

Hi, dear. nice to meet you. i'm python expert. please discuss more details by chatting. Regards. gao M.

$250 USD dalam 3 hari
(13 Ulasan)

Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.

$140 USD dalam 2 hari
(22 Ulasan)

do kindly let's discuss over chat

$222 USD dalam 6 hari
(22 Ulasan)

I have been working as data scientist for more than 4 years during which i implemented numerous machine learning algorithms to solve varied business problems. Moreover, to gain other domain expertise, i have been activ Lagi

$388 USD dalam 7 hari
(2 Ulasan)

Hello, Sir. How are you? I have experiences more than 9 years in developing Laravel,node.js,angular.js,react.js and Python Frameworks with mobile apps I will work for you all my best. Thank you in advances for your t Lagi

$155 USD dalam 3 hari
(1 Ulasan)

Hello? I have read your job description carefully. I have python experienced for 7 years. I want to discuss with you via chat. Thanks you, James.

$155 USD dalam 3 hari
(1 Ulasan)
$244 USD dalam 21 hari
(0 Ulasan)

i know pyspark... try me... just need a nice review...

$45 USD dalam sehari
(0 Ulasan)