Using Apache Spark for NLP and Machine Learning tasks
Apache Spark
In this project, I’ve used Apache Spark for NLP and Machine Learning tasks.
Spark for Machine Learning
model
In this Task, I’ve used spark ML to build an mlp model and apply a multi-label classification on a `spark data frame. After Preprocessing Data and creating a column for labels, Standardization and PCA are applied to data respectfully.
Results
The results of training a mlp model on the proposed dataset are shown in the table below:
| Model | Test Accuracy | Test Recall | Test Precision |
|---|---|---|---|
| MLP | 96,01% | 96,01% | 92,19% |
Spark for NLP
For this task, I downloaded the Les Misérables book and created a spark dataframe with sentences from the book. After that, I created the bigram and trigram of the prepared data frame and compute the count of each bigram and trigram.
for the last part, I implemented a logistic regression to see if there is any relationship between the length of words in bigram or trigram.