The project is about making modifications on a table on a postgresql database using Scala and Spark. The following functions are required:
1- Connect to a Postgresql database.
2- Update a column based on the existence of certain words on another column and write the new table on the database.
3- Filter non-relevant tweets: define the filtering criteria (removing retweets, remove spams :
tweets containing squads, removing tweets containing only urls or mentions, etc.
4- call a function that compute a score for each tweet : score = number of suicide, depression terms.
5- integrate these steps as a generic function, try to use something scalable, so that we can modify the criteria easily without touching the whole pipeline.