This project involves two parts.
1) A data collector engine that continuously query twitter and facebook and some other public api's to get data related to a given list of celebraties and store it in database. All the data is public and there is no violation of any TOA of api's.
2) A customizable web analytics dashboard to display some useful insights from the data captured above. Sample insights are
-- Metrics related to follower growth
-- Engagement/ performance metrics
-- Follower demographics
More details in the detailed section.
## Deliverables
There are 2 excel files in the attachment. One of them is the data collected regarding a celebrity from twitter and metrics/graphs generated with same. The second is data collected from facebook and metrics generated from the same . Essentially thats the data collected for one celebrity over one week of time. What we need is ability to capture similar data continuously for all 1500 celebrities and produce those metrics in realtime. More details can be communicated once the project gets started.
Technology choices:
Data collection engine: Java or Python or ruby
Database: mysql
Deployment platform: amazon-aws or Heroku
Dashboard related stuff: <Anything> (we can discuss more)
Note:
-- These api's have limits and rates. So the engine should respect and adapt to these limits.
-- At the same time, there is a realtime component involved in data collection. So this involves querying from a cluster so that restrictions wont kick in.
-- Database should be normalized for the frequent queries.