End to End Big data project
Budget $30-250 USD
- Freelancer
- Jobs
- Python
- End to End Big data project
Problem Statement:
Imagine you are part of a data team that wants to bring in daily data for COVID-19
test occurring in New York state for analysis. Your team has to design a daily
workflow that would run at 9:00 AM and ingest the data into the system.
API:
[login to view URL]
By following the ETL process, extract the data for each county in New York state from
the above API, and load them into individual tables in the database. Each county table
should contain following columns :
❖ Test Date
❖ New Positives
❖ Cumulative Number of Positives
❖ Total Number of Tests Performed
❖ Cumulative Number of Tests Performed
❖ Load date
Implementation options:
1. Python scripts to run a daily cron job
a. Utilize SQLite in memory database for data storage
b. You should have one main standalone script for a daily cron job that
orchestrates all other remaining ETL processes
c. Multi-threaded approach to fetch and load data for multiple counties
concurrently
2. Airflow to create a daily scheduled dag
a. Utilize docker to run the Airflow and Postgres database locally
b. There should be one dag containing all tasks needed to perform the end
to end ETL process
c. Dynamic concurrent task creation and execution in Airflow for each county
based on number of counties available in the response
Implement unit and/or integration tests for your application
5 freelancers are bidding on average $231 for this job
Hi I am Ashish, I am working as Software Engineer III - Data for Walmart, Previously I was with Deutsche Bank. I have total experience of 3 years in BigData, Java Spring, Competitive Programming. I am just trying out More
hello i am fullstack rubyonrails developer 3 year of experience . i am also working on govt covid data to analysis with bigdata i have team having much experience in python spark hadoop hive and kafka
I have worked extensively in python , ETL , database and airflow in both linear and distributed environment.