End to End Big data project

$30-250 USD

Ditutup

Disiarkan

sekitar 3 tahun yang lalu

$30-250 USD

Dibayar semasa penghantaran

Problem Statement: Imagine you are part of a data team that wants to bring in daily data for COVID-19 test occurring in New York state for analysis. Your team has to design a daily workflow that would run at 9:00 AM and ingest the data into the system. API: [login to view URL] By following the ETL process, extract the data for each county in New York state from the above API, and load them into individual tables in the database. Each county table should contain following columns : ❖ Test Date ❖ New Positives ❖ Cumulative Number of Positives ❖ Total Number of Tests Performed ❖ Cumulative Number of Tests Performed ❖ Load date Implementation options: 1. Python scripts to run a daily cron job a. Utilize SQLite in memory database for data storage b. You should have one main standalone script for a daily cron job that orchestrates all other remaining ETL processes c. Multi-threaded approach to fetch and load data for multiple counties concurrently 2. Airflow to create a daily scheduled dag a. Utilize docker to run the Airflow and Postgres database locally b. There should be one dag containing all tasks needed to perform the end to end ETL process c. Dynamic concurrent task creation and execution in Airflow for each county based on number of counties available in the response Implement unit and/or integration tests for your application

ID Projek: 29026380

Tentang projek

5 cadangan

Projek jarak jauh

Aktif 3 tahun yang lalu

Ingin menjana wang?

Alamat e-mel

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda

Dapatkan bayaran untuk kerja anda

Tuliskan cadangan anda

Ianya percuma untuk mendaftar dan membida pekerjaan

5 pekerja bebas membida secara purata $231 USD untuk pekerjaan ini

@nmogilip

Hi, I am a certified big data developer, used pyspark in my many of applications, I feel you should use pyspark for multithreaded applications as spark distribute the load into different node and executors. If you have spark environment ready then you should start using it, otherwise it can be done using thread mechanism in pure python code too. Please let’s connect and discuss more on your requirements. Thanks, Naresh.

$244 USD dalam 7 hari

5.0

(6 ulasan)

4.2

@ashishpatel0720

Hi I am Ashish, I am working as Software Engineer III - Data for Walmart, Previously I was with Deutsche Bank. I have total experience of 3 years in BigData, Java Spring, Competitive Programming. I am just trying out this platform, I can do your project in 7 days. please contact in chat if you are interested.

$200 USD dalam 7 hari

0.0

(0 ulasan)

0.0

@sanjayrathore556

hello i am fullstack rubyonrails developer 3 year of experience . i am also working on govt covid data to analysis with bigdata i have team having much experience in python spark hadoop hive and kafka

$222 USD dalam 2 hari