Find Jobs
Hire Freelancers

End to End Big data project

$30-250 USD

Ditutup
Disiarkan sekitar 3 tahun yang lalu

$30-250 USD

Dibayar semasa penghantaran
Problem Statement: Imagine you are part of a data team that wants to bring in daily data for COVID-19 test occurring in New York state for analysis. Your team has to design a daily workflow that would run at 9:00 AM and ingest the data into the system. API: [login to view URL] By following the ETL process, extract the data for each county in New York state from the above API, and load them into individual tables in the database. Each county table should contain following columns : ❖ Test Date ❖ New Positives ❖ Cumulative Number of Positives ❖ Total Number of Tests Performed ❖ Cumulative Number of Tests Performed ❖ Load date Implementation options: 1. Python scripts to run a daily cron job a. Utilize SQLite in memory database for data storage b. You should have one main standalone script for a daily cron job that orchestrates all other remaining ETL processes c. Multi-threaded approach to fetch and load data for multiple counties concurrently 2. Airflow to create a daily scheduled dag a. Utilize docker to run the Airflow and Postgres database locally b. There should be one dag containing all tasks needed to perform the end to end ETL process c. Dynamic concurrent task creation and execution in Airflow for each county based on number of counties available in the response Implement unit and/or integration tests for your application
ID Projek: 29026380

Tentang projek

5 cadangan
Projek jarak jauh
Aktif 3 tahun yang lalu

Ingin menjana wang?

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
5 pekerja bebas membida secara purata $231 USD untuk pekerjaan ini
Avatar Pengguna
Hi, I am a certified big data developer, used pyspark in my many of applications, I feel you should use pyspark for multithreaded applications as spark distribute the load into different node and executors. If you have spark environment ready then you should start using it, otherwise it can be done using thread mechanism in pure python code too. Please let’s connect and discuss more on your requirements. Thanks, Naresh.
$244 USD dalam 7 hari
5.0 (6 ulasan)
4.2
4.2
Avatar Pengguna
Hi I am Ashish, I am working as Software Engineer III - Data for Walmart, Previously I was with Deutsche Bank. I have total experience of 3 years in BigData, Java Spring, Competitive Programming. I am just trying out this platform, I can do your project in 7 days. please contact in chat if you are interested.
$200 USD dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
hello i am fullstack rubyonrails developer 3 year of experience . i am also working on govt covid data to analysis with bigdata i have team having much experience in python spark hadoop hive and kafka
$222 USD dalam 2 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
I have worked extensively in python , ETL , database and airflow in both linear and distributed environment.
$240 USD dalam 5 hari
0.0 (0 ulasan)
0.0
0.0

Tentang klien

Bendera UNITED STATES
Los Angeles, United States
0.0
0
Ahli sejak Jan 21, 2021

Pengesahan Klien

Terima kasih! Kami telah menghantar pautan melalui e-mel kepada anda untuk menuntut kredit percuma anda.
Sesuatu telah berlaku semasa menghantar e-mel anda. Sila cuba lagi.
Pengguna Berdaftar Jumlah Pekerjaan Disiarkan
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuatkan pratonton
Kebenaran diberikan untuk Geolocation.
Sesi log masuk anda telah luput dan telah dilog keluar. Sila log masuk sekali lagi.