Find Jobs
Hire Freelancers

Big data management

$55-60 AUD

Ditutup
Disiarkan lebih dari 2 tahun yang lalu

$55-60 AUD

Dibayar semasa penghantaran
Task: Data Warehouse Design and Implementation in Apache Hive The objective of this task is to design and implement a sample data warehouse in Apache Hive, which is described in the following narrative. A university plans to create a data warehouse to store information about the submissions of student assignments and later on to analyse the contents of a data warehouse. It is expected that the planned data warehouse will contain historical information collected over a long period of time. This data warehouse will contain information about assignment submissions (abbreviated as “submissions” hereafter), assignments, subjects, students, and degrees. The following relationships exist between the above domain entities: Each submission belongs to one assignment and is submitted by one or more students (for individual or group submissions). Each student is enrolled into one degree. Each assignment belongs to one subject. A submission is described by a mark, a submission date, and a file path (which refers to a location on HDFS). An assignment is described by a weight (percentage), a due date and a specification file path. A subject is described by a subject code and subject name. A student is described by a student number, first name, last name, and email address. A student number and email address separately identify each student. The time dimension contains four levels: day, week, session (Autumn or Spring) and year. This data warehouse should support OLAP queries, including the common aggregations about submissions per subject, per student, per degree, per day, per week, per session, or per year. You can make reasonable assumptions on the keys of domain entities. Complete the following questions: Part 1. Develop a conceptual model for the above data warehouse. The dimensions and hierarchies must be correctly presented. Part 2. Specify the OLAP operations for the following specific queries by using relational algebraic notations (in the slides of Lectures 4 and 5): (i) “Find the average slack period (i.e., number of days between submission date and due date) for submissions per subject and per session” (ii) “Find the average mark for each assignment for the subject ‘ISIT312-912’ in 2017” (Hint. Use the “DICE” operation at page 46 of Lecture 4 slides.) Part 3. Transform your conceptual model for Question 1 into a logical model with a star schema. Note that all level tables in a star schema are flatten, i.e., denormalized. part 4. Create an (internal or external) Hive table (schema) for each table in your logical model for part 3. part 5. Populate the Hive tables for Question 4 with some sample data. More specifically, create a file containing a few (e.g., three) sample records, which are determined by yourself, for each table in the local system, and then load those files into Hive. Once done, use HQL to show all data. Part 6. Implement the OLAP operations for Question 2 as HQL statements on the Hive tables for Part 5.
ID Projek: 31602148

Tentang projek

3 cadangan
Projek jarak jauh
Aktif 2 tahun yang lalu

Ingin menjana wang?

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
3 pekerja bebas membida secara purata $238 AUD untuk pekerjaan ini
Avatar Pengguna
Hi, I've read the description of your posted job with the title "Big data management”. I’m doing this job for the last 7 years. Which carries out different complexity level SQL/MySQL/PostgreSQL/MS Access/Maria DB task few of them are mentioned below, 1. Database Creation from Scenario 2. Create ERD from the documentation. 3. Extraction of Data from database through quires. 4. Writing Store procedure and views. 5. Indexing, look after query performance. 6. DB backup and restoration. 7. Data analysis and attractive visualization dashboards with (Tableau and Power BI) 8. Custom Visualization (I can show my works screenshot in chat). 9. Customize dashboards using different apps as a data source(Acumatica/Healthcare apps) 10. SQL Programing. Thanks, Sufyan Jamil
$600 AUD dalam 7 hari
5.0 (1 ulasan)
2.1
2.1
Avatar Pengguna
I'm data scientist and big data expert working Hadoop Scala spark hive pig HBase pyspark python. I can easily do your work and deliver your work before time
$55 AUD dalam 2 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
I do have 6 years of IT experience with Python, pyspark, Spark, SQL, hive, ETL, Databricks, data analysis and engineering. I do have exposure to Azure and Aws cloud platforms. I am a Microsoft Certified Azure Data Engineer. DP-200 & DP-201 Oracle SQL & Java Certified. Though, i am quite new to this platform but i can assure you i do have a rich experience in robust and scalable data pipelines using pyspark. We can have a 20 mins session to understand your needs. Let's connect to discuss more about same.
$58 AUD dalam 7 hari
0.0 (0 ulasan)
0.0
0.0

Tentang klien

Bendera AUSTRALIA
RANDWICK, Australia
5.0
20
Kaedah pembayaran disahkan
Ahli sejak Sep 19, 2019

Pengesahan Klien

Terima kasih! Kami telah menghantar pautan melalui e-mel kepada anda untuk menuntut kredit percuma anda.
Sesuatu telah berlaku semasa menghantar e-mel anda. Sila cuba lagi.
Pengguna Berdaftar Jumlah Pekerjaan Disiarkan
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuatkan pratonton
Kebenaran diberikan untuk Geolocation.
Sesi log masuk anda telah luput dan telah dilog keluar. Sila log masuk sekali lagi.