Ditutup

Counting duplicates in big files

I have about 14million rows of data with 6 columns in csv format.

Created a working solution in Power BI that do the trick within 30mins but the program has limitation of row size that can be exported for further processing and can only run 2 files (sometimes buggy) whereas i need to run 6 files in a day.

Target:

-a program or any data manipulation software, sql codes that return the counts of the number of rows or entries that have similar content as the current row - from 1 entry only to all 6 columns/entries

-the position of the column is not important in the check e.g. for count of 5 similar entries, the following 2 (representative entries, not actual) rows will have the result of 1 because of 2,3,4,5,6

1,2,3,4,5,6 - 1

2,3,4,5,6,7 - 1

-It should able to return the result fast - not more than 30mins (can be discussed)/ or maximum 4 hours for 6 files.

note: Unfortunately, I cannot give milestone payment for program/solution that cannot meet the processing timing.

Kemahiran: Power BI, Python, SQL, MySQL

Lihat lagi: upload big files 4gb, ajax upload big files script, upload big files ajax php, php cant upload big files, can load big files internet others, transfer big files, find duplicates excel files, mysql load big files, upload big files php http ajax, virtuemart upload big files, load big files, file upload perl big files, upload big files php status, can delevered big files, upload big files via script, joomla component upload big files, download big files sugarcrm, failed find flength file big files

Tentang Majikan:
( 1 ulasan ) Singapore, Singapore

ID Projek: #20307066

26 pekerja bebas membida secara purata $47 untuk pekerjaan ini

tausy

Hi, I'm a data engineer with over 5 years of industry experience on a wide array of tech stacks including databases, data warehouses, machine learning, big data/Hadoop. I'm currently pursuing my Master's in Data Scien Lagi

$50 USD dalam 3 hari
(45 Ulasan)
5.2
AliSafder

Hi, My name is Ali and I can work on the task with immediate availability. I can do duplication check in SQL Server. Let's have quick discussion so I can work on it.

$30 USD dalam 3 hari
(25 Ulasan)
4.4
ranahashim

Hi. I can make a program that can solve your problem. I have enough experience to tackle the problem. Message me to discuss

$25 USD dalam 7 hari
(13 Ulasan)
4.0
jayantavkumar

I can upload the file into SQL db using SSIS ETL with removal of duplicate records with efficient performance. And there will not be any restriction of no. of files. You can load N number of files in one go. Let me kno Lagi

$30 USD dalam sehari
(10 Ulasan)
3.9
jiteshparwal93

Hey I have got your requirement and can deliver you a SQL script that will compute results within maximum 10 minutes. You can message me to get query and check if it is giving you result within time and then you can a Lagi

$35 USD dalam sehari
(2 Ulasan)
3.3
juttj110

Okay the program will process in your given time. But you need to discuss more over chat about job. Thanks

$30 USD dalam 2 hari
(6 Ulasan)
2.4
AlexFaster

Hi. I can write this program on native language (not c# or pypton) and it will calculate very fast. See my reviews and completion rate on this site. Regards, Alex.

$250 USD dalam 3 hari
(1 Ulasan)
2.4
l0ginp

Hi, I can manupulate your csv file by python in 1 day. Please send me message so that we could discuss it further. To make sure that employment will truly serve your requirement, you can evaluate my skill by giving pa Lagi

$30 USD dalam sehari
(3 Ulasan)
1.9
Sendmefreelancer

Did you manage to make a decision to pick the freelancer? I have got the code ready and I will test it with the 14million rows of data if you can get me a sample CSV. It’s written in Python and is fairly looks for a Lagi

$10 USD dalam 2 hari
(1 Ulasan)
1.7
sd21TheDeath

I see what you want, however its not completely clear. So, I might want to ask a few things first if we decide to work on it. It won't take more than 2 days to complete such a program, so 7 days which I am proposing is Lagi

$25 USD dalam 7 hari
(1 Ulasan)
1.0
aap31374

Myself Anil have more then 10 years of experience in SQL Server databse development and Administration. I have worked with big Databases for clients like match. Com, nationstar mortgage and with TCS. I am also good f Lagi

$35 USD dalam sehari
(1 Ulasan)
0.6
ThinkStartPL

Hi there! I am 4+ years experienced developer as Python, Django, RoR & ReactJS. Please open the chat box for further discussion. Regards,

$25 USD dalam 10 hari
(1 Ulasan)
0.4
PageOllice

Hello, Thanks for posting this job and giving us opportunity to apply on it. I have read project description and can assure you that I can handle this job. Please reply back to get into more details over chat board. Lagi

$20 USD dalam 7 hari
(0 Ulasan)
0.0
olrya

Dear sir. I have read your project details carefully. I am a web full stack developer. I can do your project, perfectly. My Target is best service to customer, credit is first, high quality result. I have 5+ years expe Lagi

$30 USD dalam 3 hari
(1 Ulasan)
0.1
natuzaid

Hi, I am an expert in java and python and I can complete this job within a day. I have read your requirements and look forward to working with you. Let's continue this in freelance chat

$40 USD dalam sehari
(0 Ulasan)
0.0
krushnadebashram

I can upload the file into free version on SQL Express DB using Openquery/Openrowset .The csv file to dump in a location in the system where SQL Express .Then using a Tsql script to get the desire result. The whole pro Lagi

$70 USD dalam 2 hari
(0 Ulasan)
0.0
Lukacho

Hi! I can make an application for you on C#. It will be maximally fast and process files in minimum time. I can do that in 1-2 hours. Write me to discuss details. Thanks!

$30 USD dalam sehari
(0 Ulasan)
0.0
IshaqKN

Hi, I understood your problem very nicely, processing large amount of csv data in an efficient and speedy way. Well Python is your tool for this task. This is the type of problem (Data Processing) Python solves the be Lagi

$20 USD dalam 2 hari
(0 Ulasan)
0.0
pancholivinay

I am software engineer and ready to do this work as already experience in doing db query and optimization. Relevant Skills and Experience Having 6 years of experience in Microsoft technologies, Azure Cloud, asp.net, c Lagi

$19 USD dalam 3 hari
(0 Ulasan)
0.0
aritramukh09

Hi, Since you are looking to filter out duplicates from huge number of rows, the easiest way would be via a Relational Database. I would very much like to work on this project. I have over 4 years of database developme Lagi

$25 USD dalam 3 hari
(0 Ulasan)
0.0