I need help greping files, diff & extracting data. Ubuntu Linux.
I am running:
grep '^[a-zA-Z0-9-]\+ AA .*'[url removed, login to view]|sed 's/AA .*//'|uniq >[url removed, login to view]
I end up with a clean file called file2.txt.
I have to download this file daily & grep it. So tomorrow, I would run the above grep command & I need to diff today & yesterday's file2.txt. I would like an output of any differences between both files.
Difference # 1
Make a new text file of domains that are missing from yesterday's text file.
Difference # 2
Make a new text file of new domains that appear in today's list that weren't in yesterday's list.
For the differences found, I need to go back to the orginal file & extract data to the right of the delimiter AA.
I then need to populate a MySQL database with the daily differences & provide a search feature.
Here's the 3 fields I will need to search - [url removed, login to view]
I have made REGEX PHP scripts to import the data into mysql but they are too slow. The files vary in size from 500MB - 1GB & 1 file is 6GB in size. But the grep handles my command above on these massive files.