Server script (LAMP) with a user interface, that walks through a customisable list of twitter accounts, saves twits to a database, extracts links from twits and stores their contents
1. querying twitter accounts for post that are not yet stored in a database (with a customisable pause between the requests):
- storing tweets info into database - origin, timestamp, body of tweet and maybe some other properties
- extracting URLs from tweets, storing them in a separate table (with dupe checking);
- links from certain domains goes to separate lists (for filesharing services);
- if ling goes to pа[login to view URL], pа[login to view URL] - saving the paste contents to the database with info about link and searching for links within the paste text;
- automatic "ticking" links that had their contents downloaded and an option to do this ticking manually on the list of links;
- if link contents > some defined size than no need to save it, just make a separate list of such inks or mark them somehow;
- some of the links can be "shertened" (t.cо, bit.lу и т.д.), it is needed to store the final URL, not the shortened one.
2. option to choose the tweets parsing "depth" (all, last n days, date from * to *) separately for every account.
3. some status/statistics:
- are there any jobs that is currently going;
- an option to launch/stop the parsing and to change/adjust those definable parameters;
- a list of accounts with a total/total saved, last modification/last saved or smth alike;
- a view to get a list of saved tweets for an account/get the contents;
- a view to get a list of found URLs/pastes with info on them (were comes from, date etc.) with and option to sort by fields;
- an option in those fields to tick what is needed (separate items, all on the page, everything) and download it with a browser (maybe with some gzip-archiving before).