we need to develop an HTTP Robot in two levels:
The robot will index several web sources (blogs, newspaper...) in order to get content from deep pages (posts, news...). Final content pages will be selected based on uri patterns.
Robot have to take the content from that pages using regulars expresions for each different data (title, content, category...)
Content must be saved in a Database.
Web administrator with access restriction based in user/password.
Some functions: create web sources, create categories, create differentes IndexerProjects (one source can have more than one IndexProject: i.e. one per category)
System must run in linux with mysql and apache. We will provide more detailed information when choose the developer.
You must give us some information about how you're going to de the development: technologies, programming language, modules...