Scraping of Topsy, Google, Zacks (only todays EPS/SALES)

In Progress Posted May 15, 2012 Paid on delivery
In Progress Paid on delivery

Dear Mr/Mrs,

I would like to be able to scrape certain values from the web page [url removed, login to view] on demand via the windows command line prompt. Then should these scraped values be stored into a csv file.

This script should I be able to use different input parameters so I can control the scraping.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

A.) TOPSY SCRAPING

1. Scraping mode (probably best is to use a input txt file)

The different input parameters are (see [url removed, login to view]

for the attributes on the page):

- Past 1 Hour

- Past 1 Day

- Past 30 Day

- All Time

(Search)

- Everything

- Links

- Tweets

- Experts

(Network)

- Google Plus

- Twitter

(Language)

- All Languages

- Different languages

The attributes are used in the HTTP of [url removed, login to view]

e.g. [url removed, login to view] is attribute "Past 7 Day"

The attributes to be scraped are:

- Number of hits

- 10 result details

these attributes are written to CVS file and with the current date

if there are already entries then are the result appended.

2. Scraping mode

@Input (probably best is to use a input txt file)

- Be able to use all the different attributes for scraping on this page

[url removed, login to view]

with these different ways:

scrapingPart

- 24 hour

- 12 hour

- 6 hour

- 2 hour

- 1 hour

Time Period: how far back time the scraping should be done up till which date

@Result

Retrieve these output results

The attribute to be scraped is:

- Number of hits

for every ScrapingPart(24 hours, 12 hours ....) is a scraping done and the @Result is saved as a record in the CSV file

For example

selected Time Period: [url removed, login to view] (start) - [url removed, login to view]

Are the number of days 365 and there are 365 scrapings with the specified keyword(s)

the output is written to a CVS file.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

B.) Scraping of [url removed, login to view] / [url removed, login to view]

The same scraping we would like to have as well for [url removed, login to view]

or [url removed, login to view]

Where user can specify last hour, last 24 hours

and use keywords and scraping input like this

test site:.de

for scraping on specific domain.

The attributes to be scraped are:

- Number of hits

- 10 result details

These attributes are written to CVS file and with the current date

If there are already entries then is the result appended.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

C.) Scraping Zacks Earnings

The page to scraped is [url removed, login to view]

The tables

TODAY'S EPS SURPRISES

- Positive Surprises

- Negative Surprises

Should be scraped with colon values and merged into one CVS file with current date as additional colon

TODAY'S SALES SURPRISES

- Positive Surprises

- Negative Surprises

Should be scraped with colon values and merged into one CVS with the current date additional colon

Then there are two different output CVS files [url removed, login to view] and SALES_suprises.csv.

This script must be possible to run from the command line. Every result will be appended. In case there are the same record (Same Company & Time) should no action be performed.

I should be able to run this script to create new records every day.

I will be able on Skype every day for the project support.

The code will be belong to the project requester and is not allowed to be distributed to third party.

I am looking forward to quality fast coding.

Regards,

Thomas

Perl Script Install Software Architecture SQL

Project ID: #1635439

About the project

4 proposals Remote project Active May 17, 2012