I need a program that quickly grabs 100,000+ products from Amazon using the Amazon Web Services (AWS) API.
This information will then be stored in a MySQL database. Ideally, I would like you to develop this program using PHP and Curl, but I am open to other alternatives.
Here is a list of useful URLs:
AWS API: <http://www.amazon.com/gp/browse.html/ref=sws_aws_/002-8182866-8964820?node=3435361>
[login to view URL]: <[login to view URL]>
PHP Curl Functions: <[login to view URL]>
## Deliverables
This program should import products from [login to view URL] and store the product information in a MySQL database. The program should import product information as quickly as possible, with a minimum of 1000 products imported per hour. The program should also avoid duplicating product information in the database, even if run multiple times.
The program should probably generate this information using the [login to view URL] browse node information at [[login to view URL]][1] - although I am not set on this if you have a better solution. I will not be providing a list of ISBN numbers or UPC codes to start from. I will provide a list of popular keyword searches to use, but your database should not be based entirely on this list. Using [login to view URL]'s browse nodes will enable you to grab products from [login to view URL]'s "New Releases" and "Top Sellers" categories. Your program should make sure to get both Amazon's most popular products and products from our list of popular keyword searches.
For each product, you will need to assign a unique product id. You will also need to store the following information in the database:
product name
description
image url
product category
price
brand (brand/manufacturer)
upc
product_type (book/music/video/other)
additional info needed for books/music/video:
books: isbn, format(paperback/hardcover/audiobook/ebook), author, publisher
music: artist,format(cd/tape,vinyl/mp3/itunes/ogg/wma),release_date
video: format(dvd/vhs),director,release_date,starring,rating(G/PG/PG-13/R/NR)
You will be responsible for providing a MySQL database schema, as well as program files needed to populate the database. Ideally, I would like you to develop this program using PHP and Curl, but I am open to other alternatives.
You will be paid after we have been provided or have generated a database of at least 100,000 unique products. This database should include at least 1 product for each of the keywords we provide, as well a large percentage of the products listed in [login to view URL]'s "Top Sellers" category.
Here is a list of useful URLs:
AWS API: <http://www.amazon.com/gp/browse.html/ref=sws_aws_/002-8182866-8964820?node=3435361>
[login to view URL]: <[login to view URL]>
PHP Curl Functions: <[login to view URL]>
Other standard deliverable terms:
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Red Hat Linux with MySQL database