I need someone to write a program that can auto catch data from this site : [url removed, login to view]
The program is what I [url removed, login to view] the data.
Look at the category list in the right site first.
I need the program can auto catch some categories' data.
Some of them are under [恋愛・少女漫画], from [ちはやふる] to [すきっていいなよ。].
Rest fo them are under [ファンタジー漫画], from [終りのセラフ] to [クレイモア].
*Check the attachment named “category”
You have to output data in the following way:
url----URL of the article
picture----the cover the article
*Check check the attachment named “tip1,tip2,tip3”
[cat],means the name of [url removed, login to view] the category list in the right [url removed, login to view] can see words like [ちはやふる] and [黒崎くんの言いなりになんてならない],they are categorise.
[title],check the category [ちはやふる],turns to a new page,you can see words such as [ちはやふる33巻173首のネタバレ感想] or [ちはやふる33巻172首のネタバレ感想], they are titles.
[article],check one title like [ちはやふる33巻173首のネタバレ感想],turns to a new page，you can see an article with lot of [url removed, login to view] have to catch the body which from the title(ちはやふる33巻173首のネタバレ感想) to the end of the article (end at the place above [目次][コメント] and advertisements).
[entry_data_at],means the publish time of the articel,for example,the publish time of ちはやふる33巻173首のネタバレ感想 is the one written under the title - 2016/10/[url removed, login to view] have to record it by using timestamp,which would turn 2016/10/01 into 1451577600.
[url],means the url of the article,like [url removed, login to view]
[site],all write as [url removed, login to view]
[url removed, login to view],
under advertisements,there is a [目次] [url removed, login to view] can see [33巻173首] write in black and has no [url removed, login to view]'s the [character]
About [author],[magazine],[genre],[picutre],[id] and [created_at],you should do the following step first.
Search [cat] in [url removed, login to view],use the first result.
For example,search [ちはやふる] in [url removed, login to view],you can get:
ジャンル： スポーツ / 少女マンガ / アニメ化 / 映画化
[author],means the words after [作家：]. In the example the [author] is [末次由紀].
[magazine],means the words after [雑誌・レーベル：], In the example the [magazine] is [BE･LOVE].
[genre],means the words after [genre：],need to use "," to separate them. In the example the [genre] is [スポーツ,少女マンガ,アニメ化,映画化].
[pitucre],the cover of the first [url removed, login to view] have to catch covers and store [url removed, login to view] the datebase there should add a data bar of [pictuer] and have url of each cover.
[id],means the order, the first one is 1, the second one is 2, etc.
[created_at],means the time you catch the article,also have to record by using timestamp. For example,if I catch the date on UTC/GMT+08:00 2016/10/11 14:40:30, so the [created_at] should be 1476168030.
Use [ちはやふる] as the example, do what I said,you can get:
url:[url removed, login to view]
site：[url removed, login to view]
*Check the explanation named “database sample”.
This is what I [url removed, login to view] have to make the program to catch data in this way to make my server can recognize the data.
Need to catch data 2 hours one [url removed, login to view] to send me the program you write to catch data.
Because all I need is a [url removed, login to view] the budget is 300 USD.
Tap 113114 in your bid.
11 freelancers are bidding on average $361 for this job
Hello, I am able to create the scraper. The first time, the scraper will need to get all the articles on the site and the associated data from cmoa.jp. Then in the future, will it just need to get new data?