Find Jobs
Hire Freelancers

Crawl Site -> Parse -> Index in ElasticSearch -> Visualize in Kibana

$250-750 USD

Cancelled
Posted over 8 years ago

$250-750 USD

Paid on delivery
I have two sites that I'd like to crawl - each one roughly 750k pages. I'm interested in stats around how the sites are organized content-wise (the most popular categories and tags), etc... From a crawling standpoint I would prefer either Nutch 2.3 or Scrapy. We then need to parse relevant pieces from the HTML (roughly 10-15 fields that are easily identifiable in the dom), and get the data into an ElasticSearch index. From there we hook Kibana to it and start slicing the data. It's going to run on a single ec2 instance - we will not be clustering the crawl or the ES index. It would be a huge bonus if you were familiar with AWS OpsWorks so we could automate standing up the environments for future crawls. If you've done this before, please mention it in your response. There are tons of freely available chef cookbooks to do this if you want to take a look... I see the project organized like this: 1. Core infrastructure setup - you run the installs and prep the environment. 2. We start the crawl. 3. While the crawl is running, you can work on the parsing and indexing piece. :-)
Project ID: 8065182

About the project

8 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
8 freelancers are bidding on average $481 USD for this job
User Avatar
Hi. I consider scrapy the best tool for such tasks so far. So, I will use scrapy for it. The general idea is clear, but I have few questions: 1. What are the sites we need to crawl? 2. What the fields we need to crawl? 3. Does this project include working with kibana and "start slicing the data" ? Or you will do it yourself? 4. Do you think that we may need proxy while crawling? If there a lots of data then we may need. 5. What type of EC2 instance you are planing to use ? This are the questions so far. Let me know all the details and we can continue. As experience I have a lot related to this. You can check my profile and projects I worked on which include lots of crawler. I even created a web UI where you can crate crawler without writing any code. Let me know the details, hope we will collaborate.
$631 USD in 4 days
4.9 (83 reviews)
7.0
7.0
User Avatar
Dear Sir, We are a expert of coding programming php/SQL-Database, python, lxml ,Java script, extract email from website and all kind of website scraping Experts team ready to help you. Experiences team here ready to start . Please check my profile..I have done similar projects. I know some beater knowledge how to Find companies, Email, address, Person name, What you want So I can do the work acquired perfect in time. Please see first my work sample if you like my sample then award me. Waiting for your reply. Thanks
$555 USD in 10 days
4.6 (55 reviews)
5.7
5.7
User Avatar
G'day Your idea for the project workflow sounds great! I could build a scraper for you using scrappy and test it on my own ec2 instance. After a day or two when we have the final version running on your instance I can dive into the parsing and indexing. It sounds like you really know what you want so I'm confident that I'll be able to produce a good outcome. Please let me know what you think and if you have any questions. Regards, Justin
$475 USD in 10 days
5.0 (7 reviews)
5.3
5.3
User Avatar
Hi, this looks like a really interesting project. I want to say beforehand that I don't meet all the qualifications: I've never worked in a AWS environment and I haven't gone further than the tutorial with ElasticSearch; I do work with Scrapy though. I can't offer you the best experience nor the quickest delivery time but, as you can see, you will be working with an honest guy and a good learner. Feel free to contact me :)
$300 USD in 10 days
5.0 (19 reviews)
4.6
4.6
User Avatar
Hello! My bid is lower than other competitors, because I'd like to learn a thing or two about mentioned AWS OpsWorks :-) As you can see in my profile, I've completed scraping tasks without a problem. I use Linux environment daily, including more advanced service configuration. Waiting to hear from you! Best, Pawel
$277 USD in 10 days
5.0 (7 reviews)
4.4
4.4
User Avatar
A proposal has not yet been provided
$388 USD in 2 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I read your job description, I'm really interested in your job as it is exactly within my scope of expertise in Python. Having several years experience in coding and programming using Python, PHP, Wordpress, OOP and MYSQL. Moreover, I have programmed web crawler using Python for specialized search engine, so I can build a crawler in python as you need with high quality . I look forward to contacting me to discuss more details. My Regards, Ensherah Ahel
$666 USD in 15 days
0.0 (0 reviews)
0.0
0.0
User Avatar
A proposal has not yet been provided
$555 USD in 10 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Piscataway, United States
4.6
13
Payment method verified
Member since Oct 3, 2014

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.