Find Jobs
Hire Freelancers

Craigslist scraper and parser

$250-750 USD

Cancelled
Posted over 14 years ago

$250-750 USD

Paid on delivery
We need a Craigslist scraper and parser with (source code; preferably python) that automatically archives multiple RSS feeds from Craigslist. Running the parsed on the scraped logfile should provide word usage frequency based on gender of poster (extracted from the w4m or m4w header), city, day of posting. The program should allow the user to choose a range of dates (extracted from the timestamps) to pull the statistics from. Outputs: (1) XML files with the archived feeds for each city and craigslist category (2) Daily CSV files listing 100 most frequent words categorized by each gender , city, and age group (excluding articles and common modifiers like "a" "an" "the" "for" etc). An example text file would look like this: header: 07-01-2009,female, atlanta, 20-25 love,112 passion,93 independent,56 caring, 46 ...... ....... CSV files should also be generated for all cities, and all age groups. Headers for these files would look like this header: 07-01-2009,female, all, 20-25 header: 07-01-2009, female, miami, all header: 07-01-2009, female, all, all
Project ID: 460955

About the project

1 proposal
Remote project
Active 15 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
1 freelancer is bidding on average $500 USD for this job
User Avatar
Please check my PM for details.
$500 USD in 7 days
5.0 (4 reviews)
3.6
3.6

About the client

Flag of INDIA
Boston, India
5.0
27
Payment method verified
Member since Jul 1, 2009

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.