Find Jobs
Hire Freelancers

Geographic Data Scrapper

$250-750 USD

Closed
Posted about 15 years ago

$250-750 USD

Paid on delivery
-- Scope -- Create a command-line Python program capable of scrapping places information from the ‘Satellite + old places' map type on Wikimapia Beta website – [login to view URL] – given a bounding box. The bounding box is defined by a pair of coordinates – latitude and longitude (decimal degrees) in WGS84 coordinate system – in the following format: (minimum latitude, minimum longitude), (maximum latitude, maximum longitude). -- Required Knowledge -- Python – good OO design and memory management skills, experience with Beautiful Soup (or equivalent) is recommended. Some experience with Google Maps API might be useful. -- Specifications -- -Target Operating Systems – Windows XP, Debian, Ubuntu -Language – Python 2.5(+) -Data Output Format – TSV, UTF-8 -Geometries Format – Well-Known Text (WKT) strings (see [login to view URL]) -Coordinate System – Latitude and Longitude decimal degrees on WGS84 -- Deliverables -- (See also ‘Project Milestones' below.) - Python script that fetches Wikimapia data for places in a given geographical area defined by a bounding-box; - Comprehensive documentation – user manual, setup and commented code; - Installer scripts for Windows XP, Debian and Ubuntu – listing any external dependencies and their setup procedures. -- Requirements -- Small Memory/Disk Usage Foot-Print – the program has to use memory and disk space efficiently, via built-in house-keeping procedures to avoid leaving temporary files or to consume big chunks of memory unnecessarily. No Wikimapia DOS – the program has to have random time intervals between requests to Wikimapia website and/or other measures to avoid over-stressing Wikimapia resources. Completeness - the program has to account for the complete set of places existing in the given bounding-box. The places retrieval mechanism has to be aware of different map levels contents – not all places, if any, appear at every map level - and to be able to record information about every place present in the bounding-box once (and only once). Tasks Script File - the program has to be able to sub-divide a task into smaller tasks – e.g.: by sub-dividing the original bounding-box into smaller bounding-boxes – generating a tasks script. In order to be able to distribute a task across several machines, the program has to be able to interpret this tasks script – or a subset of it – and to process the sequence of tasks it describes. The tasks script can be an argument – as a path for a text file - for the command-line program and, when present, is a replacement of the bounding-box argument. The aggregation of results from the processing of several subsets of a tasks script by distinct program copies has to be equal to the processing of the complete tasks script by a sole copy of the program. Log File – the program has to have the ability to record (with time-stamps) its steps, warnings and errors in order to guarantee the possibility to restart a task from a specific point. Data to Scrap - the places' information to extract from Wikimapia is as follows: -Label – map place tooltip (equivalent to Google Maps API GMarker Title); -Outline or Envelope – polygon that defines the boundaries of the place (Note: ‘old places' have envelopes, other places have outlines but all of these and those are polygons); -Centroid – coordinates in top right corner of info window, converted to decimal degrees; -Categories – text after “Category: “ on info window; -Description – description in info window; -Permalink – permalink URL in info window; -Languages – language acronym in bottom right corner of info window; -Last Edit Date – converted in year/month/day format from text after “Edited: “ in bottom left corner of info window. Output Format - the collected data is to be exported to a UTF-8 tab separated values file with 8 fields: -“label” – text; -“envelope” – WKT polygon string; -“centroid” – WKT point string; -“categories” – text, if multiple categories exist, separate by semi-colons; -“description” – text; -“permalink” – text; -“languages” – text, if multiple languages exist, separate by semi-colons; -“last_edit_date” – number, format ‘yyyymmdd'. -- Project Milestones -- If the developer agrees, partial payment will be processed on delivery and acceptance of the following working scripts: -[40%] Create a program that, given a bounding-box defined by a pair of coordinates: 1. Retrieves the above mentioned ‘data to scrap' for the places present in the highest level that encompass the bounding-box, and; 2. Produces a UTF-8 tab separated values file with the above mentioned ‘output format' and fills it with the scraped data. -[20%] Create an evolution of the previous program that: 1. Retrieves the above mentioned ‘data to scrap' for all the places present in every level that encompass the bounding-box; 2. Registers, if requested, the steps, warnings and errors of the previous task in a ‘log file' – one record per line with time-stamp, and; 3. Produces a UTF-8 tab separated values file with the above mentioned ‘output format' with one (and only one) record (line) of scraped data per place. -[40%]Create the final version of the program, which is able to: 1. Generate a ‘tasks script file' given a bounding-box, describing the data scrapping process in modular (atomic) steps in such way that subsets (lines) of that ‘tasks script file' may be processed by independent machines (using the same final version of the program); 2. Retrieve the above mentioned ‘data to scrap' for all the places present in every level that encompass a bounding-box or the correspondent ‘tasks script file' (or subset of it); 3. Register, if requested, the steps, warnings and errors of the previous task in a ‘log file' – one record per line with time-stamp; 4. Collate scraped data resulting from the processing of different but related subsets of a ‘tasks script file', and; 5. Produce a UTF-8 tab separated values file with the above mentioned ‘output format', with each place that exists in the bounding-box – or equivalent ‘tasks script file' – recorded once (and only once)
Project ID: 393420

About the project

3 proposals
Remote project
Active 15 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
3 freelancers are bidding on average $450 USD for this job
User Avatar
I am fluent with python, and I can be fully dedicated to this project if you select me.
$400 USD in 30 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Update: 550$/21 days. Likely to be faster, but we are talking about the guaranteed time frame, right?
$550 USD in 21 days
0.0 (0 reviews)
0.0
0.0
User Avatar
As python developer I'm ready to deliver you wikimapia GIS data scrapping tool in 28 days. However this time might be drastically reduced.
$400 USD in 28 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED ARAB EMIRATES
Dubai, United Arab Emirates
5.0
2
Member since Feb 27, 2009

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.