Scraping real estate prices using python and visualization using maps

TL;DR

An interactive map, accurate as of 13/08/2018 showing property prices per square meter in various areas of Tallin:

https://dvas0004.github.io/TallinnRealEstate/

Data shown is for 3-bedroom apartments (resource limitations). Green is less expensive, red is more expensive. Clicking on a data point will show a popup containing the actual price per square meter for that data point


 

As any house/apartment hunter knows, finding the perfect place to call home is an arduous and drawn-out process. In this show-and-tell article I’ve used python to scrape data from one of the most popular Estonian real-estate sites (https://kv.ee) and display the median price per square meter at different locations across Tallinn:

tallin_property_1

The above is a screenshot of the final result, which you can browse here:
https://dvas0004.github.io/TallinnRealEstate/

Note: the map only shows results for 3-bedroom apartments due to resource limitations. Green is cheaper, red is more expensive

Tip: click on the individual data points to display a popup showing the actual price per square meter.

Technical description

The actual code is posted at the end of this article. The main ingredients for this script were the python “requests” and “requests_html” modules. Admittedly, I could have used just one module but I did want to try out the HTML parsing capabilities of the requests_html module. For simplicity’s sake, the script outputs a static HTML file which can then be loaded into the browser or github pages like I did above. A more sophisticated approach would be to use a Python web framework like Flask to host the web page directly.

Scraping the data involved inspecting the web traffic between the browser and KV.EE, specifically when using the “Search by Map” functionality on the site. One the appropriate search filters are set, and the map is centered around the area you’d like to search within, pressing the “search” button issues a request via a URL similar to that shown on line 24 in the code below. The parameters I was particular interested in were the parameters describing the map area to search:

  • nelng / nelat : north east longitude / latitude (the top right corner of the map)
  • swlng / swlat : south west longitude / latitude (the bottom left corner of the map)

This allows us to issue different requests for the areas within which we’d like to scrape data, as was done in lines 154-185 from the code snippet below.  The “get_area_objects” class method gets a list of object IDs representing apartments, and their corresponding co-ordinates.

At this stage, we have the co-ordinates for the apartments, but we need to get their price and area in order to calculate their price per meter squared. This is what the “get_object_details” class method does – and it is here that requests_html really shines since it makes it very easy to extract the data we require

In the final stage, the “get_html” method uses Leaflet to build a map over which we display our data – circles representing the price per meter. I used an elegant JavaScript function (perc2color) in line 111 to convert from number/price to color


# only python 3.6 supported
# sudo pipenv –python=3.6 install requests requests_html numpy
import requests
import numpy as np
from requests_html import HTMLSession
class KVBuilder:
def __init__(self):
self.discovery_url=''
self.session = HTMLSession()
self.data_objects = []
self.max_price=0
def get_object_details(self, object_id):
r = self.session.get('http://kinnisvaraportaal-kv-ee.postimees.ee/?act=search.objectinfo&object_id={}'.format(object_id))
absolute_size = int(r.html.find('span.sep', first=True).text.split('\xa0')[0].strip('|'))
absolute_price = int(''.join(r.html.find('p.object-price strong', first=True).text.split('\xa0')[0:2]))
relative_price = float(absolute_price)/float(absolute_size)
return relative_price
def get_area_objects(self, nelat, nelng, swlat, swlng, rooms):
self.discovery_url='http://kinnisvaraportaal-kv-ee.postimees.ee/?act=search.objectcoords&last_deal_type=1&company_id=&page=1&orderby=ob&page_size=10000&deal_type=1&dt_select=1&county=1&search_type=new&parish=1061&rooms_min={}&rooms_max={}&price_min=&price_max=&nr_of_people=&area_min=&area_max=&floor_min=&floor_max=&energy_certs=&keyword=&cluster=true&nelat={}&nelng={}&swlat={}&swlng={}&zoom=15'.format(rooms, rooms, nelat, nelng, swlat, swlng)
kv_request = requests.get(self.discovery_url)
kv_json_response = kv_request.json()
print(kv_json_response)
if type(kv_json_response)==dict:
kv_markers = kv_json_response['markers']
for marker in kv_markers:
try:
lng = marker['1']
lat = marker['0']
if 'object_ids' in marker:
objects = marker['object_ids'].split('.')
elif 'object_id' in marker:
objects = marker['object_id'].split('.')
else:
continue
relative_prices=[]
for apartment in objects:
relative_price = self.get_object_details(apartment)
relative_prices.append(relative_price)
median_price = np.median(relative_prices)
if median_price>self.max_price:
self.max_price=median_price
result = {
'lng': lng,
'lat': lat,
'price': median_price
}
self.data_objects.append(result)
print(result)
except Exception as e:
print(e)
continue
else:
for marker in kv_json_response:
try:
lat = marker[0]
lng = marker[1]
apartment = marker[2]
relative_price = self.get_object_details(apartment)
result = {
'lng': lng,
'lat': lat,
'price': relative_price
}
self.data_objects.append(result)
print(result)
except Exception as e:
print(e)
continue
def get_html(self):
html = '''
<html>
<head>
<title>TLN Real Estate</title>
<meta name="viewport" content="initial-scale=1.0">
<meta charset="utf-8">
<link rel="stylesheet" href="https://unpkg.com/leaflet@1.3.3/dist/leaflet.css&quot;
integrity="sha512-Rksm5RenBEKSKFjgI3a41vrjkw4EVPlJ3+OiI65vTjIdo9brlAacEuKOiQ5OFh7cOI1bkDwLqdLw3Zg0cRJAAQ=="
crossorigin=""/>
<script src="https://unpkg.com/leaflet@1.3.3/dist/leaflet.js&quot;
integrity="sha512-tAGcCfR4Sc5ZP5ZoVz0quoZDYX5aCtEm/eu1KhSLj2c9eFrylXZknQYmxUssFaVJKvvc0dJQixhGjG2yXWiV9Q=="
crossorigin=""></script>
<style>
#map {
height: 100%;
}
/* Optional: Makes the sample page fill the window. */
html, body {
height: 100%;
margin: 0;
padding: 0;
}
</style>
<body>
<div id="map"></div>
<script>
function perc2color(perc) {
var r, g, b = 0;
if(perc < 50) {
r = 255;
g = Math.round(5.1 * perc);
}
else {
g = 255;
r = Math.round(510 – 5.10 * perc);
}
var h = r * 0x10000 + g * 0x100 + b * 0x1;
return '#' + ('000000' + h.toString(16)).slice(-6);
}
var mymap = L.map('map').setView([59.437291, 24.745194], 12);
L.tileLayer('https://api.tiles.mapbox.com/v4/{id}/{z}/{x}/{y}.png?access_token=pk.eyJ1IjoiZHZhczAwMDQiLCJhIjoiY2prczdrMDRmMTg4ejNxbG1ndXFqYjZ3biJ9.BFxa0UpSh3dHg2pmDZSDYA', {
attribution: 'Map data &copy; <a href="https://www.openstreetmap.org/">OpenStreetMap</a&gt; contributors, <a href="https://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a&gt;, Imagery © <a href="https://www.mapbox.com/">Mapbox</a>&#39;,
maxZoom: 18,
id: 'mapbox.streets',
accessToken: 'pk.eyJ1IjoiZHZhczAwMDQiLCJhIjoiY2prczdrMDRmMTg4ejNxbG1ndXFqYjZ3biJ9.BFxa0UpSh3dHg2pmDZSDYA'
}).addTo(mymap);
'''
counter=0
for data_point in self.data_objects:
html = html+'var circle'+str(counter)+'= L.circle(['+str(data_point['lat'])+', '+str(data_point['lng'])+'], { color: perc2color('+str((((data_point['price']/self.max_price)*100)100)*1)+'), fillColor: perc2color('+str((((data_point['price']/self.max_price)*100)100)*1)+'), fillOpacity: 0.5,radius: 10}).addTo(mymap);'
html = html+'''
circle{}.bindPopup('{}')
'''.format(counter,data_point['price'])
counter+=1
html += '''
</script>
</body>
</html>
'''
return html
kv = KVBuilder()
# Rocca Al Mare
kv.get_area_objects('59.45522849665097','24.67078002286371', '59.424680606517576', '24.566238244299257', '3')
# Haabersti
kv.get_area_objects('59.42307949118309','24.66723950696405', '59.40779447737271', '24.614968617681825', '3')
# Mustamae
kv.get_area_objects('59.410502611228274','24.703073819403016', '59.395211919711365', '24.65080293012079', '3')
# Kristiine
kv.get_area_objects('59.42755451066967','24.730067570928895', '59.412271517239816', '24.67779668164667', '3')
# Kassisaba
kv.get_area_objects('59.43475678748747','24.746488054518068', '59.42711777891242', '24.720352609876954', '3')
# Pengulinn
kv.get_area_objects('59.441852989733874','24.733390257580368', '59.43421558315144', '24.707254812939254', '3')
# Kalamaja
kv.get_area_objects('59.44942323101878','24.749011442883102', '59.441787533574114', '24.72287599824199', '3')
# Vanalinn
kv.get_area_objects('59.44256209001308','24.761220858318893', '59.43492484351895', '24.73508541367778', '3')
# Kesklinn
kv.get_area_objects('59.43536130404822','24.76802294038066', '59.42772243194037', '24.741887495739547', '3')
# Kadriog
kv.get_area_objects('59.44156934546018','24.78900854371318', '59.433931874841804', '24.762873099072067', '3')
# Pirita
kv.get_area_objects('59.45800730477103','24.839511295335114', '59.44273806263683', '24.787240406052888', '3')
print(kv.get_html())
# TODO
# save html to file

Advertisement
Privacy Settings

2 thoughts on “Scraping real estate prices using python and visualization using maps

Comments are closed.