Scraping real estate prices using python and visualization using maps

TL;DR

An interactive map, accurate as of 13/08/2018 showing property prices per square meter in various areas of Tallin:

https://dvas0004.github.io/TallinnRealEstate/

Data shown is for 3-bedroom apartments (resource limitations). Green is less expensive, red is more expensive. Clicking on a data point will show a popup containing the actual price per square meter for that data point


 

As any house/apartment hunter knows, finding the perfect place to call home is an arduous and drawn-out process. In this show-and-tell article I’ve used python to scrape data from one of the most popular Estonian real-estate sites (https://kv.ee) and display the median price per square meter at different locations across Tallinn:

tallin_property_1

The above is a screenshot of the final result, which you can browse here:
https://dvas0004.github.io/TallinnRealEstate/

Note: the map only shows results for 3-bedroom apartments due to resource limitations. Green is cheaper, red is more expensive

Tip: click on the individual data points to display a popup showing the actual price per square meter.

Technical description

The actual code is posted at the end of this article. The main ingredients for this script were the python “requests” and “requests_html” modules. Admittedly, I could have used just one module but I did want to try out the HTML parsing capabilities of the requests_html module. For simplicity’s sake, the script outputs a static HTML file which can then be loaded into the browser or github pages like I did above. A more sophisticated approach would be to use a Python web framework like Flask to host the web page directly.

Scraping the data involved inspecting the web traffic between the browser and KV.EE, specifically when using the “Search by Map” functionality on the site. One the appropriate search filters are set, and the map is centered around the area you’d like to search within, pressing the “search” button issues a request via a URL similar to that shown on line 24 in the code below. The parameters I was particular interested in were the parameters describing the map area to search:

  • nelng / nelat : north east longitude / latitude (the top right corner of the map)
  • swlng / swlat : south west longitude / latitude (the bottom left corner of the map)

This allows us to issue different requests for the areas within which we’d like to scrape data, as was done in lines 154-185 from the code snippet below.  The “get_area_objects” class method gets a list of object IDs representing apartments, and their corresponding co-ordinates.

At this stage, we have the co-ordinates for the apartments, but we need to get their price and area in order to calculate their price per meter squared. This is what the “get_object_details” class method does – and it is here that requests_html really shines since it makes it very easy to extract the data we require

In the final stage, the “get_html” method uses Leaflet to build a map over which we display our data – circles representing the price per meter. I used an elegant JavaScript function (perc2color) in line 111 to convert from number/price to color

Advertisements

Lessons learned: Spring Data Postgres application configuration

Scenario: During development of a Spring Boot application, with a PostgreSQL backend, we randomly observe errors such as:

hikaripool-1 - connection is not available, request timed out after 30000ms

 

FATAL: remaining connection slots are reserved for non-replication superuser connections

 

Solution:

There are a few checks to perform:

  1. Ensure that you use the correct Spring annotations to support the JpaRepository, such as the @Transactional and @Modifying annotations.  These help in making sure connections are closed and flushed
  2. Pay special attention to places in your code where long running or computationally heavy database queries are made. You can check which queries are currently active on your PostgreSQL database by logging into the psql command line and running the command: “select * from pg_stat_activity“. Ideally, most of the connection slots displayed should be in the “idle” state.
  3. The definitive solution in our case was to modify the connection pool properties in the application.properties file of the Spring Boot application, as follows:
spring.datasource.hikari.minimumIdle=5
spring.datasource.hikari.maximumPoolSize=20
spring.datasource.hikari.idleTimeout=30000
spring.datasource.hikari.poolName=SpringBootJPAHikariCP
spring.datasource.hikari.maxLifetime=2000000
spring.datasource.hikari.connectionTimeout=30000