CMD delivered a cloud native data warehouse that combines RateMyAgent’s (RMA) global data into a single, consistent data mart to enable enhanced reporting and advanced data science use cases.

Overview

RateMyAgent provides a platform where property buyers and sellers can review the agents and brokers they’ve worked with generating ratings which other sellers in the market can use to make better-informed decisions when choosing the right agent for them. The ratings are based on recent reviews with presence across Australia, New Zealand and the United States.

The challenge

RMA have a rich set of data contained in RDS application databases across the AWS Sydney and California regions but no centralised store of the data to query and report on at a global level. They have a suite of Tableau reports that are constructed by running targeted queries at the source databases meaning scalability and the time to deploy new reports were a problem due to the amount of manual transformation required. The amount of manual processing required caused the RMA data team to struggle with report accuracy and consistency of results, which was eroding trust in reporting across the business.

The solution

CMD built a solution that continuously streams changes out of source RDS databases and then cleans, anonymises and centralises data during an extract, transform and load (ETL) process that outputs a single unified analytics schema. In order for global data to be centralised into a single region Snowflake data warehouse, all Personally Identifiable Information (PII) was hashed and encrypted to protect data sovereignty while retaining most of its useful features.

To reduce operational overhead and total cost of ownership (TCO) while also making the solution highly scalable, serverless tools were favoured and deployments fully automated.

The solution contains key data zones to define clean and prepared data as it moves from raw to a single type 2 slow changing dimension (SCD) data warehouse as the “single source of truth” where all reports and analysis are derived from.

Services deployed

AWS DMS

AWS Glue

AWS S3

AWS Lambda

Amazon SNS

Amazon Athena

The result

CMD helped RMA consolidate and structure all of their global data into a single data warehouse for the first time. This revolutionised the speed and accuracy of reports delivered to their executives via Tableau and other channels. All ETL pipelines are automated meaning RMA data analysts don’t have to do any of the manual transforms they used to do and can concentrate on delivering high value reports and analytics. RMA could quickly rework their existing reports to be driven from the new platform, rapidly got through their reporting backlog and onto more advanced use cases, such as data APIs and machine learning. Furthermore, RMA were able to start reporting on trends and develop reporting looking at change across longer time horizons, rather than just point in time reporting.