Blog

Fixing MLS Data Chaos: How to Design RESO-Compliant Real Estate Integrations

Your MLS integration works perfectly in staging. Listings are syncing, photos are loading, statuses are updating in near real-time. You launch to production. For the first week, everything looks fine. Then, on a Tuesday morning, a broker calls: a listing that went active yesterday still shows as “Coming Soon” on the platform. An agent refreshes repeatedly. The status doesn’t change. Nobody on the engineering team got an alert. The import job ran, returned a 200, and logged zero errors – but forty listings across two MLS boards silently failed to update because a field mapping broke when one board deployed a schema change at 2am.

This is the scenario that every development team building on MLS data eventually encounters. And it’s almost never a dramatic failure – no 500 errors, no database crashes, no obvious moment of breakage. It’s a quiet, invisible degradation of data accuracy that agents experience as “the system doesn’t work” and that leads to exactly the kind of trust erosion that causes platforms to lose engagement before the engineering team even knows the problem exists.

Building MLS integrations that survive production is not primarily a question of knowing the API – it’s a question of designing for the ways the API will fail, drift, and behave unexpectedly across dozens of boards over years of operation. That’s what this post is about.

RETS vs RESO Web API: Where Things Actually Stand in 2025

If you’re still maintaining a RETS integration, the clock has essentially run out. RESO’s Data Dictionary 2.0 standard passed in April 2024, and by NAR’s own requirement, all MLSs operated by REALTOR associations were required to certify to the new standard within one year – meaning the April 2025 deadline has now passed. WARDEX, on CoreLogic’s Trestle platform, was the first MLS in the country to achieve RESO Data Dictionary 2.0 certification, and CoreLogic committed to having all Trestle-connected MLSs certified well ahead of deadline. RETS support across major data providers is being retired: multiple MLS software vendors have formally ended RETS support as of late 2024 and early 2025.

The practical implication is straightforward: any new MLS integration should be built exclusively on the RESO Web API. Any existing RETS integration that hasn’t been migrated is now operating on deprecated infrastructure with an increasingly short support window. The migration is not trivial – the data model is different enough that a RETS-to-Web-API migration often requires rethinking the field mapping layer from scratch rather than just swapping the transport protocol – but it’s work that has to happen regardless.

The RESO Web API is built on OData over REST with OAuth2 authentication. If you’ve integrated with modern third-party APIs in any other context, the base architecture will feel familiar. What won’t feel familiar is the real estate-specific complexity layered on top: the fragmentation across boards, the field naming conventions that are standardized in theory but variable in practice, the custom fields that individual MLSs add beyond the Data Dictionary, and the data licensing requirements that vary by board and by access tier. The API is the easy part. The ecosystem around it is where the work is.

Understanding Regional Fragmentation: Why One Integration Is Never Enough

There are over 500 active MLSs in the United States. They share a common standard – the RESO Data Dictionary – but they do not share common data. Each MLS governs its own market, maintains its own listings, and configures its own implementation of the standard. The standardization that RESO provides is real and meaningful: field names like ListPrice, BedroomsTotal, StandardStatus, and ModificationTimestamp are consistent across boards in ways they weren’t in the RETS era. But the edges – the places where boards implement optional fields, custom enumerations, and local business rules – are where fragmentation still creates significant integration work.

Property status is the canonical example. RESO’s Data Dictionary defines StandardStatus with a fixed set of values: Active, Active Under Contract, Cancelled, Closed, Coming Soon, Delete, Expired, Hold, Incomplete, Pending, Withdrawn. But individual MLS boards also expose their own MlsStatus field, which reflects the board’s local taxonomy. In CRMLS, the nation’s largest MLS with over 110,000 subscribers across 39 associations, MlsStatus values include “Active,” “Back On Market,” “Hold Do Not Show,” “Pending,” and others that don’t map cleanly to StandardStatus without board-specific translation logic. In NTREIS (North Texas), the local status taxonomy uses different terminology again. In Bright MLS (Mid-Atlantic), the convention is different once more.

If your platform displays listing status to users – and nearly every platform does – you need a mapping table for each board that translates the board’s local MlsStatus values into your internal status model. This table needs to be maintained: when a board adds or renames a status value (which happens more frequently than you’d expect, especially after mergers between boards), your mapping breaks silently unless you have monitoring that alerts on unmapped enumeration values.

Field availability is the other fragmentation dimension that causes production problems. The RESO Data Dictionary defines hundreds of fields, but no MLS exposes all of them, and individual boards expose custom fields that aren’t in the dictionary at all. School district, flood zone, HOA details, solar panel status, ADU presence – these are all data points that buyers and platforms care about, but their availability and naming vary significantly across boards. When you’re designing a search filter or a listing display template, you can’t assume that a field that exists in CRMLS also exists in Bright MLS or NTREIS. Your application layer needs to handle gracefully the case where a field is absent, and your field mapping layer needs to be queryable – so the engineering team can see, for a given board, which fields are present and at what fill rate.

The ListingKey vs ListingId Problem That Trips Up Every Team Eventually

There’s a specific technical mistake that appears in a surprising number of MLS integrations, and it’s worth naming directly: using ListingId as your primary database key for listings instead of ListingKey.

ListingId is the identifier that agents and brokers recognize – it’s the MLS number they reference when discussing a listing. ListingKey is the globally unique identifier assigned by the MLS system to the record itself. The critical distinction, documented explicitly in the Trestle API documentation, is that ListingId is not guaranteed to be unique across MLSs. Two different boards can independently assign the same ListingId value to two completely different properties. If your database uses ListingId as a unique key and you’re syncing from multiple boards, you will eventually get a collision – one listing’s data silently overwriting another’s – with no error, no warning, and no way to detect the problem until an agent notices that a listing in Dallas is showing Phoenix property details.

ListingKey is unique within a board. To create a genuinely unique identifier across boards, the standard pattern is a composite key: OriginatingSystemName (the board identifier) concatenated with ListingKey. This composite becomes your internal record identifier. It’s a simple decision, but retrofitting it after you’ve already built a database on ListingId is a multi-day migration with associated risk. Make it on day one.

Sync Architecture: Polling vs Webhooks, and What Actually Works

The question of how frequently to sync MLS data, and through what mechanism, is where most MLS integration architectures make tradeoffs that come back to haunt them.

The RESO Web API’s EntityEvent resource – the specification that defines how MLSs can push change notifications to data consumers – was added to the RESO standard as a formal endorsement, and some forward-thinking boards now support webhooks that fire in near real-time when listings are created, modified, or deleted. Where webhooks are available, they’re the right choice for any data category that users interact with directly: listing status, price changes, new listings. The latency between a listing going Pending on the MLS and that status appearing on your platform can drop from fifteen minutes to under sixty seconds.

The reality, though, is that webhook support is still inconsistent across the MLS landscape. Many boards – particularly smaller regional ones that haven’t yet modernized their infrastructure – support polling only. Even among boards that nominally support the EntityEvent endorsement, reliability and delivery guarantees vary. The pragmatic architecture for any platform that needs to cover multiple markets is a polling-first design with webhook augmentation: a robust polling infrastructure handles the baseline sync across all boards, and webhook subscribers are layered on top for boards that support them, providing faster updates where available without making the polling layer redundant.

For polling, the ModificationTimestamp filter is the standard approach. Each sync run queries the board for all records where ModificationTimestamp is greater than the last successful sync timestamp, processes the delta, and advances the watermark. This requires storing the watermark reliably – if the watermark advances before processing completes and the job fails mid-run, you’ll miss records. The safe pattern is to advance the watermark only after successful processing of the full delta, with a small overlap window (typically five to ten minutes behind the true last-sync time) to account for clock skew between your system and the board’s servers.

Rate limiting is where polling architectures most frequently break in production. Trestle, which aggregates data from over 100 MLSs through a single API endpoint, enforces a documented quota of 7,200 API queries per hour and 180 queries per minute for WebAPI requests, with a separate quota of 18,000 requests per hour and 480 per minute for media (photo) URLs. Those limits are per connection per product type – if you have multiple MLS connections through Trestle, each gets its own quota. But they’re still limits, and a sync job that doesn’t respect them will start receiving 429 rate limit responses.

The failure mode that matters here isn’t the 429 itself – that’s visible and handleable. It’s the behavior of a sync job that doesn’t handle 429s correctly. A job that treats a rate limit response as a failure, increments an error counter, and immediately retries will burn through your remaining quota in seconds and put the job into a retry loop that holds the quota at zero for the rest of the hour. The correct handling is exponential backoff with jitter: when a 429 is received, pause for an initial delay (typically 30-60 seconds for MLS integrations), then retry. If the retry also returns a 429, double the delay and retry again, with a cap on maximum delay and maximum retry attempts before the job parks the batch and moves on.

Field Mapping: The Layer Nobody Wants to Own

Field mapping is the operational work that makes MLS integration sustainable over time, and it’s consistently underestimated in project planning because it looks like a configuration task rather than an engineering task. At two boards, it is a configuration task. At twenty boards, it’s a system.

Each board’s data needs to be mapped to your internal property schema in several dimensions: field presence (which fields does this board expose?), field naming (is the field named the same as the RESO standard, or does it use a board-specific name?), enumeration mapping (what are the valid values for this field on this board, and how do they map to your internal taxonomy?), and data quality (what percentage of records on this board have this field populated, and is the populated data reliable enough to use in search filters?).

The mapping layer needs to be stored in the database as configuration, not hardcoded into the application. When a board changes a field name or adds a new enumeration value – which you will not be notified about; you’ll discover it when your monitoring detects unmapped values – the fix should be a configuration update, not a code deploy. The schema for this configuration typically includes: board identifier, source field name, target internal field name, transformation function (if any – e.g., converting square footage from a string with commas to a clean integer), and active/inactive flag.

Data Dictionary 2.0, which all NAR-affiliated MLSs were required to certify to by April 2025, improves this situation meaningfully. The 2.0 standard increases the density and consistency of standardized fields that boards are required to expose, and the RESO Validation Expressions feature – machine-readable business rules baked into the API – allows platforms to automatically enforce data quality constraints at ingestion time. A listing that comes in with a $0 list price should be flagged by validation rules before it hits your search index, not after an agent calls to report a $0 listing appearing in search results. Building validation as part of the ingestion pipeline, not as a downstream check, is the architecture that keeps garbage data out of the system before it causes user-facing problems.

Monitoring: The Infrastructure That Distinguishes Production-Ready Integrations

The gap between an MLS integration that works in the first three months and one that still works reliably in year three is almost entirely explained by monitoring. Without active observability into the health of each board’s sync, you don’t know what you don’t know – and MLS integrations fail in ways that are invisible to application error monitoring.

The monitoring layer for a production MLS integration needs to track, at minimum: sync run completion per board (did the job run, did it complete, how long did it take), record count delta per run (if a board that normally syncs 50 updates per day suddenly syncs zero, that’s not normal – it’s either a connection failure or a board outage), photo import queue depth and processing lag (photos are a separate pipeline from listing data and fail independently), rate limit consumption per board per hour (approaching the quota ceiling before market close is a signal to adjust job scheduling), unmapped field values (new enumerations that appeared in the feed but haven’t been mapped yet), and the age of the most recently modified listing per board (a board that hasn’t sent a ModificationTimestamp update in three hours during business hours is probably experiencing an issue on their end).

Trestle’s documentation notes that listing data should be updated within five minutes of source MLS change, and images within fifteen minutes. When your monitoring shows that a board’s average update latency has drifted to forty-five minutes, that’s a conversation to have with Trestle support – with timestamps and a specific board identifier – before the delay becomes visible to agents on your platform.

Alerting needs to be calibrated differently for different failure types. A complete board outage should page someone immediately; a data quality degradation (higher-than-normal percentage of listings missing key fields) should go into a daily digest for the data team rather than waking up an engineer at 3am. The escalation path for a board that hasn’t synced in six hours on a weekday is different from the escalation path for a board that hasn’t synced over a holiday weekend when many boards suspend operations. These distinctions need to be encoded in the alerting configuration rather than leaving the on-call engineer to make judgment calls in the middle of the night.

IDX vs VOW: Compliance Requirements That Shape Your Data Access

How you access and use MLS data is governed by your feed type, and choosing the wrong feed type for your use case creates legal exposure that can result in loss of MLS access – the kind of infrastructure risk that’s worth understanding clearly before you build.

IDX (Internet Data Exchange) access is the standard feed type for displaying active listings on consumer-facing websites. It covers active listings from the board and participates in inter-board data sharing agreements. IDX has specific display requirements: certain listing attribution fields (listing agent name, listing broker name, the MLS’s IDX copyright notice) must be displayed with each listing, and there are rules about how long listings can remain displayed after they’ve been removed from the MLS. IDX does not include sold data in most implementations – agents can display active listings to consumers, but historical closed sale data requires a separate data product.

VOW (Virtual Office Website) access is broader: it allows display of additional data categories (some sold data, some off-market data) to registered users who have signed a buyer representation agreement with the brokerage. If your platform involves users signing up and registering before accessing full listing data, VOW may be appropriate – but it comes with additional compliance requirements around user registration and data usage that IDX doesn’t have.

The distinction matters architecturally because the compliance requirements aren’t just display rules – they determine what data your platform is licensed to store, how long you can retain it, and what you can do with it downstream. Using an IDX feed to power a data analytics product that stores historical listings beyond the permitted retention period is a license violation regardless of whether anyone notices in the short term. Building the data architecture with the feed type’s compliance constraints as first-class requirements – not as an afterthought – is the difference between a platform that can scale without legal exposure and one that has a hidden compliance risk embedded in its infrastructure.

The Aggregator Question: Build Direct or Use a Platform Like Trestle?

For most real estate platforms that need multi-market MLS coverage, the decision between building direct integrations with each board versus using a data aggregation platform like Trestle (now operated by Cotality, formerly CoreLogic) is one of the most consequential early architecture choices.

Building direct integrations gives you maximum control: direct relationships with each board, no intermediary terms of service, and the ability to negotiate specific data access requirements that aggregators’ standard products don’t accommodate. It also means managing dozens of individual licensing agreements, data feeds, authentication credentials, and monitoring configurations – a meaningful operational overhead that grows with every market you add.

Trestle and similar aggregators (Bridge Interactive is another) provide normalized data access across 100+ MLSs through a single API endpoint and a single licensing process. Trestle’s documentation commits to listing updates within five minutes of source MLS change and image updates within fifteen minutes – a standard that would require significant engineering to achieve independently across a large number of direct connections. The tradeoff is that you’re adding a layer of dependency: when Trestle has infrastructure issues, every connected MLS is affected simultaneously, and you’re subject to Trestle’s own terms of service constraints in addition to the underlying MLS agreements. Trestle is also in the process of migrating its API URL to a new host before the end of 2025 – the kind of infrastructure change that requires engineering time regardless of whether it affects functionality.

The decision framework we use is roughly this: if you need fewer than ten MLS boards and the markets you’re targeting have boards that support direct RESO Web API access cleanly, build direct – the control and simplicity of direct relationships is worth more than the aggregator overhead. If you need broad national coverage across dozens of markets, or if you need to onboard markets rapidly as the business grows, the aggregator approach is almost certainly the right choice despite the dependency it introduces.

Either way, the integration architecture principles are the same: asynchronous processing, explicit rate limit handling, field mapping as configuration, monitoring with board-level granularity, and compliance constraints embedded in the data model from the start. Those don’t change based on how you’re connecting – they’re the requirements of running on MLS data in production at any meaningful scale.

If you’re building a platform that depends on MLS data reliability – whether it’s a brokerage CRM syncing agent listings or a marketplace aggregating inventory across multiple markets – the architectural decisions we’ve covered here are the ones we work through in every integration engagement. We’ve designed real estate brokerage software and marketplace infrastructure for clients operating across dozens of MLS boards, and the patterns that hold up under load are consistently the same: decouple sync from display, design for the failure cases first, and treat field mapping as a system rather than a configuration file. If you’re hitting the limits of your current integration or building from scratch, let’s talk through the architecture.

vikas patel

Next From Spreadsheets to Systems: Designing a Real Estate Deal Pipeline That Actually Scales »

Previous « Real Estate Marketplace Development: Architecture, Search, and Monetization Models

Published by

vikas patel

3 months ago

Microservices and Scaling Patterns for Growing Real Estate Platforms

The microservices conversation in real estate software development usually gets started by one of three…

3 months ago

Blog

Architecture Patterns for Real Estate Platforms: What Works, What Doesn’t, and Why

Architecture conversations in software development have a tendency to become abstract quickly - patterns discussed…

3 months ago

Blog

Modernizing Legacy Real Estate Systems: Strategies, Sequencing, and the Cost of Waiting

Legacy real estate systems don't announce their obsolescence. They don't fail dramatically or produce a…

3 months ago

Blog

Advanced Search and Discovery for Real Estate Marketplaces: Filters, Maps, and Recommendations

Search is the product in a real estate marketplace. Not the listing detail page, not…

3 months ago

Blog

Payments and Escrow in Real Estate Platforms: Architecture, Compliance, and Fraud Prevention

Real estate transactions move more money than almost any other consumer context. An earnest money…

3 months ago

Blog

Analytics and Dashboards for Real Estate Platforms: Turning Operational Data Into Decisions

Most real estate platforms have more data than they use. The property management system knows…

3 months ago

Fixing MLS Data Chaos: How to Design RESO-Compliant Real Estate Integrations

RETS vs RESO Web API: Where Things Actually Stand in 2025

Understanding Regional Fragmentation: Why One Integration Is Never Enough

The ListingKey vs ListingId Problem That Trips Up Every Team Eventually

Sync Architecture: Polling vs Webhooks, and What Actually Works

Field Mapping: The Layer Nobody Wants to Own

Monitoring: The Infrastructure That Distinguishes Production-Ready Integrations

IDX vs VOW: Compliance Requirements That Shape Your Data Access

The Aggregator Question: Build Direct or Use a Platform Like Trestle?

Related Post

Recent Posts

Microservices and Scaling Patterns for Growing Real Estate Platforms

Architecture Patterns for Real Estate Platforms: What Works, What Doesn’t, and Why

Modernizing Legacy Real Estate Systems: Strategies, Sequencing, and the Cost of Waiting

Advanced Search and Discovery for Real Estate Marketplaces: Filters, Maps, and Recommendations

Payments and Escrow in Real Estate Platforms: Architecture, Compliance, and Fraud Prevention

Analytics and Dashboards for Real Estate Platforms: Turning Operational Data Into Decisions