One of the most critical pieces of infrastructure for a scalable e-commerce platform is its product information manager (PIM). It acts as the source of truth for product data, allows team members to add and modify product information, and makes this product information available to various distribution channels.
This post is the third in a three-part series on the technical decisions that go into building a PIM (catch up on part one here and part two here). In this post, I’ll focus on how you can make your product data available to your distribution network.
In modern e-commerce businesses, it’s common for retailers to maintain a website, mobile application, third-party marketplace integrations, and possibly physical stores as well. As your product catalog grows, ensuring that each of these channels has the correct information becomes a bigger challenge, but that’s where a PIM can help.
In this post, I’ll share some of the major technical considerations your team should make. Whether your team builds its own PIM in-house or uses one from a third party like fabric, engineers must understand the channels, output formats, and technical considerations that go into building a PIM distribution system.
While your PIM stores product data, variations, and images, customers don’t make purchases through your PIM directly. They will access your catalog through one or more distribution channels that use data from your PIM, and you need to be consistent across each of them.
“In order to retain customers, the online shopping experience as well as customer service, needs to match that of the in-store experience. Shoppers are unforgiving when it comes to disruption, and in an age where they are spoilt by choice, a seamless experience online needs to be guaranteed and consistent.” –Fahad Alamoudi, Marketing Automation Manager at Transiris Corporation
While consistency is essential to prevent confusion and unmet expectations, each channel you use will have unique distribution requirements. This makes the engineering task of creating an interface for your PIM quite challenging. Before you start building the distribution arm of your PIM, consider the channels you need to support and how your team will get data from your PIM to the channel:
Once you know all the channels your PIM needs to support, you need to figure out how product data will be kept in sync between your PIM and the distributor. Typically, your team will use a manual, automated, or hybrid approach to data synchronization, but the method varies depending on the channel.
While it’s great if you can fully automate product distribution, you don’t always get that option, and automation comes with limitations. For example, third-party marketplaces can always change their feed requirements. When this happens, your engineering team will have to update any scripts they maintain to keep the two systems in sync.
Now that you’ve decided which distribution channels you need to support and how you can get data to each, you’re ready to start designing your PIM’s distribution system. Depending on whether you need to manually or programmatically distribute data from your PIM, you’ll likely have to support one or more of the following output formats: product feed, API, and manual data export.
A product feed is a single file or list of files hosted on the internet that contains all of your PIM’s product data. You can create a single feed for all your partners and distribution channels or create a unique version of your feed for each of them.
When I worked in the textbook industry a few years ago, every one of our partners used an XML feed to send us product data. Unfortunately, each of these feeds was unique and had its own quirks based on the retailer’s software stack. This made the engineering task of pulling data and changes from the feeds quite cumbersome.
If you’re supplying a product feed to partners, they might give you their own technical requirements that you’ll have to abide by. You’ll likely need to apply some kind of adapter pattern to your application to handle this.
APIs are an integral part of most web and mobile applications, so while implementation patterns vary (REST, SOAP, GraphQL, etc.), it’s very likely that your PIM will need an API of some form. Of course, building a robust API requires a lot of work, and it doesn’t stop when you release your PIM:
“While developing and testing your API plays a big part in the process, the real work does not end here. You need to continue providing support, even before the code is deployed to production.” –Theofanis Despoudis, Senior Software Engineer, Teckro
Your PIM’s API is likely to be just one part of a suite of services required to power your e-commerce store. You’ll also need an order management system, pricing API, and a service mesh to coordinate between them.
Finally, the most widely supported yet time-intensive method of marshaling data from your PIM to your distribution channels is via manual data exports. While most PIMs should support this method, it provides limited value for large retailers.
“[Manual export] is generally only useful for test purposes. However, if your product feed contains fewer than 50 items or if it doesn’t change very often, you may consider manual upload to be a good, simple method.” –Hannah Augur
Typically, data exports allow users to download a CSV or XML file that contains some or all of your product data. If you have a lot of products, you probably need to give users a way to filter your catalog before they export the file because working with 10,000+ line spreadsheets can be troublesome.
Since most PIMs must support at least two of the above methods, the engineering challenge quickly balloons. This is one of the reasons that using a third-party PIM like Fabric can save you a lot of time and headache. Fabric gives you the ability to share data with both online and offline channels using exports or its robust e-commerce API.
Whether you support a product feed, an API, manual data exports, or all three, there are a few specific things software engineers should consider. While these are always important elements in software development, each of them has unique implications when building a PIM.
Creating good docs for your APIs and data feeds is hard. Keeping them up to date as your product catalog evolves is even harder.
“Outdated docs are the biggest pet peeve of API users… Developers often write about updates several days after rolling them out, sometimes limiting themselves to a few sentences because there’s no established process for docs updates and it’s no one’s responsibility.” –Altexsoft
What makes this documentation especially hard when building a PIM is that not all of your users will be developers. Your documentation might be used by integration partners, warehouse workers, interns, or data scientists. This further complicates the process of writing strong documentation for each of your PIM’s output formats.
All software needs logging to ensure that developers can quickly respond to issues, but a PIM also requires internal reporting and usage tracking. This helps your team see which product data is being accessed most frequently and by which channels so they can update it accordingly.
Because speed is so important in distributing your product data, you’ll likely need to implement some form of caching. I’ve typically seen flat file caches in e-commerce product feeds, but APIs might need more complex caching mechanisms that handle various input parameters. Of course, the trade-off when caching is that the data could be out of date, so you’ll also need to consider cache invalidation.
Every engineering team I’ve ever worked with underestimates the cost of building software. I’ve mentioned this in every part in this series because it accounts for up to 90 percent of the cost of running software.
“When we first started doing e-commerce integrations, we thought all the work was in getting the integration set up… In reality we were missing what would consume as much if not more time: maintaining the integration.” – Nick McHenry, OneShop Retail
If you’re going to build your own PIM and distribute products to multiple channels, you have to realize that each of these channels means a new integration cost that just starts the day your software goes live. You’ll be paying to maintain each channel for years to come.
Now that you’ve seen how to store your product data, how to let your team manage it, and how to make your product data available, you’re ready to decide whether building a PIM is truly worth the investment.
If you have an engineering team on hand and you’re ready to incur the cost of building and maintaining a significant new software product, then building a PIM in-house will give you a lot of flexibility. You’ll be able to pick the pieces that you need, optimize them for the performance you expect, and customize data feeds for each of your distribution channels. On the other hand, you’ll likely spend months developing the product.
Using an off-the-shelf PIM from a SaaS provider like Fabric can save you months of development time and untold monthly maintenance costs. With Fabric, you get a robust data storage and optimization system, an interface for your team to edit products manually or programmatically, and an API so you can distribute your product data to any number of channels. While you won’t get to build your data model from scratch, Fabric supports product families, categories, and taxonomies so you can ensure your catalog scales as your product lines evolve.
Tech advocate and writer @ fabric. Previously CTO @ Graide Network.