A Piece of the Py: Overcoming Obstacles

In this post I'm going to talk about overcoming obstacles and technical debt, while focusing on realizing business goals in a timely manner.

Technical Debt is closely related to the business term Total Cost of Ownership or "TCO". TCO in financial terms is a methodology to estimate the full monetary cost associated with a decision. Technical debt is closely related in that technical decisions including hires, technology, platforms, outsourcing, and development all have internal and external costs associated with them.

A lot of us know this, so what's new? I think one of the insights I'm hoping to share in this article is when it comes to debt, we can't be too idealistic. In an ideal world technology should never "get in the way" of progress at a business. Sometimes, if you have an open mind responding to technical debt can open up new possibilities for growth and profitability.

In this article I hope to show you how we were able to organize our goals to meet business objectives, grow revenue, and move towards solving the overall problem of replacing an aging infrastructure.

My Story

In my previous roles I've been the architect of a lot of software to deal with data-problems and I've had to integrate a lot of disparate technology. During my time in the motorcycle industry I dealt with every corner of the business, shipping, receiving, eCommerce, analytics, inventory, search, ad-integration, etc.

After a year at the business and witnessing the organization sort of coast on the success of their custom built eCommerce platform for quite a few years it was clear that an upgrade was needed. The solution to this problem was somewhat obvious - build or adopt a new platform! There was a problem though; the labor, attention, and manpower needed to accomplish this task I estimated to far exceed what the ownership of the organization was comfortable allocating to a single project. And really there was a much greater problem, things related to this old system were tightly coupled and even more legacy, tightly bound, impossible to debug/understand code had sprang up from a single developer over the course of several years. The software had been modified to work for it's current application and dated back to 5 or 6 years from it's origin. Attempting to even run the PHP3 application on any modern version of PHP / Mysql lead to deprecation warnings and strange crashes. Hopefully I'm painting a picture of the predicament we were in.

So there I was between old, hastily written code and the need to build for the future and continued growth. We needed to replace or rebuild many of the dependencies around this platform before we could even hope to retool the eCommerce site itself. Over the first year at the business I had become familiar enough with the platforms and dependencies that I had recognized several areas that needed to drastically improve if we hoped to tackle the eCommerce project. Inventory and catalog management was first on the list.

Inventory and ETL

Inventory management was a confusing array of databases and scripts, but ultimately boiled down to this...

Our 14+ suppliers would provide us spreadsheets in various formats (TAB-Delimited, CSV, Excel 2000, Excel 2007, etc). Each spreadsheet needed to be merged with the previous to form a complete representation of items. Items removed from the spreadsheet usually were no longer for sale and were probably discontinued. Any local stock for these discontinued items could be sold off without worrying about the manufacturers pricing policies.
In addition to this different suppliers gave quantities for different warehouses across the US. These values needed to be combined to get an accurate representation of total on-hand supplier quantity.

The existing process required someone to painstakingly upload each spreadsheet into a web interface and then sit by as each process ran. If there was a problem with one of the files or an inconsistency the process would exit without warning and you'd have to start again or get one of the developers to take a look. This happened often. The inventory process sometimes exceeding 12 hours in total between all suppliers due to poorly optimized PHP code and Mysql queries that had evolved into a tangled mess over many iterations by developers past. There were other complications, such as poor distinctions between what was "market management" and what was "inventory management". The ERP system that acted as the businesses cash-register also needed to be updated. The ERP system in question had an import methodology but this to took many hours for the number of items in our supplier feeds. In the end we built a custom SQL solution that automated this process and kept the ERP inventory database in sync. The full story we will save for another day.

The solution to this mess was writing new code, that normalized the inputs to strong data types. It also detected empty rows, the differences between current and previous spreadsheets, automatically detected column data-types, and would also read column-labels (headers) as often as possible. If a column was missing the program would do it's best to extract the data it needed from the spreadsheet. In the new system the processing of the spreadsheets was detached and separate from committing the data to the live inventory database. In addition we stored delta's so we could undo a days inventory changes for each supplier if they were found to be unreliable or in error. The last step was implementing a web based interface in which users could browse and edit inventory and set exclusion or blackout dates for specific brands or regions, etc. This improved the inventory in many ways, when we were told of warehouse blackout dates in Nevada a user could use the administrative interface
to mark a supplier's warehouse as unavailable for that day. When we were done it took around 2 minutes to run inventory, and each supplier process ran on-demand and in parallel with eachother.

The Fruits of our Labor

Once we had a reliable inventory solution a long-standing problem could now be fixed.

Some of the businesses inventory was on-demand, meaning the customer ordered it from us, and we ordered it from a much larger supplier that was just 1 state away. Motorcycle exhausts and other items that were large, cumbersome, and expensive to carry usually fell into this category. These items had a tremendous profit margin, and a high level of consumer desirability!

We knew we needed to offer these items for sale, but because our suppliers only carried 1 or 2 in stock we had no way of reliably predicting if the item would stay in stock.
It was clear to me a system needed to exist that used previous inventory values to predict the availability of supplier inventory.

Building the Solution

I began exploring database solutions, and considered using Postgres to store integer arrays and possibly write server-side logic in one of it's scripting languages to perform calculations on the data. This made sense because we used Postgres heavily for a lot of data-needs and archival as well. I however, wasn't pleased with the fact I was relying on a database, had to load the database with information, and couldn't run the process in a distributed (map-reduce / chunked) manner.

With this insight, I moved to an on-disk solution using Numpy/Pytables. With Pytables I could use numexpr to perform in-kernal queries for data and build out numerical quantities using space-efficient small-integer fields or arrays. From here it was fairly simple to write a distributed algorithm to run through historical numpy arrays of integers and perform analysis.

Through brand, and categorical meta-data I was able to build statistics per item, per-brand-category, and per-brand. From this data historical statistics were built for overall velocity, monthly, and weekly velocity, and per category seasonality. From this suite of available statistics I built 3 different models to calculate confidence based on inventory and sales history or lack there of. In the end I found using 1 model over another based on certain qualifiers worked best. If the item had complete inventory data, and historically had no major shifts in quantity, I used a an algorithm that favored historical values of the item itself for confidence. If the item had an erratic history of quantity changes or highly infrequent changes, it looked to brand-category statistics for volatility to make it's estimate. If the item had good history, and lot's of fluctuations in inventory it looked to an algorithm that considered brand, category, and item velocities, as well as likelihood of replenishment based on past supplier inventory increases.

Testing the Fortune Teller

Because we had the last year of inventory reports from several suppliers I was able to create a simulation and tweak the algorithm used for quantity prediction. I was able to get to a respectable 96% accuracy when predicting if an item with a low-quantity would indeed be in stock by the end of the day at our supplier based on analyzing historical inventory feeds.

I wasn't quite pleased with my 96% because I knew the algorithm could be smarter if it was aware of sales velocity. The ERP system was aware of the days sales, and each product had a category assigned to it. Basically, the ERP system knew if exhausts were having a big day, but even better yet, if I compared our suppliers inventory numbers from the morning with mid-morning I could tell if our supplier was having a lot of orders for exhausts. By integrating the current days sales velocity into the prediction logic I was able to get to 98.5% accuracy - and even then the chance that one of our customers would order one of the 1.5% of items that we guessed wrong on was a minimal risk. Through this process we were able to immediately list 50,000+ more items on our eCommerce channels.

Epilogue

Over the next year motorcycle parts sales grew by over 101% ($6,500,000) on eBay alone!

With inventory now robust, accurate, and up-to-date we were free to move onto improving the customer facing side of the business which would be the next great challenge to overcome. But, in the process we had paid for the cost of overcoming technical debt, and improved the future capabilities of the business.

A Piece of the Py

Friday, July 12, 2013

Overcoming Obstacles