Getting SASsy: Process Improvements and Using the Right Tool for the Job
The program took nearly 48 hours to run. On a tight timeline, this wasn’t ideal. Then we had to make several last-minute changes to the code right after our engineer went on parental leave. None of us knew enough about C# to confidently fix the code.
Today things look quite different. The 48-hour task takes about seven minutes to run. We make far fewer—if any—errors and a request for a last-minute code rerun doesn’t lead to a week of late nights anymore. What happened here and what can we learn from this experience?
The Centers for Medicare and Medicaid Services (CMS) tasked Lantana with creating about 75 publicly-available downloadable database (DDB) files containing hospital quality measure data for all providers in the country. We create these files from a mixture of sources: massive JSON files and a variety of CSV, XLSX, XML, and TXT files. Each DDB file is slightly different in structure, formatting, and data included, meaning that separate code needed to be written for each. With tens of millions of rows of data and new files created every quarter, this is a pretty big task.
Let’s rewind a year. We had divided the interrelated data tasks between two teams who were programming in two different languages (C# and SAS). There wasn’t much redundancy in skills, which led to problems when one of the lead programmers was out of the office for more than a day or two. And with a program that took 48 hours to run, we didn’t have a lot of flexibility in the timeline if we needed to make any last-minute changes or rerun the code with updated data. This happened quite often.
So, with the engineer out for an extended period, the team decided to streamline the process and convert everything from C# and Excel to SAS. C# is a more general programming language with its own advantages, but SAS is optimized for working with medium to large datasets like the ones we were processing. A dataset that’s too large to even open in Excel, perhaps tens of millions of rows, isn’t a problem for SAS. While it’s not the only programming language that works well for data analytics, SAS is a standard in the healthcare world. I’d been using SAS for years, so it seemed to be a good fit.
Recoding the project into SAS took several weeks of intense work, but once completed, the team’s entire code base was now in one language, streamlining processes where deliverables had been going back and forth between different programmers and different environments. We focused on creating simple SAS code that would be easy to understand and update during future releases. Even though speed wasn’t the primary driver for these updates, just by using SAS, we reduced run time from about 48 hours to less than 7 minutes. On a project with timelines that previously had very little room for delays or re-runs, this was a welcome change.