Solving my first real life problem with code

Back in 2017, I was working on the physical refurbishment and warranty repairs of electronic tape drives and tape drive libraries. They would arrive by the dozen's and over the course of a few hours I would take them apart fully. Then take them through a full refurbishment cycle as well as diagnostic text.

After about six months, I started to realize that the smaller the library was, the more frequent they would fail on the diagnostic bench. The size of the library here was either 8 slots, 16, 32 or 64. With the 8 slot library having a much higher random failure rate than the 64 slot one. Nobody really wanted to believe me, because the software was solid!

But something really didn't sit right there. Then I realized that we write logs with a decent retention period, so I sifted through Gigabytes of them. This did take me a few weeks, because my Python was incredibly rusty; and I was mostly working on it in my breaks. But in the end I was able to access all the non-formatted text (txt) files, and parse their results to neatly formatted Excel.

Using some Excel-skills and building a heatmap, I quickly saw that the highest percentage of random failures would always occur in the last position of any library. Which is a special slot, that cannot be moved by automation. But this unmovable attribute was not documented anywhere. This also explained the scaling issue here, as the failure would occur for the 1/8th, 1/16th, 1/32 or 1/64th slot. We did some integration tests, then fixed the code. After that I realized that I would be spending al lot more time with this thing called Programming.