When I was young and naive (about 20 years or so ago), I was responsible for leading up part of a development team who worked on a XML database product that ran on the early web and which could serve up XML content or HTML pages in response to requests. One of the people on our staff had a masters (in mathematics or computer science, I cannot recall which), and was also working on his doctorate in the same.
Every Monday I would get a report about the product performance he wrote, and the changes in performance between this week and the last. It was filled with estimates on the number N of requests needed of each type to verify performance improvements, and mathematics behind it, and then the design of the performance tests ... experiments he called them. And they were carried out with that kind of experimental accuracy you'd expect from a scientist. The results, the relative improvements (or dis-improvements), error bars, and the expected P values and all the rest the rest were in the summary report, about 10 pages, every week for weeks in a row, on my desk, Monday morning from the past week's performance run.
He set my expectations for performance testing in a way that would leave me almost forever disappointed for the rest of my career. I was spoiled rotten for months and never knew it. I miss him, and I wonder where he went. He was only with us for a short period of time, I think on an internship. If I was team captain, he was my Scotty, not just an engineer, but a true scientist.
That company was my introduction to standards development (in W3C), and gracefully self-disrupted at the outset of the .com bust in 2001, which led me to my career in healthcare, and eventually Health IT standards.