Lenovo’s Chris Eckhoff and Chulho Kim are used to working on-the-fly. But a recent project was unlike any they’d undertaken before. Both Chris and Chulho lived out of a hotel for months during a spike in East Coast coronavirus cases to install an urgently needed supercomputer at NYU, which would support vital COVID-19 research. In the face of unimaginable constraints, “the service,” Chris says,” must go on.”
In October 2020, a researcher at New York University (NYU) made a groundbreaking discovery. She uncovered the structural biophysics of the Novel SARS-CoV-2 virus that causes COVID-19. The revelation came just a few months after NYU installed a massive Lenovo supercomputer in the New York Metropolitan area.
“That chemistry researcher, her job could not complete on our old infrastructure,” said Dr. David Ackerman, Associate Vice President for Research Technology in NYU IT and Chief Digital Officer in NYU Libraries, who is responsible for NYU’s overall research technology strategy and services. “It took just 30 hours on our new infrastructure.”
The new supercomputer, David says, “is groundbreaking for us — and the world.”
But installing a High-Performance Computing (HPC) cluster at the height of a spike in COVID-19 cases on the East Coast was no minor feat. There were limitations on access, travel, and ability to meet with the customer. Yet, that didn’t stop two indomitable Lenovo engineers.
In July, Chris Eckhoff drove 24 hours from his home in Florida to New York to install the NYU supercomputer. For those interested, that’s over 1,000 miles or 1,600 kms!
Chris eventually was joined by Chulho Kim, another longtime Lenovo employee. The two spent months living out of a hotel to ensure the installation went off without a hitch. They left only to work and purchase food.
“It was an exceptional time due to exceptional circumstances,” Chris recounted. “But the service must go on.”
In fact, the installation was absolutely urgent.
“The idea that we wouldn’t move fast was just unacceptable,” David added. “I wrote to Lenovo: We need this supercomputer and we need it now to help the world!”
“It had to be up and running because many of the researchers using the system were going to be doing COVID research,” said Scott Tease, General Manager, HPC, and AI at Lenovo. “NYU sent out a challenge to Lenovo and their other vendors asking for our support to get the system up during such difficult times — and we responded.”
Getting a system of this size and complexity built and then shipped during an unprecedented time is not for the faint of heart. Luckily, the duo was supported by Lenovo’s industry-leading global supply chain teams who worked tirelessly to assemble the supercomputer, test it in the factory and then manage the logistics to allow a synchronized arrival at NYU’s data center, just in time for the power-on.
NYU couldn’t have been more impressed by the results. One of the major appeals of working with Lenovo, said David, is its high-performance server portfolio equipped with the Lenovo Neptune™ liquid cooling technology. Not only is the system greener and more cost-effective, but it’s also more powerful.
“Our original TOP500 number gave us a 1.729 petaFLOP rating,” he said. The TOP500 project ranks and details the 500 most powerful supercomputers in the world, which it measures in petaFLOPS. That number increased to 2.008, “just by having the direct water cooling” — a 20-percent boost.
“They say the best technology resembles magic, and this is like magic,” David said.
Behind the magic were Chulho and Chris, who worked around-the-clock as they applied their expertise and agility to ensure the cluster got installed on time.
Usually, Chulho works solely on troubleshooting and identifying hardware issues. “The hard part is troubleshooting, figuring out why a certain node is running slow, if your network setting is correct, and so on.” The cluster is only as strong as its weakest link, meaning even minor issues impact the configuration.
But this time around, he had to wear multiple hats, “running tests, replacing parts.” Building a supercomputer, after all, is no small accomplishment. A typical supercomputer employs upwards of tens of thousands of cores working in parallel. Troubleshooting such a device, Chulho says, is like searching for a needle in a haystack.
“There were many factors that made this a challenging project,” Chris added. Normally, an HPC cluster is installed in close contact with the client and with a team of two to three people. Things worked differently this time. But Chris pushed ahead, working to meet the deadline so NYU’s research could go on unimpeded.
Chulho says the experience taught him that “I can succeed even when things seem impossible.” At the very least, he realized he “had to give it a try.”
“I didn’t think there was any other option,” he said.