Scan's TekSpek

Our Aim
To provide you with an overview on New And existing technologies, hopefully helping you understand the changes in the technology. Together with the overviews we hope to bring topical issues to light from a series of independent reviewers saving you the time And hassle of fact finding over the web.

We will over time provide you with quality content which you can browse and subscribe to at your leisure.

TekSpek Deep Learning
Data Science Workstations

Data Science Workstations

Date issued:

A number of recent studies have outlined the challenges faced by IT managers across all business sizes. Data breaches and associated security is the greatest concern, fuelled by high-profile thefts, nascent legislation such as GDPR, and customers increasingly aware of their digital footprint. Second is how to extract meaningful insights from the ever-increasing amount of data generated by businesses. The burgeoning field of big-data analytics is fast becoming an industry itself.

It so happens that the GPU (graphics processing unit), used for calculating and displaying images on to a screen, or high-performance gaming by another term, is imbued with an architecture that is highly competent at handling lots and lots of data in parallel. Modern gaming cards such as the GeForce line from NVIDIA use this inherent parallelism to execute graphics workloads across thousands of cores simultaneously, and it is the main reason why they are able to display high-quality graphics, at high resolutions, many times a second. A quality GPU is effectively a data-processing monster, and it is therefore much for efficient at data throughput than a traditional CPU.

And it's this superb parallelism that makes GPUs also incredibly useful at accelerating big-data analytics. Equipped with lots of onboard memory to store vast datasets, scientists understand that, with the right accompanying software, GPUs fundamentally accelerate the three key requirements for insightful analytics: data preparation, model training, and visualisation.

End-to-End faster speeds with RAPIDS on RTX 8000

NVIDIA, a worldwide leader in graphics, has a number of turnkey solutions - hardware and software - that cater for the data scientist needing fast, reliable results from massive datasets. Its latest, most efficient Turing architecture is at the beating heart of the all-new Quadro RTX GPUs. Equipped with up to 48GB of ultra-fast local memory to handle the largest datasets and compute-intensive workloads, as well as incredible parallelism achieved through nearly 5,000 cores per GPU, Quadro RTX 8000 is simply way faster than any CPU at large-scale analytics.

Quadro RTX 8000 vs CPU

The above graphics ably demonstrate the massive speed-ups for meaningful data science. Running NVIDIA's RAPIDS software alongside the Quadro RTX 8000 GPU shows a single card computes at a 4-20x faster rate than a high-quality Intel Xeon CPU. Adding a second card improves performance by another 1.9x in the training stage. Impressive. In a world where time often equates with money, investing in the right computing infrastructure is key.

It is also worth showing real-world examples of how GPUs and associated software help reduce time for overall data computation.

Data computation

The example, shown above, trains a model to perform home loan risk assessment using all of the loan data for the years 2000 to 2016 in the Fannie Mae loan performance dataset, consisting of roughly 400GB of data in memory.

Results are startling. Remember, this is a massive dataset, and even 20-CPU nodes take well over three hours to complete the analytics. Running up to 100 CPUs, which is common for big providers such as Amazon Web Services, takes approximately an hour. The same computation on a single NVIDIA DGX-2, comprising 16 Tesla V100 GPUs, is approximately 10x faster. Yes, 10x. What can take hours on CPUs takes minutes on well-tuned GPUs.

This alacrity is the reason why select NVIDIA workstations make so much sense to data scientists, and the beauty is that data-science workstations can be tailored to your workflow for maximum bang for buck.

For example, Scan Computers' single-socket 3XS Data Science Workstations utilise an Intel Xeon W-series processor aligned with a choice of NVIDIA RTX Quadro cards, all the way up to the class-leading RTX 8000. For those requiring more horsepower from a single machine, the 3XS G2000X includes dual Intel Xeon Silver processors alongside a couple of high-end cards. Of course, depending upon workload and budget, each system can be individually configured for targeted performance.

The democratisation of cutting-edge technology has shown that meaningful deep learning and AI needn't be the preserve of academia or huge institutions. A burgeoning line of ready-made workstations are adept at handling datasets and workloads that, when implemented and understood properly, can streamline one of the biggest bugbears for IT managers: data science and analytics.

Scan Computers builds and supplies custom workstations, to find out more view our range of Data Science Workstations