Shandong University: Supporting Diverse Workloads

An Environment-as-a-Service model supports traditional and non-traditional HPC, AI/ML, analytics, bioinformatics, and more.

Executive Summary
Shandong University, founded in 1901, is one of the oldest and most prestigious universities in China. It is the second national university established in the country and one of the first in China to install high performance computing (HPC) resources. The School hosts the Shandong Center for High Performance Computing, an HPC and resource sharing platform established in 2002. It provides an environment for world-class modern research for fundamental science, material science, bioscience, environmental science, and computing, including grid technology, parallel computing, mass data processing, cryptanalysis, and virtual reality and visualization technology. The center is a milestone for the national computing environment and a critical component of the ChinaGrid project, one of the world’s largest grid computing implementations.

Challenge
HPC resources in Shandong University are needed across a diversity of learning disciplines and environments, and to support national initiatives. The insights needed to support China’s ongoing 5-year plans have leveraged HPC resources. The Shandong Center for High Performance Computing has undertaken some key research and development programs under the Eleventh, Twelfth, and Thirteenth Five-Year Plans. It is also part of the National 863 Plan, a program established in 1986 to stimulate technology development in China.

The supercomputing center supports research across Artificial Intelligence and Machine Learning (AI/ML), experimental teaching and virtual/augmented reality, big data and others, serving both sophisticated and unexperienced users. Thus, Shandong University recognized the need to provide computing resources that extend beyond traditional simulation and modeling used by the empirical sciences. To meet the needs of a hugely diverse user audience, the center focused on building their next HPC system to provide Environments as a Service (EaaS).

Running as EaaS, the new supercomputer needed to support multiple operating systems (OS), various software versions (not just the latest one), deep learning frameworks, and more that could run on the x86 instruction set processors and GPUs. The hardware and software needed to be easy to manage and operate for both system administrators and users. The solution had to provide both large-scale and small-scale HPC cluster computing and powerful desktop-like environments—all enabled through user-focused interfaces that simplified and accelerated each environment deployment.

Solution
In designing their HPC system, the Shandong Center for High Performance Computing employed smart microcode and container and mobile application technologies on a cloud service platform all based on a hybrid architecture. To support a sophisticated environment that was user-friendly yet able a wide base of research needs, open sharing, and efficient management, their software included bar code scanning. The enhancements will simplify user logins, enable social-based mobile applications to push notifications to users, and provide an environment that allows self-administration of systems, environments, applications, and data for each user.

Shandong University’s new system incorporates Intel® Xeon® Scalable processors interconnected by Intel® Omni-Path Architecture (Intel® OPA) fabric.

The project began in March 2017. Built by Huawei and Clustertech, the new system includes 172 nodes of dual-socket Intel® Xeon® Gold 6132 processor interconnected by Intel® Omni-Path Architecture (Intel® OPA) fabric. The cloud service platform delivers 380 teraFLOPS of performance (e)1 with 1.6PB storage capacity. It was jointly launched in July 2018 by Huawei, Clustertech, Intel, and the university.

System Management software provides one-click configuration and installation and batch installation, and supports dynamic capacity expansion or reduction based on the service traffic. It’s also provides intelligent power consumption management. It can monitor, and analyze, and diagnose various energy efficiency indicators, and take action based on the analysis and diagnosis results to reduce power consumption. The software also supports centralized monitoring and unified management of various devices.

Per Huawei, the infrastructure provides board-level to system-level energy-saving measures, intuitive real-time monitoring, and dynamic energy-saving technologies to reduce power consumption by up to 40 percent.2 The system-level energy-saving measures include:

  • Efficient uninterruptible power systems (UPSs)
  • In-row air conditioners
  • Frequency-conversion cooling
  • Modular design
  • Natural cooling
  • NetEco intelligent power consumption management software

These measures decrease the overall power usage effectiveness (PUE) to less than 1.2.

Results
Since deployment, the new system has supported projects running a wide range of OSs, parallel workloads, AI/ML jobs, data analytics, and more.

The new system leverages widespread use of mobile devices by integrating mobile services for authentication, self-administration of users’ workloads and data, and push-notifications of job activities and status. This allows users to have greater awareness and control of their projects running on the new system.

Meeting the needs of a very wide user base across multiple research areas and computational applications, the system is built for a wide variety of workloads. TensorFlow* and Jupyter are installed for deep learning and AI applications; several bioinformatics tools support easy biodata analysis workflows. The cluster has become a public open platform that integrates various biological information analysis functions, such as data uploading and processing, sequence alignment assembly, sequence analysis, SNP/WGA analysis, and data visualization for bioinformatics.

Figure 1. Recent environments and workloads

The new cluster also supports traditional computational sciences, including computational chemistry with applications like Gaussian and GaussView, enabling building, analysis, and visualization of complex molecules and materials. And, supporting the ChinaGrid distributed computing model, users can request cluster resources that the system then orchestrates into virtual HPC clusters for their jobs, all through a sophisticated yet easy to use queue management system.

Solution Summary
Shandong University’s Center for High Performance Computing needed their next HPC resource to serve a wide diversity of users with a range of computer experience and computing needs. They deployed a 172-node cluster running a sophisticated stack of software to support traditional HPC jobs, modern research in AI/ML, analytics, and bioinformatics, and non-traditional workloads and personal desktops in an Environment as a Service model. The cluster was built on Intel® Xeon® Gold processors and an Intel® Omni-Path Architecture (Intel® OPA) fabric.

Solution Ingredients

  • Intel® Xeon® 6132 Gold processors
  • Intel® Omni-Path Architecture (Intel® OPA) fabric
  • Server: Huawei FusionServer* 2488H V5/ Huawei FusionServer* 1288H V5 172
  • Storage: Huawei OceanStor* 2600 V3
  • Filesystem: Lustre*
  • System Management: Huawei eSight*
  • Infrastructure: Huawei Fusion Module* 2000

 

Explore Related Intel® Products

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Omni-Path Architecture (Intel® OPA)

Intel® Omni-Path Architecture (Intel® OPA) lowers system TCO while providing reliability, high performance, and extreme scalability.

Learn more

Notices and Disclaimers

Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www.intel.fr. // Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit https://www.intel.fr/benchmarks. // Performance results are based on testing as of the date set forth in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. // Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. // Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. // In some test cases, results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Infos sur le produit et ses performances

1

Notez que "e" signifie "estimé" ; la mesure de performance est le produit de la performance Linpack théorique calculée basée sur le nombre de processeurs et de nœuds. Score HPL Linpack Rpeak : 2,6 GHz*14*14*2*32*172=400 TFlops, sur 380 téraflops. Configuration du système : Huawei FusionServer 1288H V5*/ Huawei FusionServer 2488H V5 *172 avec processeurs Gold Intel® Xeon® 6132 (14 cœurs/2,6G/140w), Intel® Omni-Path Architecture (Intel® OPA), tissu Huawei OceanStor 2600 V3 *2 (disque dur 8*80 To) et 300 To de disque système, Lustre, Huawei eSight* et module Huawei Fusion 2000*.

2

Dans le système Huawei Fusion Module 2000*, l'indicateur d'efficacité énergétique PUE de refroidissement liquide au niveau de la carte est d'environ 1,1, et le PUE moyen refroidi par air est d'environ 1,6, ce qui améliore l'efficacité de dissipation de la chaleur d'environ 40 % [(1,6-1,1)/1,1]. Source : Huawei.