A peek under the hood of AWS’s commitment to performance – AWS re:Invent 2022 Monday Night Live
The focus of the Monday Night Live keynote is performance. Peter DeSantis, Senior Vice President, AWS Utility Computing, says that “great performance is the result of innovating from the ground up and then continually investing over time and being committed to performance”. Let’s have a look at what new features reflect their philosophy here.
New Chip, New Processor
AWS Nitro v5 is made GA (General Available) at the Keynote session. It is a high-performance chip with 2x Transistors, 50% Faster DRAM Speed and 2x more PCIe Bandwidth.
Equipped with Nitro v5, C7gn instances (in preview) offer the best performance for network-intensive workloads with higher networking bandwidth, greater packet rate performance, and lower latency.
The pandemic has created a chance for AWS and Amazon to leverage their logistics expertise and provide an offering that will visualise any potential disruption.
Graviton 3E is the new processor designed for HPC (high performance computing).
HPC7g instances, which will enjoy the benefits of the above both, are not available yet. We are looking forward to collaborating with our customers in making use of these instances.
SRD Makes The Cloud Go Round
I like the analogy that Peter uses. He says that Scalable Reliable Diagram (SRD) is the wheel of the AWS network. It can be applicable in many AWS services.
For example, SRD makes EBS with lower latency and higher throughput possible. In fact, EBS write tail latency is 90% lower compared to that without SRD. All new EBS io2 volumes will be running on SRD with no extra cost, but only early next year.
ENA Express is ENA built on top of SRD, bringing the benefits of SRD to any network based services. You can enable ENA Express on new and existing ENAs and take advantage of this performance right away for TCP and UDP traffic between c6gn instances running in the same Availability Zone. However, one thing noteworthy is that ENA Express is currently limited to c6gn.16xlarge instances.
ElastiCache is an example of utilising ENA Express. Though its support for ENA Express is not available yet.
Machine Learning Capabilities Improved
There are two innovations behind the scene, stochastic rounding in precision accuracy and ring of rings synchronisation algorithm.
A new instance type called Trn1n was introduced at the keynote. It is a networking optimised version of Trn1, with 1600 Gbps EFA networking. It would provide faster distributed training of ultra-large models. It is not in the market yet though.
Lastly but not the least, Lambda SnapStart is another highlight of this keynote. It is available to Java runtime for the moment and with no additional cost. I think it excites me most as it would benefit a bigger range of customers. This feature is to solve the long cold start initialisation time problem. Magically, it reduces up to 90% in cold start latency.
Peter also shared the secret recipes of SnapStart.
It may involve some code changes to use this function. To ensure the uniqueness, you have to generate content like random numbers after initialisation.
AWS does so by profiling how the virtual machine accesses every snapshot during its boot sequence and predicts the next time with this information.
I can see AWS’s efforts in advancing hardware and software, which brings the performance game to a new level. It actually surprised me that AWS has designed custom chips. From a user’s perspective, it’s also good to see that AWS doesn’t trade security and cost for performance. I’m looking forward to seeing these new features rolled out and become available in the Sydney Region so I can experiment with them.