V8 Release 5.3

July 18, 2016, 6:38 pm

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.3, which will be in beta until it is released in coordination with Chrome 53 Stable. V8 5.3 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Memory

New Ignition Interpreter

Ignition, V8's new interpreter, is feature complete and will be enabled in Chrome 53 for low-memory Android devices. The interpreter brings immediate memory savings for JIT'ed code and will allow V8 to make future optimizations for faster startup during code execution. Ignition works in tandem with V8's existing optimizing compilers (TurboFan and Crankshaft) to ensure that “hot” code is still optimized for peak performance. We are continuing to improve interpreter performance and hope to enable Ignition soon on all platforms, mobile and desktop. Look for an upcoming blog post for more information about Ignition’s design, architecture, and performance gains. Embedded versions of V8 can turn on the Ignition interpreter with the flag --ignition.

Reduced jank

V8 version 5.3 includes various changes to reduce application jank and garbage collection times. These changes include:

Optimizing weak global handles to reduce the time spent handling external memory
Unifying the heap for full garbage collections to reduce evacuation jank
Optimizing V8’s black allocation additions to the garbage collection marking phase

Together, these improvements reduce full garbage collection pause times by about 25%, measured while browsing a corpus of popular webpages. For more detail on recent garbage collection optimizations to reduce jank, see the “Jank Busters” blog posts Part 1& Part 2.

Performance

Improving page startup time

The V8 team recently began tracking performance improvements against a corpus of 25 real-world website page loads (including popular sites such as Facebook, Reddit, Wikipedia, and Instagram). Between V8 5.1 (measured in Chrome 51 from April) and V8 5.3 (measured in a recent Chrome Canary 53) we improved startup time in aggregate across the measured websites by ~7%. These improvements loading real websites mirrored similar gains on the Speedometer benchmark, which ran 14% faster in V8 5.3. For more details about our new testing harness, runtime improvements, and breakdown analysis of where V8 spends time during page loads, see our upcoming blog post on startup performance.

ES6 Promise performance

V8's performance on the Bluebird ES6 Promise benchmark suite improved by 20-40% in V8 version 5.3, varying by architecture and benchmark.

V8 Promise performance over time on a Nexus 5x

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.3 -t branch-heads/5.3' to experiment with the new features in V8 5.3. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

V8 at the BlinkOn 6 conference

July 21, 2016, 9:47 am

≫ Next: Firing up the Ignition Interpreter

≪ Previous: V8 Release 5.3

BlinkOn is a biannual meeting of Blink, V8, and Chromium contributors. BlinkOn 6 was held in Munich on June 16 and June 17. The V8 team gave a number of presentations on architecture, design, performance initiatives, and language implementation.

The V8 BlinkOn talks are embedded below.

Real-world JavaScript Performance

Length: 31:41

Slides

Outlines the history of how V8 measures JavaScript performance, the different eras of benchmarking, and a new technique to measure page loads across real-world, popular websites with detailed breakdowns of time per V8 component.

Ignition: an interpreter for V8

Length: 36:39

Slides

Introduces V8’s new Ignition Interpreter, explaining the architecture of the engine as a whole, and how Ignition affects memory usage and startup performance.

How we measure and optimize for RAIL in V8’s GC

Length: 27:11

Slides

Explains how V8 uses the Response, Animation, Idle, Loading (RAIL) metrics to target low-latency garbage collection and the recent optimizations we’ve made to reduce jank on mobile.

ECMAScript 2015 and Beyond

Length: 28:52

Slides

Provides an update on the implementation of new language features in V8, how those features integrate with the web platform, and the standards process which continues to evolve the ECMAScript language.

Tracing Wrappers from V8 to Blink (Lightning Talk)

Length: 2:31

Slides

Highlights tracing wrappers between V8 and Blink objects and how they help prevent memory leaks and reduce latency.

↧

Firing up the Ignition Interpreter

August 23, 2016, 12:01 pm

≫ Next: V8 Release 5.4

≪ Previous: V8 at the BlinkOn 6 conference

V8 and other modern JavaScript engines get their speed via just-in-time (JIT) compilation of script to native machine code immediately prior to execution. Code is initially compiled by a baseline compiler, which can generate non-optimized machine code quickly. The compiled code is analyzed during runtime and optionally re-compiled dynamically with a more advanced optimizing compiler for peak performance. In V8, this script execution pipeline has a variety of special cases and conditions which require complex machinery to switch between the baseline compiler and two optimizing compilers, Crankshaft and TurboFan.

One of the issues with this approach (in addition to architectural complexity) is that the JITed machine code can consume a significant amount of memory, even if the code is only executed once. In order to mitigate this overhead, the V8 team has built a new JavaScript interpreter, called Ignition, which can replace V8’s baseline compiler, executing code with less memory overhead and paving the way for a simpler script execution pipeline.

With Ignition, V8 compiles JavaScript functions to a concise bytecode, which is between 50% to 25% the size of the equivalent baseline machine code. This bytecode is then executed by a high-performance interpreter which yields execution speeds on real-world websites close to those of code generated by V8’s existing baseline compiler.

In Chrome 53, Ignition will be enabled for Android devices which have limited RAM (512 MB or less), where memory savings are most needed. Results from early experiments in the field show that Ignition reduces the memory of each Chrome tab by around 5%.

V8’s compilation pipeline with Ignition enabled.

Details

In building Ignition’s bytecode interpreter, the team considered a number of potential implementation approaches. A traditional interpreter, written in C++ would not be able to interact efficiently with the rest of V8’s generated code. An alternative would have been to hand-code the interpreter in assembly code, however given V8 supports nine architecture ports, this would have entailed substantial engineering overhead.

Instead, we opted for an approach which leveraged the strength of TurboFan, our new optimizing compiler, which is already tuned for optimal interaction with the V8 runtime and other generated code. The Ignition interpreter uses TurboFan’s low-level, architecture-independent macro-assembly instructions to generate bytecode handlers for each opcode. TurboFan compiles these instructions to the target architecture, performing low-level instruction selection and machine register allocation in the process. This results in highly optimized interpreter code which can execute the bytecode instructions and interact with the rest of the V8 virtual machine in a low-overhead manner, with a minimal amount of new machinery added to the codebase.

Ignition is a register machine, with each bytecode specifying its inputs and outputs as explicit register operands, as opposed to a stack machine where each bytecode would consume inputs and push outputs on an implicit stack. A special accumulator register is an implicit input and output register for many bytecodes. This reduces the size of bytecodes by avoiding the need to specify specific register operands. Since many JavaScript expressions involve chains of operations which are evaluated from left to right, the temporary results of these operations can often remain in the accumulator throughout the expression’s evaluation, minimizing the need for operations which load and store to explicit registers.

As the bytecode is generated, it passes through a series of inline-optimization stages. These stages perform simple analysis on the bytecode stream, replacing common patterns with faster sequences, remove some redundant operations, and minimize the number of unnecessary register loads and transfers. Together, the optimizations further reduce the size of the bytecode and improve performance.

For further details on the implementation of Ignition, see our BlinkOn talk:

Future

Our focus for Ignition up until now has been to reduce V8’s memory overhead. However, adding Ignition to our script execution pipeline opens up a number of future possibilities. The Ignition pipeline has been designed to enable us to make smarter decisions about when to execute and optimize code to speed up loading web pages and reduce jank and to make the interchange between V8’s various components more efficient.

Stay tuned for future developments in Ignition and V8.

by Ross McIlroy, V8 Ignition Jump Starter

↧

V8 Release 5.4

September 9, 2016, 5:25 am

≫ Next: Fall cleaning: Optimizing V8 memory consumption

≪ Previous: Firing up the Ignition Interpreter

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.4, which will be in beta until it is released in coordination with Chrome 54 Stable in several weeks. V8 5.4 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Performance Improvements

V8 5.4 delivers a number of key improvements in memory footprint and startup speed. These primarily help accelerate initial script execution and reduce page load in Chrome.

Memory

When measuring V8’s memory consumption, two metrics are very important to monitor and understand: Peak memory consumption and average memory consumption. Typically, reducing peak consumption is just as important as reducing average consumption since an executing script that exhausts available memory even for a brief moment can cause an Out of Memory crash, even if its average memory consumption is not very high. For optimization purposes, it’s useful to divide V8's memory into two categories: On-heap memory containing actual JavaScript objects and off-heap memory containing the rest, such as internal data structures allocated by the compiler, parser and garbage collector.

In 5.4 we tuned V8’s garbage collector for low-memory devices with 512 MB RAM or less. Depending on the website displayed this reduces peak memory consumption of on-heap memory up to 40%.

Memory management inside V8’s JavaScript parser was simplified to avoid unnecessary allocations, reducing off-heap peak memory usage by up to 20%. This memory savings is especially helpful in reducing memory usage of large script files, including asm.js applications.

Startup & speed

Our work to streamline V8's parser not only helped reduce memory consumption, it also improved the parser's runtime performance. This streamlining, combined with other optimizations of JavaScript builtins and how accesses of properties on JavaScript objects use global inline caches, resulted in notable startup performance gains.

Our internal startup test suite that measures real-world JavaScript performance improved by a median of 5%. The Speedometer benchmark also benefits from these optimizations, improving by ~ 10 to 13% compared to 5.2.

~ 13% reduction on Speedometer/Mac

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.4 -t branch-heads/5.4' to experiment with the new features in V8 5.4. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

Fall cleaning: Optimizing V8 memory consumption

October 7, 2016, 4:55 am

≫ Next: V8 Release 5.5

≪ Previous: V8 Release 5.4

Memory consumption is an important dimension in the JavaScript virtual machine performance trade-off space. Over the last few months the V8 team analyzed and significantly reduced the memory footprint of several websites that were identified as representative of modern web development patterns. In this blog post we present the workloads and tools we used in our analysis, outline memory optimizations in the garbage collector, and show how we reduced memory consumed by V8’s parser and its compilers.

Benchmarks

In order to profile V8 and discover optimizations that have impact for the largest number of users, it is crucial to define workloads that are reproducible, meaningful, and simulate common real-world JavaScript usage scenarios. A great tool for this task is Telemetry, a performance testing framework that runs scripted website interactions in Chrome and records all server responses in order to enable predictable replay of these interactions in our test environment. We selected a set of popular news, social, and media websites and defined the following common user interactions for them:

A workload for browsing news and social websites:

Open a popular news or social website, e.g. hackernews.
Click on the first link.
Wait until the new website is loaded.
Scroll down a few pages.
Click the back button.
Click on the next link on the original website and repeat steps 3-6 a few times.

A workload for browsing media website:

Open an item on a popular media website, e.g. a video on YouTube.
Consume that item by waiting for a few seconds.
Click on the next item and repeat steps 2-3 a few times.

Once a workflow is captured, it can be replayed as often as needed against a development version of Chrome, for example each time there is new version of V8. During playback, V8’s memory usage is sampled at fixed time intervals to obtain a meaningful average. The benchmarks can be found here.

Memory Visualization

One of the main challenges when optimizing for performance in general is to get a clear picture of internal VM state to track progress or weigh potential tradeoffs. For optimizing memory consumption, this means keeping accurate track of V8’s memory consumption during execution. There are two categories of memory that must be tracked: memory allocated to V8’s managed heap and memory allocated on the C++ heap. The V8 Heap Statistics feature is a mechanism used by developers working on V8 internals to get deep insight into both. When the --trace-gc-object-stats flag is specified when running Chrome (M54 or newer) or the d8 command line interface, V8 dumps memory-related statistics to the console. We built a custom tool, the v8 heap visualizer, to visualize this output. The tool shows a timeline-based view for both the managed and C++ heaps. The tool also provides a detailed breakdown of the memory usage of certain internal data types and size-based histograms for each of those types.

A common workflow during our optimization efforts involves selecting an instance type that takes up a large portion of the heap in the timeline view, as depicted in Figure 1. Once an instance type is selected, the tool then shows a distribution of uses of this type. In this example we selected V8’s internal FixedArray data structure, which is an untyped vector-like container used ubiquitously in all sorts of places in the VM. Figure 2 shows a typical FixedArray distribution, where we can see that the majority of memory can be attributed to a specific FixedArray usage scenario. In this case FixedArrays are used as the backing store for sparse JavaScript arrays (what we call DICTIONARY_ELEMENTS). With this information it is possible to refer back to the actual code and either verify whether this distribution is indeed the expected behavior or whether an optimization opportunity exists. We used the tool to identify inefficiencies with a number of internal types.

Figure 1: Timeline view of managed heap and off-heap memory

Figure 2: Distribution of instance type

Figure 3 shows C++ heap memory consumption, which consists primarily of zone memory (temporary memory regions used by V8 used for a short period of time; discussed in more detail below). Since zone memory is used most extensively by the V8 parser and compilers, the spikes correspond to parsing and compilation events. A well-behaved execution consists only of spikes, indicating that memory is freed as soon as it is no longer needed. In contrast, plateaus (i.e. longer periods of time with higher memory consumption) indicate that there is room for optimization.

Figure 3: Zone memory

Early adopters can also try out the integration into Chrome’s tracing infrastructure. Therefore you need to run the latest Chrome Canary with --track-gc-object-stats and capture a trace including the category v8.gc_stats. The data will then show up as V8.GC_Object_Stats event.

JavaScript Heap Size Reduction

There is an inherent trade-off between garbage collection throughput, latency, and memory consumption. For example, garbage collection latency (which causes user-visible jank) can be reduced by using more memory to avoid frequent garbage collection invocations. For low-memory mobile devices, i.e. devices with under 512M of RAM, prioritizing latency and throughput over memory consumption may result in out-of-memory crashes and suspended tabs on Android.

To better balance the right tradeoffs for these low-memory mobile devices, we introduced a special memory reduction mode which tunes several garbage collection heuristics to lower memory usage of the JavaScript garbage collected heap. 1) At the end of a full garbage collection, V8’s heap growing strategy determines when the next garbage collection will happen based on the amount of live objects with some additional slack. In memory reduction mode, V8 will use less slack resulting in less memory usage due to more frequent garbage collections. 2) Moreover this estimate is treated as a hard limit, forcing unfinished incremental marking work to finalize in the main garbage collection pause. Normally, when not in memory reduction mode, unfinished incremental marking work may result in going over this limit arbitrarily to trigger the main garbage collection pause only when marking is finished. 3) Memory fragmentation is further reduced by performing more aggressive memory compaction.

Figure 4 depicts some of the improvements on low memory devices since Chrome M53. Most noticeably, the average V8 heap memory consumption of the mobile New York Times benchmark reduced by about 66%. Overall, we observed a 50% reduction of average V8 heap size on this set of benchmarks.

Figure 4: V8 heap memory reduction since M53 on low memory devices

Another optimization introduced recently not only reduces memory on low-memory devices but beefier mobile and desktop machines. Reducing the V8 heap page size from 1M to 512KB results in a smaller memory footprint when not many live objects are present and lower overall memory fragmentation up to 2x. It also allows V8 to perform more compaction work since smaller work chunks allow more work to be done in parallel by the memory compaction threads.

Zone Memory Reduction

In addition to the JavaScript heap, V8 uses off-heap memory for internal VM operations. The largest chunk of memory is allocated through memory areas called zones. Zones are a type of region-based memory allocator which enables fast allocation and bulk deallocation where all zone allocated memory is freed at once when the zone is destroyed. Zones are used throughout V8’s parser and compilers.

One of the major improvements in M55 comes from reducing memory consumption during background parsing. Background parsing allows V8 to parse scripts while a page is being loaded. The memory visualization tool helped us discover that the background parser would keep an entire zone alive long after the code was already compiled. By immediately freeing the zone after compilation, we reduced the lifetime of zones significantly which resulted in reduced average and peak memory usage.

Another improvement results from better packing of fields in abstract syntax tree nodes generated by the parser. Previously we relied on the C++ compiler to pack fields together where possible. For example, two booleans just require two bits and should be located within one word or within the unused fraction of the previous word. The C++ compiler doesn’t not always find the most compressed packing, so we instead manually pack bits. This not only results in reduced peak memory usage, but also improved parser and compiler performance.

Figure 5 shows the peak zone memory improvements since M54 which reduced by about 40% on average over the measured websites.

Figure 5: V8 peak zone memory reduction since M54 on desktop

Over the next months we will continue our work on reducing the memory footprint of V8. We have more zone memory optimizations planned for the parser and we plan to focus on devices ranging from 512M-1G of memory.

Update: All the improvements discussed above reduce the Chrome 55 overall memory consumption by up to 35% on low-memory devices compared to Chrome 53. Other device segments will only benefit from the zone memory improvements.

Posted by the V8 Memory Sanitation Engineers Ulan Degenbaev, Michael Lippautz, Hannes Payer, and Toon Verwaest.

↧

V8 Release 5.5

October 24, 2016, 12:05 pm

≫ Next: WebAssembly Browser Preview

≪ Previous: Fall cleaning: Optimizing V8 memory consumption

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.5, which will be in beta until it is released in coordination with Chrome 55 Stable in several weeks. V8 5.5 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Language features

Async functions

In 5.5, V8 ships JavaScript ES2017 async functions, which makes it easier to write code that uses and creates Promises. Using async functions, waiting for a Promise to resolve is as simple as typing await before it and proceeding as if the value were synchronously available - no callbacks required. See this article for an introduction.

Here’s an example function which fetches a URL and returns the text of the response, written in a typical asynchronous, Promise-based style.

function logFetch(url) {
  return fetch(url)
    .then(response => response.text())
    .then(text => {
      console.log(text);
    }).catch(err => {
      console.error('fetch failed', err);
    });
}

Here’s the same code rewritten to remove callbacks, using async functions.

async function logFetch(url) {
  try {
    const response = await fetch(url);
    console.log(await response.text());
  } catch (err) {
    console.log('fetch failed', err);
  }
}

Performance improvements

V8 5.5 delivers a number of key improvements in memory footprint.

Memory

Memory consumption is an important dimension in the JavaScript virtual machine performance trade-off space. Over the last few releases, the V8 team analyzed and significantly reduced the memory footprint of several websites that were identified as representative of modern web development patterns. V8 5.5 reduces Chrome’s overall memory consumption by up to 35% on low-memory devices (compared to V8 5.3 in Chrome 53) due to reductions in the V8 heap size and zone memory usage. Other device segments also benefit from the zone memory reductions. Please have a look at the dedicated blog post to get a detailed view.

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

V8 inspector migrated

The V8 inspector was migrated from Chromium to V8. The inspector code now fully resides in the V8 repository.

Developers with an active V8 checkout can use 'git checkout -b 5.5 -t branch-heads/5.5' to experiment with the new features in V8 5.5. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

WebAssembly Browser Preview

October 31, 2016, 8:00 am

≫ Next: V8 Release 5.6

≪ Previous: V8 Release 5.5

Today we’re happy to announce, in tandem with Firefox and Edge, a WebAssembly Browser Preview. WebAssembly or wasm is a new runtime and compilation target for the web, designed by collaborators from Google, Mozilla, Microsoft, Apple, and the W3C WebAssembly Community Group.

What does this milestone mark?

This milestone is significant because it marks:

a release candidate for our MVP (minimum viable product) design (including semantics, binary format, and JS API)
compatible and stable implementations of WebAssembly behind a flag on trunk in V8 and SpiderMonkey, in development builds of Chakra, and in progress in JavaScriptCore
a working toolchain for developers to compile WebAssembly modules from C/C++ source files
a roadmap to ship WebAssembly on-by-default barring changes based on community feedback

You can read more about WebAssembly on the project site as well as follow our developers guide to test out WebAssembly compilation from C & C++ using Emscripten. The binary format and JS API documents outline the binary encoding of WebAssembly and the mechanism to instantiate WebAssembly modules in the browser, respectively. Here’s a quick sample to show what wasm looks like:

Raw Bytes	Text Format	C Source
02 40 03 40 20 00 45 0d 01	block loop get_local 0 i32.eqz br_if 1	while (x != 0) {
20 00 21 02	get_local 0 set_local 2	z = x;
20 01 20 00 6f 21 00	get_local 1 get_local 0 i32.rem_s set_local 0	x = y % x;
20 02 21 01	get_local 2 set_local 1	y = z;
0c 00 0b 0b	br 0 end end	}
20 01	get_local 1	return y;

Greatest Common Divisor function

Since WebAssembly is still behind a flag in Chrome (chrome://flags/#enable-webassembly), it is not yet recommended for production use. However, the Browser Preview period marks a time during which we are actively collecting feedback on the design and implementation of the spec. Developers are encouraged to test out compiling and porting applications and running them in the browser.

V8 continues to optimize the implementation of WebAssembly in the TurboFan compiler. Since last March when we first announced experimental support, we’ve added support for parallel compilation. In addition, we’re nearing completion of an alternate asm.js pipeline, which converts asm.js to WebAssembly under the hood so that existing asm.js sites can reap some of the benefits of WebAssembly ahead-of-time compilation.

What's next?

Barring major design changes arising from community feedback, the WebAssembly Community Group plans to produce an official specification in Q1 2017, at which point browsers will be encouraged to ship WebAssembly on-by-default. From that point forward, the binary format will be reset to version 1 and WebAssembly will be versionless, feature-tested, and backwards-compatible. A more detailed roadmap can be found on the WebAssembly project site.

↧

V8 Release 5.6

December 2, 2016, 12:33 pm

≫ Next: V8 ❤️ Node.js

≪ Previous: WebAssembly Browser Preview

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.6, which will be in beta until it is released in coordination with Chrome 56 Stable in several weeks. V8 5.6 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Ignition and TurboFan pipeline for ES.next (and more) shipped

Starting with 5.6, V8 can optimize the entirety of the JavaScript language. Moreover, many language features are sent through a new optimization pipeline in V8. This pipeline uses V8’s Ignition interpreter as a baseline and optimizes frequently executed methods with V8’s more powerful TurboFan optimizing compiler. The new pipeline activates for new language features (e.g. many of the new features from the ES2015 and ES2016 specifications) or whenever Crankshaft (V8’s “classic” optimizing compiler) cannot optimize a method (e.g. try-catch, with).

Why are we only routing some JavaScript language features through the new pipeline?

The new pipeline is better-suited to optimizing the whole spectrum of the JS language (past and present). It's a healthier, more modern codebase, and it has been designed specifically for real-world use cases including running V8 on low-memory devices.

We've started using the Ignition/TurboFan with the newest ES.next features we've added to V8 (ES.next = JavaScript features as specified in ES2015 and later) and will route more features through it as we continue improving its performance. In the middle term, the V8 team is aiming to switch all JavaScript execution in V8 to the new pipeline. However, as long as there are still real-world use cases where Crankshaft runs JavaScript faster than the new Ignition/TurboFan pipeline, for the short term we'll support both pipelines to ensure that JavaScript code running in V8 is as fast as possible in all situations.

So, why does the new pipeline use both the new Ignition interpreter and the new Turbofan optimizing compiler?

Running JavaScript fast and efficiently requires having multiple mechanisms, or tiers, under the hood in a JavaScript virtual machine to do the low-level busywork of execution. For example, it’s useful to have a first tier that starts executing code quickly, and then a second optimizing tier that spends longer compiling hot functions in order to maximize performance for longer-running code.

Ignition and TurboFan are V8’s two new execution tiers that are most effective when used together. Due to efficiency, simplicity and size considerations, TurboFan is designed to optimize JavaScript methods starting from the bytecode produced by V8's Ignition interpreter. By designing both components to work closely together, there are optimizations that can be made to both because of the presence of the other. As a result, starting with 5.6 all functions which will be optimized by TurboFan first run through the Ignition interpreter. Using this unified Ignition/TurboFan pipeline enables the optimization of features that were not optimizable in the past, since they now can take advantage of TurboFan's optimizations passes. For example, by routing Generators through both Ignition and TurboFan, Generators runtime performance has nearly tripled.

For more information on V8's journey to adopt Ignition and TurboFan please have a look at Benedikt's dedicated blog post.

Performance improvements

V8 5.6 delivers a number of key improvements in memory and performance footprint.

Memory-induced jank

Concurrent remembered set filtering was introduced: One step more towards Orinoco.

Greatly improved ES2015 performance

Developers typically start using new language features with the help of transpilers because of two challenges: backwards-compatibility and performance concerns.

V8's goal is to reduce the performance gap between transpilers and V8’s “native” ES.next performance in order to eliminate the latter challenge. We’ve made great progress in bringing the performance of new language features on-par with their transpiled ES5 equivalents. In this release you will find the the performance of ES2015 features is significantly faster than in previous V8 releases, and in some cases ES2015 feature performance is approaching that of transpiled ES5 equivalents.

Particularly the spread operator should now be ready to be used natively. Instead of writing ...

// Like Math.max, but returns 0 instead of -∞ for no arguments.
function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max.apply(Math, args);
}

… you should now be able to write ...

function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max(...args);
}

… and get similar performance results. In particular 5.6 includes speed-ups for the following micro-benchmarks:

See the chart below for a comparison between V8 5.4 and 5.6.

Comparing the ES2015 feature performance of V8 5.4 and 5.6
Source: https://fhinkel.github.io/six-speed/ (Cloned from http://kpdecker.github.io/six-speed/)

This is just the beginning, a lot more to follow in upcoming releases!

Language features

String.prototype.padStart / String.prototype.padEnd

String.prototype.padStart and String.prototype.padEnd are the latest stage 4 additions to ECMAScript. These library functions are officially shipped in 5.6.

WebAssembly browser preview

Chromium 56 (which includes 5.6) is going to ship the WebAssembly browser preview. Please refer to the dedicated blog post for further information.

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.6 -t branch-heads/5.6' to experiment with the new features in V8 5.6. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

V8 ❤️ Node.js

December 15, 2016, 9:03 am

≫ Next: How V8 measures real-world performance

≪ Previous: V8 Release 5.6

Node's popularity has been growing steadily over the last few years, and we have been working to make Node better. This blog post highlights some of the recent efforts in V8 and DevTools.

Debug Node.js in DevTools

You can now debug Node applications using the Chrome developer tools. The Chrome DevTools Team moved the source code that implements the debugging protocol from Chromium to V8, thereby making it easier for Node Core to stay up to date with the debugger sources and dependencies. Other browser vendors and IDEs use the Chrome debugging protocol as well, collectively improving the developer experience when working with Node.

ES6 Speed-ups

We are working hard on making V8 faster than ever. A lot of our recent performance work centers around ES6 features, including promises, generators, destructors, and rest/spread operators. Because the versions of V8 in Node 6.2 and onwards fully support ES6, Node developers can use new language features "natively", without polyfills. This means that Node developers are often the first to benefit from ES6 performance improvements. Similarly, they are often the first to recognize performance regressions. Thanks to an attentive Node community, we discovered and fixed a number of regressions, including performance issues with instanceof ,buffer.length, long argument lists, and let/const.

Fixes for Node.js vm module and REPL coming

The vm module has had some long standing limitations. In order to address these issues properly, we have extended the V8 API to implement more intuitive behavior. We are excited to announce that the vm module improvements are one of the projects we’re supporting as mentors in Outreachy for the Node Foundation. We hope to see additional progress on this project and others in the near future.

Async/await

With async functions, you can drastically simplify asynchronous code by rewriting program flow by awaiting promises sequentially. Async/await will land in Node with the next V8 update. Our recent work on improving the performance of promises and generators has helped make async functions fast. On a related note, we are also working on providing promise hooks, a set of introspection APIs needed for the Node Async Hook API.

Want to try Bleeding Edge Node.js?

If you’re excited to test the newest V8 features in Node and don’t mind using bleeding edge, unstable software, you can try out our integration branch here. V8 is continuously integrated into Node before V8 hits Node master, so we can catch issues early. Be warned though, this is more experimental than Node master.

Posted by Franziska Hinkelmann, Node Monkey Patcher

↧

How V8 measures real-world performance

December 21, 2016, 8:02 am

≫ Next: Speeding up V8 Regular Expressions

≪ Previous: V8 ❤️ Node.js

Over the last year the V8 team has developed a new methodology to measure and understand real-world JavaScript performance. We’ve used the insights that we gleaned from it to change how the V8 team makes JavaScript faster. Our new real-world focus represents a significant shift from our traditional performance focus. We’re confident that as we continue to apply this methodology in 2017, it will significantly improve users’ and developers’ ability to rely on predictable performance from V8 for real-world JavaScript in both Chrome and Node.js.

The old adage “what gets measured gets improved” is particularly true in the world of JavaScript virtual machine (VM) development. Choosing the right metrics to guide performance optimization is one of the most important things a VM team can do over time. The following timeline roughly illustrates how JavaScript benchmarking has evolved since the initial release of V8:

Evolution of JavaScript benchmarks.

Historically, V8 and other JavaScript engines have measured performance using synthetic benchmarks. Initially, VM developers used microbenchmarks like SunSpider and Kraken. As the browser market matured a second benchmarking era began, during which they used larger but nevertheless synthetic test suites such as Octane and JetStream.

Microbenchmarks and static test suites have a few benefits: they’re easy to bootstrap, simple to understand, and able to run in any browser, making comparative analysis easy. But this convenience comes with a number of downsides. Because they include a limited number of test cases, it is difficult to design benchmarks which accurately reflect the characteristics of the web at large. Moreover, benchmarks are usually updated infrequently; thus, they tend to have a hard time keeping up with new trends and patterns of JavaScript development in the wild. Finally, over the years VM authors explored every nook and cranny of the traditional benchmarks, and in the process they discovered and took advantage of opportunities to improve benchmark scores by shuffling around or even skipping externally unobservable work during benchmark execution. This kind of benchmark-score-driven improvement and over-optimizing for benchmarks doesn’t always provide much user- or developer-facing benefit, and history has shown that over the long-term it’s very difficult to make an “ungameable” synthetic benchmark.

Measuring real websites: WebPageReplay & Runtime Call Stats

Given an intuition that we were only seeing one part of the performance story with traditional static benchmarks, the V8 team set out to measure real-world performance by benchmarking the loading of actual websites. We wanted to measure use cases that reflected how end users actually browsed the web, so we decided to derive performance metrics from websites like Twitter, Facebook, and Google Maps. Using a piece of Chrome infrastructure called WebPageReplay we were able to record and replay page loads deterministically.

In tandem, we developed a tool called Runtime Call Stats which allowed us to profile how different JavaScript code stressed different V8 components. For the first time, we had the ability not only to test V8 changes easily against real websites, but to fully understand how and why V8 performed differently under different workloads.

We now monitor changes against a test suite of approximately 25 websites in order to guide V8 optimization. In addition to the aforementioned websites and others from the Alexa Top 100, we selected sites which were implemented using common frameworks (React, Polymer, Angular, Ember, and more), sites from a variety of different geographic locales, and sites or libraries whose development teams have collaborated with us, such as Wikipedia, Reddit, Twitter, and Webpack. We believe these 25 sites are representative of the web at large and that performance improvements to these sites will be directly reflected in similar speedups for sites being written today by JavaScript developers.

For an in-depth presentation about the development of our test suite of websites and Runtime Call Stats, see the BlinkOn 6 presentation on real-world performance. You can even run the Runtime Call Stats tool yourself.

Making a real difference

Analyzing these new, real-world performance metrics and comparing them to traditional benchmarks with Runtime Call Stats has also given us more insight into how various workloads stress V8 in different ways.

From these measurements, we discovered that Octane performance was actually a poor proxy for performance on the majority of our 25 tested websites. You can see in the chart below: Octane’s color bar distribution is very different than any other workload, especially those for the real-world websites. When running Octane, V8’s bottleneck is often the execution of JavaScript code. However, most real-world websites instead stress V8’s parser and compiler. We realized that optimizations made for Octane often lacked impact on real-world web pages, and in some cases these optimizations made real-world websites slower.

Distribution of time running all of Octane, running the line-items of Speedometer and loading websites from our test suite on Chrome M57.

We also discovered that another benchmark was actually a better proxy for real websites. Speedometer, a WebKit benchmark that includes applications written in React, Angular, Ember, and other frameworks, demonstrated a very similar runtime profile to the 25 sites. Although no benchmark matches the fidelity of real web pages, we believe Speedometer does a better job of approximating the real-world workloads of modern JavaScript on the web than Octane.

Bottom line: A faster V8 for all

Over the course of the past year, the real-world website test suite and our Runtime Call Stats tool has allowed us to deliver V8 performance optimizations that speed up page loads across the board by an average of 10-20%. Given the historical focus on optimizing page load across Chrome, a double-digit improvement to the metric in 2016 is a significant achievement. The same optimizations also improved our score on Speedometer by 20-30%.

These performance improvements should be reflected in other sites written by web developers using modern frameworks and similar patterns of JavaScript. Our improvements to builtins such as Object.create and Function.prototype.bind, optimizations around the object factory pattern, work on V8’s inline caches, and ongoing parser improvements are intended to be generally applicable improvements to underlooked areas of JavaScript used by all developers, not just the representative sites we track.

We plan to expand our usage of real websites to guide V8 performance work. Stay tuned for more insights about benchmarks and script performance.

Posted by the V8 team

↧

Speeding up V8 Regular Expressions

January 10, 2017, 7:06 am

≫ Next: V8 Release 5.7

≪ Previous: How V8 measures real-world performance

This blog post covers V8's recent migration of RegExp's built-in functions from a self-hosted JavaScript implementation to one that hooks straight into our new code generation architecture based on TurboFan.

V8’s RegExp implementation is built on top of Irregexp, which is widely considered to be one of the fastest RegExp engines. While the engine itself encapsulates the low-level logic to perform pattern matching against strings, functions on the RegExp prototype such as RegExp.prototype.exec do the additional work required to expose its functionality to the user.

Historically, various components of V8 have been implemented in JavaScript. Until recently, regexp.js has been one of them, hosting the implementation of the RegExp constructor, all of its properties as well as its prototype’s properties.

Unfortunately this approach has disadvantages, including unpredictable performance and expensive transitions to the C++ runtime for low-level functionality. The recent addition of built-in subclassing in ES6 (allowing JavaScript developers to provide their own customized RegExp implementation) has resulted in a further RegExp performance penalty, even if the RegExp built-in is not subclassed. These regressions could not be be fully addressed in the self-hosted JavaScript implementation.

We therefore decided to migrate the RegExp implementation away from JavaScript. However, preserving performance turned out to be more difficult than expected. An initial migration to a full C++ implementation was significantly slower, reaching only around 70% of the original implementation’s performance. After some investigation, we found several causes:

RegExp.prototype.exec contains a couple of extremely performance-sensitive areas, most notably including the transition to the underlying RegExp engine, and construction of the RegExp result with its associated substring calls. For these, the JavaScript implementation relied on highly optimized pieces of code called 'stubs', written either in native assembly language or by hooking directly into the optimizing compiler pipeline. It is not possible to access these stubs from C++, and their runtime equivalents are significantly slower.
Accesses to properties such as RegExp's lastIndex can be expensive, possibly requiring lookups by name and traversal of the prototype chain. V8's optimizing compiler can often automatically replace such accesses with more efficient operations, while these cases would need to be handled explicitly in C++.
In C++, references to JavaScript objects must be wrapped in so-called Handles in order to cooperate with garbage collection. Handle management produces further overhead in comparison to the plain JavaScript implementation.

Our new design for the RegExp migration is based on the CodeStubAssembler, a mechanism that allows V8 developers to write platform-independent code which will later be translated into fast, platform-specific code by the same backend that is also used for the new optimizing compiler TurboFan. Using the CodeStubAssembler allows us to address all shortcomings of the initial C++ implementation. Stubs (such as the entry-point into the RegExp engine) can easily be called from the CodeStubAssembler. While fast property accesses still need to be explicitly implemented on so-called fast paths, such accesses are extremely efficient in the CodeStubAssembler. Handles simply do not exist outside of C++. And since the implementation now operates at a very low level, we can take further shortcuts such as skipping expensive result construction when it is not needed.

Results have been very positive. Our score on a substantial RegExp workload has improved by 15%, more than regaining our recent subclassing-related performance losses. Microbenchmarks (Figure 1) show improvements across the board, from 7% for RegExp.prototype.exec, up to 102% for RegExp.prototype[@@split].

Figure 1: RegExp speedup broken down by function.

So how can you, as a JavaScript developer, ensure that your RegExps are fast? If you are not interested in hooking into RegExp internals, make sure that neither the RegExp instance, nor its prototype is modified in order to get the best performance:

var re = /./g;
re.exec('');  // Fast path.
re.new_property = 'slow';
RegExp.prototype.new_property = 'also slow';
re.exec('');  // Slow path.

And while RegExp subclassing may be quite useful at times, be aware that subclassed RegExp instances require more generic handling and thus take the slow path:

class SlowRegExp extends RegExp {}
new SlowRegExp(".", "g").exec('');  // Slow path.

The full RegExp migration will be available in V8 5.7.

Posted by Jakob Gruber, Regular Software Engineer

↧

V8 Release 5.7

February 6, 2017, 11:33 am

≫ Next: One small step for Chrome, one giant heap for V8

≪ Previous: Speeding up V8 Regular Expressions

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.7, which will be in beta until it is released in coordination with Chrome 57 Stable in several weeks. V8 5.7 is filled will all sorts of developer-facing goodies. We’d like to give you a preview of some of the highlights in anticipation of the release.

Performance improvements

Native async functions as fast as promises

Async functions are now approximately as fast as the same code written with promises. The execution performance of async functions quadrupled according to our microbenchmarks. During the same period, overall promise performance also doubled.

Async performance improvements in V8 on Linux x64

Continued ES2015 improvements

V8 continues to make ES2015 language features faster so that developers use new features without incurring performance costs. The spread operator, destructuring and generators are now approximately as fast as their naive ES5 equivalents.

RegExp 15 % faster

Migrating RegExp functions from a self-hosted JavaScript implementation to one that hooks into TurboFan’s code generation architecture has yielded ~15 % faster overall RegExp performance. More details can be found in the dedicated blog post.

New library features

Several recent additions to the ECMAScript standard library are included in this release. Two String methods, padStart and padEnd, provide helpful string formatting features, while Intl.DateTimeFormat.prototype.formatToParts gives authors the ability to customize their date/time formatting in a locale-aware manner.

WebAssembly enabled

Chrome 57 (which includes V8 5.7) will be the first release to enable WebAssembly by default. For more details, see the getting started documents on webassembly.org and the API documentation on MDN.

V8 API additions

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.
Developers with an active V8 checkout can use 'git checkout -b 5.7 -t branch-heads/5.7' to experiment with the new features in V8 5.7. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

PromiseHook

This C++ API allows users to implement profiling code that traces through the lifecycle of promises. This enables Node’s upcoming AsyncHook API which lets you build async context propagation.

The PromiseHook API provides four lifecycle hooks - init, resolve, before and after -- init hook is run when a new promise is created, the resolve hook is run when a promise is resolved, the pre & post hooks are run right before and after a PromiseReactionJob. For more information please checkout the tracking issue and design document.

Posted by the V8 team

↧

One small step for Chrome, one giant heap for V8

February 9, 2017, 10:40 am

≫ Next: Help us test the future of V8!

≪ Previous: V8 Release 5.7

V8 has a hard limit on its heap size. This serves as a safeguard against applications with memory leaks. When an application reaches this hard limit, V8 does a series of last resort garbage collections. If the garbage collections do not help to free memory V8 stops execution and reports an out-of-memory failure. Without the hard limit a memory leaking application could use up all system memory hurting the performance of other applications.

Ironically, this safeguard mechanism makes investigation of memory leaks harder for JavaScript developers. The application can run out of memory before the developer manages to inspect the heap in DevTools. Moreover the DevTools process itself can run out memory because it uses an ordinary V8 instance. For example, taking a heap snapshot of this demo will abort execution due to out-of-memory on the current stable Chrome.

Historically the V8 heap limit was conveniently set to fit the signed 32-bit integer range with some margin. Over time this convenience lead to sloppy code in V8 that mixed types of different bit widths, effectively breaking the ability to increase the limit. Recently we cleaned up the garbage collector code, enabling the use of larger heap sizes. DevTools already makes use of this feature and taking a heap snapshot in the previously mentioned demo works as expected in the latest Chrome Canary.

We also added a feature in DevTools to pause the application when it is close to running out of memory. This feature is useful to investigate bugs that cause the application to allocate a lot of memory in a short period of time. When running this demo with the latest Chrome Canary, DevTools pauses the application before the out-of-memory failure and increases the heap limit, giving the user a chance to inspect the heap, evaluate expressions on the console to free memory and then resume execution for further debugging.

V8 embedders can increase the heap limit using the set_max_old_space_size function of the ResourceConstraints API. But watch out, some phases in the garbage collector have a linear dependency on the heap size. Garbage collection pauses may increase with larger heaps.

Posted by guardians of heap Ulan Degenbaev, Hannes Payer, Michael Lippautz and DevTools master Alexey Kozyatinskiy.

↧

Help us test the future of V8!

February 14, 2017, 12:25 pm

≫ Next: High-performance ES2015 and beyond

≪ Previous: One small step for Chrome, one giant heap for V8

The V8 team is currently working on a new default compiler pipeline that will help us bring future speedups to real-world JavaScript. You can preview the new pipeline in Chrome Canary today to help us verify that there are no surprises when we roll out the new configuration for all Chrome channels.

The new compiler pipeline uses the Ignition interpreter and Turbofan compiler to execute all JavaScript (in place of the classic pipeline which consisted of the FullCodegen and Crankshaft compilers). A random subset of Chrome Canary and Chrome Developer channel users are already testing the new configuration. However, anyone can opt-in to the new pipeline (or revert to the old one) by flipping a flag in about:flags.

You can help test the new pipeline by opting-in and using it with Chrome on your favorite web sites. If you are a web developer, please test your web applications with the new compiler pipeline. If you notice a regression in stability, correctness, or performance, please report the issue to the V8 bug tracker.

How to enable the new pipeline

Install the latest Canary
Open URL "about:flags" in Chrome
Search for "Experimental JavaScript Compilation Pipeline" and set it to "Enabled"

How to report problems

Please let us know if your browsing experience changes significantly when using the new pipeline over the default pipeline. If you are a web developer, please test the performance of the new pipeline on your (mobile) web application to see how it is affected. If you discover that your web application is behaving strange (or tests are failing), please let us know:

Ensure that you have correctly enabled the new pipeline as outlined in the previous section.
Create a bug on V8's bug tracker.
Attach sample code which we can use to reproduce the problem.

Posted by Daniel Clifford @expatdanno, Original Munich V8 Brewer

↧

High-performance ES2015 and beyond

February 17, 2017, 8:35 am

≫ Next: Fast For-In in V8

≪ Previous: Help us test the future of V8!

Over the last couple of months the V8 team focused on bringing the performance of newly added ES2015 and other even more recent JavaScript features on par with their transpiled ES5 counterparts.

Motivation

Before we go into the details of the various improvements, we should first consider why performance of ES2015+ features matter despite the widespread usage of Babel in modern web development:

First of all there are new ES2015 features that are only polyfilled on demand, for example the Object.assign builtin. When Babel transpiles object spread properties (which are heavily used by many React and Redux applications), it relies on Object.assign instead of an ES5 equivalent if the VM supports it.
Polyfilling ES2015 features typically increases code size, which contributes significantly to the current web performance crisis, especially on mobile devices common in emerging markets. So the cost of just delivering, parsing and compiling the code can be fairly high, even before you get to the actual execution cost.
And last but not least, the client side JavaScript is only one of the environments that relies on the V8 engine. There’s also Node.js for server side applications and tools, where developers don’t need to transpile to ES5 code, but can directly use the features supported by the relevant V8 version in the target Node.js release.

Let’s consider the following code snippet from the Redux documentation:


function todoApp(state = initialState, action) {
  switch (action.type) {
    case SET_VISIBILITY_FILTER:
      return { ...state, visibilityFilter: action.filter }
    default:
      return state
  }
}

There are two things in that code that demand transpilation: the default parameter for state and the spreading of state into the object literal. Babel generates the following ES5 code:


"use strict";

var _extends = Object.assign || function (target) { for (var i = 1; i < arguments.length; i++) { var source = arguments[i]; for (var key in source) { if (Object.prototype.hasOwnProperty.call(source, key)) { target[key] = source[key]; } } } return target; };

function todoApp() {
  var state = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : initialState;
  var action = arguments[1];

  switch (action.type) {
    case SET_VISIBILITY_FILTER:
      return _extends({}, state, { visibilityFilter: action.filter });
    default:
      return state;
  }
}

Now imagine that Object.assignis orders of magnitude slower than the polyfilled _extends generated by Babel. In that case upgrading from a browser that doesn’t support Object.assign to an ES2015 capable version of the browser would be a serious performance regression and probably hinder adoption of ES2015 in the wild.

This example also highlights another important drawback of transpilation: The generated code that is shipped to the user is usually considerably bigger than the ES2015+ code that the developer initially wrote. In the example above, the original code is 203 characters (176 bytes gzipped) whereas the generated code is 588 characters (367 bytes gzipped). That’s already a factor of two increase in size. Let’s look at another example from the Async Iterators for JavaScript proposal:


async function* readLines(path) {
  let file = await fileOpen(path);

  try {
    while (!file.EOF) {
      yield await file.readLine();
    }
  } finally {
    await file.close();
  }
}

Babel translates these 187 characters (150 bytes gzipped) into a whopping 2987 characters (971 bytes gzipped) of ES5 code, not even counting the regenerator runtime that is required as an additional dependency:


"use strict";

var _asyncGenerator = function () { function AwaitValue(value) { this.value = value; } function AsyncGenerator(gen) { var front, back; function send(key, arg) { return new Promise(function (resolve, reject) { var request = { key: key, arg: arg, resolve: resolve, reject: reject, next: null }; if (back) { back = back.next = request; } else { front = back = request; resume(key, arg); } }); } function resume(key, arg) { try { var result = gen[key](arg); var value = result.value; if (value instanceof AwaitValue) { Promise.resolve(value.value).then(function (arg) { resume("next", arg); }, function (arg) { resume("throw", arg); }); } else { settle(result.done ? "return" : "normal", result.value); } } catch (err) { settle("throw", err); } } function settle(type, value) { switch (type) { case "return": front.resolve({ value: value, done: true }); break; case "throw": front.reject(value); break; default: front.resolve({ value: value, done: false }); break; } front = front.next; if (front) { resume(front.key, front.arg); } else { back = null; } } this._invoke = send; if (typeof gen.return !== "function") { this.return = undefined; } } if (typeof Symbol === "function"&& Symbol.asyncIterator) { AsyncGenerator.prototype[Symbol.asyncIterator] = function () { return this; }; } AsyncGenerator.prototype.next = function (arg) { return this._invoke("next", arg); }; AsyncGenerator.prototype.throw = function (arg) { return this._invoke("throw", arg); }; AsyncGenerator.prototype.return = function (arg) { return this._invoke("return", arg); }; return { wrap: function wrap(fn) { return function () { return new AsyncGenerator(fn.apply(this, arguments)); }; }, await: function await(value) { return new AwaitValue(value); } }; }();

var readLines = function () {
  var _ref = _asyncGenerator.wrap(regeneratorRuntime.mark(function _callee(path) {
    var file;
    return regeneratorRuntime.wrap(function _callee$(_context) {
      while (1) {
        switch (_context.prev = _context.next) {
          case 0:
            _context.next = 2;
            return _asyncGenerator.await(fileOpen(path));

          case 2:
            file = _context.sent;
            _context.prev = 3;

          case 4:
            if (file.EOF) {
              _context.next = 11;
              break;
            }

            _context.next = 7;
            return _asyncGenerator.await(file.readLine());

          case 7:
            _context.next = 9;
            return _context.sent;

          case 9:
            _context.next = 4;
            break;

          case 11:
            _context.prev = 11;
            _context.next = 14;
            return _asyncGenerator.await(file.close());

          case 14:
            return _context.finish(11);

          case 15:
          case "end":
            return _context.stop();
        }
      }
    }, _callee, this, [[3,, 11, 15]]);
  }));

  return function readLines(_x) {
    return _ref.apply(this, arguments);
  };
}();

This is a 650% increase in size (the generic _asyncGenerator function might be shareable depending on how you bundle your code, so you can amortize some of that cost across multiple uses of async iterators). We don’t think it’s viable to ship only code transpiled to ES5 long-term, as the increase in size will not only affect download time/cost, but will also add additional overhead to parsing and compilation. If we really want to drastically improve page load and snappiness of modern web applications, especially on mobile devices, we have to encourage developers to not only use ES2015+ when writing code, but also to ship that instead of transpiling to ES5. Only deliver fully transpiled bundles to legacy browsers that don’t support ES2015. For VM implementors, this vision means we need to support ES2015+ features natively and provide reasonable performance.

Measurement methodology

As described above, absolute performance of ES2015+ features is not really an issue at this point. Instead the highest priority currently is to ensure that performance of ES2015+ features is on par with their naive ES5 and even more importantly, with the version generated by Babel. Conveniently there was already a project called six-speed by Kevin Decker, that accomplishes more or less exactly what we needed: a performance comparison of ES2015 features vs. naive ES5 vs. code generated by transpilers.

Six-Speed benchmark

So we decided to take that as the basis for our initial ES2015+ performance work. We forked it and added a couple of benchmarks. We focused on the most serious regressions first, i.e. line items where slowdown from naive ES5 to recommended ES2015+ version was above 2x, because our fundamental assumption is that the naive ES5 version will be at least as fast as the somewhat spec-compliant version that Babel generates.

A modern architecture for a modern language

In the past V8’s had difficulties optimizing the kind of language features that are found in ES2015+. For example, it never became feasible to add exception handling (i.e. try/catch/finally) support to Crankshaft, V8’s classic optimizing compiler. This meant V8’s ability to optimize an ES6 feature like for...of, which essentially has an implicit finally clause, was limited. Crankshaft’s limitations and the overall complexity of adding new language features to full-codegen, V8’s baseline compiler, made it inherently difficult to ensure new ES features were added and optimized in V8 as quickly as they were standardized.

Fortunately, Ignition and TurboFan (V8’s new interpreter and compiler pipeline), were designed to support the entire JavaScript language from the beginning, including advanced control flow, exception handling, and most recently for...of and destructuring from ES2015. The tight integration of the architecture of Ignition and TurboFan make it possible to quickly add new features and to optimize them fast and incrementally.

Many of the improvements we achieved for modern language features were only feasible with the new Ignition/Turbofan pipeline. Ignition and TurboFan proved especially critical to optimizing generators and async functions. Generators had long been supported by V8, but were not optimizable due to control flow limitations in Crankshaft. Async functions are essentially sugar on top of generators, so they fall into the same category. The new compiler pipeline leverages Ignition to make sense of the AST and generate bytecodes which de-sugar complex generator control flow into simpler local-control flow bytecodes. TurboFan can more easily optimize the resulting bytecodes since it doesn’t need to know anything specific about generator control flow, just how to save and restore a function’s state on yields.

How JavaScript generators represented in Ignition and TurboFan

State of the union

Our short-term goal was to reach less than 2x slowdown on average as soon as possible. We started by looking at the worst test first, and from Chrome M54 to Chrome M58 (Canary) we managed to reduce the number of tests with slowdown above 2x from 16 to 8, and at the same time reduce the worst slowdown from 19x in M54 to just 6x in M58 (Canary). We also significantly reduced the average and median slowdown during that period:

You can see a clear trend towards parity of ES2015+ and ES5. On average we improved performance relative to ES5 by over 47%. Here are some highlights that we addressed since M54.

Most notably we improved performance of new language constructs that are based on iteration, like the spread operator, destructuring and for...of loops. For example, using array destructuring


function fn() {
  var [c] = data;
  return c;
}

is now as fast as the naive ES5 version


function fn() {
  var c = data[0];
  return c;
}

and a lot faster (and shorter) than the Babel generated code:


"use strict";

var _slicedToArray = function () { function sliceIterator(arr, i) { var _arr = []; var _n = true; var _d = false; var _e = undefined; try { for (var _i = arr[Symbol.iterator](), _s; !(_n = (_s = _i.next()).done); _n = true) { _arr.push(_s.value); if (i && _arr.length === i) break; } } catch (err) { _d = true; _e = err; } finally { try { if (!_n && _i["return"]) _i["return"](); } finally { if (_d) throw _e; } } return _arr; } return function (arr, i) { if (Array.isArray(arr)) { return arr; } else if (Symbol.iterator in Object(arr)) { return sliceIterator(arr, i); } else { throw new TypeError("Invalid attempt to destructure non-iterable instance"); } }; }();

function fn() {
  var _data = data,
      _data2 = _slicedToArray(_data, 1),
      c = _data2[0];

  return c;
}

You can check out the High-Speed ES2015 talk we gave at the last Munich NodeJS User Group meetup for additional details:

We are committed to continue improving the performance of ES2015+ features. In case you are interested in the nitty-gritty details please have a look at V8's ES2015 and beyond performance plan.

Posted by Benedikt Meurer @bmeurer, EcmaScript Performance Engineer

↧

Fast For-In in V8

March 1, 2017, 8:03 am

≫ Next: V8 Release 5.8

≪ Previous: High-performance ES2015 and beyond

For-in is a widely used language feature present in many frameworks. Despite its ubiquity, it is one of the more obscure language constructs from an implementation perspective. V8 went to great lengths to make this feature as fast as possible. Over the course of the past year, for-in became fully spec compliant and up to 3 times faster, depending on the context.

Many popular websites rely heavily on for-in and benefit from its optimization. For example, in early 2016 Facebook spent roughly 7% of its total JavaScript time during startup in the implementation of for-in itself. On Wikipedia this number was even higher at around 8%. By improving the performance of certain slow cases, Chrome 51 significantly improved the performance on these two websites:

Facebook and Wikipedia both improved their total script time by 4% due to various for-in improvements. Note that during the same period, the rest of V8 also got faster, which yielded a total scripting improvement of more than 4%.

In the rest of this blog post we will explain how we managed to speed up this core language feature and fix a long-standing spec violation at the same time.

The Spec

TL;DR; The for-in iteration semantics are fuzzy for performance reasons.

When we look at the spec-text of for-in, it’s written in an unexpectedly fuzzy way,which is observable across different implementations. Let's look at an example when iterating over a Proxy object with the proper traps set.

let proxy = new Proxy({a:1, b:1},{
 getPrototypeOf(target) {
 console.log("getPrototypeOf");
 return null;
},
ownKeys(target) {
 console.log("ownKeys");
 return Reflect.ownKeys(target);
},
getOwnPropertyDescriptor(target, prop) {
 console.log("getOwnPropertyDescriptor name=" + prop);
 return Reflect.getOwnPropertyDescriptor(target, prop);
}
});

In V8/Chrome 56 you get the following output:

ownKeys
getPrototypeOf
getOwnPropertyDescriptor name=a
a
getOwnPropertyDescriptor name=b
b

In contrast, you will see a different order of statements for the same snippet in Firefox 51:

ownKeys 
getOwnPropertyDescriptor name=a 
getOwnPropertyDescriptor name=b 
getPrototypeOf 
a 
b

Both browsers respect the spec, but for once the spec does not enforce an explicit order of instructions. To understand these loop holes properly, let's have a look at the spec text:

EnumerateObjectProperties ( O )
When the abstract operation EnumerateObjectProperties is called with argument O, the following steps are taken:
Assert: Type(O) is Object.
Return an Iterator object (25.1.1.2) whose next method iterates over all the String-valued keys of enumerable properties of O. The iterator object is never directly accessible to ECMAScript code. The mechanics and order of enumerating the properties is not specified but must conform to the rules specified below.

Now, usually spec instructions are precise in what exact steps are required. But in this case they refer to a simple list of prose, and even the order of execution is left to implementers. Typically, the reason for this is that such parts of the spec were written after the fact where JavaScript engines already had different implementations. The spec tries to tie the loose ends by providing the following instructions:

The iterator's throw and return methods are null and are never invoked.
The iterator's next method processes object properties to determine whether the property key should be returned as an iterator value.
Returned property keys do not include keys that are Symbols.
Properties of the target object may be deleted during enumeration.
A property that is deleted before it is processed by the iterator's next method is ignored. If new properties are added to the target object during enumeration, the newly added properties are not guaranteed to be processed in the active enumeration.
A property name will be returned by the iterator's next method at most once in any enumeration.
Enumerating the properties of the target object includes enumerating properties of its prototype, and the prototype of the prototype, and so on, recursively; but a property of a prototype is not processed if it has the same name as a property that has already been processed by the iterator's next method.
The values of [[Enumerable]] attributes are not considered when determining if a property of a prototype object has already been processed.
The enumerable property names of prototype objects must be obtained by invoking EnumerateObjectProperties passing the prototype object as the argument.
EnumerateObjectProperties must obtain the own property keys of the target object by calling its [[OwnPropertyKeys]] internal method.

These steps sound tedious, however the specification also contains an example implementation which is explicit and much more readable:

function* EnumerateObjectProperties(obj) {
  const visited = new Set();
  for (const key of Reflect.ownKeys(obj)) {
    if (typeof key === "symbol") continue;
    const desc = Reflect.getOwnPropertyDescriptor(obj, key);
    if (desc && !visited.has(key)) {
      visited.add(key);
      if (desc.enumerable) yield key;
    }
  }
  const proto = Reflect.getPrototypeOf(obj);
  if (proto === null) return;
  for (const protoKey of EnumerateObjectProperties(proto)) {
    if (!visited.has(protoKey)) yield protoKey;
  }
}

Now that you've made it this far, you might have noticed from the previous example that V8 does not exactly follow the spec example implementation. As a start, the example for-in generator works incrementally, while V8 collects all keys upfront - mostly for performance reasons. This is perfectly fine, and in fact the spec text explicitly states that the order of operations A - J is not defined. Nevertheless, as you will find out later in this post, there are some corner cases where V8 did not fully respect the specification until 2016.

The Enum Cache

The example implementation of the for-in generator follows an incremental pattern of collecting and yielding keys. In V8 the property keys are collected in a first step and only then used in the iteration phase. For V8 this makes a few things easier. To understand why, we need to have a look at the object model.

A simple object such as {a:"value a", b:"value b", c:"value c"} can have various internal representations in V8 as we will show in a detailed follow-up post on properties. This means that depending on what type of properties we have—in-object, fast or slow—the actual property names are stored in different places. This makes collecting enumerable keys a non-trivial undertaking.

V8 keeps track of the object's structure by means of a hidden class or so-called Map. Objects with the same Map have the same structure. Additionally each Map has a shared data-structure, the descriptor array, which contains details about each property, such as where the properties are stored on the object, the property name, and details such as enumerability.

Let’s for a moment assume that our JavaScript object has reached its final shape and no more properties will be added or removed. In this case we could use the descriptor array as a source for the keys. This works if there are only enumerable properties. To avoid the overhead of filtering out non-enumerable properties each time V8 uses a separate EnumCache accessible via the Map's descriptor array.

Given that V8 expects that slow dictionary objects frequently change, (i.e. through addition and removal of properties), there is no descriptor array for slow objects with dictionary properties. Hence, V8 does not provide an EnumCache for slow properties. Similar assumptions hold for indexed properties, and as such they are excluded from the EnumCache as well.

Let’s summarize the important facts:

Maps are used to keep track of object shapes.
Descriptor arrays store information about properties (name, configurability, visibility).
Descriptor arrays can be shared between Maps.
Each descriptor array can have an EnumCache listing only the enumerable named keys, not indexed property names.

The Mechanics of For-In

Now you know partially how Maps work and how the EnumCache relates to the descriptor array. V8 executes JavaScript via Ignition, a bytecode interpreter, and TurboFan, the optimizing compiler, which both deal with for-in in similar ways. For simplicity we will use a pseudo-C++ style to explain how for-in is implemented internally:

// For-In Prepare:
FixedArray* keys = nullptr;
Map* original_map = object->map();
if (original_map->HasEnumCache()) {
  if (object->HasNoElements()) {
    keys = original_map->GetCachedEnumKeys();
  } else {
    keys = object->GetCachedEnumKeysWithElements();
  }
} else {
  keys = object->GetEnumKeys();
}

// For-In Body:
for (size_t i = 0; i < keys->length(); i++) {
  // For-In Next:
  String* key = keys[i];
  if (!object->HasProperty(key) continue;
  EVALUATE_FOR_IN_BODY();
}

For-in can be separated into three main steps:

Preparing the keys to iterate over,
Getting the next key,
Evaluating the for-in body.

The “prepare”-step is the most complex out of these three and this is the place where the EnumCache comes into play. In the example above you can see that V8 directly uses the EnumCache if it exists and if there are no elements (integer indexed properties) on the object (and its prototype). For the case where there are indexed property names, V8 jumps to a runtime function implemented in C++ which prepends them to the existing enum cache, as illustrated by the following example:

FixedArray* JSObject::GetCachedEnumKeysWithElements() {
  FixedArray* keys = object->map()->GetCachedEnumKeys();
  return object->GetElementsAccessor()->PrependElementIndices(object, keys);
}

FixedArray* Map::GetCachedEnumKeys() {
  // Get the enumerable property keys from a possibly shared enum cache
  FixedArray* keys_cache = descriptors()->enum_cache()->keys_cache();
  if (enum_length() == keys_cache->length()) return keys_cache;
  return keys_cache->CopyUpTo(enum_length());
}

FixedArray* FastElementsAccessor::PrependElementIndices(
      JSObject* object, FixedArray* property_keys) {
  Assert(object->HasFastElements());
  FixedArray* elements = object->elements();
  int nof_indices = CountElements(elements)
  FixedArray* result = FixedArray::Allocate(property_keys->length() + nof_indices);
  int insertion_index = 0;
  for (int i = 0; i < elements->length(); i++) {
    if (!HasElement(elements, i)) continue;
    result[insertion_index++] = String::FromInt(i);
  }
  // Insert property keys at the end.
  property_keys->CopyTo(result, nof_indices - 1);
  return result;
}

In the case where no existing EnumCache was found we jump again to C++ and follow the initially presented spec steps:

FixedArray* JSObject::GetEnumKeys() {
  // Get the receiver’s enum keys.
  FixedArray* keys = this->GetOwnEnumKeys();
  // Walk up the prototype chain.
  for (JSObject* object : GetPrototypeIterator()) {
     // Append non-duplicate keys to the list.
     keys = keys->UnionOfKeys(object->GetOwnEnumKeys());
  }
  return keys;
}

FixedArray* JSObject::GetOwnEnumKeys() {
  FixedArray* keys;
  if (this->HasEnumCache()) {
    keys = this->map()->GetCachedEnumKeys();
  } else {
    keys = this->GetEnumPropertyKeys();
  }
  if (this->HasFastProperties()) this->map()->FillEnumCache(keys);
  return object->GetElementsAccessor()->PrependElementIndices(object, keys);
}


FixedArray* FixedArray::UnionOfKeys(FixedArray* other) {
  int length = this->length();
  FixedArray* result = FixedArray::Allocate(length + other->length());
  this->CopyTo(result, 0);
  int insertion_index = length;
  for (int i = 0; i < other->length(); i++) {
    String* key = other->get(i);
    if (other->IndexOf(key) == -1) {
      result->set(insertion_index, key);
      insertion_index++;
    }
  }
  result->Shrink(insertion_index);
  return result;
}

This simplified C++ code corresponds to the implementation in V8 until early 2016 when we started to look at the UnionOfKeys method. If you look closely you notice that we used a naive algorithm to exclude duplicates from the list which might yield bad performance if we have many keys on the prototype chain. This is how we decided to pursue the optimizations in following section.

Problems with For-In

As we already hinted in the previous section, the UnionOfKeys method has bad worst-case performance. It was based on the valid assumption that most objects have fast properties and thus will benefit from an EnumCache. The second assumption is that there are only few enumerable properties on the prototype chain limiting the time spent in finding duplicates. However, if the object has slow dictionary properties and many keys on the prototype chain, UnionOfKeys becomes a bottleneck as we have to collect the enumerable property names each time we enter for-in.

Next to performance issues, there was another problem with the existing algorithm in that it’s not spec compliant. V8 got the following example wrong for many years:

var o = { 
  __proto__ : {b: 3},
  a: 1 
};
Object.defineProperty(o, “b”, {});

for (var k in o) print(k);

Output:

"a"
"b"

Perhaps counterintuitively this should just print out “a” instead of “a” and “b”. If you recall the spec text at the beginning of this post, steps G and J imply that non-enumerable properties on the receiver shadow properties on the prototype chain.

To make things more complicated, ES6 introduced the proxy object. This broke a lot of assumptions of the V8 code. To implement for-in in a spec-compliant manner, we have to trigger the following 5 out of a total of 13 different proxy traps.

Internal Method	Handler Method
[[GetPrototypeOf]]	getPrototypeOf
[[GetOwnProperty]]	getOwnPropertyDescriptor
[[HasProperty]]	has
[[Get]]	get
[[OwnPropertyKeys]]	ownKeys

This required a duplicate version of the original GetEnumKeys code which tried to follow the spec example implementation more closely. ES6 Proxies and lack of handling shadowing properties were the core motivation for us to refactor how we extract all the keys for for-in in early 2016.

The KeyAccumulator

We introduced a separate helper class, the KeyAccumulator, which dealt with the complexities of collecting the keys for for-in. With growth of the ES6 spec, new features like Object.keys or Reflect.ownKeys required their own slightly modified version of collecting keys. By having a single configurable place we could improve the performance of for-in and avoid duplicated code.

The KeyAccumulator consists of a fast part that only supports a limited set of actions but is able to complete them very efficiently. The slow accumulator supports all the complex cases, like ES6 Proxies.

In order to properly filter out shadowing properties we have to maintain a separate list of non-enumerable properties that we have seen so far. For performance reasons we only do this after we figure out that there are enumerable properties on the prototype chain of an object.

Performance Improvements

With the KeyAccumulator in place, a few more patterns became feasible to optimize. The first one was to avoid the nested loop of the original UnionOfKeys method which caused slow corner cases. In a second step we performed more detailed pre-checks to make use of existing EnumCaches and avoid unnecessary copy steps.

To illustrate that the spec-compliant implementation is faster, let’s have a look at the following four different objects:

var fastProperties = {
    __proto__ : null,
“property 1” : 1,
…
“property 10” : n
}

var fastPropertiesWithPrototype = {
“property 1” : 1,
…
“property 10” : n
}

var slowProperties = {
    __proto__ : null,
“dummy”: null,
“property 1” : 1,
…
“property 10” : n
}
delete slowProperties[“dummy”]

var elements = {
    __proto__: null,

“1” : 1,
…
“10” : n
}

The fastProperties object has standard fast properties.
The fastPropertiesWithPrototype object has additional non-enumerable properties on the prototype chain by using the Object.prototype.
The slowProperties object has slow dictionary properties.
The elements object has only indexed properties.

The following graph compares the original performance of running a for-in loop a million times in a tight loop without the help of our optimizing compiler.

As we've outlined in the introduction, these improvements became very visible on Facebook and Wikipedia in particular.

Besides the initial improvements available in Chrome 51, a second performance tweak yielded another significant improvement. The following graph shows our tracking data of the total time spent in scripting during startup on a Facebook page. The selected range around V8 revision 37937 corresponds to an additional 4% performance improvement!

To underline the importance of improving for-in we can rely on the data from a tool we built back in 2016 that allows us to extract V8 measurements over a set of websites. The following table shows the relative time spent in V8 C++ entry points (runtime functions and builtins) for Chrome 49 over a set of roughly 25 representative real-world websites.

Position	Name	Total Time
1	CreateObjectLiteral	1.10%
2	NewObject	0.90%
3	KeyedGetProperty	0.70%
4	GetProperty	0.60%
5	ForInEnumerate	0.60%
6	SetProperty	0.50%
7	StringReplaceGlobalRegExpWithString	0.30%
8	HandleApiCallConstruct	0.30%
9	RegExpExec	0.30%
10	ObjectProtoToString	0.30%
11	ArrayPush	0.20%
12	NewClosure	0.20%
13	NewClosure_Tenured	0.20%
14	ObjectDefineProperty	0.20%
15	HasProperty	0.20%
16	StringSplit	0.20%
17	ForInFilter	0.10%

The most important for-in helpers are at position 5 and 17, accounting for an average of 0.7% percent of the total time spent in scripting on a website. In Chrome 57 ForInEnumerate has dropped to 0.2% of the total time and ForInFilter is below the measuring threshold due to a fast path written in assembler.

Posted by Camillo Bruni, @camillobruni

↧

V8 Release 5.8

March 20, 2017, 7:23 am

≫ Next: Retiring Octane

≪ Previous: Fast For-In in V8

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.8, which will be in beta until it is released in coordination with Chrome 58 Stable in several weeks. V8 5.8 is filled will all sorts of developer-facing goodies. We’d like to give you a preview of some of the highlights in anticipation of the release.

Arbitrary heap sizes

Historically the V8 heap limit was conveniently set to fit the signed 32-bit integer range with some margin. Over time this convenience lead to sloppy code in V8 that mixed types of different bit widths, effectively breaking the ability to increase the limit. In 5.8 we enabled the use of arbitrary heap sizes. See the dedicated blog post for more information.

Startup performance

In 5.8 the work towards incrementally reducing the time spent in V8 during startup was continued. Reductions in the time spent compiling and parsing code, as well as optimizations in the IC system yielded ~5 % improvements on our real-world startup workloads.

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.8 -t branch-heads/5.8' to experiment with the new features in V8 5.8. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

Retiring Octane

April 12, 2017, 10:49 am

≫ Next: V8 Release 5.9

≪ Previous: V8 Release 5.8

The genesis of Octane

The history of JavaScript benchmarks is a story of constant evolution. As the web expanded from simple documents to dynamic client-side applications, new JavaScript benchmarks were created to measure workloads that became important for new use cases. This constant change has given individual benchmarks finite lifespans. As web browser and virtual machine (VM) implementations begin to over-optimize for specific test cases, benchmarks themselves cease to become effective proxies for their original use cases. One of the first JavaScript benchmarks, SunSpider, provided early incentives for shipping fast optimizing compilers. However, as VM engineers uncovered the limitations of microbenchmarks and found new ways to optimize around SunSpider’s limitations, the browser community retired SunSpider as a recommended benchmark.

Designed to mitigate some of the weaknesses of early microbenchmarks, the Octane benchmark suite was first released in 2012. It evolved from an earlier set of simple V8 test cases and became a common benchmark for general web performance. Octane consists of 17 different tests, which were designed to cover a variety of different workloads, ranging from Martin Richards’ kernel simulation test to a version of Microsoft’s TypeScript compiler compiling itself. The contents of Octane represented the prevailing wisdom around measuring JavaScript performance at the time of its creation.

Diminishing returns and over-optimization

In the first few years after its release, Octane provided a unique value to the JavaScript VM ecosystem. It allowed engines, including V8, to optimize their performance for a class of applications that stressed peak performance. These CPU-intensive workloads were initially underserviced by VM implementations. Octane helped engine developers deliver optimizations that allowed computationally-heavy applications to reach speeds that made JavaScript a viable alternative to C++ or Java. In addition, Octane drove improvements in garbage collection which helped web browsers avoid long or unpredictable pauses.

By 2015, however, most JavaScript implementations had implemented the compiler optimizations needed to achieve high scores on Octane. Striving for even higher benchmark scores on Octane translated into increasingly-marginal improvements in the performance of real web pages. Investigations into the execution profile of running Octane versus loading common websites (such as Facebook, Twitter, or Wikipedia) revealed that the benchmark doesn’t exercise V8’s parser or the browser loading stack the way real-world code does. Moreover, the style of Octane’s JavaScript doesn’t match the idioms and patterns employed by most modern frameworks and libraries (not to mention transpiled code or newer ES2015+ language features). This means that using Octane to measure V8 performance didn’t capture important use cases for the modern web, such as loading frameworks quickly, supporting large applications with new patterns of state management, or ensuring that ES2015+ features are as fast as their ES5 equivalents.

In addition, we began to notice that JavaScript optimizations which eked out higher Octane scores often had a detrimental effect on real-world scenarios. Octane encourages aggressive inlining to minimize the overhead of function calls, but inlining strategies that are tailored to Octane have led to regressions from increased compilation costs and higher memory usage in real-world use cases. Even when an optimization may be genuinely useful in the real-world, as is the case with dynamic pretenuring, chasing higher Octane scores can result in developing overly-specific heuristics which have little effect or even degrade performance in more generic cases. We found that Octane-derived pretenuring heuristics led to performance degradations in modern frameworks such as Ember. The `instanceof` operator was another example of an optimization tailored to a narrow set of Octane-specific cases that led to significant regressions in Node.js applications.

Another problem is that over time, small bugs in Octane become a target for optimizations themselves. For example, in the Box2DWeb benchmark, taking advantage of a bug where two objects were compared using the `<` and `>=` operators gave a ~15% performance boost on Octane. Unfortunately, this optimization had no effect in the real world and complicates more general types of comparison optimizations. Octane sometimes even negatively penalizes real-world optimizations: engineers working on other VMs have noticed that Octane seems to penalize lazy parsing, a technique that helps most real websites load faster given the amount of dead code frequently found in the wild.

Beyond Octane and other synthetic benchmarks

These examples are just some of the many optimizations which increased Octane scores to the detriment of running real websites. Unfortunately, similar issues exist in other static or synthetic benchmarks, including Kraken and JetStream. Simply put, such benchmarks are insufficient methods of measuring real-world speed and create incentives for VM engineers to over-optimize narrow use cases and under-optimize generic cases, slowing down JavaScript code in the wild.

Given the plateau in scores across most JS VMs and the increasing conflict between optimizing for specific Octane benchmarks rather than implementing speedups for a broader range of real-world code, we believe that it is time to retire Octane as a recommended benchmark.

Octane enabled the JS ecosystem to make large gains in computationally-expensive JavaScript. The next frontier, however, is improving the performance of real web pages, modern libraries, frameworks, ES2015+ language features, new patterns of state management, immutable object allocation, and module bundling. Since V8 runs in many environments, including server side in Node.js, we are also investing time in understanding real-world Node applications and measuring server-side JavaScript performance through workloads such as AcmeAir.

Check back here for more posts about improvements in our measurement methodology and new workloads that better represent real-world performance. We are excited to continue pursuing the performance that matters most to users and developers!

Posted by the V8 team

↧

V8 Release 5.9

April 27, 2017, 5:28 am

≫ Next: Energizing Atom with V8's custom start-up snapshot

≪ Previous: Retiring Octane

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.9, which will be in beta until it is released in coordination with Chrome 59 Stable in several weeks. V8 5.9 is filled will all sorts of developer-facing goodies. We’d like to give you a preview of some of the highlights in anticipation of the release.

Ignition+Turbofan launched

V8 5.9 is going to be the first version with Ignition+Turbofan enabled by default. In general, this switch should lead to lower memory consumption and faster startup for web application across the board, and we don’t expect stability or performance issues because the new pipeline has already undergone significant testing. However, give us a call in case your code suddenly starts to significantly regress in performance.

A dedicated blog post will delve deeper into this topic soon.

WebAssembly TrapIf support on all platforms

The TrapIf support significantly reduced the time spent compiling code (~30 %).

V8 API

Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.9 -t branch-heads/5.9' to experiment with the new features in V8 5.9. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

↧

Energizing Atom with V8's custom start-up snapshot

May 3, 2017, 4:19 pm

≫ Next: Launching Ignition and TurboFan

≪ Previous: V8 Release 5.9

The team behind Atom, a text editor based on the Electron framework, recently published an article detailing significant improvements to start-up time, owing big parts of the gains to the usage of V8's custom start-up snapshot. Awesome!

Native Electron apps, including Atom, leverage Chromium to display a GUI and Node.js as an execution environment, both of which respectively embed V8 to run JavaScript. This allows Electron apps to take advantage of V8 snapshots to quickly initialize a previously serialized heap for faster startup. Electron developers have even released electron-link, a convenience library for setting up this feature, which Atom heavily relies on for its performance optimizations.

We announced V8's support for custom start-up snapshot more than a year ago. With v8::V8::CreateSnapshotDataBlob, embedders can provide an additional script to customize a start-up snapshot. New contexts created from this snapshot are initialized as if the additional script has already been executed. As the Atom team has shown, using a custom start-up snapshot can significantly boost start-up performance.

To quote the Atom article:

The tricky part of using this technology, however, is that the code is executed in a bare V8 context. In other words, it only allows us to run plain JavaScript code and does not provide access to native modules, Node/Electron APIs or DOM manipulation.

To work around this restriction, electron-link goes great lengths to make sure native functions (backed by C++ functions) are not included in the snapshot, and are instead loaded lazily. V8's serializer simply does not know how to serialize these native functions. Instead, native functions are wrapped into helper functions that load them lazily at runtime.

Since our original post about snapshots, the V8 team has continued developing the snapshot API and added many features and improvements. These features include support for serializing native functions. Provided that the backing C++ functions have been registered with V8, the serializer can now recognize and encode native functions for the deserializer to restore later. We’re happy to announce that the work-around in electron-link is no longer necessary.

One caveat remains: the serializer cannot directly capture state outside of V8, for example changes to the DOM in case of Atom. However, outside state directly attached to JavaScript objects via embedder fields (previously named "internal fields") can now be serialized and deserialized through a new callback API. With some work, this feature allows outside state to be put into the snapshot after all.

Some other highlights include:

FunctionTemplate and ObjectTemplate objects can now be added and extracted from the snapshot, so that they do not have to be set up from scratch.
V8 can now run multiple scripts or apply any other modifications to the context before creating the snapshot, as opposed to only a single source string.
Multiple differently-configured contexts can now be included in the same start-up snapshot blob.

To make these features more accessible, we designed a new, more powerful API with v8::SnapshotCreator. The old API is now merely a wrapper around this underlying API. This is how v8::V8::CreateSnapshotDataBlob is implemented:

StartupData V8::CreateSnapshotDataBlob(const char* embedded_source) {
  StartupData result = {nullptr, 0};
  {
    SnapshotCreator snapshot_creator;
    // Obtain an isolate provided by SnapshotCreator.
    Isolate* isolate = snapshot_creator.GetIsolate();
    {
      HandleScope scope(isolate);
      // Create a new context and optionally run some script.
      Local context = Context::New(isolate);
      if (embedded_source != NULL &&
          !RunExtraCode(isolate, context, embedded_source, "")) {
        return result;
      }
      // Add the possibly customized context to the SnapshotCreator.
      snapshot_creator.SetDefaultContext(context);
    }
    // Use the SnapshotCreator to create the snapshot blob.
    result = snapshot_creator.CreateBlob(
        SnapshotCreator::FunctionCodeHandling::kClear);
  }
  return result;
}

For more advanced code examples, take a look at these test cases:

The new API is available in V8 version 5.7 and later. We hope that these new features will help embedders make even better use of custom start-up snapshot. If you have any questions, please reach out to our v8-users mailing list.

Posted by Yang Guo

↧