In this article, I want to explore some things we've seen about JS indexing behavior in the wild and in controlled tests and share some tentative conclusions I've drawn about how it must be working.
There are some complexities even in this basic definition (answers in brackets as I understand them):
For more on the technical details, I recommend my ex-colleague Justin’s writing on the subject.
These days, if you need or want JS-enhanced functionality, more of the top frameworks have the ability to work the way Rob described in 2012, which is now called isomorphic (roughly meaning “the same”).
I was fascinated by this piece of research published recently — you should go and read the whole study. In particular, you should watch this video (recommended in the post) in which the speaker — who is an Angular developer and evangelist — emphasizes the need for an isomorphic approach:
If you work in SEO, you will increasingly find yourself called upon to figure out whether a particular implementation is correct (hopefully on a staging/development server before it’s deployed live, but who are we kidding? You’ll be doing this live, too).
To do that, here are some resources I’ve found useful:
It may be more complicated than that, however. This segment of a thread is interesting. It's from a Hacker News user who goes by the username KMag and who claims to have worked at Google on the JS execution part of the indexing pipeline from 2006–2010. It’s in relation to another user speculating that Google would not care about content loaded “async” (i.e. asynchronously — in other words, loaded as part of new HTTP requests that are triggered in the background while assets continue to download):
“Actually, we did care about this content. I'm not at liberty to explain the details, but we did execute setTimeouts up to some time limit.
If they're smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.”
I referenced this recent study earlier. In it, the author found:
It’s definitely worth reading the whole thing and reviewing the performance of the different frameworks. There's more evidence of Google saving computing resources in some areas, as well as surprising results between different frameworks.
I might have expected the platforms to block their JS with robots.txt, but at least the main platforms I’ve looked at don't do that. With Google being sympathetic towards testing, however, this shouldn’t be a major problem — just something to be aware of as you build out your user-facing CRO tests. All the more reason for your UX and SEO teams to work closely together and communicate well.
Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.”