How to make a React single-page app Googleable with Prerender.io

Pargles Dall'Oglio
6 min readMar 1, 2021

This is another post from a series of posts about what Growth Engineers do. This particular article is about increasing awareness about the product by broadening the Top of the Funnel (ToFu), a term frequently used by marketing and sales teams that also ends up being part of a Growth Engineer’s daily routine. I will start with the non-technical description of this project, along with some charts and numbers. Then in a separate section, I will provide some technical details, given that it is a relatively complex engineering challenge.

Suppose you own a pet shop, sales of pet toys have skyrocketed during the pandemic, and cat owners are craving for a Tower of Tracks with Balls, but nobody knows you have it at your shop, and even worse, you have it in exclusive colors that cannot be found anywhere else!

You decided to then make your site searchable on Google, it ranks on the first page, or even better, you did a fantastic Search Engine Optimization (SEO) work, and now it ranks first on Google. Well done, everyone looking for a cat toy is now aware of your pet store. You need one page for each item in your store, its hard work because you could need hundreds, thousands, or even millions of pages, but it pays off.

ROSS started with a single-page application with private access. At that point, one of the main channels where lawyers could learn more about ROSS and then signup for it was the landing page and the blog. As seen in the screen-shot below (despite some spikes due to announcements that have brought enormous attention), the number of U.S daily visitors was pretty much stale at around 180. The problem is, it’s hard to scale that number organically (money can be thrown at ads to increase it artificially)

As ROSS’s core AI search product matured, it was time for the growth team to get the core value out there to the world so that more lawyers could have a chance to experience that value, to blow their minds with ROSS’ AI capabilities and ease of use. Same as it happened with your fictional pet shop, you wanted to make sure everyone knew your shop has the Tower of Tracks with Balls in a handful of unique colors.

ROSS has a collection of around 13 million case law decisions that were not public (or “Googleable”). Consequently, that was an excellent opportunity to bring more awareness about the product. However, due to the technicalities of a single-page app, we had to decide on whether we would refactor the entire frontend or use a tool like Prerender.io. Given that it was hard to predict how successful this project was going to be, we decided to take the latter approach.

It turned out to be the right decision and a successful experiment. The screen-shot above illustrates the growing number of daily visitors to case law decisions hosted on our platform. Four months after the launch, the traffic to our pages was equivalent to 28% (about 50 visitors a day) of the number of daily visitors to our landing page and blog (about 180 daily visitors). And even better, we indexed only 200 thousand case law decisions (less than 1% of the 13 million case law decisions), so there is tremendous potential to easily overpass both landing page and blog as the main awareness channel.

That was the non-technical part. In the next section, I will dig deeper into the technicalities of that project.

Getting technical: Considerations and First Steps

If you have a React Server-Side Rendering (SSR) app, such as Next.js or Gatsby, you are probably in a great place in terms of SEO. That’s what your team should be aiming for if you are sure that great SEO is one of the requirements moving forward.

On the other hand, if you have a React Single Page app, you can either migrate it to an SSR framework or use a service that pre-caches all your pages to accomplish the same goal, which is probably the ideal solution if you are not certain about how your pages will rank on Google and the impact that it will bring in terms of traffic.

First, you have to make sure your React app is not using the “HashHistory” but the “BrowserHistory”. In general, Google does not index anything in a URL after a “#” because it considers it as an indicator that it is an anchor to an area on a page.

Once you have a React app with “BrowserHistory” you can choose one of Google’s service recommendations for pre-caching pages. After doing some research and building a simple proof of concept (POC), we decided to move forward with Prerender.io

But, if if you have a React Single Page app, can you still have all pages indexed on Google without caching them? Yes, and no. In that case, you have to make sure your pages load fast enough because crawlers have a time out, and they might end up indexing a blank or incomplete page. Also, keep in mind that crawlers might need to download all javascript and assets for each dynamic page of your app because a crawler doesn’t necessarily have a caching mechanism. In contrast, browsers download javascript and assets only once.

Getting technical: Infrastructure

Prerender works with several middleware, such as ExpressJS, Ruby on Rails, Tomcat, and Nginx (see the complete list here). We are hosting a React app on Amazon S3 connected through CloudFront.

In general, the idea is that a server (in our case, a lambda edge function is necessary) will host some code to identify whether the request comes from a crawler or a browser, and it will redirect the traffic accordingly. The flowchart below illustrates how it works at a high level.

Lambda Edge functions can be of four types (below is a short description of their purposes). We need two lambda functions for this project, one Viewer Request, and one Origin Request function.

  • Viewer Request: after CloudFront receives a request from a viewer
  • Origin Request: before CloudFront forwards the request to the origin
  • Origin Response: after CloudFront receives the response from the origin
  • Viewer Response: before CloudFront forwards the response to the viewer
Source: https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html

The Viewer Request function (see code block below) is responsible for checking the user-agent header attribute that comes with the HTTP request and is used to determine whether it’s a crawler or not. If the request comes from a crawler, then extra HTTP request headers are added to the request (“x-prerender-token”, “x-prerender-host”, and “x-prerender-cachebuster”).

Then, the Origin Request function (see code below) will check whether the request contains the Prerender parameters and, if positive, redirect the request to the cached page or Amazon S3, if negative.

Once you have your middleware setup with Prerender, you then can proceed with indexing the pages, according to the following process:

1- Construct your sitemap.xml listing all pages you want to index
2- Submit your sitemap.xml to Prerender.io so it can cache the new pages
3- Submit your sitemap.xml to Google Search Console, or wait until Google crawls your pages

This project took a quarter of a year, with up to four engineers working on it simultaneously.

I hope you enjoyed this article! Have a great day!

--

--

Pargles Dall'Oglio

Lead Software Engineer and co-founder of ROSS Intelligence. Passionate about growth engineering, and artificial intelligence