Skip to main content
null đź’» notes

DIY AngularJS SEO with PhantomJS (the easy way!)

September 11, 2019: This is a super old tutorial. I don't think you will want to use this anymore.

Setting up your AngularJS development environment needs to include SEO best practices. For JS-rendered applications, take a look at this solid solution using PhantomJS. #

I've been tasked with recreating a website for a higher education institution and I want to capitalize on AngularJS technologies to provide a rich user experience. Unfortunately, one of the largest issues with using the SPA approach to business/corporate/education web design is search engine optimization; AngularJS, and any JS-rendered application framework, is not SEO-friendly. To get around this, we need a way to serve search engine bots a set of pre-rendered HTML pages. Our goal is to create a development environment for AngularJS SEO awesomeness. In this tutorial, we'll walk through how to get PhantomJS up and running right alongside our app using our Yeoman AngularJS scaffolding that comes with small development server. We'll go from having nothing to having a full development environment, complete with verifiable pre-rendered page cache for bots to eat up and enjoy.

#

Update – May 2nd: I've received a few dozen emails after publishing this article (and lots of generous praise — thank you, everyone!) Please use the comments section to post any questions or concerns you may have, as this is just the start of a huge AngularJS tutorial project that I've started.

The Scaffolding #

To setup our development environment, we'll be using Yeoman's AngularJS generator, which is an all-in-one solution for developing and testing AngularJS applications. Open up a terminal and let's get started:

mkdir lawsonry
cd lawsonry
yo angular

Wait for the scaffolding to start, and definitely go ahead and answer those questions about whether you want to include Bootstrap (that's what I use) and any of those other AngularJS modules. Once it's finished, you'll have a development environment setup. From your terminal you were just in, let's test our development server: grunt server You should see some automation kick in, then your default browser will open up and you'll see the default Yeoman scaffolding for the AngularJS template. Right now we're all set to develop an SPA with AngularJS, but we're going to take this one step further and create an HTML pre-rendering workflow to server pre-rendered HTML pages to bots for SEO purposes.

The SEO Setup #

I have to be honest: I'm only writing this tutorial because I could not find a decent tutorial on setting up an environment for AngularJS SEO awesomeness. It couldn't be that hard, right? We need something that will tell crawlers to eat up pre-rendered pages, and thankfully, a lot of the heavy lifting in terms of module design has already been taken care of by Steeve at GitHub. Steeve's code is actually the bulk of what we're going to be using here. The first thing we'll need to do is get a hold of this angular-seo package from GitHub: git clone https://github.com/steeve/angular-seo.git Inside this folder you'll have two core files: angular-seo.js, which you need to put into your /lawsonry/app folder, and angular-seo-server.js, which you need to put in your /lawsonry folder (or wherever your application root folder is — you know, the one with the Gruntfile.js file in it). You can follow Steeve's instructions here, but I found it a little unhelpful at 6:00AM. So let's do this setup together.

**The idea is simple: **we're going to have our application running from our application port, and then a PhantomJS instance of our application running from a snapshot port. Requests from non-bots will be served directly from our application port (it doesn't matter what port that is), and requests from bots and search engines will be served pre-rendered html content via the snapshot port.

To do this, we'll have to do three things: tell our application to enable AJAX indexing by crawlers; include our seo module and tell our application to let us know when we're done rendering the page; install and run PhantomJS.

Making our Site Crawlable #

This couldn't be easier. Go to your index.html file and add the following line to the <head>:

<meta name="fragment" content="!"/>

This basically tells search engines that, while you're technically a SPA, you have the ability to interpret a special URL structure that it will request in order to ask for pre-rendered HTML pages. If you want to learn more about what the hell I'm talking about, click here. Otherwise, here's the gist of what's happening:

A crawler hits your site and sees that it's not pre-rendered HTML, but finds the fragment meta tag. This tag tells it to alter the way it requests information from your server by changing the hashtag in the URL structure to ?_escaped_fragment. Now your server, asked for a new url, serves the request from a pre-rendered set of pages instead of from the application. This latter procedure gives the search engine a full html page to work with, rather than just an empty JS-rendered page.1

Adding the SEO Code #

The next thing we'll do is go into our app.js file and find the module inclusions part of our declaration. However you do it, you'll need to include the seo module that comes inside the angular-seo.js file we put in our lawsonry/app folder earlier. For example, here's what my module declarations block looks like:

angular.module("lawsonryApp", [
  "ngCookies",
  "ngResource",
  "ngSanitize",
  "ngRoute",
  "seo",
]).config(function ($routeProvider) {
  $routeProvider.when("/", {
    templateUrl: "views/main.html",
    controller: "MainCtrl",
  }).when("/about", { templateUrl: "views/about.html", controller: "MainCtrl" })
    .otherwise({ redirectTo: "/" });
});

Notice that I've added the seo module up there. Make sure you do, too! The last thing we'll do in the app is set a scope-level declaration that all the html has been rendered. This is super easy: Depending on how you organize your controllers, simply call $scope.htmlReady() whenever you are certain that the HTML page is done loading. This is often done at the end of the main controller. For example, with the controller that comes with Yeoman's AngularJS scaffolding, your main.js file would look like this:

'use strict';

angular.module('oliviaApp') .controller('MainCtrl', function ($scope) {
$scope.awesomeThings = [ 'HTML5 Boilerplate', 'AngularJS', 'Karma' ]; // SEO
REQUIREMENT: // PhantomJS pre-rendering workflow requires the page to declare,
through htmlReady(), that // we are finished with this controller.
$scope.htmlReady(); });

Finally, we need to actually include the angular-seo.js file manually in our index.html file, toward the bottom where the includes for our controllers go. In an unedited scaffolding, my new index.html file looks like this (at the bottom):

<script src="scripts/app.js"></script>
<script src="scripts/controllers/main.js"></script>
<script src="angular-seo.js"></script>

Now we're complete with app-level changes, so let's move to our command line to deal with the server-side requirements. Don't worry; we're almost done!

Setting up PhantomJS #

The last part of our Scaffolding is to install and run PhantomJS alongside our development environment. You should already have npm, so install phantom like this: npm install phantomjs Once that's completed, navigate to your application root directory (the one where we put the other angular-seo file, angular-seo-server.js) and run the following command:

phantomjs --disk-cache=no angular-seo-server.js 9090 http://127.0.0.1:9000

This will start a phantomJS server with no disk caching (we'll use that during production in another tutorial) on port 9090. It's important to note that PhantomJS's port needs to be different from the port that your application runs on. Notice that we have set the last parameter (the application) URL to be running on port 9000; that port number comes from the grunt file native to Yeoman's AngularJS scaffolding.

In other words, yo angular gives us the option to run grunt server, which sets up a localhost webserver to test our app on port 9000.

So think of it like this:

Now that we've got PhantomJS running, let's go ahead and run our development server, too: grunt serve Now we've got our development environment running a web server on 127.0.0.1 at port 9000 (or localhost, depending on what you like to call it), and a second web server running on port 9090 that will listen to traffic on port 9000 to see if that traffic is coming from a crawler. Fantastic!

Testing Your Pre-Rendered HTML #

The last thing I would encourage everyone to do is test whether your site is serving pre-rendered HTML to requests that contain the ?_escaped_fragment= url. You do this by going back to your terminal and typing: curl 'http://localhost:9090/?_escaped_fragment_= This will pull from your PhantomJS server a request for whatever is routed to the '/' route, which should be (if you haven't modified the Yeoman AngularJS scaffolding) the views/main.html file. The terminal should output a fully rendered HTML page. Check the contents of the &lt;div class="container ng-scope" ng-view""&gt; tag, and you should see a bunch of HTML underneath. It works!

Going Live #

To take this to production, you'll need to make one more adjustment on the server. Add a detection block in your site's configuration on your server that will check if the escaped_fragment_ url is being requested, because if it is, we'll want to proxy the user over to PhantomJS instead of serving from our main server on port 80. If you're in Nginx2 (like I am), you can do this:

if ($args ~ escaped_fragment) {
    # Proxy to PhantomJS instance here
}

However you do it, just remember to have your PhantomJS running on a different port than your web server.

Common Problems #

(This section is reserved for commenters whose problems are solved. If you have any questions or concerns, leave a comment and let's sort it out together!) Problem: The curl test is not outputting pre-rendered html pages. Solution: You need to ensure that your root route '/' is what you used for the server address when you instantiated phantomjs. For example, if you're routing your application's root to '/index.html', you need to change the server address from the example above to http://localhost:9090/index.html/?_escaped_fragment_=' Whew! I know it seems daunting, but once you have it all setup, it's really very simple.

  • If you want to see what a page looks like without pre-rendered HTML, just open up your AngularJS app in `view-source:your-app-url` and take a look. Notice anything? Your view partials are not loaded on this page; JavaScript loads these html files dynamically. If you think about this from a search engine's point of view, how will you be able to see the web content unless you're viewing the site from a browser? 
  • In a future tutorial, we'll talk about capitalizing on Nginx's amazing static file serving capabilities by setting up a failover for a prerendered cache of html files in a snapshots/ folder. Basically, Nginx receives the request and checks a local snapshots/ folder to see if the requested file exists. If it does, it will check it's last cache time. If it's too old, it will recache the file with PhantomJS and then serve the cached file. If it's not too old, it will simply serve the cached file.