DIY AngularJS SEO with PhantomJS (the easy way!)
September 11, 2019: This is a super old tutorial. I don't think you will want to use this anymore.
Setting up your AngularJS development environment needs to include SEO best practices. For JS-rendered applications, take a look at this solid solution using PhantomJS. #
I've been tasked with recreating a website for a higher education institution and I want to capitalize on AngularJS technologies to provide a rich user experience. Unfortunately, one of the largest issues with using the SPA approach to business/corporate/education web design is search engine optimization; AngularJS, and any JS-rendered application framework, is not SEO-friendly. To get around this, we need a way to serve search engine bots a set of pre-rendered HTML pages. Our goal is to create a development environment for AngularJS SEO awesomeness. In this tutorial, we'll walk through how to get PhantomJS up and running right alongside our app using our Yeoman AngularJS scaffolding that comes with small development server. We'll go from having nothing to having a full development environment, complete with verifiable pre-rendered page cache for bots to eat up and enjoy.
#
Update – May 2nd: I've received a few dozen emails after publishing this article (and lots of generous praise — thank you, everyone!) Please use the comments section to post any questions or concerns you may have, as this is just the start of a huge AngularJS tutorial project that I've started.
@lawsonry Thank you for publishing this! Literally the first good, concrete explanation that I've seen on how to do SEO in Angular.
— mklickman (@mklickman) May 2, 2014
The Scaffolding #
To setup our development environment, we'll be using Yeoman's AngularJS generator, which is an all-in-one solution for developing and testing AngularJS applications. Open up a terminal and let's get started:
mkdir lawsonry
cd lawsonry
yo angular
Wait for the scaffolding to start, and definitely go ahead and answer those
questions about whether you want to include Bootstrap (that's what I use) and
any of those other AngularJS modules. Once it's finished, you'll have a
development environment setup. From your terminal you were just in, let's test
our development server: grunt server
You should see some automation kick in,
then your default browser will open up and you'll see the default Yeoman
scaffolding for the AngularJS template. Right now we're all set to develop an
SPA with AngularJS, but we're going to take this one step further and create an
HTML pre-rendering workflow to server pre-rendered HTML pages to bots for SEO
purposes.
The SEO Setup #
I have to be honest: I'm only writing this tutorial because I could not find a
decent tutorial on setting up an environment for AngularJS SEO awesomeness. It
couldn't be that hard, right? We need something that will tell crawlers to eat
up pre-rendered pages, and thankfully, a lot of the heavy lifting in terms of
module design has already been taken care of by Steeve at GitHub. Steeve's
code is actually the bulk of what we're going to be using here. The first thing
we'll need to do is get a hold of this angular-seo package from GitHub:
git clone https://github.com/steeve/angular-seo.git
Inside this folder you'll
have two core files: angular-seo.js
, which you need to put into your
/lawsonry/app
folder, and angular-seo-server.js
, which you need to put in
your /lawsonry
folder (or wherever your application root folder is — you
know, the one with the Gruntfile.js file in it). You can follow Steeve's
instructions here, but I found it a little unhelpful at 6:00AM. So let's do
this setup together.
**The idea is simple: **we're going to have our application running from our application port, and then a PhantomJS instance of our application running from a snapshot port. Requests from non-bots will be served directly from our application port (it doesn't matter what port that is), and requests from bots and search engines will be served pre-rendered html content via the snapshot port.
To do this, we'll have to do three things: tell our application to enable AJAX indexing by crawlers; include our seo module and tell our application to let us know when we're done rendering the page; install and run PhantomJS.
Making our Site Crawlable #
This couldn't be easier. Go to your index.html
file and add the following line
to the <head>
:
<meta name="fragment" content="!"/>
This basically tells search engines that, while you're technically a SPA, you have the ability to interpret a special URL structure that it will request in order to ask for pre-rendered HTML pages. If you want to learn more about what the hell I'm talking about, click here. Otherwise, here's the gist of what's happening:
A crawler hits your site and sees that it's not pre-rendered HTML, but finds
the fragment meta tag. This tag tells it to alter the way it requests
information from your server by changing the hashtag in the URL structure to
?_escaped_fragment
. Now your server, asked for a new url, serves the request
from a pre-rendered set of pages instead of from the application. This latter
procedure gives the search engine a full html page to work with, rather than
just an empty JS-rendered
page.1
Adding the SEO Code #
The next thing we'll do is go into our app.js
file and find the module
inclusions part of our declaration. However you do it, you'll need to include
the seo
module that comes inside the angular-seo.js
file we put in our
lawsonry/app
folder earlier. For example, here's what my module declarations
block looks like:
angular.module("lawsonryApp", [
"ngCookies",
"ngResource",
"ngSanitize",
"ngRoute",
"seo",
]).config(function ($routeProvider) {
$routeProvider.when("/", {
templateUrl: "views/main.html",
controller: "MainCtrl",
}).when("/about", { templateUrl: "views/about.html", controller: "MainCtrl" })
.otherwise({ redirectTo: "/" });
});
Notice that I've added the seo
module up there. Make sure you do, too! The
last thing we'll do in the app is set a scope-level declaration that all the
html has been rendered. This is super easy: Depending on how you organize your
controllers, simply call $scope.htmlReady()
whenever you are certain that the
HTML page is done loading. This is often done at the end of the main controller.
For example, with the controller that comes with Yeoman's AngularJS scaffolding,
your main.js
file would look like this:
'use strict';
angular.module('oliviaApp') .controller('MainCtrl', function ($scope) {
$scope.awesomeThings = [ 'HTML5 Boilerplate', 'AngularJS', 'Karma' ]; // SEO
REQUIREMENT: // PhantomJS pre-rendering workflow requires the page to declare,
through htmlReady(), that // we are finished with this controller.
$scope.htmlReady(); });
Finally, we need to actually include the angular-seo.js file manually in our index.html file, toward the bottom where the includes for our controllers go. In an unedited scaffolding, my new index.html file looks like this (at the bottom):
<script src="scripts/app.js"></script>
<script src="scripts/controllers/main.js"></script>
<script src="angular-seo.js"></script>
Now we're complete with app-level changes, so let's move to our command line to deal with the server-side requirements. Don't worry; we're almost done!
Setting up PhantomJS #
The last part of our Scaffolding is to install and run PhantomJS alongside our
development environment. You should already have npm, so install phantom like
this: npm install phantomjs
Once that's completed, navigate to your
application root directory (the one where we put the other angular-seo file,
angular-seo-server.js) and run the following command:
phantomjs --disk-cache=no angular-seo-server.js 9090 http://127.0.0.1:9000
This will start a phantomJS server with no disk caching (we'll use that during production in another tutorial) on port 9090. It's important to note that PhantomJS's port needs to be different from the port that your application runs on. Notice that we have set the last parameter (the application) URL to be running on port 9000; that port number comes from the grunt file native to Yeoman's AngularJS scaffolding.
In other words, yo angular
gives us the option to run grunt server
, which
sets up a localhost webserver to test our app on port 9000.
So think of it like this:
-
PhantomJS runs on port 9090 and listens for requests on port 9000.
-
If those requests contain the `?_escaped_fragment=` URL instead of the hashtag URL, then PhantomJS knows to pre-render the page and serve it because the only way we wouldn't be asking for a hashtag url is if the requestor is a crawler.
-
If those requests contain hashtags, then this is a human (browser) accessing the app and we can go ahead and bypass PhantomJS altogether.
Now that we've got PhantomJS running, let's go ahead and run our development
server, too: grunt serve
Now we've got our development environment running a
web server on 127.0.0.1 at port 9000 (or localhost, depending on what you like
to call it), and a second web server running on port 9090 that will listen to
traffic on port 9000 to see if that traffic is coming from a crawler. Fantastic!
Testing Your Pre-Rendered HTML #
The last thing I would encourage everyone to do is test whether your site is
serving pre-rendered HTML to requests that contain the ?_escaped_fragment=
url. You do this by going back to your terminal and typing:
curl 'http://localhost:9090/?_escaped_fragment_=
This will pull from your
PhantomJS server a request for whatever is routed to the '/'
route, which
should be (if you haven't modified the Yeoman AngularJS scaffolding) the
views/main.html
file. The terminal should output a fully rendered HTML page.
Check the contents of the <div class="container ng-scope" ng-view"">
tag, and you should see a bunch of HTML underneath. It works!
Going Live #
To take this to production, you'll need to make one more adjustment on the server. Add a detection block in your site's configuration on your server that will check if the escaped_fragment_ url is being requested, because if it is, we'll want to proxy the user over to PhantomJS instead of serving from our main server on port 80. If you're in Nginx2 (like I am), you can do this:
if ($args ~ escaped_fragment) {
# Proxy to PhantomJS instance here
}
However you do it, just remember to have your PhantomJS running on a different port than your web server.
Common Problems #
(This section is reserved for commenters whose problems are solved. If you have
any questions or concerns, leave a comment and let's sort it out together!)
Problem: The curl test is not outputting pre-rendered html pages. Solution:
You need to ensure that your root route '/'
is what you used for the server
address when you instantiated phantomjs. For example, if you're routing your
application's root to '/index.html'
, you need to change the server address
from the example above to
http://localhost:9090/index.html/?_escaped_fragment_='
Whew! I know it seems
daunting, but once you have it all setup, it's really very simple.