.. Seitegeist documentation master file, created by
   sphinx-quickstart on Tue Jan  6 11:42:32 2015.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to Seitegeist's documentation!
======================================

Seitegeist (Git repository on `Izeni's GitLab`_) is a library that provides a
management command, ``fetch_pages``, that loads a URL, executes it with
PhantomJS (including all JavaScript), and then outputs that HTML. It provides a
few configurable backends to store that HTML to disk or to Amazon AWS.

.. _Izeni's Gitlab: https://dev.izeni.net/open_source/seitegeist  

Raison d'être
=============

Many search engines do not execute JavaScript and if your website only renders 
its content client-side then these search engines and other bots such as social
networking sites will not be able to "see" the rendered content of your page.
Many search engines and bots support a technique called the "escaped fragment",
which allows the site to signal to the bot that there is an alternative,
pre-rendered version of the page that it can fetch.

Detailed documentation for Google's implementation of "escaped fragment"
`is available online`_.

.. _is available online: https://developers.google.com/webmasters/ajax-crawling/docs/specification

Seitegeist provides the basic tools needed to render a static version of the
page so that non-JavaScript version of your page can be made available via the
"escaped fragment" technique. This isn't it's only use but it is a primary use
case.

Installation
============

Having a Python 2.7 or Python 3 installation, use ``pip`` to install::

    pip install git+https://dev.izeni.net/open_source/seitegeist.git@master#egg=seitegeist

In the future ``pip install seitegeist`` may work.

Also, the main dependency is PhantomJS, which is required for it to run. This
can most easily be installed with NodeJS.::

    npm install -g phantomjs

This may require root privileges, or you can set the environment variable
``PREFIX`` to something like ``$HOME/.local`` or to ``$VIRTUAL_ENV`` and it
will work.

Usage
=====

Backend Settings
----------------

First, determine the backend you with to use. There are currently three
backends. ``FilelikeBackend`` simply writes the HTML to a file-like object
provided by your code. ``DirectoryPathBackend`` write the HTML to disk under a
provided base path. ``S3Backend`` writes to an Amazon S3 bucket.

To set your backend, you need to add a value to your Django settings module.::

    SEITEGEIST = {
        "BACKEND": "sitegeist.backends.DirectoryPathBackend",
        "BACKEND_ARGS": {"path": "somepath"},
    }


The ``SEITEGEIST`` object must be a dictionary and must have the ``BACKEND``
key with a string as the value, and the ``BACKEND_ARGS`` must exist and be a
dictionary.

Whatever the ``BACKEND_ARGS`` has to be is dependent on the code within the
backend, and is documented for each one.

App code: list_pages
--------------------

Next you need to write code for your app. In each Django app you wish to use
Seitegeist with, you need to add a module ``seites``, with a function
``list_pages()``. The ``list_pages`` function must take an argument that will
be a boolean, specifying whether the user desires to run a "full" update or
not. This function must return an iterable of dictionaries (it can be a
generator if you like); the values in the dictionary must be ``url``, the URL
that will be fetched,  ``dest``, which will be passed to the backed as the
target value, and optionally ``callback`` which is a callable that will be
executed after the call is done.

Example file ``myapp/seites.py``::

    from .models import MyModel

    def updateobj(obj):
        def inner():
            obj.seitegeistified = True
            obj.save()

    def list_pages(full=False):
        if full:
            queryset = MyModel.objects.all()
        else:
            queryset = MyModel.objects.filter(updated=True)
        for obj in queryset:
            url = obj.get_absolute_url()
            yield {'url': 'http://mypublicsite.example.com'+url,
                   'dest': url.strip("/"),
                   'callback': updateobj(obj)}

This assumes your model has a ``get_absolute_url()`` method. It will instruct
Seitegeist to load and render the page from a fully-qualified URL and then have
the backend save it to a file path. Returning a file path isn't the only
option, you could use the ``FilelikeBackend`` and instead return a write-able
file-like, like perhaps ``sys.stdout``, just to test it. The ``updateobj``
function is a closure that returns a callback function that will make a change
to the object after it Seitegeist has been processed.

Run the management command
--------------------------

Now that Seitegeist is configured and a list of pages to render is provided,
you can run the management command ``./manage.py fetch_pages myapp``.

The list of app names is optional, Seitegeist will the ``seites.list_pages()``
function of all apps that have one if not provided.

What this does is run PhantomJS for each page, render its HTML and execute its
JavaScript, then dump the DOM as HTML to stdout, which is captured and then
saved to the configured backend.

There is an optional flag to ``fetch_pages``, ``--full``, which will be passed
to the ``list_pages()`` function in each app, and means the app will process
this run of fetch pages with an alternate list, semantically meaning a "full"
list while a normal one is just an "incremental" list.

nginx Example
=============

As an example of using the escaped fragment technique, suppose you have a site
with an AngularJS single-page app where widget pages live under
``/app#!/widgets/[slug]/``. The hash-bang pattern is part of the escaped fragment
specification, but you can use HTML5 URLs as well. You can then have the
S3Backend in Seitegeist configured to write to the bucket ``mysite-rendered``.

Your ``list_pages()`` renders each ``/app#!/widgets/slug/`` URL to the path
"widgets/slug". Now, all you need is an nginx config like this

.. code-block:: nginx

    http {
        ...
        server ... {
            ...
            # Numbered error code that returned calls the fragment_s3 location.
            error_page 601 = @rendered_s3;
            location @rendered_s3 {
                rewrite ^(.*)/$ /mysite-rendered$1? break;
                proxy_pass http://s3.amazonaws.com;
                proxy_http_version 1.1;
                proxy_set_header Host 's3.amazonaws.com';
                proxy_set_header Cookie '';
                proxy_read_timeout 60;
                add_header Content-Type 'text/html; charset=utf-8';
            }

            location / {
                # If the URL arguments include "_escaped_fragment_" then return a
                # rendered version.
                if ($args ~ "_escaped_fragment_=") {
                    return 601;
                }
                # When using HTML5 URLS instead of hash-bang, Facebook.com doesn't
                # play nice. Use User-Agent sniffing instead in that case.
                if ($http_user_agent ~* "^facebookexternalhit") {
                    return 601;
                }
                ...
                uwsgi_pass 127.0.0.1:3000;
                ...
            }
            ...
        }
    }

This server will pass requests to uWSGI unless they either have
``_escaped_fragment_`` in their GET arguments or are from Facebook.com, in which
case they instead sent a pre-rendered HTML file, served from the S3 bucket
using a reverse-proxy.

And that's that. More examples will be written soon.


API Documentation
=================

.. toctree::
   :maxdepth: 2

   backends
   tools


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`