.. Seitegeist documentation master file, created by sphinx-quickstart on Tue Jan 6 11:42:32 2015. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to Seitegeist's documentation! ====================================== Seitegeist (Git repository on `Izeni's GitLab`_) is a library that provides a management command, ``fetch_pages``, that loads a URL, executes it with PhantomJS (including all JavaScript), and then outputs that HTML. It provides a few configurable backends to store that HTML to disk or to Amazon AWS. .. _Izeni's Gitlab: https://dev.izeni.net/open_source/seitegeist Raison d'ĂȘtre ============= Many search engines do not execute JavaScript and if your website only renders its content client-side then these search engines and other bots such as social networking sites will not be able to "see" the rendered content of your page. Many search engines and bots support a technique called the "escaped fragment", which allows the site to signal to the bot that there is an alternative, pre-rendered version of the page that it can fetch. Detailed documentation for Google's implementation of "escaped fragment" `is available online`_. .. _is available online: https://developers.google.com/webmasters/ajax-crawling/docs/specification Seitegeist provides the basic tools needed to render a static version of the page so that non-JavaScript version of your page can be made available via the "escaped fragment" technique. This isn't it's only use but it is a primary use case. Installation ============ Having a Python 2.7 or Python 3 installation, use ``pip`` to install:: pip install git+https://dev.izeni.net/open_source/seitegeist.git@master#egg=seitegeist In the future ``pip install seitegeist`` may work. Also, the main dependency is PhantomJS, which is required for it to run. This can most easily be installed with NodeJS.:: npm install -g phantomjs This may require root privileges, or you can set the environment variable ``PREFIX`` to something like ``$HOME/.local`` or to ``$VIRTUAL_ENV`` and it will work. Usage ===== Backend Settings ---------------- First, determine the backend you with to use. There are currently three backends. ``FilelikeBackend`` simply writes the HTML to a file-like object provided by your code. ``DirectoryPathBackend`` write the HTML to disk under a provided base path. ``S3Backend`` writes to an Amazon S3 bucket. To set your backend, you need to add a value to your Django settings module.:: SEITEGEIST = { "BACKEND": "sitegeist.backends.DirectoryPathBackend", "BACKEND_ARGS": {"path": "somepath"}, } The ``SEITEGEIST`` object must be a dictionary and must have the ``BACKEND`` key with a string as the value, and the ``BACKEND_ARGS`` must exist and be a dictionary. Whatever the ``BACKEND_ARGS`` has to be is dependent on the code within the backend, and is documented for each one. App code: list_pages -------------------- Next you need to write code for your app. In each Django app you wish to use Seitegeist with, you need to add a module ``seites``, with a function ``list_pages()``. The ``list_pages`` function must take an argument that will be a boolean, specifying whether the user desires to run a "full" update or not. This function must return an iterable of dictionaries (it can be a generator if you like); the values in the dictionary must be ``url``, the URL that will be fetched, ``dest``, which will be passed to the backed as the target value, and optionally ``callback`` which is a callable that will be executed after the call is done. Example file ``myapp/seites.py``:: from .models import MyModel def updateobj(obj): def inner(): obj.seitegeistified = True obj.save() def list_pages(full=False): if full: queryset = MyModel.objects.all() else: queryset = MyModel.objects.filter(updated=True) for obj in queryset: url = obj.get_absolute_url() yield {'url': 'http://mypublicsite.example.com'+url, 'dest': url.strip("/"), 'callback': updateobj(obj)} This assumes your model has a ``get_absolute_url()`` method. It will instruct Seitegeist to load and render the page from a fully-qualified URL and then have the backend save it to a file path. Returning a file path isn't the only option, you could use the ``FilelikeBackend`` and instead return a write-able file-like, like perhaps ``sys.stdout``, just to test it. The ``updateobj`` function is a closure that returns a callback function that will make a change to the object after it Seitegeist has been processed. Run the management command -------------------------- Now that Seitegeist is configured and a list of pages to render is provided, you can run the management command ``./manage.py fetch_pages myapp``. The list of app names is optional, Seitegeist will the ``seites.list_pages()`` function of all apps that have one if not provided. What this does is run PhantomJS for each page, render its HTML and execute its JavaScript, then dump the DOM as HTML to stdout, which is captured and then saved to the configured backend. There is an optional flag to ``fetch_pages``, ``--full``, which will be passed to the ``list_pages()`` function in each app, and means the app will process this run of fetch pages with an alternate list, semantically meaning a "full" list while a normal one is just an "incremental" list. nginx Example ============= As an example of using the escaped fragment technique, suppose you have a site with an AngularJS single-page app where widget pages live under ``/app#!/widgets/[slug]/``. The hash-bang pattern is part of the escaped fragment specification, but you can use HTML5 URLs as well. You can then have the S3Backend in Seitegeist configured to write to the bucket ``mysite-rendered``. Your ``list_pages()`` renders each ``/app#!/widgets/slug/`` URL to the path "widgets/slug". Now, all you need is an nginx config like this .. code-block:: nginx http { ... server ... { ... # Numbered error code that returned calls the fragment_s3 location. error_page 601 = @rendered_s3; location @rendered_s3 { rewrite ^(.*)/$ /mysite-rendered$1? break; proxy_pass http://s3.amazonaws.com; proxy_http_version 1.1; proxy_set_header Host 's3.amazonaws.com'; proxy_set_header Cookie ''; proxy_read_timeout 60; add_header Content-Type 'text/html; charset=utf-8'; } location / { # If the URL arguments include "_escaped_fragment_" then return a # rendered version. if ($args ~ "_escaped_fragment_=") { return 601; } # When using HTML5 URLS instead of hash-bang, Facebook.com doesn't # play nice. Use User-Agent sniffing instead in that case. if ($http_user_agent ~* "^facebookexternalhit") { return 601; } ... uwsgi_pass 127.0.0.1:3000; ... } ... } } This server will pass requests to uWSGI unless they either have ``_escaped_fragment_`` in their GET arguments or are from Facebook.com, in which case they instead sent a pre-rendered HTML file, served from the S3 bucket using a reverse-proxy. And that's that. More examples will be written soon. API Documentation ================= .. toctree:: :maxdepth: 2 backends tools Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`