> Permission to read from or write to another process is governed by a ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check
I presume that is necessary for this in addition to belonging to the same UID?
> As far as I know process_vm_readv isn't even detectable if the agent process is more privileged than the examinee process—so you're free to manipulate your private copy of the application in the comfort of your own address space.
Interesting. This would be really useful in debugging. Many issues don't reproduce except for in specific configurations. Having access to the memory dump of the live process "streamed" to the debugger would be great!
"a PTRACE_MODE_ATTACH_REALCREDS check" and "same UID" are roughly synonyms. ("Roughly" because the actual check also permits privileged processes and denies some special cases such as processes that have done a setuid. REALCREDS, if I remember right, is in contrast to the check done on certain files in /proc.)
It's the same check as ptrace itself, so the intuition is "can I strace or gdb this process."
AIUI Docker containers by default deny the ptrace syscall (and presumably process_vm_readv/writev), they don't change the permission check. So /proc/$pid/mem, which uses the same permission check, ought to work.
(This also means that you don't want or need CAP_SYS_PTRACE to get gdb/strace working in Docker, that lets you ptrace anything and also coincidentally turns off the syscall filter. Just turn the filter off, that works without privileging the processes in the container.)
> Having access to the memory dump of the live process "streamed" to the debugger would be great!
This is also possible with standard debuggers, such as GDB: It can attach to a running process and not only examine the memory, but also debug (stop, pause, skip, ...) the stack trace and control flow. Usage is as simple as gdb -p $(pidof my_running_program)
And even then you can probably just store the diff instead of a full image. And even then if you run out of memory you can just start evicting the oldest snapshots.
I know Windows doesn't get too much love here. But we have to admit that Win32 has already this kind of feature since ages: Process access routines such as OpenProcess() [1] coupled with ReadProcessMemory() [2] will do the job in a clean way.
Taking a snapshot of other processes is also a basic use case of this family of functions [3].
I appreciate win32 much more than a typical HN user. But let's not fall into the trap of seeing it as the only way or that certain features haven't also been on Unix for decades.
> that Win32 has already this kind of feature since ages: Process access routines such as OpenProcess() [1] coupled with ReadProcessMemory() [2]
And Unix has had ptrace(2) for a very long time too, which will accomplish the same thing. Also you can read another process's memory through /proc. So there are multiple paths.
Also keep in mind, the most "legit" use case for this stuff is for writing a debugger. If you have used a debugger you are already relying on this functionality being there.
Edit: I am not aware of a win32 equivalent of this thing that lets you easily handle another thread's page faults in user mode though. That seems a little wacky. You can use debugger APIs to handle "STATUS_IN_PAGE_ERROR" and "STATUS_ACCESS_VIOLATION", which might get you there.
What unfork does is more complicated than a mere read, though. I'm still not entirely clear on its use case, but it does all sorts of tampering that the code comments describe as "cursed." It also seems to be specifically targeting applications which have anti-debug measures.
"Debuggers" implemented mostly through ReadProcessMemory / mach_vm_read / process_vm_read/pread are all intended to defeat anti-debug mechanisms, though; I'm not clear how unfork is meant to make the process simpler, but it looks intriguing.
process_vm_read and its equivalents allow you to read data whose layout you already know. unfork allows you to read data whose layout you don't know or that is partially generated by calling the accessors that the application already has for that data instead of reverse-engineering them and doing the transformation yourself.
unfork appears to be unique in that it creates the illusion of mapping the target process's memory into the source. This is achieved through the use of userfaultfd, which allows a Linux process to mark memory as missing, to receive notifications when other threads attempt to access missing memory, and to provide the contents of that memory in response to such faults. This mechanism is quite powerful and flexible, and Windows does not have a direct equivalent of this.
The closest equivalent I can think of in Windows would be to mark pages as no-access and use vectored exception handling to trap access faults. During a fault, the exception handler would fill in the page (e.g. via ReadProcessMemory) and flip the page protection to read or read/write.
Since you wouldn't want to flip the page protection until after the memory had been updated, you would probably have to used a pagefile-backed section to update the memory at a separate virtual address with independent page protections. And unlike the userfaultfd approach, this mechanism would not help for cases where the mirrored memory was being passed to a syscall.
I think Linux could do this too, via a signal handler, but AFAIK the Linux memory manager does not efficiently support per-page access protection (unlike Windows). In the worst case, each page would get its own vma structure in the kernel, which would be quite expensive. So absent userfaultfd, the Windows memory manager probably has the edge.
Glibc used to have unexec(), which is fairly old, but it was removed because nobody used it (except Emacs, and there were better solutions to the problem it was solving).
It's as clean as any official Win32 API which uses their privilege system to restrict/allow accesses to each and any bit of information on the process state and/or memory.
> Can you simply exec the result?
This is possible using CreateThread() [1] which creates a remote thread inside another process execution context.
Emacs somewhat famously uses "unexec" in its build process, you build a skeletal Emacs in C (mostly the Emacs Lisp implementation), run it, load and compile and process a bunch of Lisp that implements the editor itself, and dump the resulting process memory back out to disk. The result of this eldritch process is the final emacs binary. When you exec emacs, you get an environment that consists of the editor code ready to go.
It comes from a time of machines executing instructions thousands of times slower than they do now. Literally – thousands. Memory access was about as fast as an instruction execution, so the amount of compute you can justify per unit of data was hundreds of times less than it is now. They did however have virtual memory systems with on demand page fetching.
Also, that machine was being time shared with a dozen or more users.
Launching emacs or TeX on this machine might take tens of seconds without access to unexec(), but only 3 seconds for the freeze dried version.
unexec() was easier at the time. There were no shared libraries, no address space layout randomization. One memory region grew up from the bottom, one down from the top. There was no mmap() jamming mysterious stuff in the middle. Just copy the bottom, copy the top, do magic to adjust the stack for your unexec() call, and write the thing out as an executable.
(Yeah, I excised unexec() from BibTeX back in the ‘80s to port it to a 68k Mac for a coworker, then later implemented unexec() for a Motorola 88k based multilevel secure SysV system in the early ‘90s because launching emacs was driving me insane. I prefer our shiny new future of stupidly fast computers.)
Interestingly, even if Emacs removes this I see Apple being forced to keep their hack in place as they're not likely to update their version of Emacs anytime soon…
I don't think it's the same thing. Linkat is still starting with a file that exists in the filesystem.
In my silly world, unopen() would just take any fd (socket, file, pipe, etc.) and create a file system binding which anyone could open. Kind of like how /proc works on Linux today.
Does that mean that one could implement "undelete" by changing open() so that some central process (let's call it "recycle-bin") also opens a copy at the same time a process opens them (but keeps it open until you send it a "empty" signal), and then calling that link()?
You could use LD_PRELOAD to replace unlink with a version that passes an fd to a running recycle-bin daemon. That approach doesn't really have much, or any, advantage over just moving files to a recycle-bin folder and keeping track of items in there.
For those who aren't familiar with the significance of this, "cd" is a shell builtin because the working directory is per-process state. So while it's perfectly valid to write a program that does a chdir(2) and then exits, it's only changing its own working directory, which is pretty useless.
> A: It's true that meshing address spaces is much harder than copying them. ... [truncated] ... 64-bit systems with ASLR are far more forgiving. Nevertheless, I think that with some effort two allocators or even dynamic linkers could survive together.
Does it pause the process whose memory it is copying?
Freezing the process can affect its correct operation. (Sometimes when I need a memory dump of a production java app, I can't take it because can not afford freezing a production app)
Without the freeze, the memory copy we get can be inconsistent.
If I understand it correctly, it's more of a Copy-on-Read/Write, and as opposed to fork, it's only one-sided: read/write is only detected and results in a copy on the unfork side; if the original process changes memory, it doesn't result in a copy as nothing is monitoring this (the userfaultfd only monitors the unfork side).
The ideal approach would be if it turned the original process memory into "copy on write", and created a paused exact copy of that process. This would give a consistent, immutable, snapshot of the target process memory, without freezing for the duration of actual memory copying.
One could then take a core dump, java heap dump, or similar, of the paused copy process.
I'm curious, why does the tool try to copy the original process memory into the memory of the tool itself, risking a collision? Is it impossible to create a third process - an exact copy of the original process?
> all while leaving no ptrace and sending no signal
If this is a design goal, I'm afraid it is indeed impossible to take a snapshot of the original process. As far as I know (I researched the status quo 2 years ago when I needed copy-on-write for VM cloning/forking), the only way to make a snapshot of a process' address space is to invoke the clone (fork) system call. If you need to take a snapshot of another process, then you need ptrace.
But you're absolutely right that the unfork functionality itself can be implemented more robustly by doing this ptrace/fork trick.
I've read the quirky FAQ and would now be interested in what this really does. The Readme mentions some demo code that's not in the repository. It also instructs us to run this on cat and enjoy, but... what would we observe and enjoy?
If you assume each process needs, say, 16MB of contiguous space, then you get 48-24 bits left, which by the birthday paradox implies you can have up to 2^(24/2 = 12) ~= 4k processes before you start colliding about half the time.
No, it merges a copy-on-write clone of the debuggee into yourself. That's quite different and, indeed, you can do similar things with it that a debugger could.
If I understand this right,the process being unforked into you won't notice a thing and will happily chime on.
I could imagine it being interesting for VMs and DBs too. Imagine a VM whose memory "looks" like virtual memory but under the hood is transparently persisted between invocations.
Seems like this would be useful for re-attaching to a shell or process you have either disowned or otherwise lost control of. Would be neat to see some common use cases on the FAQ page.
Is it just me, or is this something that would make Linux even more vulnerable to cyber attacks? What protections are there? Would OpenBSD's pledge prevent something like this?
I presume that is necessary for this in addition to belonging to the same UID?
> As far as I know process_vm_readv isn't even detectable if the agent process is more privileged than the examinee process—so you're free to manipulate your private copy of the application in the comfort of your own address space.
Interesting. This would be really useful in debugging. Many issues don't reproduce except for in specific configurations. Having access to the memory dump of the live process "streamed" to the debugger would be great!