Unfork()

aloknnikhil · on Oct 30, 2019

> Permission to read from or write to another process is governed by a ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check

I presume that is necessary for this in addition to belonging to the same UID?

> As far as I know process_vm_readv isn't even detectable if the agent process is more privileged than the examinee process—so you're free to manipulate your private copy of the application in the comfort of your own address space.

Interesting. This would be really useful in debugging. Many issues don't reproduce except for in specific configurations. Having access to the memory dump of the live process "streamed" to the debugger would be great!

geofft · on Oct 30, 2019

"a PTRACE_MODE_ATTACH_REALCREDS check" and "same UID" are roughly synonyms. ("Roughly" because the actual check also permits privileged processes and denies some special cases such as processes that have done a setuid. REALCREDS, if I remember right, is in contrast to the check done on certain files in /proc.)

It's the same check as ptrace itself, so the intuition is "can I strace or gdb this process."

tempay · on Oct 30, 2019

The big exception to this is inside containers where it is often disabled by default.

geofft · on Oct 30, 2019

AIUI Docker containers by default deny the ptrace syscall (and presumably process_vm_readv/writev), they don't change the permission check. So /proc/$pid/mem, which uses the same permission check, ought to work.

(This also means that you don't want or need CAP_SYS_PTRACE to get gdb/strace working in Docker, that lets you ptrace anything and also coincidentally turns off the syscall filter. Just turn the filter off, that works without privileging the processes in the container.)

comex · on Oct 30, 2019

Or if you have an LSM such as Yama or SELinux set to deny ptrace globally.

justincormack · on Oct 30, 2019

Yama lets you ptrace a child process but not others.

jwilk · on Oct 30, 2019

By default, yes.

You can also disable ptrace() completely.

(Grep for "ptrace_scope" in the ptrace(2) man page for details.)

ktpsns · on Oct 30, 2019

> Having access to the memory dump of the live process "streamed" to the debugger would be great!

This is also possible with standard debuggers, such as GDB: It can attach to a running process and not only examine the memory, but also debug (stop, pause, skip, ...) the stack trace and control flow. Usage is as simple as gdb -p $(pidof my_running_program)

andrewaylett · on Oct 30, 2019

You might also find https://rr-project.org/ interesting -- it lets you step backwards too.

Enginerrrd · on Oct 30, 2019

Interesting. Without letting storage explode, I can't think of an easy way to do this since computation isn't really reversible.

saagarjha · on Oct 30, 2019

You can keep track of the before and after states whenever you do something nonreversible, like a syscall.

Doxin · on Oct 31, 2019

And even then you can probably just store the diff instead of a full image. And even then if you run out of memory you can just start evicting the oldest snapshots.

Enginerrrd · on Oct 31, 2019

There's a lot of nonreversability in the way a processor interacts with memory though.

fooker · on Oct 30, 2019

gdb supports reverse debugging too

aloknnikhil · on Oct 30, 2019

Right. But the difference is modifying any memory under GDB will be seen by the process. It's not copy-on-write

coldtea · on Oct 30, 2019

Isn't that a feature for debugging tho?

aloknnikhil · on Oct 30, 2019

Absolutely. I was arguing for "unforking" to say a patched version of the process and verifying a fix without actually modifying the live process.

mmoez · on Oct 30, 2019

I know Windows doesn't get too much love here. But we have to admit that Win32 has already this kind of feature since ages: Process access routines such as OpenProcess() [1] coupled with ReadProcessMemory() [2] will do the job in a clean way.

Taking a snapshot of other processes is also a basic use case of this family of functions [3].

[1] https://docs.microsoft.com/en-us/windows/win32/api/processth...

[2] https://docs.microsoft.com/en-us/windows/win32/api/memoryapi...

[3] https://docs.microsoft.com/en-us/windows/win32/toolhelp/taki...

asveikau · on Oct 30, 2019

I appreciate win32 much more than a typical HN user. But let's not fall into the trap of seeing it as the only way or that certain features haven't also been on Unix for decades.

> that Win32 has already this kind of feature since ages: Process access routines such as OpenProcess() [1] coupled with ReadProcessMemory() [2]

And Unix has had ptrace(2) for a very long time too, which will accomplish the same thing. Also you can read another process's memory through /proc. So there are multiple paths.

Also keep in mind, the most "legit" use case for this stuff is for writing a debugger. If you have used a debugger you are already relying on this functionality being there.

Edit: I am not aware of a win32 equivalent of this thing that lets you easily handle another thread's page faults in user mode though. That seems a little wacky. You can use debugger APIs to handle "STATUS_IN_PAGE_ERROR" and "STATUS_ACCESS_VIOLATION", which might get you there.

amelius · on Oct 30, 2019

> And Unix has had ptrace(2) for a very long time too

A flaw in the Linux implementation, though, prevents one to run ptrace on a process that is using ptrace itself.

As more programs use ptrace, this flaw is becoming quite annoying.

pjc50 · on Oct 30, 2019

The APIs are broadly equivalent, see https://nullprogram.com/blog/2016/09/03/

What unfork does is more complicated than a mere read, though. I'm still not entirely clear on its use case, but it does all sorts of tampering that the code comments describe as "cursed." It also seems to be specifically targeting applications which have anti-debug measures.

tptacek · on Oct 30, 2019

"Debuggers" implemented mostly through ReadProcessMemory / mach_vm_read / process_vm_read/pread are all intended to defeat anti-debug mechanisms, though; I'm not clear how unfork is meant to make the process simpler, but it looks intriguing.

whitequark_ · on Oct 30, 2019

process_vm_read and its equivalents allow you to read data whose layout you already know. unfork allows you to read data whose layout you don't know or that is partially generated by calling the accessors that the application already has for that data instead of reverse-engineering them and doing the transformation yourself.

jstarks · on Oct 30, 2019

unfork appears to be unique in that it creates the illusion of mapping the target process's memory into the source. This is achieved through the use of userfaultfd, which allows a Linux process to mark memory as missing, to receive notifications when other threads attempt to access missing memory, and to provide the contents of that memory in response to such faults. This mechanism is quite powerful and flexible, and Windows does not have a direct equivalent of this.

The closest equivalent I can think of in Windows would be to mark pages as no-access and use vectored exception handling to trap access faults. During a fault, the exception handler would fill in the page (e.g. via ReadProcessMemory) and flip the page protection to read or read/write.

Since you wouldn't want to flip the page protection until after the memory had been updated, you would probably have to used a pagefile-backed section to update the memory at a separate virtual address with independent page protections. And unlike the userfaultfd approach, this mechanism would not help for cases where the mirrored memory was being passed to a syscall.

I think Linux could do this too, via a signal handler, but AFAIK the Linux memory manager does not efficiently support per-page access protection (unlike Windows). In the worst case, each page would get its own vma structure in the kernel, which would be quite expensive. So absent userfaultfd, the Windows memory manager probably has the edge.

pcwalton · on Oct 30, 2019

I mean, /proc/foo/mem has also exposed that feature for forever on Linux.

klodolph · on Oct 30, 2019

How clean is it? Can you simply exec the result?

Glibc used to have unexec(), which is fairly old, but it was removed because nobody used it (except Emacs, and there were better solutions to the problem it was solving).

mmoez · on Oct 30, 2019

> How clean is it?

It's as clean as any official Win32 API which uses their privilege system to restrict/allow accesses to each and any bit of information on the process state and/or memory.

> Can you simply exec the result?

This is possible using CreateThread() [1] which creates a remote thread inside another process execution context.

[1] https://docs.microsoft.com/en-us/windows/win32/api/processth...

> Glibc used to have unexec()

My understanding is that unexec() was more about making a snapshot of the whole process state to an executable on disk.

hackworks · on Oct 30, 2019

That is my understanding too. Solaris had a flag for dldump (https://docs.oracle.com/cd/E19455-01/806-0627/6j9vhfmop/inde...). Emacs moved to a portable dumper (maybe inspired from XEmacs)

rjsw · on Oct 30, 2019

Emacs had its own unexec().

euske · on Oct 30, 2019

Totally, off topic, but it's funny to think about what kind of the feature would be if we put "un-" on each syscall:

unseek

unselect

unpipe

unsync

etc.

geofft · on Oct 30, 2019

Emacs somewhat famously uses "unexec" in its build process, you build a skeletal Emacs in C (mostly the Emacs Lisp implementation), run it, load and compile and process a bunch of Lisp that implements the editor itself, and dump the resulting process memory back out to disk. The result of this eldritch process is the final emacs binary. When you exec emacs, you get an environment that consists of the editor code ready to go.

lilyball · on Oct 30, 2019

I'm given to understand that the macOS implementation of malloc had to have special-cased code in it just to support emacs due to this approach.

saagarjha · on Oct 30, 2019

For the curious: https://opensource.apple.com/source/libmalloc/libmalloc-166....

tambourine_man · on Oct 30, 2019

jandrese · on Oct 30, 2019

Because linkers are too easy?

It really looks like some overly clever college student's weird trick that somehow managed to survive for decades in an established product.

jws · on Oct 30, 2019

It comes from a time of machines executing instructions thousands of times slower than they do now. Literally – thousands. Memory access was about as fast as an instruction execution, so the amount of compute you can justify per unit of data was hundreds of times less than it is now. They did however have virtual memory systems with on demand page fetching.

Also, that machine was being time shared with a dozen or more users.

Launching emacs or TeX on this machine might take tens of seconds without access to unexec(), but only 3 seconds for the freeze dried version.

unexec() was easier at the time. There were no shared libraries, no address space layout randomization. One memory region grew up from the bottom, one down from the top. There was no mmap() jamming mysterious stuff in the middle. Just copy the bottom, copy the top, do magic to adjust the stack for your unexec() call, and write the thing out as an executable.

(Yeah, I excised unexec() from BibTeX back in the ‘80s to port it to a 68k Mac for a coworker, then later implemented unexec() for a Motorola 88k based multilevel secure SysV system in the early ‘90s because launching emacs was driving me insane. I prefer our shiny new future of stupidly fast computers.)

shoo · on Oct 30, 2019

"Removing support for Emacs unexec from Glibc" -- https://lwn.net/Articles/673724/

saagarjha · on Oct 30, 2019

Interestingly, even if Emacs removes this I see Apple being forced to keep their hack in place as they're not likely to update their version of Emacs anytime soon…

aasasd · on Oct 30, 2019

squints Sooo... Elisp has AOT compilation!

trasz · on Oct 30, 2019

The idea is somewhat similar to Android's zygotes, isn't it?

colonwqbang · on Oct 30, 2019

unselect(2)

Description: put thread to sleep as long as there is activity on any fd, wake up only when all fds are inactive.

Useful for: Scheduling work to be performed only when server is idle.

unsync(2)

Description: Select a random file, load it into the buffer cache, and remove it from file system.

Useful for: Freeing up some disk space in a pinch.

unwait(2)

Description: Resurrects the previous child process.

Useful for: Implementing the !! operator in bash.

unsignalfd(2)

Description: Invoke signal handler whenever a given fd activates.

Useful for: User space interrupts.

unopen(2)

Description: Create a file which refers to an open fd.

Useful for: Implementing /proc/self/fd functionalit.

marcosdumay · on Oct 30, 2019

> unselect(2) > Useful for: Scheduling work to be performed only when server is idle.

Yeah that's nice.

> unsignalfd(2) > Useful for: User space interrupts.

There is libfam. At least on my system, it doesn't have a manual page.

> unopen(2) > Description: Create a file which refers to an open fd.

That does sounds useful, and I don't know any library that does it.

datenwolf · on Oct 30, 2019

The "unopen" syscall actually exists, albeit under a different name: linkat

https://linux.die.net/man/2/linkat

colonwqbang · on Oct 30, 2019

I don't think it's the same thing. Linkat is still starting with a file that exists in the filesystem.

In my silly world, unopen() would just take any fd (socket, file, pipe, etc.) and create a file system binding which anyone could open. Kind of like how /proc works on Linux today.

microtherion · on Oct 30, 2019

Surely, "unseek" should be called "lhide".

ars · on Oct 30, 2019

ungetc does actually exist.

lifthrasiir · on Oct 30, 2019

Or `unlink`.

ultrarunner · on Oct 30, 2019

ununlink would be pretty useful. Then again, this kind of exists; I have fond memories of a panicky younger me struggling with undelete in DOS.

marcan_42 · on Oct 30, 2019

ununlink is link() on a /proc/<pid>/fd/<fd> entry. Assuming some process still has the file open, that is.

coldtea · on Oct 30, 2019

Does that mean that one could implement "undelete" by changing open() so that some central process (let's call it "recycle-bin") also opens a copy at the same time a process opens them (but keeps it open until you send it a "empty" signal), and then calling that link()?

wichert · on Oct 30, 2019

You could use LD_PRELOAD to replace unlink with a version that passes an fd to a running recycle-bin daemon. That approach doesn't really have much, or any, advantage over just moving files to a recycle-bin folder and keeping track of items in there.

claudius · on Oct 30, 2019

It would "automatically empty" the recycle bin at each shutdown though.

coldtea · on Oct 30, 2019

Doesn't it have the advantage that "rm" and co are then made automatically undo-able?

(Whereas moving to recycle-bin is a manual process you need to remember to do).

saagarjha · on Oct 30, 2019

  alias rm trash

coldtea · on Oct 30, 2019

Also a manual process -- and per user at that...

jwilk · on Oct 30, 2019

link() doesn't work for this use case.

You need linkat() with the AT_SYMLINK_FOLLOW flag enabled.

jdoliner · on Oct 30, 2019

unopen

reportgunner · on Oct 30, 2019

untouch

alienallys · on Oct 30, 2019

unkill

moocowtruck · on Oct 30, 2019

unfinger

emigre · on Oct 30, 2019

unbowed unbent unbroken

jagrsw · on Oct 30, 2019

And for something completely different - but in the same vein - a stand-alone 'cd' binary - https://github.com/robertswiecki/extcd - enjoy!

0xcde4c3db · on Oct 30, 2019

For those who aren't familiar with the significance of this, "cd" is a shell builtin because the working directory is per-process state. So while it's perfectly valid to write a program that does a chdir(2) and then exits, it's only changing its own working directory, which is pretty useless.

Hello71 · on Oct 30, 2019

isn't this more or less the same as

    gdb -batch -n -ex 'call chdir("whatever")' -p $$

vidarh · on Oct 30, 2019

That does seem to be conceptually pretty much what it's doing. Except your version works on more architectures.

It's a simple example of ptrace() though.

apeace · on Oct 30, 2019

> Nevertheless, I think that with some effort two allocators or even dynamic linkers could survive together.

Famous last words.

koolba · on Oct 30, 2019

> How limited is this approach?

> A: It's true that meshing address spaces is much harder than copying them. ... [truncated] ... 64-bit systems with ASLR are far more forgiving. Nevertheless, I think that with some effort two allocators or even dynamic linkers could survive together.

That is a really cool side effect of ASLR!

[1]: https://en.wikipedia.org/wiki/ASLR

avodonosov · on Oct 30, 2019

Does it pause the process whose memory it is copying?

Freezing the process can affect its correct operation. (Sometimes when I need a memory dump of a production java app, I can't take it because can not afford freezing a production app)

Without the freeze, the memory copy we get can be inconsistent.

aargh_aargh · on Oct 30, 2019

I have no idea but the FAQ says it's CoW.

Liskni_si · on Oct 30, 2019

If I understand it correctly, it's more of a Copy-on-Read/Write, and as opposed to fork, it's only one-sided: read/write is only detected and results in a copy on the unfork side; if the original process changes memory, it doesn't result in a copy as nothing is monitoring this (the userfaultfd only monitors the unfork side).

avodonosov · on Oct 31, 2019

The ideal approach would be if it turned the original process memory into "copy on write", and created a paused exact copy of that process. This would give a consistent, immutable, snapshot of the target process memory, without freezing for the duration of actual memory copying.

One could then take a core dump, java heap dump, or similar, of the paused copy process.

I'm curious, why does the tool try to copy the original process memory into the memory of the tool itself, risking a collision? Is it impossible to create a third process - an exact copy of the original process?

Liskni_si · on Oct 31, 2019

The FAQ says:

> all while leaving no ptrace and sending no signal

If this is a design goal, I'm afraid it is indeed impossible to take a snapshot of the original process. As far as I know (I researched the status quo 2 years ago when I needed copy-on-write for VM cloning/forking), the only way to make a snapshot of a process' address space is to invoke the clone (fork) system call. If you need to take a snapshot of another process, then you need ptrace.

But you're absolutely right that the unfork functionality itself can be implemented more robustly by doing this ptrace/fork trick.

tom_mellior · on Oct 30, 2019

I've read the quirky FAQ and would now be interested in what this really does. The Readme mentions some demo code that's not in the repository. It also instructs us to run this on cat and enjoy, but... what would we observe and enjoy?

saagarjha · on Oct 30, 2019

I wonder if 64 (well, 48) bits of address space is enough to glom together every process on a normal Linux boot without collisions…

smallnamespace · on Oct 30, 2019

If you assume each process needs, say, 16MB of contiguous space, then you get 48-24 bits left, which by the birthday paradox implies you can have up to 2^(24/2 = 12) ~= 4k processes before you start colliding about half the time.

saagarjha · on Oct 30, 2019

> If you assume each process needs, say, 16MB of contiguous space

Unfortunately I’m not sure that’s a good assumption, due to the stack and heap needing to exist even for statically-linked binaries.

smallnamespace · on Oct 30, 2019

Yes, but simply replace with 'average number of allocations * number of processes' and * 'average size of allocation'

Iv · on Oct 30, 2019

Isn't that a bit similar to what debuggers typically do when you ask them to attach to a given process?

saagarjha · on Oct 30, 2019

Debuggers touch other processes from afar. This merges the debuggee into yourself.

skrebbel · on Oct 30, 2019

No, it merges a copy-on-write clone of the debuggee into yourself. That's quite different and, indeed, you can do similar things with it that a debugger could.

If I understand this right,the process being unforked into you won't notice a thing and will happily chime on.

saagarjha · on Oct 30, 2019

> No, it merges a copy-on-write clone of the debugger into yourself.

But you are the debugger…

> If I understand this right,the process being unforked into you won't notice a thing and will happily chime on.

Right, whereas when running an actual debugger you need to deal with signals and making sure you don't touch memory.

skrebbel · on Oct 30, 2019

> But you are the debugger…

Ah thanks. My autocomplete didn't like the word "debuggee". Edited!

ignoramous · on Oct 30, 2019

This should go in the FAQ! Thanks for the concise explainer for folks like me who aren't familiar with the domain.

friend-monoid · on Oct 30, 2019

The inverse of fork is called join, right?

hirundo · on Oct 30, 2019

In the git context it's a merge. But Unfork() seems more like a rebase.

louiz · on Oct 30, 2019

In the git context, fork doesn’t really mean anything. That’s just a github thing.

acoye · on Oct 30, 2019

You could build an entire new array of malware with this :D

whateveracct · on Oct 30, 2019

userfaultfd is an extremely intriguing hammer :)

fsfod · on Oct 30, 2019

The write protected mode[1], if it ever gets merged could have some interesting uses for GCs.

[1] https://lore.kernel.org/patchwork/cover/1033856/

sanxiyn · on Oct 30, 2019

I first learned about userfaultfd's utility to GC from https://medium.com/@MartinCracauer/generational-garbage-coll...

whateveracct · on Oct 30, 2019

I could imagine it being interesting for VMs and DBs too. Imagine a VM whose memory "looks" like virtual memory but under the hood is transparently persisted between invocations.

fsfod · on Oct 30, 2019

That's kind of one of the main uses of the API so far by QEMU for live migration of VMs by streaming memory on demand over a network https://wiki.qemu.org/Features/PostCopyLiveMigration

khaki54 · on Oct 31, 2019

Seems like this would be useful for re-attaching to a shell or process you have either disowned or otherwise lost control of. Would be neat to see some common use cases on the FAQ page.

equalunique · on Oct 31, 2019

Is it just me, or is this something that would make Linux even more vulnerable to cyber attacks? What protections are there? Would OpenBSD's pledge prevent something like this?

CriticalCathed · on Oct 30, 2019

This is a hack.

I like it.

Refreshing.

keeganpoppen · on Nov 6, 2019

if only "hackernews" had more hacks like this xd

pacman128 · on Oct 30, 2019

Since it brings two processes together, maybe spoon would be a better name.

gjm11 · on Oct 30, 2019

Or, along the same lines, another four-letter word sharing two of its letters with _fork_. But that might be too distracting.

deckar01 · on Oct 30, 2019

keanebean86 · on Oct 30, 2019

I was thinking Fnnl (pronounced funnel)

But that's probably the name of a startup and would confuse people.

tendencydriven · on Oct 30, 2019

I don't know why spoon would be a better name, but I massively approve of it

greenshackle2 · on Oct 30, 2019

Because merging two address spaces together is akin to the act of spooning.

sitkack · on Oct 30, 2019

Whitequark is a Wizard. They should team up with Sammy.

ignoramous · on Oct 30, 2019

Sorry for being an ignorant: Who's Sammy? Surely not: https://news.ycombinator.com/user?id=sammy ?

sitkack · on Oct 31, 2019

This Wizard https://www.samy.pl/

rurban · on Oct 30, 2019

Whitequark is better than Sammy in SW. He's also the maintainer of SolveSpace. https://m-labs.hk/software/solvespace/

saagarjha · on Oct 30, 2019

According to her Twitter, Whitequark prefers feminine pronouns: https://twitter.com/whitequark

felipelemos · on Oct 30, 2019

By the parent comment I thought Withequark was a group/team of people.

English is also not my native language.

progval · on Oct 30, 2019

"They" can mean either a group of people, or a single person of unknown gender (or neutral gender).

balnaphone · on Oct 30, 2019

FYI, whitequark is a woman.

psaux · on Oct 30, 2019

Sounds like alternatives for Git :) But seriously, need to read more on this.

woodrowbarlow · on Oct 30, 2019

alternative for git? i don't follow. can you elaborate?

isatty · on Oct 30, 2019

I think they just read the title and assumed its a repository fork? No clue either.

psaux · on Oct 30, 2019

Yes, apologies just having fun. At first glance, I thought it looked like Linux Alternatives.