Mount Namespace
[AD REMOVED]
Basic Information
A mount namespace is a Linux kernel feature that provides isolation of the file system mount points seen by a group of processes. Each mount namespace has its own set of file system mount points, and changes to the mount points in one namespace do not affect other namespaces. This means that processes running in different mount namespaces can have different views of the file system hierarchy.
Mount namespaces are particularly useful in containerization, where each container should have its own file system and configuration, isolated from other containers and the host system.
How it works:
- When a new mount namespace is created, it is initialized with a copy of the mount points from its parent namespace. This means that, at creation, the new namespace shares the same view of the file system as its parent. However, any subsequent changes to the mount points within the namespace will not affect the parent or other namespaces.
- When a process modifies a mount point within its namespace, such as mounting or unmounting a file system, the change is local to that namespace and does not affect other namespaces. This allows each namespace to have its own independent file system hierarchy.
- Processes can move between namespaces using the
setns()
system call, or create new namespaces using theunshare()
orclone()
system calls with theCLONE_NEWNS
flag. When a process moves to a new namespace or creates one, it will start using the mount points associated with that namespace. - File descriptors and inodes are shared across namespaces, meaning that if a process in one namespace has an open file descriptor pointing to a file, it can pass that file descriptor to a process in another namespace, and both processes will access the same file. However, the file's path may not be the same in both namespaces due to differences in mount points.
Lab:
Create different Namespaces
CLI
By mounting a new instance of the /proc
filesystem if you use the param --mount-proc
, you ensure that the new mount namespace has an accurate and isolated view of the process information specific to that namespace.
Error: bash: fork: Cannot allocate memory
When `unshare` is executed without the `-f` option, an error is encountered due to the way Linux handles new PID (Process ID) namespaces. The key details and the solution are outlined below: 1. **Problem Explanation**: - The Linux kernel allows a process to create new namespaces using the `unshare` system call. However, the process that initiates the creation of a new PID namespace (referred to as the "unshare" process) does not enter the new namespace; only its child processes do. - Running `%unshare -p /bin/bash%` starts `/bin/bash` in the same process as `unshare`. Consequently, `/bin/bash` and its child processes are in the original PID namespace. - The first child process of `/bin/bash` in the new namespace becomes PID 1. When this process exits, it triggers the cleanup of the namespace if there are no other processes, as PID 1 has the special role of adopting orphan processes. The Linux kernel will then disable PID allocation in that namespace. 2. **Consequence**: - The exit of PID 1 in a new namespace leads to the cleaning of the `PIDNS_HASH_ADDING` flag. This results in the `alloc_pid` function failing to allocate a new PID when creating a new process, producing the "Cannot allocate memory" error. 3. **Solution**: - The issue can be resolved by using the `-f` option with `unshare`. This option makes `unshare` fork a new process after creating the new PID namespace. - Executing `%unshare -fp /bin/bash%` ensures that the `unshare` command itself becomes PID 1 in the new namespace. `/bin/bash` and its child processes are then safely contained within this new namespace, preventing the premature exit of PID 1 and allowing normal PID allocation. By ensuring that `unshare` runs with the `-f` flag, the new PID namespace is correctly maintained, allowing `/bin/bash` and its sub-processes to operate without encountering the memory allocation error.Docker
Check which namespace is your process in
ls -l /proc/self/ns/mnt
lrwxrwxrwx 1 root root 0 Apr 4 20:30 /proc/self/ns/mnt -> 'mnt:[4026531841]'
Find all Mount namespaces
sudo find /proc -maxdepth 3 -type l -name mnt -exec readlink {} \; 2>/dev/null | sort -u
# Find the processes with an specific namespace
sudo find /proc -maxdepth 3 -type l -name mnt -exec ls -l {} \; 2>/dev/null | grep <ns-number>
Enter inside a Mount namespace
Also, you can only enter in another process namespace if you are root. And you cannot enter in other namespace without a descriptor pointing to it (like /proc/self/ns/mnt
).
Because new mounts are only accessible within the namespace it's possible that a namespace contains sensitive information that can only be accessible from it.
Mount something
# Generate new mount ns
unshare -m /bin/bash
mkdir /tmp/mount_ns_example
mount -t tmpfs tmpfs /tmp/mount_ns_example
mount | grep tmpfs # "tmpfs on /tmp/mount_ns_example"
echo test > /tmp/mount_ns_example/test
ls /tmp/mount_ns_example/test # Exists
# From the host
mount | grep tmpfs # Cannot see "tmpfs on /tmp/mount_ns_example"
ls /tmp/mount_ns_example/test # Doesn't exist
# findmnt # List existing mounts
TARGET SOURCE FSTYPE OPTIONS
/ /dev/mapper/web05--vg-root
# unshare --mount # run a shell in a new mount namespace
# mount --bind /usr/bin/ /mnt/
# ls /mnt/cp
/mnt/cp
# exit # exit the shell, and hence the mount namespace
# ls /mnt/cp
ls: cannot access '/mnt/cp': No such file or directory
## Notice there's different files in /tmp
# ls /tmp
revshell.elf
# ls /mnt/tmp
krb5cc_75401103_X5yEyy
systemd-private-3d87c249e8a84451994ad692609cd4b6-apache2.service-77w9dT
systemd-private-3d87c249e8a84451994ad692609cd4b6-systemd-resolved.service-RnMUhT
systemd-private-3d87c249e8a84451994ad692609cd4b6-systemd-timesyncd.service-FAnDql
vmware-root_662-2689143848
References
- https://stackoverflow.com/questions/44666700/unshare-pid-bin-bash-fork-cannot-allocate-memory
- https://unix.stackexchange.com/questions/464033/understanding-how-mount-namespaces-work-in-linux
[AD REMOVED]