Background
Recently, I found an interesting effect. I have some pods which use flexVolume,
and after pods are Running, I’ll put some readOnly files in this volume
directory. The intersting part is that these files will become writable every
time after kubelet being restarted.
1
2
3
4
5
6
|
-rw-r--r-- 1 root 2000 0 Sep 27 11:03 test1
---------- 1 root 2000 0 Sep 27 11:03 test2
-r----x--- 1 root 2000 0 Sep 27 11:03 test3
---x--x--x 1 root 2000 0 Sep 27 11:03 test4
-r--r----- 1 root 2000 0 Sep 27 11:03 test5
-rw-rw---- 1 root 2000 0 Sep 27 11:05 test6
|
1
2
3
4
5
6
|
-rw-rw-r-- 1 root 2000 0 Sep 27 11:03 test1
-rw-rw---- 1 root 2000 0 Sep 27 11:03 test2
-rw-rwx--- 1 root 2000 0 Sep 27 11:03 test3
-rwxrwx--x 1 root 2000 0 Sep 27 11:03 test4
-rw-rw---- 1 root 2000 0 Sep 27 11:03 test5
-rw-rw---- 1 root 2000 0 Sep 27 11:05 test6
|
1
2
3
4
5
6
7
8
|
File: ‘test1’
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: fd01h/64769d Inode: 404813664 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 2000/ UNKNOWN)
Access: 2019-09-27 11:03:57.045523223 +0800
Modify: 2019-09-27 11:03:57.045523223 +0800
Change: 2019-09-27 11:07:21.387847127 +0800
Birth: -
|
Trouble Shooting
I failed finding something valuable after tracking kubelet logs. So I decide to
search the root cause in kubelet source code.
Fortunately, I found the code where kubelet does change the files’ permissions.
pkg/volume/flexvolume/mounter.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
|
// SetUpAt creates new directory.
func (f *flexVolumeMounter) SetUpAt(dir string, fsGroup *int64) error {
// Mount only once.
alreadyMounted, err := prepareForMount(f.mounter, dir)
if err != nil {
return err
}
if alreadyMounted {
return nil
}
call := f.plugin.NewDriverCall(mountCmd)
// Interface parameters
call.Append(dir)
extraOptions := make(map[string]string)
// pod metadata
extraOptions[optionKeyPodName] = f.podName
extraOptions[optionKeyPodNamespace] = f.podNamespace
extraOptions[optionKeyPodUID] = string(f.podUID)
// service account metadata
extraOptions[optionKeyServiceAccountName] = f.podServiceAccountName
// Extract secret and pass it as options.
if err := addSecretsToOptions(extraOptions, f.spec, f.podNamespace, f.driverName, f.plugin.host); err != nil {
os.Remove(dir)
return err
}
// Implicit parameters
if fsGroup != nil {
extraOptions[optionFSGroup] = strconv.FormatInt(int64(*fsGroup), 10)
}
call.AppendSpec(f.spec, f.plugin.host, extraOptions)
_, err = call.Run()
if isCmdNotSupportedErr(err) {
err = (*mounterDefaults)(f).SetUpAt(dir, fsGroup)
}
if err != nil {
os.Remove(dir)
return err
}
if !f.readOnly {
if f.plugin.capabilities.FSGroup {
volume.SetVolumeOwnership(f, fsGroup)
}
}
return nil
}
|
The key point is on line N. 51, which will SetVolumeOwnership
when volume is not readOnly
and pod seted FSGroup
.
In function SetVolumeOwnership
, we can see that kubelet modifies file
permissions by OR 0440
.
pkg/volume/volume_linux.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
const (
rwMask = os.FileMode(0660)
roMask = os.FileMode(0440)
)
// SetVolumeOwnership modifies the given volume to be owned by
// fsGroup, and sets SetGid so that newly created files are owned by
// fsGroup. If fsGroup is nil nothing is done.
func SetVolumeOwnership(mounter Mounter, fsGroup *int64) error {
if fsGroup == nil {
return nil
}
return filepath.Walk(mounter.GetPath(), func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
// chown and chmod pass through to the underlying file for symlinks.
// Symlinks have a mode of 777 but this really doesn't mean anything.
// The permissions of the underlying file are what matter.
// However, if one reads the mode of a symlink then chmods the symlink
// with that mode, it changes the mode of the underlying file, overridden
// the defaultMode and permissions initialized by the volume plugin, which
// is not what we want; thus, we skip chown/chmod for symlinks.
if info.Mode()&os.ModeSymlink != 0 {
return nil
}
stat, ok := info.Sys().(*syscall.Stat_t)
if !ok {
return nil
}
if stat == nil {
klog.Errorf("Got nil stat_t for path %v while setting ownership of volume", path)
return nil
}
err = os.Chown(path, int(stat.Uid), int(*fsGroup))
if err != nil {
klog.Errorf("Chown failed on %v: %v", path, err)
}
mask := rwMask
if mounter.GetAttributes().ReadOnly {
mask = roMask
}
if info.IsDir() {
mask |= os.ModeSetgid
}
err = os.Chmod(path, info.Mode()|mask)
if err != nil {
klog.Errorf("Chmod failed on %v: %v", path, err)
}
return nil
})
}
|
And in PodSecurityContext
struct, we can find the comments which already point
out this case.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
type PodSecurityContext struct {
...
// A special supplemental group that applies to all containers in a pod.
// Some volume types allow the Kubelet to change the ownership of that volume
// to be owned by the pod:
//
// 1. The owning GID will be the FSGroup
// 2. The setgid bit is set (new files created in the volume will be owned by FSGroup)
// 3. The permission bits are OR'd with rw-rw----
//
// If unset, the Kubelet will not modify the ownership and permissions of any volume.
// +optional
FSGroup *int64
}
|
And as it says, Some volume types allow the Kubelet to change the ownership of
that volume, and I found hostPath
won’t change the ownership of volume.
Why it’s designed like this? I found the design proposal(Proposal Volume Plugins and Idempotency),
but still, it didn’t explain what’s the point to OR 0440
to all files.
Ref