Skip to the content.

Run shell cmd with double asterisks(**) in Python2

Background

Double asterisks wildcard is super useful in shell command for lazy programmers.

Suppose that we need to grep some strings from all *.txt in such directory:

.
├── lv0file.txt
└── lv1dir
    ├── lv1file.txt
    └── lv2dir
        └── lv2file.txt

2 directories, 3 files

A very intuitive grep with ** is

>>> grep 'search_me' **/*.txt
lv0file.txt:search_me
lv1dir/lv1file.txt:search_me
lv1dir/lv2dir/lv2file.txt:search_me

Otherwise we need to set -r and appoint –include=’*.txt’.

Now what’s the problem

Simply speaking, ** not working in shell command run by Python2’s subprocess (or os.system())

In [7]: p = subprocess.Popen("grep -H 'search_me' **/*.txt", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [8]: output = p.stdout.readlines()

In [9]: output
Out[9]: ['lv1dir/lv1file.txt:search_me\n']

## os.system
In [15]: os.system("grep -H 'search_me' **/*.txt")
lv1dir/lv1file.txt:search_me
Out[15]: 0

Seems the ** were treated as a single asterisk so only lv1dir/lv1file.txt was scanned. Till now I cannot find a good way to make it produce expected result.

Many people would suggest use glob to generate file list then pass them to Popen. Unfortunately, it doesn’t work neither.

In [29]: glob.glob('**/*.txt')
Out[29]: ['lv1dir/lv1file.txt']

And if you google it, a known issue about this could be found. Some people raised it as a feature request and from the issue status it’s claimed already been fixed.

So is the problem solved?

Answer is partially.

For glob, as said in the issue is already fixed in Python 3.5. Some quick results:

In [18]: glob.glob('**/*.txt')
Out[18]: ['lv1dir/lv1file.txt']

In [19]: glob.glob('*.txt', recursive=True)
Out[19]: ['lv0file.txt']

In [21]: glob.glob('**/*.txt', recursive=True)
Out[21]: ['lv0file.txt', 'lv1dir/lv1file.txt', 'lv1dir/lv2dir/lv2file.txt']

Notice the last run, glob returns the expected pathes with ** and recursive=True both. In python3 glob doc, it describes as below

If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.

But how about if we don’t want to use glob? I just want to see the same result from subprocess as it from my shell.

I have no answer for this. The only way I can come up with is to use recursive option in the command (if it supports)

In [28]: p = subprocess.Popen("grep -rH 'search_me' . --include='*.txt'", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [29]: output = p.stdout.readlines()

In [30]: output
Out[30]:
[b'./lv0file.txt:search_me\n',
 b'./lv1dir/lv1file.txt:search_me\n',
 b'./lv1dir/lv2dir/lv2file.txt:search_me\n']

¯\(ツ)

Written on June 15, 2017