MediaPipe on Windows: A Tale of Woe

Ritvik Mandyam
5 min readJan 11, 2021

Introduction

One of the big challenges of using AI, and particularly deep learning, in production is the immense amount of processing power that the average deep learning-based model requires in order to run. For instance, the current state-of-the-art model for image classification is called (with no apparent sense of irony) EfficientNet. This model has 480 million parameters. Processing a single image on the average desktop CPU using such a model would take several seconds. This latency is just fine for a research model. In fact, such a large model would typically be run on a GPU, increasing throughput several times over. Challenges begin to arise when you want to use deep learning models on more resource-constrained environments where GPUs are not available. For example, I was recently working on a pipeline that used several neural networks. The pipeline itself was nothing particularly fancy — face detection and recognition, gaze tracking, object detection plus a few other bells and whistles for pre- and post-processing. The issue was, I had to be able to run this pipeline on Intel Atom machines from around a decade ago. These machines did not have GPUs, and internet access was too unreliable for me to stream data to the cloud for processing. As if that wasn’t bad enough, the machines ran Windows. Running a deep learning pipeline on a bare-metal Linux machine without a Docker container keeping things clean is painful enough as it is, but getting it to run on Windows is basically rocket surgery. After many experiments with various models, I found that I could achieve one or two frames per second at BEST with my pipeline in TensorFlow. This was nowhere near fast enough, so I said a hopeless prayer to Yoshua Bengio and dove into MediaPipe.

What the heck is a MediaPipe?

MediaPipe is Google’s offering for real-time, cross-platform media processing using machine learning. It includes several incredibly fast models purpose-built for resource-constrained environments. There are pre-trained models available for human pose estimation, iris tracking, hair segmentation, object detection and several other tasks. Unfortunately, though, MediaPipe’s Windows support — at least on bare-metal — is experimental. It works extremely well once you get it set up and squeeze a compiled binary out of it, but getting it to compile on Windows without using WSL is… Challenging. After a day of cursing out everyone at Google from Sundar Pichai downwards, I finally managed to get it working. In this post, I want to try and document some of the stumbling blocks I hit.

Setting Up

Note: For this section, I’ll be using “XX” to represent values which may change based on versions/environments.

MediaPipe requires several additional pieces of tooling to be installed on Windows before you can compile it. The first of these is MSYS2, a set of tools allowing you to build and run native code more easily on Windows. This one is reasonably straightforward to install — download the installer, run it and let it set itself up. One thing to note, though, is that the path to the folder you install MSYS2 into cannot contain any spaces, accents or non-ASCII characters. It also cannot be a symlink or a network drive. Most of those are not really huge issues, but the no-spaces requirement means that the usual C:\Program Files is completely out of the question as an installation directory. I would recommend just going with the installer’s default choice of C:\msysXX if you have admin rights. Once the installation is done, add C:\msysXX\usr\bin (or whatever install directory you chose) to your %PATH% variable. Then, pull up command prompt/powershell and run

pacman -S git patch unzip

This will install the packages you need to compile MediaPipe.

Up next, we need to install Python. This, too, is reasonably straightforward, but you do need to make sure you allow the installer to modify your %PATH% variable. My install uses Python 3.8.7, I haven’t tested any of the other versions yet.

Next, we need the Visual C++ Build Tools. This part is rather finicky, so I would suggest paying particularly close attention while doing this step. First, go to this link, download and install the build tools installer. Once it’s done, open it up and let it load. Under the “Workloads” tab, find “C++ build tools” and select it. Under the optional components section, select “MSVC vXXX — VS 2019 C++ x64/x86 build tools”. Then allow them to install. Do not, repeat, DO NOT install the Windows 10 SDK from here. Seriously, don’t do it, it will almost certainly break things. Instead, go here, download the standalone installer and install.

Once the Build Tools are installed, grab bazel version 3.4 or higher from here. There isn’t really much to install here, it’s just an executable that you can use. However, to keep things simple, I would recommend placing this executable somewhere without spaces or non-ASCII characters in the path as well — personally, I like C:\development. Once you’re done, add the directory containing the bazel executable to your %PATH% variable. Then, set the following variables for bazel:

set BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools

set BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC

The official documentation suggests that you manually set variables specifying the WinSDK and Visual C++ versions. I would recommend against doing this, because setting those two variables consistently broke my install. Bazel will automatically figure out the versions of these tools to use from the variables we already set.

Next, grab OpenCV and install version 3.4.10. You can install it to wherever you want, but MediaPipe’s default configuration expects it to be at C:\opencv, so for simplicity’s sake I suggest installing it there.

We’re almost done now — checkout the MediaPipe git repository using

git clone https://github.com/google/mediapipe.git

Once again, I’d suggest following the no-spaces, ASCII-only paths rule. For my installation, I dumped the MediaPipe repository into C:\development, too.

Next, a crucially important step — go into your Python installation directory (C:\Program Files\Python38 for me) and copy “python.exe” to a directory that follows the no-spaces, ASCII-only rule. Once again, I used “C:\development”.

At long last, we’re done! You should now be able to cd into the MediaPipe repository and build the examples. Some of them have various bugs which need to be patched in order to run on Windows, but the “Hello World” program should let us get our dependencies set up and make sure everything is working correctly. Enter the MediaPipe repository and run

bazel build -c opt — define MEDIAPIPE_DISABLE_GPU=1 — action_env PYTHON_BIN_PATH=”<wherever you copied python.exe to>” mediapipe/examples/desktop/hello_world

With a little luck, everything will compile fine. Then, run

set GLOG_logtostderr=1

to get it to print output to the console. After that, you can test your compiled executable by running

bazel-bin\mediapipe\examples\desktop\hello_world\hello_world.exe

You should see it print the words, “Hello World!” to console a bunch of times.

I really hope this article makes it a little easier for you to get up and running with MediaPipe than it was for me. If it did, maybe leave me a clap so I can inflate my big head just a little bit more.

--

--