-
7 December 2020
- Cloud Computing
Increasing memory in GCP AI Notebook JupyterLab settings
As a regular Jupyter user, you might encounter out-of-memory errors. They are not so straightforward in JupyterLab to notice, and because of that it might cause a lot of trouble. This article shows how to notice and deal with this issue.
Jupyter (in the form of JupyterLab, or AI Notebook in Google Cloud Platform) and Python (with packages like pandas and scikit-learn) is a duo that is very often used by lots of Data Scientists, Data Engineers, and Analysts. Its best advantages are easiness of setup, reproducibility and plug-and-play usage. In some cases, however, this setup might cause us troubles, to be exact, in the form of out-of-memory errors.
When does this problem happen?
JupyterLab (or AI Notebook) in GCP by default is set with a maximum of 3.2GB of memory. If the amount of data loaded into memory (i.e. by RAM) is large, the Jupyter Kernel will “die”. By dying, we mean a termination of its process, and as a result, loss of all data and variables that were calculated and stored in memory. This causes a lot of trouble for us: it enforces the recalculation of the whole notebook, which might take a lot of time before trying another way to perform a calculation. Fortunately for you, we got a solution for that. You may also visit our Cloud Computing Consulting Services to find out how we can help your company.
How to identify that Jupyter is having an out-of-memory error, and the kernel is dying.
By default Jupyter does not log all executions. In my experience, setting up proper logging requires some initial work during setup of the instance, and most of the times these logs are not used by us. In this scenario, we will rely on Jupyter’s behaviour and messages.
When using the traditional Jupyter instance, i.e. Jupyter Notebook (used also as JupyterHub), the problem is easier to identify. We will see a pop-up message with the following contents: “Kernel restarting: The kernel appears to have died. It will restart automatically.”. See a screenshot below.
In case of JupyterLab, it is a bit trickier to identify. We will see that the kernel is busy (displayed as a “full circle” in the upper right of the notebook UI, and after some time, without no response, the kernel will appear to be idle (empty circle). The cell that we executed will have no output number and no output, and when executing further cells, we will see that we lost all the variables that we created and calculated prior to that. This indicates that the kernel died.
If you import large amounts of data (in our case, this was over 500MB), you might expect this error to occur. Here is a solution for that.
Increasing memory in Jupyter, and therefore solving the problem
In order to increase available memory for Jupyter, first of all ensure that you have a proper amount of memory in your machine. In GCP this is fairly easy via the AI Notebooks page, by picking the proper-sized machine type.
After ensuring that you have a proper amount of memory, open your Jupyter instance and Open a Terminal window.
Enter and submit the following command: sudo nano /lib/systemd/system/jupyter.service
This will open a text editor of the Jupyter as a service settings (Jupyter in GCP AI Notebooks is installed as a service on a virtual machine). The lines we are interested in are the two red bolded ones, that is MemoryHigh and MemoryMax.
[Unit] Description=Jupyter Notebook [Service] Type=simple PIDFile=/run/jupyter.pid CPUQuota=97% MemoryHigh=3533868160 MemoryMax=3583868160 ExecStart=/bin/bash --login -c '/opt/conda/bin/jupyter lab --config=/home/jupyter/.jupyter/jupyter_notebook_config.py' User=jupyter Group=jupyter WorkingDirectory=/home/jupyter Restart=always [Install] WantedBy=multi-user.target
The default memory values in Jupyter are around 3.2GB in bytes. In order to calculate the amount of GB you need to bytes, you need to multiply your desired amount by 1073741824. For example, if you want to set it to 16GB, your value to be set would be 17179869184.
Substitute the MemoryHigh and MemoryMax by your calculated value. Exit the editor ensuring that you saved the changes.
After that, restart the Jupyter service by submitting the following command: service jupyter restart, and finally restart the machine (go to AI Notebooks in GCP and shut down, then start the AI Notebook instance).
Voila, you’re good to go. The amount of available memory for Jupyter should be larger by now, and you should be free of more memory problems).
Visit our blog for more in-depth articles on Cloud Computing: