🐳 Docker Study 3rd [ENG]
Optimizing Kaggle Python Docker for Data Analysis
by Arielle
1. Setting up a Python Analysis Environment
Kaggle provides a ‘Kaggle GPU Image’, a Docker image optimized for GPU-based machine learning. This image includes popular deep learning frameworks like TensorFlow and PyTorch along with essential libraries, making it convenient for AI development. By default, Kaggle GPU is based on Tesla T4 GPU hardware, which we will verify using code later.
To begin, search ‘kaggle python docker image’ on GitHub, where you can find the repository. The README file allows you to choose between CPU-only and GPU versions. From experience, selecting GPU is a better long-term choice for flexibility, especially if using Dev Container Extension, which generates .json
config files for further customization.
Clicking on GPU leads to the following directory, where Kaggle organizes its GPU images by version. As of October 11, 2024, the latest available version is v153, which I pulled.
💻 Related Commands
💻 Related Commands
✅ ssh -i <~/.ssh/id_rsa.pem>
- Connecting Virtual Machine Server
✅ sudo usermod -aG docker
- Add a specific user to the Docker group on this system
- The usermod command modifies user attributes, and the -aG option adds the user to a new group. (In a team project, this command can be used to add a teammate to the container.)
✅ docker pull
- Download a Docker image from Docker Hub or another image repository!
2. Setting Up a Remote Development Environment
2.1 Connecting to a VM via Remote Method
Once connected to the virtual environment, I also used Remote Connection via a new window. (See ‘Docker Study 2nd’)
🛜 Connection Steps:
Remote Window → Connect to Tunnel → GitHub → kaggle-linux-gpu-vm → Create ‘kaggle-python-gpu-env’ Folder → Generate ‘Dockerfile’ (Docker Intelligence should be auto-applied!)
Inside Dockerfile, I set the base image:
FROM gcr.io/kaggle-gpu-images/python:v153
# Install additional dependencies
RUN pip install yfinance
To verify, I ran:
sudo docker images
This confirmed that v153
was already available in my system.
2.2 Setting Up Dev Container Extension
Ensure Dev Container Extension and Remote Tunnel Extension are installed on both the VM and local machine.
# Change directory to the project folder
cd kaggle-python-gpu-env
Then, select ‘Add Dev Container Configuration Files…’ under Editing on kaggle-linux-gpu-vm (bottom left remote button).
Choose ‘From Dockerfile’.
Click OK to complete the setup.
2.3 Differences Between Dockerfile and devcontainer.json
When using Dev Container Extension, a .json
file is automatically generated. But how is this different from Dockerfile
?
📂 Dockerfile
FROM python:3.9
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
Dockerfile is like a recipe that defines the environment needed to run an application. It sets up:
- Base image
- Dependencies (e.g., libraries, packages)
- Execution commands (what runs when the container starts)
📂 devcontainer.json
{
"name": "Python Dev Container",
"dockerFile": "./Dockerfile",
"settings": {
"python.pythonPath": "/usr/local/bin/python"
},
"extensions": [
"ms-python.python",
"ms-toolsai.jupyter"
],
"forwardPorts": [5000],
"postCreateCommand": "pip install -r requirements.txt"
}
devcontainer.json is used to configure the development environment, not the application runtime. It handles:
- Editor settings (e.g., VS Code extensions)
- Port forwarding
- Post-setup commands
📌 Summary Table
Feature | Dockerfile | devcontainer.json |
---|---|---|
Purpose | Defines the runtime environment | Configures development settings |
Usage | Used in production, testing, and development | Mainly for development in VS Code |
Settings | Base image, dependencies, execution commands | Extensions, ports, additional commands |
Most of the time, devcontainer.json references Dockerfile, similar to how a storefront’s layout might change, but the core product (Docker image) remains the same.
To add extensions, go to VS Code Extensions → Copy Extension ID → Paste in devcontainer.json.
🔍 Key Takeaways
- Dockerfile defines the runtime environment.
- devcontainer.json customizes the development experience.
- Dev Containers help streamline Python data analysis with Kaggle GPU Docker.
For further insights, see my upcoming benchmark comparison post! 🚀
Reference
Subscribe via RSS