Containerizing Python Script and Scaling with Selenium Grid

#devops #python_script

Step 1: Containerize the Python script

  1. Create a Dockerfile:

    • Create a file named Dockerfile in the directory where your Python script (script.py) is located.

    • The Dockerfile contains instructions for building the Docker image.

  2. Dockerfile Contents: Here's the content of the Dockerfile:

# Use an official Python runtime as a base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Install necessary system dependencies
RUN apt-get update && apt-get install -yq wget gnupg

# Copy the Python script to the working directory
COPY script.py .

# Install Python dependencies
RUN pip install selenium pymongo

# Command to run the script
CMD ["python", "script.py"]

Step 2: Set up a Selenium Grid environment

  1. Docker Compose File: Create a docker-compose.yml file in the same directory as your Dockerfile.

  2. Docker Compose Contents: The docker-compose.yml defines services for the Selenium Hub and Chrome Node:

version: "3"
services:
  HubService:
    image: selenium/hub:4.0.0-rc-2-20210930
    container_name: seleniumHub
    ports:
      - "4445:4444"
      - "4442:4442"
      - "4443:4443"

  ChromeService:
    image: selenium/node-chrome:4.0.0-rc-2-20210930
    shm_size: "2gb"
    ports:
      - "5900"
      - "7900"
    environment:
      - SE_EVENT_BUS_HOST=seleniumHub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_MAX_SESSIONS=3
    depends_on:
      - HubService

Step 3: Build and test the Docker image locally

  1. Build the Docker image: Open a terminal in the directory containing the Dockerfile and run the following command:
docker build -t scraper .
  1. Test the Docker image locally: After the image is built, you can run a container using the following command:
docker run scraper

This command should execute your Python script inside the container and display the scraped quotes and authors in the terminal.

Step 4: Set up an Amazon ECS cluster

  1. Amazon ECS Dashboard: Log in to your AWS Management Console, navigate to the ECS service, and click on "Clusters" in the left-hand navigation pane.

  2. Create Cluster: Click the "Create Cluster" button and select the cluster template "EC2 Linux + Networking".

  3. Configure Cluster: Provide a name for your ECS cluster, choose an EC2 instance type, specify the number of instances, and select the appropriate VPC and subnets for the cluster.

  4. Configure Security Group: Create a new security group or use an existing one to control inbound and outbound traffic for the ECS instances.

  5. Create Cluster: Review the configuration and click "Create" to create the ECS cluster.

Step 5: Deploy the Docker containerized Python script to AWS ECS

  1. Push the Docker image to ECR: Follow the steps in the previous responses to tag and push the Docker image to Amazon ECR.

  2. Create an ECS Task Definition: In the ECS dashboard, create a new task definition. Configure the task definition to use the container image from the ECR repository. Define any necessary environment variables.

  3. Create an ECS Service: Create a new service using the task definition you just created. Configure the desired number of tasks (containers) to be run concurrently, select the cluster you created earlier, and configure any auto-scaling settings if needed.

  4. Start the ECS Service: The ECS service will now start deploying the scraper containers according to the specified task definition and scaling settings. The containers will automatically start scraping data from the website.

Step 6: Access and monitor the scraped data

  1. Access MongoDB Atlas: Log in to your MongoDB Atlas account and access the database where the scraped data is being stored. You can view the data in the mcs_assignment database and quotes collection.

  2. Monitoring ECS Service: Monitor the status and health of your ECS Service through the AWS Management Console or use AWS CLI commands to get information about your service.

Step 7: Making modifications and troubleshooting

  1. Modifying the Python Script: If you need to make changes to the Python script, update the script.py file on your local machine, rebuild the Docker image, and push the updated image to ECR. Then, update the ECS Task Definition to use the latest image.

  2. Troubleshooting: If any issues arise, you can check the logs of individual containers using the docker logs command on your local machine. For AWS ECS, you can view the container logs through the AWS Management Console or using the AWS CLI.