Let's automate this Python script with Google Cloud Run.
Step 1: Create a new Google Cloud Project¶
Create a new Google Cloud Project ->
What's a Google Cloud Project?
A Google Cloud Project is a top-level container for your work. In other words, it's just a high layer of abstraction for "a thing that does stuff". To give you a better idea..
- Users are created and managed within a project.
- APIs and tools are enabled with a project.
- I create and manage one project for each of my businesses.
- It's common to have a single billing account for a single project, but occassionally you'll see one billing account cover the expenses for multiple projects.
- If you're creating more than one project per day or fewer than two projects per year, you're probably doing it wrong.
There's no hard and fast rules to this stuff. Just get your hands dirty and improve over time.
Still lost? Follow the docs here ->
Step 2: Prep the source code on your local machine¶
Mimic this project structure:
The two files (main.py
and Procfile
) should have the following contents 👇
What's Procfile
?
Procfile
is a configuration file that tells Google the command to execute when the application starts. In our example Procfile
above,
Step 3: Deploy the code to create a cloud run job¶
There are multiple ways to do this, but the easiest is to use gcloud run jobs deploy
. This assumes you've installed the gcloud CLI
.
my-job
under project my-project
in the region us-central1
. You'll want to tweak these parameters to suit your needs.Running this command in your local terminal should produce some output like this 👇
Step 4: Execute the newly created job¶
The first time you execute a cloud run job, I recommend doing it from the console.
- Go to https://console.cloud.google.com/run/jobs (Make sure you're in the correct project.)
- Click the job you want to run
- Click the EXECUTE button. You may also click Execute with overrides to see the run-time options you can modify like Number of tasks and Environment variables.
Once comfortable running jobs from the console, try running your job from the command line using gcloud run jobs execute
.
Running this command should output something like this 👇
Each execution of the job gets a unique Execution ID. You can see these in the job details page.
Click on an execution id to bring up the execution details page.
The execution details include the status of the job as well as the job logs. Notice the "Hello World" line in the logs! 😀
What's a task and why would I need more than one?¶
You may have noticed that Cloud Run has the concept of tasks. A single cloud run job can have one or more (up to 10,000) tasks. Tasks run in parellel, each on their own instance. They're a way of breaking up up the work for a big job into smaller pieces. For example,
- If your cloud run job emails a list of customers, you might dedicate one task per email address
- If your cloud run job copies data from one database to another, you might dedicate one task per table
- If your cloud run job fetches data from an API, you might dedicate one task per API call
The important thing to keep in mind is that tasks cannot talk to each other. So, the work a task does should (ideally) be independent of all other tasks.
You can use built-in environment variables to identify a task
CLOUD_RUN_TASK_INDEX
: the task indexCLOUD_RUN_TASK_COUNT
: the total number of tasks for this job execution (via the--tasks
parameter)CLOUD_RUN_TASK_ATTEMPT
: the nth attempted time this task has been tried, presuming previous tries have failed
How do I change the number of tasks?¶
Use the --tasks
parameter in the deploy
command.
How do I update an existing job?¶
If you make a change to an existing job's source code, you can update the job using gcloud run jobs update
.
How do I use a specific version of Python?¶
By default, Google will use the latest stable version of the Python interpreter.1 You can specify a particular Python version by including a .python-version
file in your application's root directory.
How do I specify dependent packages to be installed?¶
Use a requirements.txt
file in your application's root directory to specify dependencies for your application. For example,
The specified dependencies will be installed via pip.2
How do I incorporate environment variables?¶
You can use the set-env-vars
parameter in the deploy
command.
How do I schedule a job to run on a recurring basis?¶
You can schedule a cloud run job to execute on a recurring basis.
-
In the Google Cloud Console, go to the job details page
-
Click on the TRIGGERS tab
-
Click ADD SCHEDULER TRIGGER
-
Fill out the schedule details like the name, region, frequency, and timezone.
Check out crontab.guru for help writing cron expresions.