Test Driven Development in Python using Pytest - Data Engineering, MlOps and Databricks services

2 December 2021

Data Engineering

Share this post

Introduction

Testing code is a very important step during software development. When a programmer submits a project, he must be sure that the software he created works correctly. There are many errors that may appear while using the software, such as wrong input data type, wrong input file, wrong API, or incorrect user operation. Tests help to predict possible code usage scenarios and prevent errors from appearing. Code with properly written tests is much easier to debug and the number of errors that may appear while using the program is smaller. Tests can also help to identify and solve problems. If you are convinced enough (I’m sure you are!) I encourage you to read this article!

What is TDD (Test-driven development)?

TDD (Test-driven development) is a software development technic that presents software requirements by tests. When writing tests, the programmer writes down the requirements that the final software should meet. The test can only be passed when the final code fulfills all conditions. Thanks to this, during software development, code is repeatedly tested and all of the requirements are collected which determine that the final software has been written correctly and does not contain errors.

TDD approach can be presented in three steps:

Write tests that contain all future software requirements. All tests should fail.
Develop minimal software and check its correctness by running tests. This step should end when all tests pass.
Refactor and improve your code. Run tests again remembering they should pass.

What is the Pytest framework?

Pytest is one of the most popular frameworks for testing python code. There are many advantages that make people enjoy using Pytest. Definitely, one of them is the ease of use, among extra plugins and very well-written documentation. Pytest supports unit tests and allows you to write simple scalable test sets. Moreover, Pytest lets programmers use fixtures, parametrization, and gives an opportunity to skip selected tests during execution. In addition, it’s open-source and on the Internet there are many articles and tutorials about testing python code using Pytest. Due to these all pros lots of developers decide to write tests using this framework.

Basic Pytest test case

First of all, we need to consider how structure of our project should look like. There are several approaches, but in my opinion, it is best to separate the tests from the code that is going to be tested. I propose that our project should have the following structure.

└── project
    ├── sources
    │ ├── code.py
    │ └── __init__.py
    └── tests
     ├── test_code.py
     └── __init__.py

In the sources package, we will keep project logic.
In the tests package, we will keep tests that check the correctness of project logic.

Remember! If you want Pytest to find your test file without listing it during execution you have to add ‘test’ prefix or suffix in the file name.

Let’s write a simple function that will return the number of ‘a’ letters (upper and lower case) in a given word. As you can see, the function has protection (cast to string) against specifying a ‘word’ argument that is a different data type than string.

# project/sources/code.py

def count_a(word = None):
    return str(word).lower().count('a')

Next, let’s write a test that will confirm that the function returns the correct result. You have to know that the test method should have a ‘test’ substring at the beginning or at the end of the name. Only then Pytest will find the testing method and will execute it. First, we need to import the count_a function. To check function correctness, we use an assert statement, which checks whether the result of the function is equal to the expected result. We would like to cover all possible cases when errors could appear. Let’s look at the following example:

# project/tests/test_code.py

from sources.code import count_a

def test_count_a():
    assert count_a('AaA') == 3
    assert count_a(2) == 0
    assert count_a(3.14) == 0
    assert count_a('watermelon') == 1
    assert count_a() == 0

To run Pytest you have to just simply write ‘pytest’ in your console. We use the verbose flag to see more detailed output:

$ pytest -v

Running the test gives the following result:

As you can see our test passed, function count_a works properly.

What if we delete protection against providing argument in different type than string? Let’s remove cast from function and run the test again.

# project/sources/code.py
def count_a(word=None):
    return word.lower().count('a')

$ pytest -v

Test failed because of an AttributeError. Pytest marked which assert statement returned false and where exception appeared in tested function.

TDD in Python using Pytest

From now on, you know how to write a simple test with Pytest. It is high time to present to you how to use the Pytest package in the TDD technique.

Let’s imagine that we want to implement the UserDatabase class. Only information such as name, surname, age, and id number could be added to the database. Age can’t be a negative value and must be lower than 120. Name and surname obviously have to be a string and id number must be unique. The user may be removed from the database after providing the id number. In addition, the class should enable displaying the name of the oldest user added to the database.

Write a test

Following the TDD method, we start work by writing a test that will check all use cases and user stories. In the example below, we used the raises method from the Pytest library to check if the expected exception will appear.

# project/tests/test_code.py
import pytest
from sources.code import UsersDatabase

def test_user_database():
    # Create user database object
    database = UsersDatabase()

    # Add new users
    database.add_user('John', 'Smith', 29, 'e31sf')
    database.add_user('Emily', 'Taylor', 12, 'd24da')
    database.add_user('Lily', 'Thomas', 66, 'd33fw')
    assert database.get_number_of_users() == 3
    
    # Add two users with the same id
    with pytest.raises(ValueError):
        database.add_user('John', 'Smith', 29, 'e31sf')

    # Remove user
    database.delete_user_by_id('d24da')
    assert database.get_number_of_users() == 2
 
    # What if age is a string
    with pytest.raises(TypeError):
        database.add_user('John', 'Thomas', 'age', 'dc33s')

    # What if age equals 200
    with pytest.raises(ValueError):
        database.add_user('Ava', 'Brown', 200, 'dsd2f')

    # What if age is negative value
    with pytest.raises(ValueError):
        database.add_user('Ava', 'Brown', -10, 'dsdw3')

    # What if name is a integer
    with pytest.raises(TypeError):
        database.add_user(2323, 'Smith', 23, 'd3ff2')

    # What if surname is a float
    with pytest.raises(TypeError):
        database.add_user('John', 3.14, 19, 'd31xe')

    # What if new user doesn't have id
    with pytest.raises(TypeError):
        database.add_user(name='John', surname='Wilson', age=19)

Write a code

Right now it’s time to write the code remembering about the test that should be passed when we finish! The final code could look like sniped below:

# project/sources/code.py

class UsersDatabase:
    def __init__(self):
        self.database = []

    def add_user(self, name, surname, age, id_number):
        if not isinstance(age, int):
            raise TypeError

        if not isinstance(name, str):
            raise TypeError
 
        if not isinstance(surname, str):
            raise TypeError

        if age not in range(1, 120):
            raise ValueError

        if any(user.id_number == id_number for user in self.database):
            raise ValueError

        self.database.append(User(name, surname, age, id_number))

    def delete_user_by_id(self, id_number):
        for user in self.database:
            if user.id_number == id_number:
                self.database.remove(user)

    def get_oldest_user_name(self):
        oldest_user = sorted(self.database, key=lambda x: x.age, reverse=True)[0]
        return oldest_user.name 

    def get_number_of_users(self):
        return len(self.database)

class User:
    def __init__(self, name, surname, age, id_number):
        self.name = name
        self.surname = surname
        self.age = age
        self.id_number = id_number

Run a test

Let’s run a Pytest and check the test result.

$ pytest -v

The test passed. Our code is ready.

Conclusion

Code testing could help to avoid many errors that are sometimes not visible at first glance. Writing tests require a bit of work and could be time-consuming, but it avoids situations where we have to improve the code over and over again as users find more bugs.

TDD is a technique where tests are written first to cover all possible errors and wrong code usage. Then the code is written based on the tests and only when all tests pass the code can be considered finished. This technique helps software developers to be more focused on project requirements and possible cases when something could be erroneous.

Pytest is the most popular framework for testing python code. It has a lot of advantages and this is the reason why people use Pytest very often. Using the TDD technique with the Pytest framework gives an opportunity to improve python code quality and user satisfaction.

Check out our blog for more details on Data Engineering:

Author

Lukasz Milaszewski
View all posts

Share this post

Introduction

What is TDD (Test-driven development)?

What is the Pytest framework?

Basic Pytest test case

TDD in Python using Pytest

Write a test

Write a code

Run a test

Conclusion

Author

Interested in our services?