UnitTesting Ingredients: PyTest, Factory Boy, YAML and docstring
[Separation of Concerns], even if not perfectly possible, is yet the only available technique for effective ordering of one’s thoughts, that I know of.” — Edsger W. Dijkstra http://deviq.com/separation-of-concerns/
Background
The project under test is a cinema ticket booking system. Users can issues certain queries related to schedules for upcoming movie showtimes. System models include:
-
Cinema1: The geographic places you would go to watch movies
-
Theater2: These are the little rooms inside each cinema
-
Movie: “Nativity”, “Star Wars”, “Passion of the Christ”…
-
Schedule: aka, showtimes
The focus of our interest should be the schedules.
The ingredients
PyTest
A Python unit testing facility which features:
-
Fixture dependency injection
-
Isolated
-
Composable
-
Plus
unittest
compatibility
See this slide for advanced features of PyTest. Also, I would recommend this site if you are really into testing, especially Python <3.
In my case, I use pytest dependency injection to inject Flask app test client, and dataset into each test methods.
class TestQuerySchedules():
def test_query_by_movie_title(self, client, dataset_saigon_weekend):
response = client.get('/api/query')
YAML
If you have heard of JSON, then you should see YAML3. It is much friendlier than JSON and yet it is by no means less expressive than JSON. Hence, it is much easier to maintain especially you have thousands of LOC.
The following fragment of YAML presents a list of movies, each of which has
code
, title
and status
:
movies:
- code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
title: The Fox
status: 2
- code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
title: Star Wars
status: 2
No curly braces, double quotes whatsoever! And it also looks very Pythonic <3. FYI, Google AppEngine uses .yaml files for application configurations.
Factory Boy
Initially I used Factory Boy to replace the needs for file-based fixtures. I do enjoy the concepts of building test fixtures with factory:
-
Using custom sequence to generate unique yet meaningful values
-
Faker to generate human friendly fields
-
Built-in integration with SQLAlchemy, Google Datastore, Django…
-
Fixture dependency support with
SubFactory
class ScheduleFactory(SQLAlchemyModelFactory):
class Meta:
model = Schedule
sqlalchemy_session = db.session
theater = SubFactory(TheaterFactory)
movie = SubFactory(MovieFactory, status=Movie.STATUS_PUBLISHED)
While testing features, we do not really care about a field’s value, we care more about the logicalness in such values. For example, a fixture with full name “Elton John”, we would expect:
-
This is a person
-
This person’s email is “elton.john@gmail.com”
-
This person’s job is “singer”
-
He works at a company named “Rocket Music Entertainment Group”
Factory Boy stubs in default, meaningful values for fields unless you override it with one your own.
You can find more about Factory Boy and its inner working here
Docstring
In Python, docstring
s are blocks of string right beneath a Python
class/method/function quoted by triple quotation marks. The purpose of
docstrings are to describe the class/method/function it belongs to.
def demo():
"""
Demo is short for demonstration
"""
pass
It is very nice of Python <3 that it lets you access this piece of information out-of-the-box.
And yes - you can parse this block of text to PyYAML to complete the big picture
of UnitTesting Ingredients: PyTest, Factory Boy, YAML and docstring
See this thread on Stack Overflow
See more on docstrings, PEP8 and PEP257.
The mix
You need to pip install PyYAML
as an dependency of your project.
Now, in order to test our showtime query features, we really need a lot of data. Unlike other operations in CRUD, ad-hoc queries needs a manageable well-controlled dataset to verify whether or not such and such combination of filtering conditions would contain the correct subset of data while maintaining the constraints of data integrity enforced by DBMS.
In other words, you have to fake them consistently and fake a lot of them. I have given a few criteria of acceptance regarding our testing setup:
-
Manageable
-
Well controlled
-
Large dataset
Imagine all that can be achieved with the following chunk of text. You can skip it, tho. Just know that:
-
Three movies are created
-
Three cinemas are created, each has three theaters
-
Six schedules are created, three of which are approved
-
All are managed under a well-known name
dataset_saigon_weekend
-
All are visible under one Python file
test_schedule_api.py
in readable format, with ~133 LOC + data:
-
~100 lines of data in YAML format
-
~34 LOC. Let’s call it an overhead
@pytest.fixture(scope='function')
def dataset_saigon_weekend(request, db):
"""
movies:
- code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
title: The Red Fox
status: 2
- code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
title: Kramus
status: 2
- code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
title: Star Wars
status: 2
cinemas:
- name: Galaxy Tan Binh
group: Galaxy
prefix: GALAXYTB
district: Tân Bình
city: Ho Chi Minh
country: Vietnam
status: 2
theaters:
- code: GALAXYTB-T000001
name: Theater One
- code: GALAXYTB-T000002
name: Theater Two
- code: GALAXYTB-T000003
name: Theater Three
- name: Galaxy Nguyen Trai
group: Galaxy
prefix: GALAXYNT
district: Quan 1
city: Ho Chi Minh
country: Vietnam
status: 2
theaters:
- code: GALAXYNT-T000001
name: Theater One
- code: GALAXYNT-T000002
name: Theater Two
- code: GALAXYNT-T000003
name: Theater Three
- name: Lotte Cong Hoa
group: Lotte
prefix: LOTTCONG
district: Tân Bình
city: Ho Chi Minh
country: Vietnam
status: 2
theaters:
- code: LOTTCONG-T000001
name: Theater One
- code: LOTTCONG-T000002
name: Theater Two
- code: LOTTCONG-T000003
name: Theater Three
schedules:
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
theater_code: GALAXYTB-T000001
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 2
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
theater_code: GALAXYTB-T000002
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 2
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
theater_code: GALAXYTB-T000003
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 2
# The same movies are not published at Galaxy Nguyen Trai (GalaxyNT)
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
theater_code: GALAXYNT-T000001
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 1
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
theater_code: GALAXYNT-T000002
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 1
- movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
theater_code: GALAXYNT-T000003
start_at: 2016-01-01 09:00:00 UTC
end_at: 2016-01-01 10:30:00 UTC
status: 1
"""
from tests.fixtures.simplefactories import (CinemaFactory,
TheaterFactory,
MovieFactory,
ScheduleFactory
)
for cinema in dataset['cinemas']:
inserted = CinemaFactory(**{key:value for key, value in cinema.items()
if key != 'theaters'})
for theater in cinema['theaters']:
theater['cinema'] = inserted
TheaterFactory(**theater)
for movie in dataset['movies']:
MovieFactory(**movie)
for schedule in dataset['schedules']:
schedule['start_at'] = datetime.strptime(schedule['start_at'],
DATETIME_FORMAT)
schedule['end_at'] = datetime.strptime(schedule['end_at'],
DATETIME_FORMAT)
ScheduleFactory(**schedule)
Imagine how you would achieve the same goals otherwise. Keep in mind with this setup, we do not need to add more to the ~34 LOC as we load our dataset with a variety of more data.
Conclusion
Since I started with “Separation of Concerns”, let me recap likewise: I have observed a few concerns while doing unit-testing:
-
Manageability concern
-
Controllability concern
-
Scalability concern
-
Readability concern
One must treat these as mutually orthogonal vectors. It is a must to do unit-testing. It is only a matter of how to keep our own sanity while maintaining the test cases. Keep the concerns separate as a change in one vector should not mess with others.
My special thanks to:
-
Holger Krekel (@hpk42) and pytest-dev team
-
Raphaël Barrois, Mark Sandstrom for Factory Boy
-
Kirill Simonov for PyYAML (249kB of awesomeness)
-
Guido van Rossum for the snake, I mean Python <3
This blog article is a part of an upcoming series: Building thebox: A cinema ticket booking system