Software Development Fundamentals

10 Apr 2026
34 minute read

DISCLAIMER: This article was initially written as a set of slides for workshop and presentation purposes for educational use. It has changed over time and will likely continue to evolve.

Basics of software development with Python.

Software Development Fundamentals

Basics of software development with Python.

Jan Eefting

Programming Theory

What is Programming?

  • A set of instructions that tell a computer what to do

  • Written in a formal language the machine can interpret or compile

  • Ranges from low-level (machine code, assembly) to high-level (Python, C#)

  • At its core: taking a problem and breaking it down into steps a machine can execute to solve it

Programming is, at its most basic, the act of giving a computer a precise set of instructions. Then the computer will do exactly what you tell them, nothing more, nothing less. This makes programming both powerful and humbling.

A program is nothing more than a sequence of operations: read this, compute that, store this value, repeat until done. The challenge is expressing the solution to a real-world problem in a language rigid enough for a machine to execute unambiguously.

What makes it interesting - and frankly difficult - is that the problems we want computers to solve are rarely simple. And the gap between “what I want it to do” and “what I actually told it to do” is where most bugs are born. Worse even when “what I think I told it to do” is not “what I actually told it to do”.

How does it work?

  1. You write code in a high-level language (Python, Java, etc.)
  2. A compiler or interpreter translates it into machine code (binary instructions the CPU understands)
  3. The CPU executes those instructions, which may involve:
    • Performing calculations
    • Reading/writing memory
    • Interacting with peripherals (disk, network, etc.)
  4. The operating system manages resources and provides services to the program

When you write a Python script, you’re writing in a high-level language that abstracts away the details of how the computer actually executes your code. The Python interpreter takes care of translating your code into machine code that the CPU can execute.

Python itself is written in C, which is a lower-level language that compiles down to machine code. So when you run a Python program, you’re actually running a C program (the interpreter) that executes your Python code.

When you run a program, the operating system loads it into memory, allocates resources, and starts executing it. The CPU executes the machine code instructions one by one, performing calculations, reading/writing memory, and interacting with peripherals as needed.

The operating system (Windows, Linux, macOS) sits between your program and the hardware. It manages resources like memory and CPU time, handles input/output operations, and provides a layer of security and stability.

We don’t need to worry about those details most of the time, but it’s helpful to understand that there’s a lot going on under the hood when you run a program.

What does it look like?

x = 42
y = x * 2
print(f"The answer is {y}")
MOV R1, 42        ; Load the value 42 into register R1
MUL R2, R1, 2    ; Multiply R1 by 2 and store the result in R2
CALL print       ; Call the print function with R2 as an argument

Which the CPU executes as:

10110000 00101010  ; MOV R1, 42
11100000 00000010  ; MUL R2, R1, 2

A simple script like the one in the slide might look like three lines of Python, but it translates into dozens of machine instructions that the CPU executes.

The exact machine code depends on the CPU architecture (x86, ARM, etc.) and the operating system, but the general idea is that your high-level code gets translated down to instructions that the hardware can execute.

Understanding this translation process helps you appreciate the power of programming languages and the abstractions they provide. It also gives you insight into performance considerations - for instance, why certain operations are faster than others, and how to write code that runs efficiently.

What is a compiler or interpreter?

Translates the high level code to machine code.

  • Compiler does this at build time. C#, Java, etc.

  • Interpreter does this at run time. Python

Some languages use a compiler to translate the code into machine code. For compiled languages it is done when the program is built and often generates a single file or binary that contains the entire translated programm. That file is what you then need to run. You don’t really run the high level code directly.

Language that use an interpreter however, translates the high level code to machine code when the program runs line by line.

Levels of Abstraction

LevelExampleTypical users
Machine code10110000 01100001Nobody, hopefully
AssemblyMOV AL, 61hEmbedded / systems programmers
Low-levelC, C++Systems, drivers, performance-critical code
High-levelPython, Java, C#Most application development
Domain-specificSQL, HTML, regexEveryone

The higher you go, the less you think about the hardware and the slower it runs.

Every layer of abstraction exists to hide complexity. Machine code is just numbers - sequences of bits that the CPU interprets as instructions. Nobody writes machine code by hand anymore.

  • Assembly gives those numbers human-readable names.
  • C gives you structured control flow without managing registers yourself.
  • Python lets you write sorted(items) without knowing - or caring - how the sort algorithm is implemented.

This is really awesome because now you can build sophisticated applications without understanding every layer beneath you.But the abstraction has a cost: performance.

Python is orders of magnitude slower than C for raw computation. For most applications this doesn’t matter. For real-time systems like trading platforms, games, or anything that needs to crunch numbers at scale, it matters enormously.

Python is “perfect for beginners”

  • High-level, general-purpose language
  • Emphasizes readability and simplicity
  • Great for beginners, but also used by professionals
  • Huge ecosystem of libraries for everything from web development to data science

Python is “perfect for beginners”

Python is one of the most popular programming languages in the world, and for good reason. It’s a versatile language that can be used for everything from web development to scientific computing. Its syntax is clean and easy to read, which makes it an excellent choice for beginners.

It has a huge ecosystem of libraries and frameworks that allow you to do almost anything without reinventing the wheel. Whether you’re building a web app with Django, analyzing data with Pandas, or automating tasks with scripts, Python has you covered.

It’s really easy to setup prototypes, scripts, and small projects in Python. You can get started with just a text editor and the command line. This makes it ideal for learning programming concepts without getting bogged down in complex setup or syntax.

Python has a reputation for being an excellent first language, and for good reason. Its syntax is clean and intuitive, making it easier to learn programming concepts without getting stuck and bogged down in complex syntax.

Fundamentals

Fundamentals

First Rule of Programming

RTFM!

Second Rule of Programming

Seriously, RTFM!

Python documentation for example is really good and easy to find. e.g. https://docs.python.org/3.14/glossary.html#term-argument

Just ask copilot or google things.

Variables

  • Variables are named references for values:
    • age = 31 # integer
    • name = "Jan" # string
    • is_student = True # boolean
Variables are the part of the basic building blocks of programming and are “references” for storing data and values. You can think of them as custom labels for instances of references to values in memory. When you write age = 31, you’re creating a variable named age that references the integer value 31. When you later write age = age + 1, you’re creating a new integer value 32 and updating the reference of age to point to that new value.

Variables

All variables in Python are references to objects in memory.

>>> x = 42
>>> id(x)
140737488346112  # memory address of the integer object 42

All variables in Python are references to objects in memory. The variable itself doesn’t hold the value directly; it points to an object that holds the value.

Python has a built in method for checking the memory address of the object a variable references, which is id().

Data Types

Some common data types in Python:

  • int - whole numbers: 42
  • float - decimals: 3.14
  • str - text: "hello"
  • bool - true/false: True
  • list - ordered, mutable collection: [1, 2, 3]
  • dict - key-value pairs: {"name": "Jan"}

There more types like set, tuple, datetime, etc. You can even create your own.

Data types are how most languages categorize values. They determine what kind of operations you can perform on a value, and how much memory it takes up.

Python is dynamically typed, which means you don’t have to declare the type of a variable, it figures it out at when the program runs. This makes it very flexible, but it also means you have to be careful about what kind of data you’re working with, because the type can change at runtime.

Different languages have different sets of data types, and some languages allow you to define your own custom types. In Python, you can create your own classes to define new types of objects with their own properties and methods.

Python is dynamically typed, often called “duck typed”. This means that the type of a variable is inferred at run time.

For example, if you write x = 42, Python will create an integer object with the value 42 and make x reference that object. If you later write x = "hello", Python will create a new string object with the value "hello" and update x to reference that new object. The type of x changed from int to str without any explicit declaration.

>>> x = 42
>>> type(x)
<class 'int'>
>>> x = "hello"
>>> type(x)
<class 'str'>

As of Python 3.5, you can use optional type hints to indicate what type a variable is expected to be, but these are not enforced at runtime. They’re mainly for documentation and tooling purposes and are not required or enforced by the language.

Data Types

  • Python is dynamically typed
>>> x = 42
>>> type(x)
<class 'int'>
>>> x = "hello"
>>> type(x)
<class 'str'>

Data Types

Python >3.5 has optional type hints:

age: int = 31

def greet(name: str) -> str:
    return f"Hello, {name}!"

How data types differ?

  • Memory representation
  • Allowed operations
  • Variable assignment and function passing
  • Mutability (mutable vs immutable)
Data types are fundamentally different kinds of data with different properties and behaviours - not just in how we read and reason about them, but in how they live in memory, what operations they support, and what happens when you pass them around.

Memory representation

In memory things are stored in binary:

5 is could be stored as 00000101

Memory representation

In Python, every value is an object. It carries metadata (reference count, type pointer) and the value itself.

So even a simple integer like the 5, which can be represented in a single byte in memory, takes up 28 bytes.

One thing worth knowing early: in Python, every value is an object. Even a simple integer like 42 carries metadata - a reference count for garbage collection, a pointer to its type, and the value itself. This is why Python types use more memory than you’d expect coming from languages like C. It’s also why mutable and immutable types behave the way they do. We cover this in more detail in another workshop on Python and its features.

Allowed Operations

Operators are symbols like:

  • + for addition or concatenation
  • * for multiplication or repetition
  • == for equality comparison
  • . for attribute access

Others: -, /, //, %, **, [], (), etc.

Allowed Operations

Some operators behave differently depending on the type:

>>> 1 + 2
3
>>> "hello " + "world"
'hello world'
>>> [1, 2] + [3, 4]
[1, 2, 3, 4]

Different data types support different operations. + means addition for numbers, concatenation for strings, and merging for lists. But + on two dicts raises a TypeError - dicts don’t define that operation.

The rules are defined per type. When you hit a TypeError, it’s Python telling you that the operation isn’t defined for the types you’re combining. We look at the edge cases in the memory deep-dive in another article.

Allowed operations - type rules

What happens here?

>>> {"one": 1} + {"two": 2}
?
>>> 1 + "2"
?
>>> 1 + True
?

Allowed operations - type rules revealed

>>> {"one": 1} + {"two": 2}
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

>>> 1 + "2"
TypeError: unsupported operand type(s) for +: 'int' and 'str'

>>> 1 + True
2   # bool is a subclass of int - True == 1

Operations are defined per type. bool inheriting from int is the one that always surprises people.

Mutable and Immutable Types

Types fall into two camps.

  • Mutable types, which can be changed over time
  • Immutable types, which can never be changed after they are created

Mutable Types

>>> mylist = [1,2,3]
>>> mylist
[1,2,3]
>>> id(mylist)
1753958035200  # memory address of the list object
>>> mylist.append(1)
>>> mylist
[1,2,3,1]
>>> id(mylist)
1753958035200  # same memory address, the list was modified in place
Some types can have values that change (mutable) like lists and dictionaries, which allow you to modify their contents after they are created.

Mutable Types

Some mutable types include:

  • list
  • dict
  • set

Immutable Types

>>> mystring = "hello"
>>> id(mystring)
140737488346112  # memory address of the string object
>>> mystring += " world"
>>> id(mystring)
140737488346176  # different memory address, a new string was created
These data types cannot be changed after they are created (immutable) like integers and strings.

Immutable Types

Some immutable types include:

  • int
  • float
  • str
  • tuple

Variable Assignment and Function Passing

Python is pass-by-object-reference

  • You’re always passing the reference to the object, not a copy of the value
  • What that means depends on whether the type is mutable or immutable

This is one of those things that trips people up constantly, especially if you’re coming from a background where you haven’t had to think much about what happens in memory when you hand a variable to a function.

Python doesn’t pass by value (like C does), and it doesn’t pass by reference in the classic sense either. It passes by object reference - which means you’re handing the function a reference to the same object in memory. Whether that matters depends entirely on whether the object can be changed.

For immutable types like integers and strings, this distinction is almost invisible. You can’t modify the object, so Python just creates a new one when you try, and the original is untouched. For mutable types like lists and dictionaries - watch out. You’re handing someone the keys to the car, not a photocopy of them.

The difference comes down to mutability. When you write b = a for a list, you’re not copying the list - you’re copying the reference. Both a and b now point to the exact same object in memory. Modify through one, and you’ll see the change through the other.

With integers, this initially looks the same - x and y briefly point to the same object. But the moment you do y = y + 1, Python can’t modify the integer 42 in place (it’s immutable), so it creates a brand new integer 43 and makes y point to that instead. x is left alone.

This is why two variables pointing to the same list is called aliasing, and why it causes some of the most confusing bugs beginners (and experienced developers who stopped paying attention) run into.

Variable Assignment - Mutable

Two variables, one object:

>>> a = [1, 2, 3]
>>> b = a
>>> id(a) == id(b)
True
>>> b.append(4)
>>> a
[1, 2, 3, 4]   # surprise!

b is not a copy - it’s an alias for the same list.

Variable Assignment - Immutable

Integers don’t have this problem:

>>> x = 42
>>> y = x
>>> id(x) == id(y)
True             # same object, for now
>>> y = y + 1
>>> y
43
>>> x
42               # x is untouched
>>> id(x) == id(y)
False            # y now points to a new object

Reassigning y creates a new object - x never changes.

Function Passing - Mutable Types

def add_item(items: list, item: str) -> None:
    items.append(item)

shopping = ["milk", "eggs"]
add_item(shopping, "bread")
print(shopping)  # ['milk', 'eggs', 'bread']

The function received the same list object - no copy was made.

When you pass a list to a function, you’re passing the reference. The function works on the same object that exists in the caller’s scope. That’s why shopping changes even though the function never returned anything.

This is useful - it means you can modify large data structures in place without the overhead of copying them. But it also means you can accidentally destroy data that you didn’t intend to touch. The function signature says nothing about this. There’s no warning label.

Data engineers hit this one all the time with Pandas DataFrames. You pass a DataFrame into a function, the function does some transformation, and suddenly the DataFrame outside the function has also changed. Or hasn’t. Depending on the operation. It’s a mess.

Function Passing - Mutable Types

def add_item(items: list, item: str) -> list:
    copy = items.copy()
    copy.append(item)
    return copy

n inside the function is a local reference - rebinding it doesn’t affect value.

If you want to pass a mutable type but protect the original, make an explicit copy:

Now the caller’s list is untouched, and the function returns a new one. More predictable, easier to test, fewer surprises at 3am.

Function Passing - Immutable Types

def double(n: int) -> int:
    n = n * 2
    return n

value = 5
result = double(value)
print(value)   # 5  - unchanged
print(result)  # 10

n inside the function is a local reference - rebinding it doesn’t affect value.

Immutable types are safe to pass around without worrying about side effects. Inside the function, n starts as a reference to the same integer object as value.

The moment you do n = n * 2, Python creates a new integer and rebinds n to it locally. The original object that value points to is never touched.

This is why strings, integers, and tuples are “safe” to pass into functions - you can’t accidentally mutate them from inside the function. You can only return a new value.

The key takeaways

  • Assigning a variable copies the reference, not the value
  • Mutable types (list, dict, set) can be modified through any reference
  • Immutable types (int, str, tuple) cannot be modified - Python creates a new object
  • Passing mutable types to functions can cause side effects - be deliberate about it
  • When in doubt, make a copy

Control Flow

How a program makes decisions and repeats work:

  • Conditionals (if/elif/else)
  • Loops (for, while)

Control Flow

# If / else
if temperature > 30:
    print("Hot day")
elif temperature > 15:
    print("Nice day")
else:
    print("Stay inside")

# For loop
for item in shopping_list:
    print(item)

# While loop
while condition:
    do_things()

Control flow is how a program makes decisions and repeats work. Without it, every program would just execute a list of statements from top to bottom, once, and stop. Not particularly useful.

The if/elif/else chain is the most basic form. Python uses indentation to define blocks - no curly braces. This is either elegant or infuriating depending on your background, and most people land somewhere in the middle after a while.

for loops in Python iterate over any iterable - lists, strings, dictionaries, ranges, file lines, database rows. If it can be iterated, you can loop over it. This is more flexible than the classic for (int i = 0; i < n; i++) style, and you’ll find yourself using it constantly.

while loops run as long as a condition is true. Always have a clear exit condition - an infinite loop is usually a bug, not a feature. break jumps out of the loop immediately. continue skips the rest of the current iteration and moves to the next one.

Control Flow - Conditionals

if temperature > 30:
    print("Hot day")
elif temperature > 15:
    print("Nice day")
else:
    print("Stay inside")
  • Python uses indentation to define blocks - no braces
  • elif is short for “else if”
  • The first condition that evaluates to True wins - the rest are skipped

Control Flow - For Loops

# Iterate over a list
for item in shopping_list:
    print(item)

# Iterate over a range
for i in range(5):
    print(i)  # 0, 1, 2, 3, 4

# Iterate over a dict
for key, value in config.items():
    print(f"{key}: {value}")

for works on anything iterable - lists, strings, dicts, files, database cursors…

Control Flow - While, Break, Continue

retries = 0
while retries < 3:
    result = try_connect()
    if result.ok:
        break         # exit the loop immediately
    retries += 1

for item in data:
    if item is None:
        continue      # skip this iteration, move to the next
    process(item)
  • while runs until the condition is False - always know what stops it
  • break exits the loop, continue skips to the next iteration

Functions

  • Reusable, named blocks of code
  • Take parameters, return a value
  • Keep them small and focused - one job per function

Functions

def greet(name: str, formal: bool = False) -> str:
    if formal:
        return f"Good day, {name}."
    return f"Hey, {name}!"

print(greet("Jan"))                  # Hey, Jan!
print(greet("Jan", formal=True))     # Good day, Jan.
  • Reusable, named blocks of code
  • Take parameters, return a value
  • Keep them small and focused - one job per function

Functions are the primary unit of reuse in programming. Instead of copying the same block of code in three places and then having to fix a bug in three places, you write it once, give it a name, and call it.

The difference between a parameter and an argument is one of those things people use interchangeably, and it doesn’t actually matter much in practice - but to be precise: parameters are the names in the function definition, arguments are the values you pass when you call it.

Python returns None implicitly if you don’t have a return statement. This is a common source of bugs - you call a function expecting a value back, get None, and then something downstream blows up in a confusing way.

Type hints like name: str and -> str are not enforced by Python at runtime. But they are enormously useful for readability, for tooling (your editor will warn you when you pass the wrong type), and for anyone who has to read your code six months later, including you. Use them.

Keep functions small and focused. One job per function. If you find yourself thinking “this function does X and also Y”, that’s two functions. This isn’t dogma - it’s just practical. Small functions are easier to test, easier to name, easier to reason about, and easier to replace when requirements change.

Functions - Parameters and Defaults

def greet(name: str, formal: bool = False) -> str:
    if formal:
        return f"Good day, {name}."
    return f"Hey, {name}!"

greet("Jan")               # Hey, Jan!
greet("Jan", formal=True)  # Good day, Jan.
  • Default values make parameters optional
  • Keyword arguments make call sites self-documenting
  • Python returns None implicitly if there’s no return

Functions - Keep Them Small


def process_order(order):
    check_required_fields(order)
    if not order.currency in supported_currencies:
        raise ValueError("Unsupported currency")

    for item in order.items:
        check_inventory_available(item)
    ....

One job per function. If you need “and” to describe what it does, split it.

Functions - Keep Them Small

# Hard to test, hard to name, hard to reuse
def process_order(order):
    validate_order(order)
    charge_card(order.payment)
    update_inventory(order.items)
    send_confirmation_email(order.customer)
    log_to_audit_trail(order)

# Each step is now independently testable and reusable

One job per function. If you need “and” to describe what it does, split it.

Classes and Objects

A class is a blueprint. An object is an instance of that blueprint.

Classes and Objects

A class is a blueprint. An object is an instance of that blueprint.

class Dog:
    def __init__(self, name: str, breed: str):
        self.name = name
        self.breed = breed

    def bark(self) -> str:
        return f"{self.name} says: Woof!"

rex = Dog("Rex", "Labrador")
print(rex.bark())  # Rex says: Woof!
  • State lives in attributes (self.name, self.breed)
  • Behaviour lives in methods (bark())

Functions are great for organising behaviour. But sometimes you also need to bundle state together with that behaviour - data and the operations on it belong together, and passing them around separately gets messy fast.

That’s the core idea behind a class. A class defines what an object looks like (its attributes) and what it can do (its methods). Every time you create an instance of a class, you get a fresh, independent object with its own copy of that state.

__init__ is the constructor - it runs automatically when you create a new instance. self is just a reference to the specific object being operated on. It’s not magic; it’s just the first argument Python passes to every method automatically.

If you’ve worked with Pandas, you’ve been using objects all along. A DataFrame is a class. When you do df = pd.DataFrame(data), you’re creating an instance. When you call df.head() or df.dropna(), you’re calling methods on that instance. You just didn’t have to write the class yourself.

When to use a class

Use a class when:

  • You have data and behaviour that belong together
  • You need multiple independent instances of the same structure
  • You want to encapsulate complexity behind a clean interface

Don’t use a class just because it feels more “proper”. A few functions in a module is often enough.

State and Behaviour

class Counter:
    def __init__(self):
        self.count = 0        # state

    def increment(self):      # behaviour
        self.count += 1

    def reset(self):          # behaviour
        self.count = 0

c1 = Counter()
c2 = Counter()
c1.increment()
c1.increment()
print(c1.count)  # 2
print(c2.count)  # 0  - completely independent

This is the key point. c1 and c2 are two separate objects. They were both created from the same Counter class, but each has its own count attribute. Incrementing c1 has absolutely no effect on c2.

This is what “encapsulation” means in practice - each object owns and manages its own state. The outside world interacts with it through methods, not by reaching in and poking the data directly.

Compare this to passing a bare integer around between functions. The moment you have more than one counter, or the counter needs a name, or a max value, or a reset policy - you’re adding parameters to every function that touches it. A class keeps all of that together in one place.

Inheritance - just the basics

class Animal:
    def __init__(self, name: str):
        self.name = name

    def speak(self) -> str:
        raise NotImplementedError

class Dog(Animal):
    def speak(self) -> str:
        return f"{self.name} says: Woof!"

class Cat(Animal):
    def speak(self) -> str:
        return f"{self.name} says: Meow."
  • Dog and Cat inherit from Animal - they get its attributes and can override its methods

Inheritance lets one class build on another. Dog gets everything Animal defines, and adds or overrides what it needs. This is useful for sharing common behaviour without duplicating it.

That said - inheritance is one of those things that looks elegant in toy examples and becomes a liability in real codebases. Deep inheritance chains are hard to follow, hard to change, and the source of plenty of subtle bugs. Keep it shallow. If you find yourself thinking “I need a SpecialSubclassOfSubclass”, that’s usually a sign to step back and reconsider the design.

For now, knowing that it exists and understanding the basic mechanics is enough.

Fundamentals - Recap

  • Variables are references to objects in memory, not containers
  • Data types determine what you can do with a value and how it behaves
  • Mutable types (list, dict) can be changed in place - immutable types (int, str, tuple) cannot
  • Pass-by-object-reference means mutable types can change under you - make copies deliberately
  • Control flow - if/elif/else, for, while, break, continue
  • Functions - one job, clear inputs and outputs, use type hints
  • Classes - when state and behaviour belong together; don’t reach for them by default

Programming Paradigms

Programming Paradigms

Paradigms: More Than One Way to Talk to a Machine

  • Procedural - step-by-step instructions (C, Pascal)

  • Object-Oriented - model the world as objects with state and behaviour (Java, C#, Python)

  • Functional - describe what to compute, not how (Haskell, Erlang, parts of Python)

  • Logic - define rules and let the engine figure out the solution (Prolog)

Most modern languages borrow from multiple paradigms.

The programming language you use shapes how you think about the problem. Each paradigm reflects a different mental model for decomposing and solving problems.

Procedural languages are closest to how the hardware actually works - you tell the machine what to do, in order. Think of it as a recipe. C is the canonical example, and understanding procedural code gives you a solid foundation for everything else.

Object-Oriented Programming (OOP) groups related data and behaviour together into objects. A Car object has properties (speed, colour) and methods (accelerate(), brake()). This maps well to how humans naturally model the world, which is why OOP dominated enterprise software for decades.

Functional programming treats computation as the evaluation of mathematical functions. It avoids changing state and mutable data. The benefits are real - pure functions are easier to test and reason about - but the mental shift can be steep if you’re coming from OOP.

Logic programming is a weird one - you define facts and rules, and the engine figures out how to apply them to answer questions. I haven’t ever read into it much but I know it exists.

Where does Python fit in?

Initially designed as a procedural scripting language.

Now comfortably supports procedural, object-oriented, and functional styles - often in the same file.

When I first started learning Python, after having cut my teeth on .net languages like delphi, c++ and C#, it felt somehow “looser”, faster, messier, and more fun. It was like I’ve only ever been able to write in cursive, with strict rules and structure and everything had to be just right or else you’d get in trouble. And then suddenly I could write in print, and I could doodle and fuck around. I could make a mess and my mess would run, and then fail.

Initially Python was designed as a procedural scripting language, but over time it has evolved to support multiple paradigms. You can write procedural code in Python, but you can also use classes and objects for OOP, or leverage functional programming features like first-class functions and list comprehensions.

This flexibility is one of Python’s greatest strengths, but it allows everyone to create a unique pile of messy code that only a mother could love. And probably not even then. And not to mention ever being able to read and maintain it.

It’s important to understand the core concepts of programming first, and then explore how Python implements those concepts in different ways.

Procedural - you’re probably already doing this

import csv

def load_data(path: str) -> list:
    with open(path) as f:
        return list(csv.DictReader(f))

def filter_active(rows: list) -> list:
    return [r for r in rows if r["status"] == "active"]

def write_output(rows: list, path: str) -> None:
    with open(path, "w") as f:
        for row in rows:
            f.write(f"{row['id']},{row['name']}\n")

rows = load_data("customers.csv")
active = filter_active(rows)
write_output(active, "output.csv")

Steps in order. Read, filter, write. Familiar?

This is procedural programming. Step-by-step instructions, executed from top to bottom. No classes, no frameworks, no ceremony. Just functions that do one thing, called in order.

Most people writing data scripts, analysis notebooks, or quick ETL pipelines are writing procedural code, whether they call it that or not. It’s the most natural way to think about a problem: first do this, then do that, then do the other thing.

For this kind of task - load a file, transform it, write it out - procedural code is completely appropriate. There’s no need to reach for anything more complex. A few well-named functions and it’s readable, maintainable, and easy to follow.

The problems start when “a few functions” becomes “a hundred functions”, the script grows to a thousand lines, and the logic starts weaving between them in ways that are hard to follow without running it in your head first.

Procedural - where it breaks down

“I’ve worked with plenty of amazingly talented data scientists and engineers who write code that is a massive ball of spaghetti and mud, which they map in their mind and navigate with ease - but for anyone else who has to read and maintain that code, it’s a nightmare.”

And even for the original author, it becomes a problem. Try explaining a bug in that code to someone else. You’re describing a maze that only exists in your head, in words, to someone who has never seen it. It doesn’t work.

This isn’t a character flaw. It’s a natural consequence of code that grew organically without structure. Every addition made sense at the time. The problem is that procedural code has no natural boundaries - nothing stops functions from growing, nothing enforces where state lives, and nothing prevents two parts of the script from depending on each other in ways that aren’t obvious from reading either one in isolation.

This is where OOP and functional patterns start to earn their keep - not as intellectual exercises, but as practical tools for drawing boundaries around complexity before it draws them around you.

Procedural - where it breaks down

The script that started as 50 lines is now 800.

  • Functions that were “temporary” are load-bearing
  • Global variables that “made sense at the time”
  • A change in one place breaks something in three others
  • Nobody wants to touch it, including you

Structure isn’t bureaucracy. It’s how code stays readable when it grows.

OOP - when state and behaviour travel together

The same pipeline, but now it needs to run on multiple sources, track how many rows it processed, and report errors per source.

class CsvPipeline:
    def __init__(self, source_path: str, output_path: str):
        self.source_path = source_path
        self.output_path = output_path
        self.rows_processed = 0
        self.errors = []

    def run(self) -> None:
        rows = self._load()
        active = self._filter(rows)
        self._write(active)

    def summary(self) -> str:
        return f"Processed {self.rows_processed} rows, {len(self.errors)} errors"

Notice what changed. The logic is largely the same - load, filter, write. But now there’s state that belongs to this specific pipeline run: which file it’s reading, where it’s writing, how many rows it processed, what errors it encountered.

In the procedural version, you’d have to pass that state around as parameters or store it in global variables. Both options get messy fast. The class keeps it in one place, attached to the object that owns it.

The other thing you get is the ability to create multiple independent instances:

orders = CsvPipeline("orders.csv", "orders_out.csv")
customers = CsvPipeline("customers.csv", "customers_out.csv")

orders.run()
customers.run()

print(orders.summary())    # Processed 1420 rows, 3 errors
print(customers.summary()) # Processed 892 rows, 0 errors

Each pipeline has its own state. They don’t interfere with each other. You didn’t have to thread rows_processed through every function call, and you didn’t have to worry about one pipeline’s error list contaminating the other’s.

This is the practical argument for OOP - not that it’s more elegant in theory, but that it stops you from having to invent increasingly awkward ways to carry state through a procedural programme as it grows.

OOP - when state and behaviour travel together

Use a class when the same data needs to travel with the operations that act on it.

orders = CsvPipeline("orders.csv", "orders_out.csv")
customers = CsvPipeline("customers.csv", "customers_out.csv")

orders.run()
customers.run()

print(orders.summary())    # Processed 1420 rows, 3 errors
print(customers.summary()) # Processed 892 rows, 0 errors

Two pipelines, independent state, same interface.

OOP - the practical argument

Not “it’s more elegant”.

  • State that belongs together stays together
  • Multiple independent instances without threading state through every function
  • A clean interface hides the implementation details
  • Easier to test: create an instance, run it, check the result

The class didn’t add complexity - it gave the complexity somewhere to live.

Functional - describing what, not how

The core idea: functions should be pure.

  • Given the same input, always return the same output
  • No side effects - don’t modify anything outside the function
# Not pure - depends on external state, modifies a list
results = []
def process(row):
    if row["status"] == "active":
        results.append(row)   # side effect

# Pure - same input, same output, nothing modified
def is_active(row: dict) -> bool:
    return row["status"] == "active"

You don’t need to adopt functional programming wholesale to benefit from it. The core idea - pure functions with no side effects - is useful regardless of what paradigm you’re working in.

A pure function is predictable. You can call it anywhere, pass it any input, and it will always behave the same way. You can test it in complete isolation. You can run it in parallel without worrying about race conditions. You can compose it with other functions without surprises.

The results.append() example is the kind of thing that looks harmless and causes headaches later. The function’s behaviour now depends on the state of results before it was called, and calling it twice produces different results. That’s the kind of subtle dependency that makes code hard to test and hard to reason about.

This doesn’t mean you can never modify state - real programs have to write to files, update databases, send messages. The idea is to push those side effects to the edges of your code, and keep the logic in the middle as pure as possible.

List Comprehensions

# Procedural
active = []
for row in rows:
    if row["status"] == "active":
        active.append(row)

# Functional - one line, same result
active = [row for row in rows if row["status"] == "active"]

# With transformation
names = [row["name"].upper() for row in rows if row["status"] == "active"]

Readable, concise, and no mutation of an external list.

List comprehensions are the most used functional feature in Python by a wide margin, and most people who write them don’t think of them as “functional” - they just think of them as the Pythonic way to build a list.

The pattern is: [expression for item in iterable if condition]. The condition is optional. The expression can be anything - a value, a function call, a method, a transformation.

They’re not always the right choice. If the logic inside gets complex enough that you need to squint to read it, write it as a loop. Readability is not a luxury. But for simple filter-and-transform operations, comprehensions are cleaner than the equivalent loop and explicit append.

You can also write dict comprehensions ({k: v for k, v in ...}) and set comprehensions ({x for x in ...}) using the same syntax.

Dict and Set Comprehensions

# Dict comprehension - invert a mapping
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
# {1: 'a', 2: 'b', 3: 'c'}

# Set comprehension - unique values only
statuses = {row["status"] for row in rows}
# {'active', 'inactive', 'pending'}

Same pattern, different brackets, different type.

map() and filter()

rows = [
    {"name": "alice", "score": 91},
    {"name": "bob",   "score": 43},
    {"name": "carol", "score": 78},
]

passing = list(filter(lambda r: r["score"] >= 50, rows))
names   = list(map(lambda r: r["name"].title(), passing))

# ['Alice', 'Carol']
  • filter() keeps items where the function returns True
  • map() transforms each item using the function
  • Both return lazy iterators - wrap in list() to materialise

map() and filter() are the canonical functional tools and you’ll encounter them regularly, especially in data code. They work by taking a function and an iterable, and applying the function to each element.

The lambda keyword creates a small anonymous function inline. lambda r: r["score"] >= 50 is equivalent to writing:

def is_passing(r):
    return r["score"] >= 50

Lambdas are convenient for short, throwaway functions. They become a problem when they get complex enough to need a name and a docstring - at that point, just write a proper function.

Both map() and filter() return lazy iterators in Python 3. They don’t compute anything until you ask for the values - by wrapping in list(), iterating in a loop, or consuming them some other way. This is memory-efficient for large datasets but can trip you up if you try to use the result twice without realising it’s been exhausted.

For most day-to-day work, list comprehensions are more readable than map()/filter(). But you’ll see map() and filter() throughout existing codebases, so you need to be able to read them.

Functions as First-Class Objects

In Python, functions are objects. You can pass them around like any other value.

def apply(func, values: list) -> list:
    return [func(v) for v in values]

def double(x):
    return x * 2

def square(x):
    return x ** 2

print(apply(double, [1, 2, 3]))  # [2, 4, 6]
print(apply(square, [1, 2, 3]))  # [1, 4, 9]

The same apply() function works with any transformation you hand it.

This is a subtle but powerful idea. In many languages, functions are a special construct that you define and call - but you can’t pass them around or store them in variables the way you can with data. In Python, a function is just an object like any other. You can assign it to a variable, put it in a list, pass it to another function, return it from a function.

This becomes enormously useful in data pipelines. Instead of writing a separate function for each transformation and calling each one explicitly, you can build a list of transformation functions and apply them in sequence:

pipeline = [str.strip, str.lower, lambda s: s.replace(" ", "_")]

def clean(value: str, steps: list) -> str:
    for step in steps:
        value = step(value)
    return value

clean("  Hello World  ", pipeline)  # 'hello_world'

The logic of what gets applied is separate from which transformations are applied. You can swap out, reorder, or extend the pipeline without touching clean().

Functional in Python - the practical takeaways

  • Write pure functions where you can - same input, same output, no side effects
  • Use list/dict/set comprehensions for filter and transform operations
  • Understand map() and filter() - you’ll read them even if you don’t write them
  • Functions are objects - you can pass them, store them, compose them
  • Push side effects (file writes, DB updates, API calls) to the edges of your code

You don’t need to go full Haskell. Just get the habits.


More advanced topics

Current topics

  • A collection of Python features, quirks and gotchas, and how to use and avoid them.

Just Python Things

This is the first article in a what will be a series about software development with python.

The next one, Just Python Things, is already out and covers how python works together with a collection of Python features, quirks and gotchas, and how to use and avoid them.

Future topics

  • Comparing procedural, OOP, and functional styles with a practical examples
  • Project setup, packaging, linting, formatting, and practical development tools
  • Unit testing, integration testing, and test-driven development
  • Design patterns and software architecture
  • Performance optimisation, load testing, and profiling
  • Logging, monitoring and debugging techniques

Future articles will cover a range of topics and practical skills that go beyond the basics of programming and Python syntax. The goal is to provide a roadmap for anyone who wants to level up from writing scripts to building maintainable, scalable software.

A clash of visions will explore the different programming paradigms - procedural, object-oriented, and functional - through a practical example. We’ll see how the same problem can be solved in each style, and discuss the trade-offs and when to choose one over the others.

Getting your hands dirty will be a deep dive into the practical tools and best practices for Python development. We’ll cover project setup, packaging, linting, formatting, and the essential tools that make development smoother and more efficient.

Trust but verify will focus on unit testing, integration testing, and test-driven development, ensuring your code is reliable and maintainable.

Building for larger systems will explore design patterns and software architecture, helping you structure your code for scalability and maintainability.

Watching the gears turn will cover performance optimisation, load testing, and profiling, as well as logging, monitoring, and debugging techniques.

Thank You

Questions? Let’s talk.

Get in touch →

Thanks for reading!

If you found this useful, please share it with your network. If you have any questions or suggestions for future topics, get in touch - I’d love to hear from you.