Software Development Fundamentals
10 Apr 2026 34 minute read

DISCLAIMER: This article was initially written as a set of slides for workshop and presentation purposes for educational use. It has changed over time and will likely continue to evolve.
Basics of software development with Python.
Programming is, at its most basic, the act of giving a computer a precise set of instructions. Then the computer will do exactly what you tell them, nothing more, nothing less. This makes programming both powerful and humbling.
A program is nothing more than a sequence of operations: read this, compute that, store this value, repeat until done. The challenge is expressing the solution to a real-world problem in a language rigid enough for a machine to execute unambiguously.
What makes it interesting - and frankly difficult - is that the problems we want computers to solve are rarely simple. And the gap between “what I want it to do” and “what I actually told it to do” is where most bugs are born. Worse even when “what I think I told it to do” is not “what I actually told it to do”.
When you write a Python script, you’re writing in a high-level language that abstracts away the details of how the computer actually executes your code. The Python interpreter takes care of translating your code into machine code that the CPU can execute.
Python itself is written in C, which is a lower-level language that compiles down to machine code. So when you run a Python program, you’re actually running a C program (the interpreter) that executes your Python code.
When you run a program, the operating system loads it into memory, allocates resources, and starts executing it. The CPU executes the machine code instructions one by one, performing calculations, reading/writing memory, and interacting with peripherals as needed.
The operating system (Windows, Linux, macOS) sits between your program and the hardware. It manages resources like memory and CPU time, handles input/output operations, and provides a layer of security and stability.
We don’t need to worry about those details most of the time, but it’s helpful to understand that there’s a lot going on under the hood when you run a program.
A simple script like the one in the slide might look like three lines of Python, but it translates into dozens of machine instructions that the CPU executes.
The exact machine code depends on the CPU architecture (x86, ARM, etc.) and the operating system, but the general idea is that your high-level code gets translated down to instructions that the hardware can execute.
Understanding this translation process helps you appreciate the power of programming languages and the abstractions they provide. It also gives you insight into performance considerations - for instance, why certain operations are faster than others, and how to write code that runs efficiently.
Some languages use a compiler to translate the code into machine code. For compiled languages it is done when the program is built and often generates a single file or binary that contains the entire translated programm. That file is what you then need to run. You don’t really run the high level code directly.
Language that use an interpreter however, translates the high level code to machine code when the program runs line by line.
Every layer of abstraction exists to hide complexity. Machine code is just numbers - sequences of bits that the CPU interprets as instructions. Nobody writes machine code by hand anymore.
- Assembly gives those numbers human-readable names.
- C gives you structured control flow without managing registers yourself.
- Python lets you write
sorted(items)without knowing - or caring - how the sort algorithm is implemented.
This is really awesome because now you can build sophisticated applications without understanding every layer beneath you.But the abstraction has a cost: performance.
Python is orders of magnitude slower than C for raw computation. For most applications this doesn’t matter. For real-time systems like trading platforms, games, or anything that needs to crunch numbers at scale, it matters enormously.
Python is one of the most popular programming languages in the world, and for good reason. It’s a versatile language that can be used for everything from web development to scientific computing. Its syntax is clean and easy to read, which makes it an excellent choice for beginners.
It has a huge ecosystem of libraries and frameworks that allow you to do almost anything without reinventing the wheel. Whether you’re building a web app with Django, analyzing data with Pandas, or automating tasks with scripts, Python has you covered.
It’s really easy to setup prototypes, scripts, and small projects in Python. You can get started with just a text editor and the command line. This makes it ideal for learning programming concepts without getting bogged down in complex setup or syntax.
Python has a reputation for being an excellent first language, and for good reason. Its syntax is clean and intuitive, making it easier to learn programming concepts without getting stuck and bogged down in complex syntax.
Fundamentals
age = 31, you’re creating a variable named age that references the integer value 31. When you later write age = age + 1, you’re creating a new integer value 32 and updating the reference of age to point to that new value.All variables in Python are references to objects in memory. The variable itself doesn’t hold the value directly; it points to an object that holds the value.
Python has a built in method for checking the memory address of the object a variable references, which is id().
Data types are how most languages categorize values. They determine what kind of operations you can perform on a value, and how much memory it takes up.
Python is dynamically typed, which means you don’t have to declare the type of a variable, it figures it out at when the program runs. This makes it very flexible, but it also means you have to be careful about what kind of data you’re working with, because the type can change at runtime.
Different languages have different sets of data types, and some languages allow you to define your own custom types. In Python, you can create your own classes to define new types of objects with their own properties and methods.
Python is dynamically typed, often called “duck typed”. This means that the type of a variable is inferred at run time.
For example, if you write x = 42, Python will create an integer object with the value 42 and make x reference that object. If you later write x = "hello", Python will create a new string object with the value "hello" and update x to reference that new object. The type of x changed from int to str without any explicit declaration.
>>> x = 42
>>> type(x)
<class 'int'>
>>> x = "hello"
>>> type(x)
<class 'str'>
As of Python 3.5, you can use optional type hints to indicate what type a variable is expected to be, but these are not enforced at runtime. They’re mainly for documentation and tooling purposes and are not required or enforced by the language.
42 carries metadata - a reference count for garbage collection, a pointer to its type, and the value itself. This is why Python types use more memory than you’d expect coming from languages like C. It’s also why mutable and immutable types behave the way they do. We cover this in more detail in another workshop on Python and its features.Different data types support different operations. + means addition for numbers, concatenation for strings, and merging for lists. But + on two dicts raises a TypeError - dicts don’t define that operation.
The rules are defined per type. When you hit a TypeError, it’s Python telling you that the operation isn’t defined for the types you’re combining. We look at the edge cases in the memory deep-dive in another article.
This is one of those things that trips people up constantly, especially if you’re coming from a background where you haven’t had to think much about what happens in memory when you hand a variable to a function.
Python doesn’t pass by value (like C does), and it doesn’t pass by reference in the classic sense either. It passes by object reference - which means you’re handing the function a reference to the same object in memory. Whether that matters depends entirely on whether the object can be changed.
For immutable types like integers and strings, this distinction is almost invisible. You can’t modify the object, so Python just creates a new one when you try, and the original is untouched. For mutable types like lists and dictionaries - watch out. You’re handing someone the keys to the car, not a photocopy of them.
The difference comes down to mutability. When you write b = a for a list, you’re not copying the list - you’re copying the reference. Both a and b now point to the exact same object in memory. Modify through one, and you’ll see the change through the other.
With integers, this initially looks the same - x and y briefly point to the same object. But the moment you do y = y + 1, Python can’t modify the integer 42 in place (it’s immutable), so it creates a brand new integer 43 and makes y point to that instead. x is left alone.
This is why two variables pointing to the same list is called aliasing, and why it causes some of the most confusing bugs beginners (and experienced developers who stopped paying attention) run into.
When you pass a list to a function, you’re passing the reference. The function works on the same object that exists in the caller’s scope. That’s why shopping changes even though the function never returned anything.
This is useful - it means you can modify large data structures in place without the overhead of copying them. But it also means you can accidentally destroy data that you didn’t intend to touch. The function signature says nothing about this. There’s no warning label.
Data engineers hit this one all the time with Pandas DataFrames. You pass a DataFrame into a function, the function does some transformation, and suddenly the DataFrame outside the function has also changed. Or hasn’t. Depending on the operation. It’s a mess.
If you want to pass a mutable type but protect the original, make an explicit copy:
Now the caller’s list is untouched, and the function returns a new one. More predictable, easier to test, fewer surprises at 3am.
Immutable types are safe to pass around without worrying about side effects. Inside the function, n starts as a reference to the same integer object as value.
The moment you do n = n * 2, Python creates a new integer and rebinds n to it locally. The original object that value points to is never touched.
This is why strings, integers, and tuples are “safe” to pass into functions - you can’t accidentally mutate them from inside the function. You can only return a new value.
Control flow is how a program makes decisions and repeats work. Without it, every program would just execute a list of statements from top to bottom, once, and stop. Not particularly useful.
The if/elif/else chain is the most basic form. Python uses indentation to define blocks - no curly braces. This is either elegant or infuriating depending on your background, and most people land somewhere in the middle after a while.
for loops in Python iterate over any iterable - lists, strings, dictionaries, ranges, file lines, database rows. If it can be iterated, you can loop over it. This is more flexible than the classic for (int i = 0; i < n; i++) style, and you’ll find yourself using it constantly.
while loops run as long as a condition is true. Always have a clear exit condition - an infinite loop is usually a bug, not a feature. break jumps out of the loop immediately. continue skips the rest of the current iteration and moves to the next one.
Functions are the primary unit of reuse in programming. Instead of copying the same block of code in three places and then having to fix a bug in three places, you write it once, give it a name, and call it.
The difference between a parameter and an argument is one of those things people use interchangeably, and it doesn’t actually matter much in practice - but to be precise: parameters are the names in the function definition, arguments are the values you pass when you call it.
Python returns None implicitly if you don’t have a return statement. This is a common source of bugs - you call a function expecting a value back, get None, and then something downstream blows up in a confusing way.
Type hints like name: str and -> str are not enforced by Python at runtime. But they are enormously useful for readability, for tooling (your editor will warn you when you pass the wrong type), and for anyone who has to read your code six months later, including you. Use them.
Keep functions small and focused. One job per function. If you find yourself thinking “this function does X and also Y”, that’s two functions. This isn’t dogma - it’s just practical. Small functions are easier to test, easier to name, easier to reason about, and easier to replace when requirements change.
Functions are great for organising behaviour. But sometimes you also need to bundle state together with that behaviour - data and the operations on it belong together, and passing them around separately gets messy fast.
That’s the core idea behind a class. A class defines what an object looks like (its attributes) and what it can do (its methods). Every time you create an instance of a class, you get a fresh, independent object with its own copy of that state.
__init__ is the constructor - it runs automatically when you create a new instance. self is just a reference to the specific object being operated on. It’s not magic; it’s just the first argument Python passes to every method automatically.
If you’ve worked with Pandas, you’ve been using objects all along. A DataFrame is a class. When you do df = pd.DataFrame(data), you’re creating an instance. When you call df.head() or df.dropna(), you’re calling methods on that instance. You just didn’t have to write the class yourself.
This is the key point. c1 and c2 are two separate objects. They were both created from the same Counter class, but each has its own count attribute. Incrementing c1 has absolutely no effect on c2.
This is what “encapsulation” means in practice - each object owns and manages its own state. The outside world interacts with it through methods, not by reaching in and poking the data directly.
Compare this to passing a bare integer around between functions. The moment you have more than one counter, or the counter needs a name, or a max value, or a reset policy - you’re adding parameters to every function that touches it. A class keeps all of that together in one place.
Inheritance lets one class build on another. Dog gets everything Animal defines, and adds or overrides what it needs. This is useful for sharing common behaviour without duplicating it.
That said - inheritance is one of those things that looks elegant in toy examples and becomes a liability in real codebases. Deep inheritance chains are hard to follow, hard to change, and the source of plenty of subtle bugs. Keep it shallow. If you find yourself thinking “I need a SpecialSubclassOfSubclass”, that’s usually a sign to step back and reconsider the design.
For now, knowing that it exists and understanding the basic mechanics is enough.
Programming Paradigms
The programming language you use shapes how you think about the problem. Each paradigm reflects a different mental model for decomposing and solving problems.
Procedural languages are closest to how the hardware actually works - you tell the machine what to do, in order. Think of it as a recipe. C is the canonical example, and understanding procedural code gives you a solid foundation for everything else.
Object-Oriented Programming (OOP) groups related data and behaviour together into objects. A Car object has properties (speed, colour) and methods (accelerate(), brake()). This maps well to how humans naturally model the world, which is why OOP dominated enterprise software for decades.
Functional programming treats computation as the evaluation of mathematical functions. It avoids changing state and mutable data. The benefits are real - pure functions are easier to test and reason about - but the mental shift can be steep if you’re coming from OOP.
Logic programming is a weird one - you define facts and rules, and the engine figures out how to apply them to answer questions. I haven’t ever read into it much but I know it exists.
When I first started learning Python, after having cut my teeth on .net languages like delphi, c++ and C#, it felt somehow “looser”, faster, messier, and more fun. It was like I’ve only ever been able to write in cursive, with strict rules and structure and everything had to be just right or else you’d get in trouble. And then suddenly I could write in print, and I could doodle and fuck around. I could make a mess and my mess would run, and then fail.
Initially Python was designed as a procedural scripting language, but over time it has evolved to support multiple paradigms. You can write procedural code in Python, but you can also use classes and objects for OOP, or leverage functional programming features like first-class functions and list comprehensions.
This flexibility is one of Python’s greatest strengths, but it allows everyone to create a unique pile of messy code that only a mother could love. And probably not even then. And not to mention ever being able to read and maintain it.
It’s important to understand the core concepts of programming first, and then explore how Python implements those concepts in different ways.
This is procedural programming. Step-by-step instructions, executed from top to bottom. No classes, no frameworks, no ceremony. Just functions that do one thing, called in order.
Most people writing data scripts, analysis notebooks, or quick ETL pipelines are writing procedural code, whether they call it that or not. It’s the most natural way to think about a problem: first do this, then do that, then do the other thing.
For this kind of task - load a file, transform it, write it out - procedural code is completely appropriate. There’s no need to reach for anything more complex. A few well-named functions and it’s readable, maintainable, and easy to follow.
The problems start when “a few functions” becomes “a hundred functions”, the script grows to a thousand lines, and the logic starts weaving between them in ways that are hard to follow without running it in your head first.
And even for the original author, it becomes a problem. Try explaining a bug in that code to someone else. You’re describing a maze that only exists in your head, in words, to someone who has never seen it. It doesn’t work.
This isn’t a character flaw. It’s a natural consequence of code that grew organically without structure. Every addition made sense at the time. The problem is that procedural code has no natural boundaries - nothing stops functions from growing, nothing enforces where state lives, and nothing prevents two parts of the script from depending on each other in ways that aren’t obvious from reading either one in isolation.
This is where OOP and functional patterns start to earn their keep - not as intellectual exercises, but as practical tools for drawing boundaries around complexity before it draws them around you.
Notice what changed. The logic is largely the same - load, filter, write. But now there’s state that belongs to this specific pipeline run: which file it’s reading, where it’s writing, how many rows it processed, what errors it encountered.
In the procedural version, you’d have to pass that state around as parameters or store it in global variables. Both options get messy fast. The class keeps it in one place, attached to the object that owns it.
The other thing you get is the ability to create multiple independent instances:
orders = CsvPipeline("orders.csv", "orders_out.csv")
customers = CsvPipeline("customers.csv", "customers_out.csv")
orders.run()
customers.run()
print(orders.summary()) # Processed 1420 rows, 3 errors
print(customers.summary()) # Processed 892 rows, 0 errors
Each pipeline has its own state. They don’t interfere with each other. You didn’t have to thread rows_processed through every function call, and you didn’t have to worry about one pipeline’s error list contaminating the other’s.
This is the practical argument for OOP - not that it’s more elegant in theory, but that it stops you from having to invent increasingly awkward ways to carry state through a procedural programme as it grows.
You don’t need to adopt functional programming wholesale to benefit from it. The core idea - pure functions with no side effects - is useful regardless of what paradigm you’re working in.
A pure function is predictable. You can call it anywhere, pass it any input, and it will always behave the same way. You can test it in complete isolation. You can run it in parallel without worrying about race conditions. You can compose it with other functions without surprises.
The results.append() example is the kind of thing that looks harmless and causes headaches later. The function’s behaviour now depends on the state of results before it was called, and calling it twice produces different results. That’s the kind of subtle dependency that makes code hard to test and hard to reason about.
This doesn’t mean you can never modify state - real programs have to write to files, update databases, send messages. The idea is to push those side effects to the edges of your code, and keep the logic in the middle as pure as possible.
List comprehensions are the most used functional feature in Python by a wide margin, and most people who write them don’t think of them as “functional” - they just think of them as the Pythonic way to build a list.
The pattern is: [expression for item in iterable if condition]. The condition is optional. The expression can be anything - a value, a function call, a method, a transformation.
They’re not always the right choice. If the logic inside gets complex enough that you need to squint to read it, write it as a loop. Readability is not a luxury. But for simple filter-and-transform operations, comprehensions are cleaner than the equivalent loop and explicit append.
You can also write dict comprehensions ({k: v for k, v in ...}) and set comprehensions ({x for x in ...}) using the same syntax.
map() and filter() are the canonical functional tools and you’ll encounter them regularly, especially in data code. They work by taking a function and an iterable, and applying the function to each element.
The lambda keyword creates a small anonymous function inline. lambda r: r["score"] >= 50 is equivalent to writing:
def is_passing(r):
return r["score"] >= 50
Lambdas are convenient for short, throwaway functions. They become a problem when they get complex enough to need a name and a docstring - at that point, just write a proper function.
Both map() and filter() return lazy iterators in Python 3. They don’t compute anything until you ask for the values - by wrapping in list(), iterating in a loop, or consuming them some other way. This is memory-efficient for large datasets but can trip you up if you try to use the result twice without realising it’s been exhausted.
For most day-to-day work, list comprehensions are more readable than map()/filter(). But you’ll see map() and filter() throughout existing codebases, so you need to be able to read them.
This is a subtle but powerful idea. In many languages, functions are a special construct that you define and call - but you can’t pass them around or store them in variables the way you can with data. In Python, a function is just an object like any other. You can assign it to a variable, put it in a list, pass it to another function, return it from a function.
This becomes enormously useful in data pipelines. Instead of writing a separate function for each transformation and calling each one explicitly, you can build a list of transformation functions and apply them in sequence:
pipeline = [str.strip, str.lower, lambda s: s.replace(" ", "_")]
def clean(value: str, steps: list) -> str:
for step in steps:
value = step(value)
return value
clean(" Hello World ", pipeline) # 'hello_world'
The logic of what gets applied is separate from which transformations are applied. You can swap out, reorder, or extend the pipeline without touching clean().
More advanced topics
This is the first article in a what will be a series about software development with python.
The next one, Just Python Things, is already out and covers how python works together with a collection of Python features, quirks and gotchas, and how to use and avoid them.
Future articles will cover a range of topics and practical skills that go beyond the basics of programming and Python syntax. The goal is to provide a roadmap for anyone who wants to level up from writing scripts to building maintainable, scalable software.
A clash of visions will explore the different programming paradigms - procedural, object-oriented, and functional - through a practical example. We’ll see how the same problem can be solved in each style, and discuss the trade-offs and when to choose one over the others.
Getting your hands dirty will be a deep dive into the practical tools and best practices for Python development. We’ll cover project setup, packaging, linting, formatting, and the essential tools that make development smoother and more efficient.
Trust but verify will focus on unit testing, integration testing, and test-driven development, ensuring your code is reliable and maintainable.
Building for larger systems will explore design patterns and software architecture, helping you structure your code for scalability and maintainability.
Watching the gears turn will cover performance optimisation, load testing, and profiling, as well as logging, monitoring, and debugging techniques.
Thanks for reading!
If you found this useful, please share it with your network. If you have any questions or suggestions for future topics, get in touch - I’d love to hear from you.