Refactoring with GitHub Copilot: A Developer’s Perspective

Posted on 28th Jan 2025 by Rodrigo Silva

Refactoring is like tidying up your workspace — it’s not glamorous, but it makes everything easier to work with. It’s the art of changing your code without altering its behavior, focusing purely on making it cleaner, more maintainable, and easier for developers (current and future) to understand. And in this day and age, we have a nifty assistant to make this process smoother: GitHub Copilot.

In this post, I’ll walk you through how GitHub Copilot can assist with refactoring, using a few straightforward examples in JavaScript. Whether you’re consolidating redundant code, simplifying complex logic, or breaking apart monolithic functions, Copilot can help you identify patterns, suggest improvements, and even write some of the boilerplate for you.

Starting Simple: Merging Redundant Functions

Let’s start with a basic example of refactoring to warm up. Imagine you’re handed a file with two nearly identical functions:

function foo() {
  console.log("foo");
}

function bar() {
  console.log("bar");
}

foo();
bar();

At first glance, there’s nothing technically wrong here — the code works fine, and the output is exactly as expected:

foo
bar

But as developers, we’re trained to spot redundancy. These functions have similar functionality; the only difference is the string they log. This is a great opportunity to refactor.

Here’s where Copilot comes into play. Instead of manually typing out a new consolidated function, I can prompt Copilot to assist by starting with a more generic structure:

function displayString(message) {
  console.log(message);
}

With Copilot’s suggestion for the function and a minor tweak to the calls, our refactored code becomes:

function displayString(message) {
  console.log(message);
}

displayString("foo");
displayString("bar");

The output remains unchanged:

foo
bar

But now, instead of maintaining two functions, we have one reusable function. The file size has shrunk, and the code is easier to read and maintain. This is the essence of refactoring — the code’s behavior doesn’t change, but its structure improves significantly.

Refactoring for Scalability: From Hardcoding to Dynamic Logic

Now let’s dive into a slightly more involved example. Imagine you’re building an e-commerce platform, and you’ve written a function to calculate discounted prices for products based on their category:

function applyDiscount(productType, price) {
  if (productType === "clothing") {
    return price * 0.9;
  } else if (productType === "grocery") {
    return price * 0.8;
  } else if (productType === "electronics") {
    return price * 0.85;
  } else {
    return price;
  }
}

console.log(applyDiscount("clothing", 100)); // 90
console.log(applyDiscount("grocery", 100));  // 80

This works fine for a few categories, but imagine the business adds a dozen more. Suddenly, this function becomes a maintenance headache. Hardcoding logic is fragile and hard to extend. Time for a refactor.

Instead of writing this logic manually, I can rely on Copilot to help extract the repeated logic into a reusable structure. I start by typing the intention:

function getDiscountForProductType(productType) {
  const discounts = {
    clothing: 0.1,
    grocery: 0.2,
    electronics: 0.15,
  };

  return discounts[productType] || 0;
}

Here, Copilot automatically fills in the logic for me based on the structure of the original function. Now I can refactor applyDiscount to use this helper function:

function applyDiscount(productType, price) {
  const discount = getDiscountForProductType(productType);
  return price - price * discount;
}

The behavior is identical, but the code is now modular, readable, and easier to extend. Adding a new category no longer requires editing a series of else if statements; I simply update the discounts object.

Refactoring with an Eye Toward Extensibility

A good refactor isn’t just about shrinking code — it’s about making it easier to extend in the future. Let’s add another layer of complexity to our discount example. What if we need to display the discount percentage to users, not just calculate the price?

Instead of writing separate hardcoded logic for that, I can reuse the getDiscountForProductType function:

function displayDiscountPercentage(productType) {
  const discount = getDiscountForProductType(productType);
  return `${discount * 100}% off`;
}

console.log(displayDiscountPercentage("clothing")); // "10% off"
console.log(displayDiscountPercentage("grocery"));  // "20% off"

By structuring the code this way, we’ve separated concerns into clear, modular functions:

• getDiscountForProductType handles the core data logic.

• applyDiscount uses it for price calculation.

• displayDiscountPercentage uses it for user-facing information.

With Copilot, this process becomes even faster — it anticipates repetitive patterns and can suggest these refactors before you even finish typing.

Code Smells: Sniffing Out the Problems in Your Codebase

If refactoring is the process of cleaning up your code, then code smells are the whiff of trouble that alerts you something isn’t quite right. A code smell isn’t necessarily a bug or an error—it’s more like that subtle, lingering odor of burnt toast in the morning. The toast is technically edible, but it might leave a bad taste in your mouth. Code smells are signs of potential problems, areas of your code that might function perfectly fine now but could morph into a maintenance nightmare down the line.

One classic example of a code smell is the long function. Picture this: you open a file and are greeted with a function that stretches on for 40 lines or more, with no break in sight. It might validate inputs, calculate prices, apply discounts, send emails, and maybe even sing “Happy Birthday” to the user if it has time. Sure, it works, but every time you come back to it, you feel like you’re trying to untangle Christmas lights from last year. This is not a good use of anyone’s time.

Let’s say you have a function in your e-commerce application that processes an order. It looks something like this:

function processOrder(order) {
  if (!validateOrder(order)) {
    return { success: false, error: "Invalid order" };
  }

  const totalPrice = calculateTotalPrice(order);
  const shippingCost = applyShipping(totalPrice);
  const finalPrice = totalPrice + shippingCost;

  sendOrderNotification(order);

  return { success: true, total: finalPrice };
}

Now, this is fine for a small project. It’s straightforward, gets the job done, and even has some comments in case your future self forgets what you were doing. But here’s the thing: this function is doing too much. It’s responsible for validation, pricing, shipping, and notifications, which are all distinct responsibilities. And if you were to write unit tests for this function, you’d quickly realize the pain of having to mock all these operations in one giant monolithic test.

Refactoring is the natural response to a code smell like this. The first step? Take a deep breath and start breaking things down. You could extract the validation logic, for example, into a separate function:

function validateOrder(order) {
  // Validation logic
  return order.items && order.items.length > 0;
}

With that in place, the processOrder function becomes simpler and easier to read:

function processOrder(order) {
  if (!validateOrder(order)) {
    return { success: false, error: "Invalid order" };
  }

  const totalPrice = calculateTotalPrice(order);
  const shippingCost = applyShipping(totalPrice);
  const finalPrice = totalPrice + shippingCost;

  sendOrderNotification(order);

  return { success: true, total: finalPrice };
}

That’s the beauty of refactoring—it’s like untangling those Christmas lights one loop at a time. The functionality hasn’t changed, but you’ve cleared up the clutter, making it easier for yourself and others to reason about the code.

Refactoring Strategies: Making the Codebase a Better Place

Refactoring is more than just cleaning up code smells. It’s about thinking strategically, looking at the long-term health of your codebase, and asking yourself, “How can I make this code easier to understand and extend?”

One of the most satisfying refactoring strategies is composing methods—taking large, unwieldy functions and breaking them into smaller, single-purpose methods. The processOrder example above is just the beginning. You can keep going by breaking out more logic, like the price calculation:

function calculateTotalPrice(order) {
  return order.items.reduce((total, item) => total + item.price, 0);
}

function applyShipping(totalPrice) {
  return totalPrice > 50 ? 0 : 5;
}

Each of these smaller functions has one responsibility and is easier to test in isolation. If the shipping rules change tomorrow, you only need to touch the applyShipping function, not the entire processOrder logic. This approach doesn’t just make your life easier—it creates code that can adapt to change without a cascade of unintended consequences.

Another common refactoring strategy is removing magic numbers—those cryptic constants that are scattered throughout your code like tiny landmines. Numbers like 50 in the shipping calculation or 0.9 in the discount example might make sense to you now, but future-you (or your poor colleague) will have no idea why they were chosen. Instead, extract them into meaningful constants:

const FREE_SHIPPING_THRESHOLD = 50;

function applyShipping(totalPrice) {
  return totalPrice > FREE_SHIPPING_THRESHOLD ? 0 : 5;
}

Now the intent is clear, and the code is easier to maintain. If the free shipping threshold changes to 60, you know exactly where to update it.

The Art of Balancing Refactoring with Reality

Here’s the thing about refactoring: it’s not just about following rules or tidying up for the sake of it. It’s about balancing effort and benefit. Not every piece of messy code is worth refactoring, and not every refactor is worth the time it takes. This is where tools like GitHub Copilot come into play.

Copilot doesn’t just suggest code—it suggests possibilities. You can ask it questions like, “How can I make this code easier to extend?” or “What parts of this file could be refactored?” and it will provide ideas. Sometimes those ideas are spot on, like extracting a repetitive block of logic into a helper function. Other times, Copilot might miss the mark or suggest something you didn’t need—but that’s part of the process. You’re still the one in charge.

One of the most valuable things Copilot can do is help you spot patterns in your codebase. Maybe you didn’t realize you’ve written the same validation logic in three different places. Maybe it points out that your processOrder function could benefit from splitting responsibilities into separate classes. These suggestions save you time and let you focus on the bigger picture: writing code that is clean, clear, and maintainable.

The Art of Refactoring: Simplifying Complexity with Clean Code and Design Patterns

As codebases grow, they tend to become like overgrown gardens—what started as neat and tidy often spirals into a chaotic mess of tangled logic and redundant functionality. This is where the true value of refactoring lies: it’s the art of pruning that overgrowth to reveal clean, elegant solutions without altering the functionality. But how do we take a sprawling codebase and turn it into something manageable? How do we simplify functionality, adopt clean code principles, and apply design patterns to improve both the current and future state of the code? Let’s dive in.

Simplifying Functionality: A Journey from Chaos to Clarity

Imagine you’re maintaining a large JavaScript application, and you stumble upon a class that handles blog posts. The class is tightly coupled to an Author class, accessing its properties directly to format author details for display. At first glance, it works fine, but this coupling is a ticking time bomb. The BlogPost class has a bad case of feature envy—it’s way too interested in the internals of the Author class. This isn’t just a code smell; it’s an opportunity to refactor.

Initially, you might be tempted to move the logic for formatting author details into a new method inside the Authorclass. That’s a solid first step:

class Author {
  constructor(name, bio) {
    this.name = name;
    this.bio = bio;
  }

  getFormattedDetails() {
    return `${this.name} - ${this.bio}`;
  }
}

class BlogPost {
  constructor(author, content) {
    this.author = author;
    this.content = content;
  }

  display() {
    return `${this.author.getFormattedDetails()}: ${this.content}`;
  }
}

Here, the getFormattedDetails method centralizes the responsibility of formatting author details inside the Author class. While this improves the code, it still assumes a single way to display author details, which can become limiting if the requirements change.

To simplify further and prepare for future flexibility, you might introduce a dedicated display class:

class AuthorDetailsFormatter {
  format(author) {
    return `${author.name} - ${author.bio}`;
  }
}

class BlogPost {
  constructor(author, content, formatter) {
    this.author = author;
    this.content = content;
    this.formatter = formatter;
  }

  display() {
    return `${this.formatter.format(this.author)}: ${this.content}`;
  }
}

By separating the formatting logic into its own class, you’ve decoupled the blog post from the author’s internal representation. Now, if a new formatting requirement arises—say, displaying the author’s details as JSON—you can create a new formatter class without touching the BlogPost or Author classes. This approach embraces the Single Responsibility Principle, one of the core tenets of clean code.

Refactoring with Clean Code Principles

At the heart of refactoring lies the philosophy of clean code, a set of principles that guide developers toward clarity, simplicity, and maintainability. Clean code isn’t just about making things pretty; it’s about making the code easier to read, understand, and extend. A few core principles of clean code shine during refactoring:

Readable Naming Conventions

Naming is one of the hardest parts of coding, and yet it’s one of the most important. Names like doStuff or processmight make sense when you write them, but six months later, they’re as opaque as a foggy morning. During refactoring, take the opportunity to rename variables, functions, and classes to better describe their purpose. For instance:

// Before refactoring
function calc(num, isVIP) {
  if (isVIP) return num * 0.8;
  return num * 0.9;
}

// After refactoring
function calculateDiscount(price, isVIP) {
  const discountRate = isVIP ? 0.2 : 0.1;
  return price * (1 - discountRate);
}

Avoiding Magic Numbers

Numbers like 0.8 or 0.9 might mean something to you now, but they’ll confuse future readers. Extract them into meaningful constants:

const VIP_DISCOUNT = 0.2;
const REGULAR_DISCOUNT = 0.1;

function calculateDiscount(price, isVIP) {
  const discountRate = isVIP ? VIP_DISCOUNT : REGULAR_DISCOUNT;
  return price * (1 - discountRate);
}

Minimizing Conditionals

Nested conditionals are a prime candidate for refactoring. Instead of deep nesting, consider a lookup table:

const discountRates = {
  regular: 0.1,
  vip: 0.2,
};

function calculateDiscount(price, customerType) {
  const discountRate = discountRates[customerType] || 0;
  return price * (1 - discountRate);
}

This approach not only simplifies the code but also makes it easier to add new customer types in the future.

Design Patterns: The Backbone of Robust Refactoring

Refactoring is also an opportunity to introduce design patterns, reusable solutions to common problems that improve the structure and clarity of your code. For example:

In the blog post example, the formatting logic was moved to a dedicated class. But what if you need multiple formatting strategies? Enter the Strategy Pattern:

class JSONFormatter {
  format(author) {
    return JSON.stringify({ name: author.name, bio: author.bio });
  }
}

class TextFormatter {
  format(author) {
    return `${author.name} - ${author.bio}`;
  }
}

// BlogPost remains unchanged

With this pattern, adding a new formatting style is as simple as creating another formatter class.

When creating complex objects, the Factory Pattern can streamline object instantiation. For example, if your BlogPostneeds an appropriate formatter based on the context, a factory can help:

class FormatterFactory {
  static getFormatter(formatType) {
    switch (formatType) {
      case "json":
        return new JSONFormatter();
      case "text":
        return new TextFormatter();
      default:
        throw new Error("Unknown format type");
    }
  }
}

Objectives and Advantages of Refactoring

At its core, refactoring aims to achieve two things:

Make the code easier to understand: Clear code leads to fewer bugs and faster development.
Make the code easier to extend: Flexible code lets you adapt to new requirements with minimal changes.

The advantages go beyond just clean aesthetics:

Reduced technical debt: Refactoring prevents small problems from snowballing into major issues.
Improved collaboration: Clean, readable code is easier for teams to work with.
Better performance: Streamlined logic often results in faster execution.
Future-proofing: Decoupled, modular code is better equipped to handle future changes.

Harnessing the Power of GitHub Copilot for Refactoring: Strategies, Techniques, and Best Practices

Refactoring is a developer’s silent crusade—an endeavor to bring clarity and elegance to code that’s grown unruly over time. And while the craft of refactoring has always been a manual, often meditative process, GitHub Copilot introduces a new ally into the mix. It’s like having a seasoned developer looking over your shoulder, suggesting improvements, and catching things you might miss. But as with any powerful tool, knowing how to wield it effectively is key to maximizing its benefits.

When embarking on a refactoring journey with Copilot, the first step is always understanding your codebase. Before you even type a single keystroke, take a moment to navigate the existing code. What are its pain points? Where does complexity lurk? Identifying these areas is crucial because, like any AI, Copilot is only as good as the questions you ask it.

Let’s say you’re working on a function that calculates the total price of items in a shopping cart:

function calculateTotal(cart) {
  let total = 0;
  for (let i = 0; i < cart.length; i++) {
    if (cart[i].category === "electronics") {
      total += cart[i].price * 0.9;
    } else if (cart[i].category === "clothing") {
      total += cart[i].price * 0.85;
    } else {
      total += cart[i].price;
    }
  }
  return total;
}

This function works, but it’s a bit clunky. Multiple if-else conditions make it hard to add new categories or change existing ones. A great prompt to Copilot would be:

“Refactor this function to use a lookup table for category discounts.”

Copilot might suggest something like this:

const discountRates = {
  electronics: 0.1,
  clothing: 0.15,
};

function calculateTotal(cart) {
  return cart.reduce((total, item) => {
    const discount = discountRates[item.category] || 0;
    return total + item.price * (1 - discount);
  }, 0);
}

With this refactor, the function is now leaner, easier to extend, and more expressive. The original logic is preserved, but the structure is improved—a classic example of effective refactoring.

Techniques for Effective Refactoring with Copilot

Identifying Code Smells with Copilot

One of the underrated features of Copilot is its ability to identify code smells on demand. Ask it directly:

“Are there any code smells in this function?”

Copilot might highlight duplicated logic, overly complex conditionals, or potential performance bottlenecks. It’s like having a pair of fresh eyes every time you revisit your code.

Simplifying Conditionals and Loops

Complex conditionals and nested loops are ripe for refactoring. If you present a nested loop or a deep conditional to Copilot and ask:

“How can I simplify this logic?”

Copilot can suggest converting nested conditionals into a strategy pattern, or refactoring loops into higher-order functions like map, filter, or reduce. The result? Code that is not only more concise but also easier to read and maintain.

For example, converting a nested loop into a more functional approach:

// Before
for (let i = 0; i < orders.length; i++) {
  for (let j = 0; j < orders[i].items.length; j++) {
    console.log(orders[i].items[j].name);
  }
}

// After using Copilot's suggestion
orders.flatMap(order => order.items).forEach(item => console.log(item.name));

Removing Dead Code

Dead code is like that box in your attic labeled “Miscellaneous” — you don’t need it, but it’s still there. By asking Copilot:

“Is there any dead code in this file?”

It can point out unused variables, redundant functions, or logic that never gets executed. Cleaning this up not only reduces the file size but also makes the codebase easier to navigate.

Refactoring Strategies and Best Practices with Copilot

Refactoring isn’t just about changing code; it’s about changing code wisely. Here are some strategies to guide your use of Copilot:

Start Small, Think Big

Begin with minor improvements. Change a variable name, simplify a function, or remove a bit of duplication. Use Copilot to suggest these micro-refactors. Over time, these small changes compound, leading to a more maintainable codebase.

Keep it Testable

Refactoring without tests is like renovating a house without checking the foundation. Before refactoring, ensure you have tests in place. If not, use Copilot to generate basic tests:

“Generate unit tests for this function.”

Once tests are in place, refactor with confidence, knowing that any unintended behavior changes will be caught.

Use Design Patterns When Appropriate

Refactoring often reveals opportunities to introduce design patterns like Singleton, Factory, or Observer. Ask Copilot:

“Refactor this into a Singleton pattern.”

It can scaffold the structure, and you can then refine it to fit your needs. Design patterns not only organize your code better but also make it easier for other developers to understand the architecture at a glance.

Document the Refactor

Every significant refactor deserves a comment or a commit message explaining the change. This isn’t just for others—it’s for you, too, six months down the line when you’re wondering why you made a change. Use Copilot to draft these messages:

“Draft a commit message explaining this refactor.”

The Advantages of Refactoring with Copilot

Efficiency Boost

Refactoring, while necessary, can be time-consuming. Copilot accelerates the process by suggesting improvements and generating boilerplate code.

Learning and Mentorship

Copilot acts as a mentor, introducing you to best practices and modern JavaScript idioms you might not have discovered otherwise. It’s a way to learn by doing, with an intelligent assistant guiding the way.

Improved Code Quality

With Copilot’s help, you can consistently apply clean code principles, reduce technical debt, and enhance the overall quality of your codebase.

Enhanced Collaboration

Refactored code is easier for others to read and extend. A cleaner codebase fosters better collaboration and reduces onboarding time for new team members.

The Journey of Continuous Improvement

Refactoring with GitHub Copilot is a journey, not a destination. Each suggestion, each refactor, and each test is a step toward cleaner, more maintainable code. By integrating clean code principles, embracing design patterns, and leveraging Copilot’s AI-driven insights, you not only improve the current state of your code but also pave the way for a more robust and flexible future.

So, as you embark on your next refactor, invite Copilot to the table. Let it help you think critically about your code, suggest improvements, and enhance your productivity. Because at the end of the day, refactoring isn’t just about code—it’s about crafting a better experience for every developer who walks through the door after you.

Unraveling the Data Science, Machine Learning, AI, and Generative AI terminology: A Practical, No-Nonsense Guide

Posted on 23rd Jan 2025 by Rodrigo Silva

We often hear the buzzwords—Data Science, Machine Learning, AI, Generative AI—used interchangeably. Yet each one addresses a different aspect of how we handle, analyze, and leverage data. Whether you’re aiming to build predictive models, generate human-like text, or glean insights to drive business decisions, understanding the core concepts can be transformative. My goal here is to draw clear lines between these often-overlapping fields, helping us see how each fits into the bigger picture of turning data into something genuinely impactful. This is a vast and deep field… we’ll just scratch the surface.

Data Science: The Foundation and Bedrock

Data Science encompasses the methods and processes by which we extract insights from raw information. Think of it as the overarching discipline that ties together a blend of mathematics, programming, domain expertise, and communication. Data science sets the overall framework. Without robust data science practices, advanced models and analytics can be built on shaky or low-quality data. Its holistic approach—spanning from collection to interpretation—acts as the springboard for more specialised disciplines like machine learning and AI.

Data Collection

Data collection is the process of gathering data from diverse sources: databases, APIs, logs, spreadsheets, different types of documents, emails or even IoT devices.

Data Wrangling and Cleaning

After collection, we need to fix inconsistencies, handle missing values, and reshape data for analysis.

Exploratory Data Analysis (EDA)

We start exploring the data by generating initial statistics, histograms, or correlation plots to understand patterns. For example, noticing that sales spike during certain temperature ranges might prompt further investigation.

Statistical Modelling and Visualisation

Working on the data, we might use regression, clustering, or significance tests to draw conclusions. One example is building a time-series model to forecast future product demand, then visualising the results for stakeholders.

Communication of Insights

We aim to tell the story behind the numbers. That’s what makes them useful. For instance, we might present a heatmap of sales correlated with local events, helping marketing teams optimize future campaigns. Practical examples include:

Finance: Identifying fraudulent transactions by analysing transaction histories.
Healthcare: Studying patient data to find risk factors for certain diseases.
Sports: Analysing player performance and in-game data to fine-tune strategies.

Machine Learning: Teaching Computers from Examples

In essence, machine learning is about creating algorithms that learn from existing data to make predictions, classifications, or decisions without explicit rule-based instructions. Usually, this implies the following:

Training a model with historical data (e.g., features and known outcomes).
Evaluating the model’s performance on unseen data to measure accuracy or error.
Deploying it so that, whenever new data arrives, the model can infer outcomes (like spam vs. not spam, or how likely a user is to buy a product).

Machine learning acts as the “engine” that can draw predictive or prescriptive power out of data. It’s a critical subset of data science and arguably the most dominant approach fuelling modern AI applications. Yet, keep in mind that ML solutions rely heavily on good data and clearly defined goals.

Generally, machine learning is divided in the following types:

Supervised Learning: Labeled data, input features with known target labels, for instance, predicting house prices given square footage, location, and past sale prices.
Unsupervised Learning: Unlabelled data: the model tries to find structure on its own (clustering, dimensionality reduction). As an example, grouping customers into segments based on behaviour (loyalty, spending patterns) without any predefined categories.
Reinforcement Learning: An agent learns to perform actions in an environment to maximize rewards. An example would be a robotic arm learning to pick up objects more efficiently through trial and error, being awarded points when it succeeds.

Artificial Intelligence: The Big Umbrella

AI is the overarching concept of machines displaying “intelligent” behaviour—learning, problem-solving, adapting to new information—much like humans do (in theory).

Machine learning is a massive driver of modern AI, but AI historically includes:

Knowledge Representation: Systems that encode domain knowledge in symbolic forms, reasoning with logic or rules.
Planning and Decision-Making: Systems that figure out sequences of actions to achieve goals.
Natural Language Processing: Understanding and generating human language (which often merges with ML nowadays).
Expert Systems: Rule-based systems used in older medical diagnosis tools, for example.

In the modern World, we can see several applications of this:

Digital Assistants: Apple’s Siri, Amazon’s Alexa, Google Assistant interpreting voice commands and responding contextually.
Robotics: Drones adjusting flight paths to avoid obstacles or robots in warehouses sorting packages.
Autonomous Vehicles: Combining computer vision, sensor fusion, path planning, and real-time decision-making.

AI aspires to replicate or approach human-level capabilities—whether that’s understanding language, making judgments, or even creative pursuits. Machine learning is a primary fuel source for AI, but AI’s broader scope includes older, rule-based, or even logic-driven systems that might not be strictly data-driven.

Generative AI: The Future of Creation

Generative AI stands out as a specialised branch of machine learning that focuses on producing new, original outputs—text, images, music, code, you name it—rather than simply predicting a label or numeric value. Generative AI models are designed to create data similar to the input data they are trained on. These models are categorised based on their architectures and the techniques they use.

Generative AI models are designed to create data similar to the input data they are trained on. These models are categorized based on their architectures and the techniques they use. Here are the main types of models for generative AI:

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) consist of two parts: a generator that creates fake data, such as images or videos, and a discriminator that tries to determine if the data is real (from a dataset) or fake (generated by the model). During the training process, the generator improves its ability to create realistic data while the discriminator becomes better at identifying fakes. This back-and-forth process helps both components improve over time. GANs are commonly used for image generation, such as creating realistic faces, generating deepfake videos, enhancing low-resolution images, and creating additional data for training other models. GANs are difficult to train and can sometimes get stuck creating only limited variations of data, a challenge known as mode collapse.

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) are probabilistic models that encode input data into a latent space and then decode it back to reconstruct the original data. The latent space is regularized to ensure smooth interpolation between points. During training, VAEs optimize a combination of reconstruction loss and Kullback-Leibler (KL) divergence to align the latent space with a known distribution, such as a Gaussian. VAEs are commonly used for image synthesis, data compression, and anomaly detection. The data generated by VAEs may lack sharpness and fine details compared to GANs..

Diffusion Models

Diffusion models work by gradually adding noise to data during training and then learning how to reverse this process to generate new data. The training involves modeling the denoising process using Markov chains and neural networks. These models are widely used for high-quality image generation, such as in tools like DALL·E 2 and Stable Diffusion, as well as for creating videos and 3D models. Diffusion models are computationally expensive because the denoising process is sequential and requires significant resources.

Autoregressive Models

Autoregressive models generate data one step at a time by predicting the next value in a sequence based on previous values, such as text or pixel generation. Well-known examples include GPT for text generation and PixelCNN for image generation. These models are widely used for tasks like text generation (e.g., ChatGPT, GPT-3), audio generation (e.g., WaveNet), and image generation (e.g., PixelCNN, PixelRNN). While powerful, autoregressive models can be slow due to their sequential nature and are memory-intensive when dealing with long sequences.

Transformers

Transformer-based models use self-attention mechanisms to process data, making them highly effective for sequential and context-dependent tasks. Popular examples include GPT, BERT, T5, DALL·E, and Codex. These models are widely used for natural language generation, code generation, text-to-image generation, and protein folding, as seen in tools like AlphaFold. However, transformers require massive datasets and significant computational resources for training.

Normalising Flows

These models learn complex data distributions by applying a series of invertible transformations to map data to and from a simple distribution (e.g., Gaussian). Applications include density estimation, image synthesis and audio generation. This model type requires designing invertible transformations, which can limit flexibility.

Energy-Based Models (EBMs)

EBMs learn an energy function that assigns low energy to realistic data and high energy to unrealistic data. Data is generated by sampling from the learned energy distribution. They are used for image generation and density estimation. EBMs are computationally expensive and challenging to train.

Hybrid Models

Hybrid models combine features from multiple generative models to leverage their strength. Examples include VAE-GANs, which combine VAEs and GANs to improve output quality and latent space regularity and diffusion-GANs, which use diffusion processes with adversarial training. These models are used mostly in image synthesis and creative AI. Hybrid models limitations include complexity in training and tuning hyperparameters.

Putting It All Together

Think of these disciplines as layers:

Data Science: The overall process of collecting data, analyzing trends, and delivering actionable insights. If you want to answer “What happened and why?” or set up the foundation, data science is the starting point.
Machine Learning: A subset of data science, focusing on building predictive or classification models. If your goal is to forecast next quarter’s sales or detect fraudulent transactions, ML is your friend.
Artificial Intelligence: The broader concept of machines mimicking human-like intelligence—machine learning is a key driver here, but AI can also involve logic-based systems and planning that aren’t purely data-driven.
Generative AI: A cutting-edge slice of ML that specialises in creating content rather than just labelling or categorising. It’s fueling new possibilities in text, art, music, and code generation.

Wrapping It Up

Although people throw around terms like “Data Science,” “Machine Learning,” “AI,” and “Generative AI” as if they were interchangeable, each category has its unique function and goals. Data Science ensures data is properly handled and turned into insights, Machine Learning zeros in on building predictive or classification models, AI provides the grand blueprint for machines to emulate intelligent behavior, and Generative AI takes that further by crafting entirely new output.

As these fields keep converging, many real-world projects weave them together—like a data science foundation guiding ML-driven AI solutions with generative capabilities. The next decade likely holds even more hybrid use cases, bridging analysis, prediction, and creative generation. But by sorting out the distinctions now, you’ll be better equipped to navigate the opportunities (and challenges) on the horizon.

Decoding Big O: Analysing Time and Space Complexity with Examples in C#, JavaScript, and Python

Posted on 22nd Jan 2025 by Rodrigo Silva

Efficiency matters. Whether you’re optimising a search algorithm, crafting a game engine, or designing a web application, understanding Big O notation is the key to writing scalable, performant code. Big O analysis helps you quantify how your code behaves as the size of the input grows, both in terms of time and space (meaning memory usage).

Big O notation was introduced by German mathematician Paul Bachmann in the late 19th century and later popularised by Edmund Landau. It was originally part of number theory and later adopted into computer science for algorithm analysis. Big O notation gets its name from the letter “O,” which stands for “Order” in mathematics. It is used to describe the order of growth of a function as the input size grows larger, specifically in terms of how the function scales and dominates other factors. The “Big” in Big O emphasises that we are describing the upper bound of growth. Big O notation describes the upper bound of an algorithm’s growth rate as a function of input size. It tells you how the runtime or memory usage scales, providing a worst-case scenario analysis.

Key Terminology:

Input Size (n): The size of the input data
Time Complexity: How the runtime grows with n
Space Complexity: How memory usage grows with n

Common Big O Classifications

These are common complexities, from most efficient to least efficient:

Big O	Name	Description
O(1)	Constant Complexity	Performance doesn’t depend on input size.
O(log n)	Logarithmic Complexity	Divides the problem size with each step.
O(n)	Linear Complexity	Grows proportionally with the input size.
O(n log n)	Log-Linear Complexity	The growth rate is proportional to n times the logarithm of n. It is often seen in divide-and-conquer algorithms that repeatedly divide a problem into smaller subproblems, solve them, and then combine the solutions
O(n²) or O(n^k)	Quadratic or Polynomial Complexity	Nested loops—performance scales with n
O(2ⁿ)	Exponential Complexity	Grows exponentially—terrible for large inputs.
O(n!)	Factorial Complexity	Explores all arrangements or sequences.

Analysing Time Complexity

Constant Time

In the first type of algorithm, regardless of input size, the execution time remains the same.

Example: C# (Accessing an Array Element)

int[] numbers = { 10, 20, 30, 40 };
Console.WriteLine(numbers[2]); // Output: 30

Accessing an element by index is O(1), as it requires a single memory lookup.

Logarithmic Time

The next most efficient case happens when the runtime grows logarithmically, typically in divide-and-conquer algorithms.

Example: JavaScript (Binary Search)

function binarySearch(arr, target) {
    let left = 0, right = arr.length - 1;
    while (left <= right) {
        let mid = Math.floor((left + right) / 2);
        if (arr[mid] === target) return mid;
        else if (arr[mid] < target) left = mid + 1;
        else right = mid - 1;
    }
    return -1;
}
console.log(binarySearch([1, 2, 3, 4, 5], 3)); // Output: 2

Each iteration halves the search space, making this O(log n).

Linear Time

Example: Python (Iterating Over a List)

numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num)

The loop visits each element once, so the complexity is O(n).

Log-linear Time

This one takes a more elaborated and complex example. First, let’s break it down:

n : This represents the linear work required to handle the elements in each step
log n : This comes from the recursive division of the problem into smaller subproblems. For example, dividing an array into halves repeatedly results in a logarithmic number of divisions.

Example: JavaScript (Sorting arrays with Merge and Sort)

function mergeSort(arr) {
  // Base case: An array with 1 or 0 elements is already sorted
  if (arr.length <= 1) {
    return arr;
  }

  // Divide the array into two halves
  const mid = Math.floor(arr.length / 2);
  const left = arr.slice(0, mid);
  const right = arr.slice(mid);

  // Recursively sort both halves and merge them
  return merge(mergeSort(left), mergeSort(right));
}

function merge(left, right) {
  let result = [];
  let leftIndex = 0;
  let rightIndex = 0;

  // Compare elements from left and right arrays, adding the smallest to the result
  while (leftIndex < left.length && rightIndex < right.length) {
    if (left[leftIndex] < right[rightIndex]) {
      result.push(left[leftIndex]);
      leftIndex++;
    } else {
      result.push(right[rightIndex]);
      rightIndex++;
    }
  }

  // Add any remaining elements from the left and right arrays
  return result.concat(left.slice(leftIndex)).concat(right.slice(rightIndex));
}

// Example usage:
const array = [38, 27, 43, 3, 9, 82, 10];
console.log("Unsorted Array:", array);
const sortedArray = mergeSort(array);
console.log("Sorted Array:", sortedArray);

The array is repeatedly divided into halves until the subarrays contain a single element (base case), with complexity O(log n). The merge function combines two sorted arrays into a single sorted array by comparing elements (O(n)). This process is repeated as the recursive calls return, merging larger and larger sorted subarrays until the entire array is sorted.

Quadratic or Polynomial Time

In the simplest and obvious examples, nested loops lead to quadratic growth.

Example: C# (Finding Duplicate Pairs)

int[] numbers = { 1, 2, 3, 1 };
for (int i = 0; i < numbers.Length; i++) {
    for (int j = i + 1; j < numbers.Length; j++) {
        if (numbers[i] == numbers[j]) {
            Console.WriteLine($"Duplicate: {numbers[i]}");
        }
    }
}

The outer loop runs n times, and for each iteration, the inner loop runs n-i-1 times. This results in O(n²).

Exponential Time

Generating all subsets (the power set) of a given set is a common example of exponential time complexity, as it involves exploring all combinations of a set’s elements.

Example: Python (Generating the Power Set)

def generate_subsets(nums):
    def helper(index, current):
        # Base case: if we've considered all elements
        if index == len(nums):
            result.append(current[:])  # Add a copy of the current subset
            return
        
        # Exclude the current element
        helper(index + 1, current)
        
        # Include the current element
        current.append(nums[index])
        helper(index + 1, current)
        current.pop()  # Backtrack to explore other combinations

    result = []
    helper(0, [])
    return result

# Example usage
input_set = [1, 2, 3]
subsets = generate_subsets(input_set)

print("Power Set:")
for subset in subsets:
    print(subset)

For a set of n elements, there are 2ⁿ subsets. Each subset corresponds to a unique path in the recursion tree. Therefore, the time complexity is O(2ⁿ)

Factorial Time

These algorithms typically involve problems where all possible permutations, combinations, or arrangements of a set are considered.

Example: Javascript (Generating All Permutations)

function generatePermutations(arr) {
    const result = [];

    function permute(current, remaining) {
        if (remaining.length === 0) {
            result.push([...current]); // Store the complete permutation
            return;
        }

        for (let i = 0; i < remaining.length; i++) {
            const next = [...current, remaining[i]]; // Add the current element
            const rest = remaining.slice(0, i).concat(remaining.slice(i + 1)); // Remove the used element
            permute(next, rest); // Recurse
        }
    }

    permute([], arr);
    return result;
}

// Example usage
const input = [1, 2, 3];
const permutations = generatePermutations(input);

console.log("Permutations:");
permutations.forEach((p) => console.log(p));

For n elements, the algorithm explores all possible arrangements, leading to n! recursive calls.

Analysing Space Complexity

Space complexity evaluates how much additional memory an algorithm requires as it grows.

Constant Space

An algorithm that uses the same amount of memory, regardless of the input size.

Example: Python (Finding the Maximum in an Array)

def find_max(arr):
    max_val = arr[0]
    for num in arr:
        if num > max_val:
            max_val = num
    return max_val

Only a fixed amount of memory is needed, regardless of the size of or n.

Logarithmic Space

Typically found in recursive algorithms that reduce the input size by a factor (e.g., dividing by 2) at each step. Memory usage grows slowly as the input size increases

Example: C# (Recursive Binary Search)

static void SearchIn(int target, string[] args)
{
    int[] array = { 1, 3, 5, 7, 9, 11 };
    int result = BinarySearch(array, target, 0, array.Length - 1);
    if (result != -1)
    {
        Console.WriteLine($"Target {target} found at index {result}.");
    }
    else
    {
        Console.WriteLine($"Target {target} not found.");
    }
}

static int BinarySearch(int[] arr, int target, int low, int high)
{
    // Base case: target not found
    if (low > high)
    {
        return -1;
    }
    // Find the middle index
    int mid = (low + high) / 2;     
    // Check if the target is at the midpoint
    if (arr[mid] == target)
    {
        return mid;
    }
    // If the target is smaller, search in the left half
    else if (arr[mid] > target)
    {
        return BinarySearch(arr, target, low, mid - 1);
    }
    // If the target is larger, search in the right half
    else
    {
        return BinarySearch(arr, target, mid + 1, high);
    }
}

Binary Search halves the search space at each step. The total space usage grows logarithmically with the depth of recursion, resulting in O(log n).

Linear Space

Memory usage grows proportionally with the input size.

Example: JavaScript (Reversing an Array)

function reverseArray(arr) {
   let reversed = [];
   for (let i = arr.length - 1; i >= 0; i--) {
      reversed.push(arr[i]);
   }
   return reversed;
}

console.log(reverseArray([1, 2, 3])); // Output: [3, 2, 1]

The new reversed array requires space proportional to the input size.

Log-linear space complexity algorithm

This type of algorithm requires memory proportional to n log n, often due to operations that recursively split the input into smaller parts while using additional memory to store intermediate results.

Example: Python (Merge Sort)

def merge_sort(arr):
    if len(arr) > 1:
        # Find the middle point
        mid = len(arr) // 2

        # Split the array into two halves
        left_half = arr[:mid]
        right_half = arr[mid:]

        # Recursively sort both halves
        merge_sort(left_half)
        merge_sort(right_half)

        # Merge the sorted halves
        merge(arr, left_half, right_half)

def merge(arr, left_half, right_half):
    i = j = k = 0

    # Merge elements from left_half and right_half into arr
    while i < len(left_half) and j < len(right_half):
        if left_half[i] <= right_half[j]:
            arr[k] = left_half[i]
            i += 1
        else:
            arr[k] = right_half[j]
            j += 1
        k += 1

    # Copy any remaining elements from left_half
    while i < len(left_half):
        arr[k] = left_half[i]
        i += 1
        k += 1

    # Copy any remaining elements from right_half
    while j < len(right_half):
        arr[k] = right_half[j]
        j += 1
        k += 1

# Example usage
if __name__ == "__main__":
    array = [38, 27, 43, 3, 9, 82, 10]
    print("Unsorted Array:", array)
    merge_sort(array)
    print("Sorted Array:", array)

The depth of recursion corresponds to the number of times the array is halved. For an array of size n, the recursion depth is log n. At each level, temporary arrays (left_half and right_half) are created for merging, requiring O(n) space. The total space complexity is given by
O(log n) recursion stack + O(n) temporary arrays = O(n log n).

Quadratic or Polynomial Space

This case encompasses algorithms that require memory proportional to the square or another polynomial function of the input size.

Example: Python (Longest Common Subsequence)

def longest_common_subsequence(s1, s2):
    n, m = len(s1), len(s2)
    dp = [[0] * (m + 1) for _ in range(n + 1)]
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    return dp[n][m]

This algorithm requires a two dimensions table storing solutions for all substrings, therefore the space complexity is O(n^k).

Exponential Space

An algorithm with exponential space complexity typically consumes memory that grows exponentially with the input size.

Example: Javascript (Generating the Power Set)

function generatePowerSet(inputSet) {
    function helper(index, currentSubset) {
        if (index === inputSet.length) {
            // Store a copy of the current subset
            powerSet.push([...currentSubset]); 
            return;
        }
        // Exclude the current element
        helper(index + 1, currentSubset);

        // Include the current element
        currentSubset.push(inputSet[index]);
        helper(index + 1, currentSubset);
        currentSubset.pop(); // Backtrack
    }
    const powerSet = [];
    helper(0, []);
    return powerSet;
}

// Example usage
const inputSet = [1, 2, 3];
const result = generatePowerSet(inputSet);

console.log("Power Set:");
console.log(result);

The recursion stack consumes O(n) space (depth of recursion). The memory for storing the power set is O(2ⁿ), which dominates the overall space complexity.

Factorial Space

These are algorithms found in problems that involve generating all permutations of a set.

Example: C# (Generating All Permutations)

static void Main(string[] args)
{
    var input = new List<int> { 1, 2, 3 };
    var permutations = GeneratePermutations(input);
    Console.WriteLine("Permutations:");
    foreach (var permutation in permutations)
    {
        Console.WriteLine(string.Join(", ", permutation));
    }
}

static List<List<int>> GeneratePermutations(List<int> nums)
{
    var result = new List<List<int>>();
    Permute(nums, 0, result);
    return result;
}

static void Permute(List<int> nums, int start, List<List<int>> result)
{
    if (start == nums.Count)
    {
        // Add a copy of the current permutation to the result
        result.Add(new List<int>(nums));
        return;
    }
    for (int i = start; i < nums.Count; i++)
    {
        Swap(nums, start, i);
        Permute(nums, start + 1, result);
        Swap(nums, start, i); // Backtrack
    }
}

static void Swap(List<int> nums, int i, int j)
{
    int temp = nums[i];
    nums[i] = nums[j];
    nums[j] = temp;
}

The algorithm generates n! permutations, and each permutation is stored in the result list. For n elements, this requires O(n!) memory.