Ernests Rudzitis - Blog

Teaching Drones to Ignore Bad Advice: Building a 'Gut-Feeling' for an AI Swarm

Tue, 08 Jul 2025 20:36:51 GMT

Introduction: A Single Lie Can Break the Team

Ever had that gut feeling that something isn't quite right? That a friend is telling you a tall tale? As it turns out, AI needs that same intuition, especially when they work in teams. For my bachelor's thesis, I decided to tackle this very problem: how do you teach a robot to be skeptical?

A drone swarm's greatest strength is its teamwork, which relies on constant, trustworthy communication to perform complex tasks like search and rescue or coordinated surveillance. They are continuously informing each other of their location, speed, and intentions to fly in perfect, coordinated formations.

Drone swarms constantly share location and velocity data to maintain coordinated flight patterns

But here's the vulnerability: these AI-controlled teams trust everything they hear. What happens when that trust is broken?

A single compromised message can cascade through the entire swarm, breaking formation and potentially causing mission failure

The Problem: When Communication Goes Wrong

Drone swarms are basically teams of robots that constantly chat with each other. "I'm here," says Drone A. "I'm moving this way," reports Drone B. "Target spotted at these coordinates," announces Drone C. This constant communication allows them to fly in perfect formation, avoid collisions, and accomplish complex missions.

But here's the catch: what happens when that communication becomes unreliable?

Sensor malfunction: A drone's GPS starts reporting wrong locations
Communication noise: Radio interference corrupts the messages
Adversarial attacks: An enemy deliberately sends false information or jams communication links
Hardware failures: Equipment starts sending stale, outdated data

Any of these scenarios can cause the entire swarm to break formation, crash into each other, or fail their mission entirely.

The Traditional Solutions (And Why They're Not Enough)

Most existing approaches to this problem are heavy-handed:

Option 1: Rebuild Everything from Scratch. Train your AI systems to handle bad communication from day one. This works, but it's expensive, time-consuming, and means you can't use any of the excellent pre-trained models that already exist.

Option 2: Use Fixed Security Protocols. Implement cryptographic security and rigid verification systems. This can work for some scenarios, but it's inflexible and can't adapt to new types of problems.

Both approaches are like replacing your entire smartphone just because you occasionally get spam calls - a massive overreaction to what should be a manageable problem.

A Better Way: The Trust-Based Information Filtering (TIF) System

The three-step research approach: identify reliability checks, enable self-learning, and improve robustness

What if instead of retraining the entire system, we just gave each drone a smart assistant that could whisper, "Hey, that message seems wrong – maybe don't trust it"? That's essentially what the Trust-Based Information Filtering (TIF) system does. It's a lightweight "trust layer" that sits between incoming messages and the drone's decision-making brain.

It works in the following way:

Step 1: Learn What "Normal" Looks Like

The TIF system records thousands of interactions during normal operations to build a baseline of trustworthy behavior

The system watches the drone swarm during normal, ideal operations and builds a detailed profile of what trustworthy communication patterns look like. To put it in perspective, It's equivalent would be learning the rhythm and flow of good teamwork.

Step 2: Spot the Anomalies

The system analyzes multiple types of features to assess message trustworthiness

When new messages come in, the system checks them against learned patterns using sophisticated anomaly detection algorithms. It looks into the following indicators:

Sudden, (proximally) impossible changes in reported positions
Messages that don't match what neighboring drones are reporting
Communication patterns that violate physical laws

Building a "zone of trust" - messages falling within normal patterns are trusted, outliers are flagged

Step 3: Plausible Recovery

The TIF system in action: trusted messages pass through, untrusted ones trigger recovery mechanisms

When bad information is detected, instead of just throwing it away, the system tries to reconstruct plausible replacement data. It might use recent history to estimate where a drone probably is, or smooth out obvious noise.

The Results: Small Improvements, Big Impact

To properly evaluate the TIF system, I needed to simulate the kinds of communication problems that real drone swarms might encounter in the field. So I introduced three types of deliberate "sabotage" into the simulation environment:

Message Freezing: This simulates scenarios like replay attacks or connection issues where the last known position keeps being broadcast even though the drone has moved.

Message Offset: This adds a consistent error to all reported values - like a drone whose GPS sensor has developed a persistent bias, always reporting positions that are off by a fixed amount in a particular direction.

Random Noise Injection: The most common real-world problem - Gaussian noise gets added to transmitted data, simulating everything from radio interference to minor sensor inaccuracies.

Each of these represents a different challenge for the trust system to detect and handle. Testing this system on drone formation flying tasks under these corrupted conditions showed promising results:

Mean formation error of the swarm formation with the baseline policy versus the policy enhanced by TIF system. Results are averaged across three distinct communication compromise types: noise, offset, and freeze. The TIF system consistently reduces formation error in all scenarios. (Lower is better).

Percentage improvement in mean formation error achieved by the TIF system, categorized by compromise type.

6.8% overall improvement in formation accuracy, with best performance against random noise

6.8% overall improvement in formation accuracy during communication failures
9.5% improvement against random noise (the most common real-world issue)
Consistent protection even as the percentage of bad messages increased

These might seem like small and modest numbers, but in the world of coordinated robotics, I would like to argue that they're significant. A 6.8% improvement in formation accuracy could mean the difference between mission success and collision, between a successful rescue operation and a catastrophic failure.

Why This Matters

What makes this approach particularly exciting is its plug-and-play nature. You don't need to retrain your expensive, carefully-tuned AI models. You don't need to redesign your entire communication system. You just add this trust layer, let it learn from normal operations for a while, and it starts protecting your swarm.

This is especially important because:

Training MARL systems is expensive – we're talking weeks of computation and thousands of dollars
Many organizations already have working systems they'd rather enhance than replace
New types of attacks and failures emerge constantly – a system that can learn and adapt is more valuable than one with fixed defenses

The Broader Picture

While this research focused specifically on drone swarms, the underlying principles could apply to many other multi-agent AI systems:

Autonomous vehicle fleets sharing traffic information
Robot teams in warehouses or factories
Distributed AI systems making collective decisions
Smart city infrastructure coordinating traffic lights, sensors, and services

Anywhere where you have AI agents that need to communicate and coordinate, you potentially need mechanisms to ensure that communication is trustworthy.

The Journey's End and What's Next?

The roadmap ahead: from 2D simulations to 3D reality, with smarter adaptation and recovery

This work represents a promising first prototype, but there's more to explore:

Testing in high-fidelity environments and eventually on real hardware
Continuous adaptation to handle dynamically changing missions
Handling more sophisticated adversaries that try to mimic normal behavior
Developing better recovery mechanisms using more advanced techniques, such as, generative networks, temporal memory

The vision is a future where AI teams can maintain their coordination and effectiveness even in the face of unreliable, noisy, or malicious communication – where a single lie, quite literally, doesn't break the team.

The Takeaway

In our increasingly connected world of AI agents, the ability to distinguish trustworthy information from bad data isn't just a nice-to-have feature, rather it's essential for safety and mission success. The Trust-Based Information Filtering (TIF) system shows that we don't always need to start from scratch to build more robust AI systems. Sometimes, the best solution is teaching our AI agents the same skill humans have been developing for millennia: knowing when not to trust what they hear.

Color Meets Shape: Using Histograms of Oriented Gradients and Colors to Classify Flowers

Mon, 30 Dec 2024 21:27:56 GMT

Introduction

In this blog post, we will explore fundamental object detection technique called Histograms of Oriented Gradients (HOGs). Interestingly, I came across this method in a sort of unconventional way, that is while browsing the comment section of a completely unrelated YouTube video. What began as a casual scroll quickly turned into an unexpected discovery, sparking my curiosity to dive deeper into this subject. After having done a reasonable amount of exploring the concepts, applications, and implementation, I am excited to share my findings with you.

While the primary focus will be on understanding HOGs and their role in object detection, we will also take things a step further by applying this knowledge to a practical task: classifying some of the most common flowers found in the UK (17 flowers dataset). This choice of application is not arbitrary. After doing a quick search on Google Scholar and the Web, I noticed that while both HOG features and color histograms have been used individually or in combination with other techniques for plant and flower classification, there are relatively few articles that specifically explore the combination of HOG and color histograms for this purpose.

This pairing is particularly compelling because it allows us to capture both the shape and structural details of the flowers (by means of HOGs) alongside their rich and varied color patterns (via color histograms).

We will begin by exploring the implementation details with the help of visual elements (animations, images) and then proceed to build the functionality from scratch without relying on any third-party libraries. Finally, we will wrap up by creating the flower classifier.

By the time you finish this post, we will have built an image classifier that achieves an impressive 94% test accuracy!

Histograms of Oriented Gradients

Introduced by Navneet Dalal and Bill Triggs in 2005. It became particularly famous for human detection applications and served as a foundational technique that influenced many modern computer vision approaches.

Histograms of Oriented Gradients are like edge-detectors but on steroids, they not only extract gradients (that tell use information about pixel intensity), but also orientation (the direction of the change in pixel intensity). These gradients and orientations are computed for local regions of an image, often called cells, and for each cell a histogram is calculated. Hence the name Histograms of Oriented Gradients. Details on this are covered in the following sections.

Dataset in question

The following image showcases a sample from our dataset which we will use throughout this post to illustrate key concepts. The example features a Daffodil flower, one of the most recognizable and vibrant blooms commonly found in the UK.

Pre-processing

As per any task in computer vision, or machine learning in general, we should start of by applying pre-processing steps. The original paper (that studied HOGs performance for Human detection) mentions the following, citing.

Our 64×128 detection window includes about 16 pixels of margin around the person on all four sides.

The exact 64×128 size seems to have been chosen empirically, for the specific task in hand, to include a reasonable margin around the pedestrian while matching the general scale of pedestrians in their dataset. The paper does not provide a theoretical justification for this particular window size, therefore for our task, considering the varied shapes and close-up nature of the flower images, I opted to resize all images to a uniform size of 256×256 pixels, ensuring consistency and better suitability for flower classification.

Gradient computation

The succeeding step is to compute vertical and horizontal gradients, that is the pixel intensity changes both in x and y directions. Gradient computation is a crucial step in the HOG feature extraction process. To calculate the gradients, we will not re-invent the wheel, rather we use specially selected kernels whose main purpose is to detect such pixel intensity changes in either of directions. The most commonly used kernels for this purpose are the [-1, 0, 1] kernel for the horizontal gradient and its transpose, [-1, 0, 1]ᵀ, for the vertical gradient. To apply these kernels, they are slid across the image in a process called convolution. At each pixel location, the kernel is centered, and the pixel values are multiplied by the corresponding kernel values. The results are then summed up to obtain the gradient value at that particular pixel. This process is repeated for every pixel in the image, resulting in two gradient maps – one for the horizontal gradient and another for the vertical gradient.

Important note! Before performing this step, a decision must be made about whether to convert the input image to grayscale or keep it in its original multichannel form. If the image is converted to grayscale, the gradient computation is straightforward and performed on a single channel. However, if the image remains multichannel, gradients are computed separately for each channel, resulting in multiple gradient maps (one for each channel). Keeping the color information can lead to improved performance later down the line.

The animation demonstrates this process with horizontal kernel, but the same idea would apply for horizontal gradient computation (on a small patch extracted from the sample image).Notably, as visible in animation, regions with significant changes in pixel intensity tend to exhibit larger gradient values. These areas correspond to edges, boundaries, or transitions within the image. In our case, these changes often occur due to the distinct edges of flower petals, where the shape of the petal creates sharp contrasts against the background or adjacent petals.

Final gradient (horizontal and vertical) gradient computation. Fun fact, that might seem counterintuitive at first, but the horizontal kernel detects vertical changes, similarly, the vertical kernel is aligned vertically, but it detects horizontal changes

Magnitude and orientation computation

With the gradient computation in place, we can now finally determine the magnitude value for each pixel. Now you might wonder, why bother calculating magnitude when we already have the horizontal and vertical gradients? The answer is plain and simple: relying on just one gradient can miss the bigger picture, especially for edges that are angled or diagonal. To fully capture the strength of an edge, we combine the horizontal and vertical gradients using the good old Pythagorean theorem. Might have sounded harder than it really is, however it is as elementary as applying a bit of high school math.

Important note! I chose to preserve all of the RGB channels of the image, therefore the final magnitude is the largest one amongst all of the image channels for a particular pixel.

After calculating the magnitude of the gradient, the next step is to compute the orientation of the gradient at each pixel. The orientation represents the direction of the edge and provides additional information about the structure and shape of the objects in the image.

The resulting angle is then typically converted to degrees for easier interpretation, in addition a crucial decision must be made regarding whether to keep the angle signed (-180° to 180°) or unsigned (0° to 360°). In the original paper the authors experimented with various approaches and found that for human detection, using unsigned gradients over angles of 0° to 180° provided the best performance. I experienced no perceptible accuracy improvements using either of the approaches for flower classification.

Histogram computation

The next step in HOG feature extraction pipeline is to create histogram representations of these gradients. To begin, the image is to be divided into smaller, local regions called 'cells', as we remember from introduction.

In the original paper, the authors used 8x8 pixel cells, their justification was that 'relatively coarse spatial quantization suffices (8×8 pixel cells / one limb width)'. This reasoning suggests that the cell size should be chosen to roughly correspond to the size of meaningful parts or features of the objects being detected, which aligns with my discoveries, more specifically in my experiments with flower classification, I found that using a larger cell size of 16×16 provided better results for the task in hand. It is important to emphasize that the cell size can be adjusted based on the characteristics of the objects being detected and the resolution of the images. When selecting the cell size, it is significant to ensure that the cells evenly divide the image. In other words, the image dimensions should be divisible by the cell size without leaving any remainder, which ensures that all cells have the same size and that there are no partial or incomplete cells at the edges of the image. For example, as in our case the images are resized to have dimensions of 256×256 pixels, we could have chosen cell sizes of 8×8, 16×16, 32×32, or 64×64, as all of these sizes evenly divide the image. On the other hand, lets say if the image has dimensions of 150×150 pixels, a cell size of 16×16 would not be suitable, as it would result in uneven cells at the image boundaries.

Illustration of target image being divided into 16x16 cells

Great, now that we have divided our image into cells, we end up with 16×16×2 = 512 total values for each cell. This comes from the fact that each cell consists of 16×16 = 256 pixels, and for each pixel, we have calculated two key values - magnitude and orientation. With these values in hand, we can now move on to the more exciting part of calculating the histograms of oriented gradients for each cell!

A histogram is a visual representation of the distribution of quantitative data. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval

In HOGs histograms represent the distribution of gradient orientations within a particular cell. The gradient orientations are typically quantized into a fixed number of bins, commonly 9, as suggested in the original paper: 'increasing the number of orientation bins improves performance significantly up to about 9 bins.' With 180° divided into 9 bins, each bin covers a span of 20°. The construction of the histogram is the following, we directly take each pixels gradients magnitude and add it to the corresponding orientation bin. The resulting histogram for each cell provides a compact representation of dominant edges while abstracting away the exact spatial locations of the gradients within the cell.

Illustrative animation of the histogram creating process by 'binning' (on a 6x6 patch for visualization purposes)

Resulting visualization constructed from histograms

Block normalization

We have arrived at the final finishing touch for HOG computation, which is to perform block normalization. Block normalization is a crucial technique used to further improve invariance of the HOG to changes in illumination and contrast.

The need for block normalization arises from the fact that the gradient magnitudes can vary significantly across different regions of an image due to variations in lighting conditions, shadows, and local contrast. The above mentioned variations can adversely affect the performance of a classifier because of the inconsistent object description. Block normalization helps to mitigate such an issue by normalizing the histogram values across larger spatial regions called 'blocks'.

A block is a group of adjacent cells, typically 2×2 or 3×3 cells, that are treated as a single unit for the means of normalization. In simple words, this means that we concatenate 4 or 9 histograms together in one large list (respectively, forming 4×9 = 36 or 9×9 = 81 histogram values). The block size is usually larger than the cell size to capture a wider spatial context. The blocks are overlapped, meaning that each cell contributes to multiple blocks. The normalization procedure involves computing L2 normalization (essentially normalization based off of Euclidean distance) or L2-Hys normalization of the block. The study shows that both methods display close performance.

After block normalization is finished we have the final feature set that describes our object, the authors in the paper introduce them as HOG descriptors: 'We will refer to the normalized descriptor blocks as Histogram of Oriented Gradient (HOG) descriptors'.

Illustration of block normalization process. The final feature count is 15×15×4×9 = 8100 (256/32-1 = 15 blocks that fit width and height wise, each block is made up of 4 cells, each cell has 9 histogram values)

Important note! While raw histograms create intuitive visualizations of gradient directions, the normalized features are essential for machine learning tasks. Visualizing normalized features is less common since they represent abstract, high-dimensional block-wise patterns optimized for classification rather than human interpretation.

Color histograms

Although color histograms are not the main focus of this blog post, they do however expand our feature space with additional valuable characteristics. The process of creating color histograms closely mirrors the approach used for HOG. Here, instead of gradients, we 'bin' the pixel values from each image channel separately based on intensity ranges. In result we are left with 3 histograms, each for Red, Green and Blue channels, that provide a detailed representation of the color distribution withing the image. For the task of flower classification I found that 64 bins per channel yielded the highest performance enhancement, and any additional increase had little to no effect.

Code implementation

There are countless libraries that implement the above discussed functionality for us, however, I believe implementing it yourself adds that extra layer of understanding and truly solidifies the concept. In the first subsection I present you a straightforward hands on implementation that is easy to follow. In the second subsection we will walk through an implementation that uses third party libraries that have much more optimized solution utilizing smart matrix multiplications - this is the recommended choice for practical applications.

Building it ourselves

pip install numpy Pillow matplotlib scipy

The following code demonstrates how to implement HOG feature extraction from scratch using Python.

This implementation is encapsulated in a HOGExtractor class, which:

Initializes Parameters: Sets up image size, cell size, block size, and the number of orientation bins required for HOG computation
Loads and Preprocesses the Image: Handles resizing and normalization to ensure consistent input
Computes Gradients: Uses kernels to calculate horizontal and vertical gradients, from which gradient magnitudes and orientations are derived
Builds Cell Histograms: Divides the image into smaller regions (cells) and computes a histogram of gradient orientations for each region
Performs Block Normalization: Slides across overlapping blocks of cells and performs normalization
Generates the Final Descriptor: Concatenates all normalized block histograms into a feature vector that represents the image and objects inside of it

import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from scipy.signal import convolve2d

class HOGExtractor:
    def __init__(self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
        self.IMAGE_RESIZE_SIZE = image_size
        self.CELL_SIZE = cell_size
        self.BLOCK_SIZE = block_size
        self.BAND_COUNT = band_count
        self.BIN_WIDTH = 180 / band_count
        
        # Sobel operators for gradient computation
        self.horizontal_kernel = np.array([[-1, 0, 1]])
        self.vertical_kernel = np.array(self.horizontal_kernel.T)
        
        # Initialize computed attributes
        self.input_image = None
        self.resized_image = None
        self.gradient_magnitude = None
        self.gradient_orientation = None
        self.cell_histograms = None
        self.hog_descriptor = None
        
    def _load_image(self, pil_image):
        # The authors of HOG found an increase in accuracy
        # by taking into consideration all RGB channels, however we convert the image to grayscale for convenience of this example
        self.input_image = pil_image.convert('L')
        self.resized_image = np.array(self.input_image.resize(self.IMAGE_RESIZE_SIZE))
        self.resized_image = self.resized_image.astype(float)
        # Normalize the image pixel values to [0, 1]
        self.resized_image = (self.resized_image - self.resized_image.min()) / (self.resized_image.max() - self.resized_image.min())
        
    def _compute_gradients(self):
        # Apply sobel kernels using convolution
        # 'same' property ensures that the output has the same dimensions as the input image by automatically adding appropriate padding
        horizontal_gradient = convolve2d(self.resized_image, self.horizontal_kernel, mode='same')
        vertical_gradient = convolve2d(self.resized_image, self.vertical_kernel, mode='same')
        
        # Calculate gradient magnitude and orientation
        self.gradient_magnitude = np.sqrt(horizontal_gradient**2 + vertical_gradient**2)
        self.gradient_orientation = np.arctan2(vertical_gradient, horizontal_gradient) * (180 / np.pi) % 180
        
    def compute_cell_histograms(self):
        # Calculate number of cells in each dimension
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
        
        # Initialize histogram array for all cells
        self.cell_histograms = np.zeros((cells_y, cells_x, self.BAND_COUNT))
        
        # Compute histograms for each cell
        for y in range(cells_y):
            for x in range(cells_x):
                # Get current cell coordinates
                y_start = y * self.CELL_SIZE[1]
                y_end = (y + 1) * self.CELL_SIZE[1]
                x_start = x * self.CELL_SIZE[0]
                x_end = (x + 1) * self.CELL_SIZE[0]
                
                # Get magnitudes and orientations for current cell
                cell_magnitudes = self.gradient_magnitude[y_start:y_end, x_start:x_end]
                cell_orientations = self.gradient_orientation[y_start:y_end, x_start:x_end]
                
                # Create histogram for current cell
                histogram = np.zeros(self.BAND_COUNT)
                
                # Go over each pixel in the cell
                for i in range(self.CELL_SIZE[1]):
                    for j in range(self.CELL_SIZE[0]):
                        orientation = cell_orientations[i, j]
                        magnitude = cell_magnitudes[i, j]
                        
                        # Compute bin index for current orientation, and add magnitude to corresponding bin
                        bin_index = int(orientation // self.BIN_WIDTH)
                        histogram[bin_index] += magnitude
                        
                self.cell_histograms[y, x] = histogram
                
    def compute_hog_descriptor(self):
        # Calculate number of blocks
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
        blocks_y = cells_y - self.BLOCK_SIZE[0] + 1
        blocks_x = cells_x - self.BLOCK_SIZE[1] + 1
        
        # Initialize final HOG descriptor
        hog_descriptor = []
        
        # Slide the block window across cells
        for y in range(blocks_y):
            for x in range(blocks_x):
                # Get histograms for current block (2x2 cells)
                block_histograms = []
                for cell_y in range(self.BLOCK_SIZE[0]):
                    for cell_x in range(self.BLOCK_SIZE[1]):
                        cell_histogram = self.cell_histograms[y + cell_y, x + cell_x]
                        block_histograms.extend(cell_histogram)
                
                # Normalize block using L2 norm
                # Small epsilon value prevents division by zero
                block_histograms = np.array(block_histograms)
                l2_norm = np.sqrt(np.sum(block_histograms ** 2) + 1e-6)
                normalized_block = block_histograms / l2_norm
                
                # Add normalized block histograms to final descriptor
                hog_descriptor.extend(normalized_block)
                
        self.hog_descriptor = np.array(hog_descriptor)
        return self.hog_descriptor
    
    def extract_features(self, pil_image):
        self._load_image(pil_image)
        self._compute_gradients()
        self.compute_cell_histograms()
        return self.compute_hog_descriptor()
    
    def visualize(self):
        self._visualize_hog()
        
    def _visualize_hog(self):
        # Calculate dimensions
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
        
        # Create visualization
        vis_image = Image.new('RGB', self.IMAGE_RESIZE_SIZE, 'black')
        draw = ImageDraw.Draw(vis_image)
        
        cell_height, cell_width = self.CELL_SIZE
        line_length = min(cell_height, cell_width) // 2
        
        # Draw lines using raw cell histograms directly
        for y in range(cells_y):
            for x in range(cells_x):
                # Use raw cell histograms instead of normalized ones
                raw_histogram = self.cell_histograms[y, x]
                self._draw_cell_visualization(draw, x, y, cell_width, cell_height, 
                                        line_length, raw_histogram)
        
        self._show_visualization(vis_image, 'Raw HOG Visualization')
    
    # Private helper  function to draw cell visualization
    def _draw_cell_visualization(self, draw, x, y, cell_width, cell_height, line_length, histogram):
        cell_center_y = (y + 0.5) * cell_height
        cell_center_x = (x + 0.5) * cell_width
        
        for orientation_bin in range(self.BAND_COUNT):
            orientation = orientation_bin * (180 / self.BAND_COUNT)
            magnitude = histogram[orientation_bin]
        
            
            radian = np.deg2rad(orientation)
            dx = line_length * np.cos(radian) * magnitude / np.max(histogram)
            dy = line_length * np.sin(radian) * magnitude / np.max(histogram)
            
            draw.line([
                (cell_center_x - dx, cell_center_y - dy),
                (cell_center_x + dx, cell_center_y + dy)
            ], fill='white', width=1)
            
    # Private helper function to show visualization
    def _show_visualization(self, vis_image, title):
        plt.figure(figsize=(10, 15))
        plt.subplot(311)
        plt.title('Original Image')
        plt.imshow(self.input_image, cmap='gray')
        
        plt.subplot(312)
        plt.title(title)
        plt.imshow(vis_image)
        
        plt.tight_layout()
        plt.show()

Leveraging the Pros

pip install scikit-image numpy matplotlib

Why Use a Library? While implementing HOG from scratch deepens understanding, third-party libraries like scikit-image offer optimized implementations (from my testing the feature extraction process was approximately 10 times faster)

from skimage.feature import hog
from skimage.transform import resize
import numpy as np
import matplotlib.pyplot as plt

class HOGExtractor:
   def __init__(self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
        self.image_size = image_size
        self.input_image = None
        self.cell_size = cell_size
        self.block_size = block_size
        self.band_count = band_count
        
    def extract_features(self, image):
        self.input_image = image
        img_array = np.array(image)
        img_array = resize(img_array, self.image_size)
        
        features, hog_image = hog(
            img_array,
            orientations=self.band_count,
            pixels_per_cell=self.cell_size,
            cells_per_block=self.block_size,
            visualize=True,
            channel_axis=-1
        )
        self.hog_image = hog_image
        return features
    
    def visualize(self):
        if self.input_image is None or self.hog_image is None:
            return
        
        plt.figure(figsize=(10, 5))
        
        plt.subplot(121)
        plt.title('Original Image')
        plt.imshow(self.input_image)
        plt.axis('off')
        
        plt.subplot(122)
        plt.title('HOG Visualization')
        plt.imshow(self.hog_image, cmap='gray')
        plt.axis('off')
        
        plt.tight_layout()
        plt.show()

import numpy as np
import matplotlib.pyplot as plt

class ColorHistogramExtractor:
    def __init__(self, bins=256, channels=3):
        self.bins = bins
        self.channels = channels
        self.image = None
        self.histograms = None
        self.colors = ['red', 'green', 'blue']
        self.channel_names = ['Red', 'Green', 'Blue']
        
    def load_image(self, pil_image):
        self.image = np.array(pil_image)
        return self
        
    def extract_features(self, image_array=None, normalize=True):
        if image_array is not None:
            self.load_image(image_array)
            
        self.histograms = []
        for channel in range(self.channels):
            histogram, _ = np.histogram(
                # Selects all pixels for a specific color channel (R, G, or B) using numpy's ellipsis notation
                # and flattens the 2D array of pixel values into a 1D array
                self.image[..., channel].ravel(),
                bins=self.bins, # divides the range into equal-width bins
                range=(0, 256)
            )
            
            if normalize:
                histogram = histogram / histogram.sum()
                
            self.histograms.append(histogram)
            
        return np.concatenate(self.histograms)

    def visualize(self):
        plt.figure(figsize=(10, 6))
        
        # Plot original image
        plt.subplot(2, 1, 1)
        plt.title('Original Image')
        plt.imshow(self.image)
        plt.axis('off')
        
        # Plot histograms as bars
        plt.subplot(2, 1, 2)
        plt.title('Color Histograms')
        
        x = np.linspace(0, 1, self.bins)  # Normalized x-axis [0,1]
        bar_width = 1.0 / self.bins
        
        for channel in range(self.channels):
            plt.bar(x, self.histograms[channel], 
                color=self.colors[channel], 
                label=self.channel_names[channel],
                alpha=0.3,
                width=bar_width)
            
        plt.xlabel('Pixel Intensity')
        plt.ylabel('Frequency')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.xlim(0, 1)  # Set x-axis limits to [0,1]
        
        plt.tight_layout()
        plt.show()

Flower classifier

In this last section, we implement a flower classifier using HOG features, color histograms, and a Random Forest Classifier. The classifier is trained on the above mentioned 17 Category Flower Dataset, which consists of 17 flower classes. The following is a short overview of the workflow:

Feature Extraction: We use a combination of HOG features (to capture shape and texture information) and color histograms (to leverage color distribution in the images). These features provide a rich representation of each image containing a flower
Training and Testing Splits: The original dataset was split into 3 different training, validation and test sets. For our use case all of the different training sets and test sets were combined to form a complete training and testing set, containing 2040 and 1040 samples respectively
Model Pipeline: A pipeline is constructed with a Standard Scaler to normalize the feature values to a similar scale in order to remove any potential bias and a Random Forest Classifier

pip install scikit-learn scipy numpy pillow tqdm

from hog import HOGExtractor
from color_histogram import ColorHistogramExtractor
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import os
import scipy.io
import numpy as np
from PIL import Image
from tqdm import tqdm

# Constants
IMAGE_SIZE = (256, 256)
BIN_COUNT = 9

# Load dataset splits
datasplits = scipy.io.loadmat('datasplits.mat')
dataset_path = './17flowers/jpg'
image_files = sorted([f for f in os.listdir(dataset_path) if f.endswith('.jpg')])

# Initialize feature extractors
hog_extractor = HOGExtractor(image_size=IMAGE_SIZE)
color_extractor = ColorHistogramExtractor(bins=BIN_COUNT)

def load_and_extract_features(indices):
    features_list = []
    labels = []
    
    for idx in tqdm(indices, desc="Extracting features"):
        # Load image
        img_path = os.path.join(dataset_path, image_files[idx-1])
        img = Image.open(img_path)
        
        # Create label (17 classes, indexed 0-16)
        label = (idx-1) // 80  # Each class has 80 images
        labels.append(label)
        
        # Extract features
        hog_features = hog_extractor.extract_features(img)
        color_features = color_extractor.extract_features(img)
        
        # Combine features
        combined_features = np.concatenate([hog_features, color_features])
        features_list.append(combined_features)
    
    return np.array(features_list), np.array(labels)

# The original dataset has 3 training, testing, validation splits
# We combine all of the training and testing data into a single set
all_train_indices = np.concatenate([
    datasplits[f'trn{split}'].ravel() 
    for split in range(1, 4)
])
all_test_indices = np.concatenate([
    datasplits[f'tst{split}'].ravel() 
    for split in range(1, 4)
])
    
# Extract features for current split
X_train, y_train = load_and_extract_features(all_train_indices)
X_test, y_test = load_and_extract_features(all_test_indices)

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier(
        n_estimators=300,
        max_depth=50,
        min_samples_split=5,
        min_samples_leaf=2,
        max_features='sqrt',
        class_weight='balanced_subsample',
    ))
])

# Fit and evaluate
pipeline.fit(X_train, y_train)
train_accuracy = pipeline.score(X_train, y_train)
test_accuracy = pipeline.score(X_test, y_test)
print(f"Train accuracy: {train_accuracy:.3f}, Test accuracy: {test_accuracy:.3f}")

The training accuracy of 1.0 (100%) with test accuracy of 0.946 (94.6%) indicates some overfitting, but it is not necessarily problematic in this case, because of the nature of Random Forests. Random Forests could achieve 100% training accuracy through their tree structure and ensemble nature, especially since the chosen model parameters create many deep trees, which eventually have reaching leaf nodes that contain samples from just one class.

Wrapping up

And that is a wrap! By now you should have a solid grasp of inner workings and/or implementation of HOG and how it can be combined with color histograms to build a feature-rich representation of images. These techniques are fantastic for tasks like object detection and classification. The Random Forest classifier performed pretty well on the flower dataset, but there is always room for improvement, maybe experimenting with other classifiers or tuning hyperparameters could get even better results.

For those interested in exploring more, Local Binary Patterns (LBP) is another powerful texture descriptor worth looking into. It is a simple yet effective technique that can complement HOG in certain scenarios.

Ultimately, the beauty of these methods lies in their adaptability. There is always more to uncover, more to discover, and sometimes the most interesting results come from just trying out new combinations!

From Code to Silicon: A Haskell and Clash Odyssey

Sun, 01 Sep 2024 16:58:03 GMT

Introduction

Round two: FIGHT! Ready to clash again? If you have been following along, you already know we have covered some important ground. If not, no worries, feel free to hop back to part one to get up to speed, before joining the action. It will make everything we do here a lot easier to follow.

Anyways, this is the second installment of the series. As promised previously, we will start off by quickly diving into some fundamental Haskell concepts - nothing too heavy, just enough to make sure we are all on the same page, and then move on to the well anticipated hardware development with Clash.

Haskell fundamentals

In imperative programming languages (C/C++, Java, Python, et cetera), we accomplish tasks by providing a sequence of code statements to execute. We can have a global state, we can define and re-define variables and utilized various control flow statements. Where as in purely functional programming languages we are much more limited in above mentioned aspects, and we focus on defining what things are, rather than specifying step-by-step actions for the computer to perform. For instance, instead of telling a computer how to calculate each Fibonacci number, in Haskell we would define a Fibonacci number as the sum of the two preceding ones, starting with 0 and 1.

Interactive environment

Assuming you have Haskell installed and ready to go, let's launch GHCI, the interactive environment of Haskell's compiler. To launch the interpreter, type

ghci

in command prompt.

You should be greeted with the following output (independent of the version).

Now, at this point, we could directly write and execute haskell code there, but for ease of management and future usage, let's create a dedicated file. This is not mandatory, but recommended! (Otherwise skip over to the next sub-section).

I called it 'prog.hs', the naming/location does not really matter, just make sure to add the appropriate file extension '.hs' and remember the file path.
Now having this file, we can compile it and execute it with a single command.
```
:load path_to_file/prog.hs
```
Having this in place, further on we only have to 'reload' the file after every modification we make.
```
:reload
```

Type system

As mentioned in the first series, Haskell has a strict static type system, which means that the compiler knowns the type of every expression during compile time, thus eliminating any rule violations. So, while it's not mandatory to write types ourselves, since the compiler can reason about our code and deduce the most generic types itself, it definitely is a good practice and will be needed later on, when we start working with Clash.

Let's try it ourselves!

:type 1
(con)1 :: Num a => a

As we can see, the compiler has deduced the type, and the format is as follows [Value] :: [Type]. Here's a brief breakdown of what it means:

"::" This symbol is read as 'has the type' and is used to specify the type of an expression.
"Num a => a" The first part before that arrow (Num a) is a type constraint.

Put together, it simply means that number '1' can be of any type 'a', as long as 'a' is part of type class 'Num'. We will get into type classes in the next subsection.

Classes, primitives & functions

In Haskell, classes are a way to group together types that share common set of functions or operations. This concept is analogue to interface data type in object-oriented languages. For example, already mentioned above, the 'Num' type class includes types that behave like numbers ('Int', 'Float', 'Integer', 'Double'), and it defines operations such as addition, multiplication, et cetera. Note! Even though types such as 'Int' and 'Float' are of the same type class, they are still distinct and can't be mixed directly in operations, you must convert the values to a common type.

Every great language starts with its 'alphabet', Haskell also has it's primitives, which generally are very similar to those proposed in other languages. To name a few, 'Bool', 'Char', 'Int', et cetera. While primitive types tend to cover many common needs, we can end up in a situation where defining our own data types is necessary for modeling complex or domain specific concepts. We can define a custom data type using the 'data' keyword.

data CarBrand = Ferrari | Bugatti | Toyota | Ford

In such a definition, 'CarBrand' is the name of the newly introduced data type (on type level), where as 'Ferrari', 'Bugatti', 'Toyota', and 'Ford' are the constructors (on value level, each of them represent a different value of 'CarBrand').

Haskell proposes quite a flexible approach when it comes to functions. Functions can be called either by infix or prefix statement.

Let's define our own function called 'successor' that returns the successor value of a given number.

successor :: Num a => a -> a
successor val = val + 1

successor 5
(con)6

Don't be confused! As visible, haskell separates it's function arguments by white space rather than having comma separated and enclosed with curly braces.

Let's turn our attention to pattern matching. that we are running a car rental dealership (we will reuse the already introduced custom type 'CarBrand'). We are grateful to own two different supercars, whose rental price greatly varies from the standard rentable cars. A solution we could come up with would be the following:

data CarBrand = Ferrari | Bugatti | Toyota | Ford

rentCost :: CarBrand -> Int 
rentCost Ferrari = 150
rentCost Bugatti = 800
rentCost anyOtherCar = 30

The function parameters are pattern matched in top-to-down manner, it simply means that provided car brand being 'Ferrari' would return the cost as 150, 'Bugatti' - 180, where as all of the remaining brands, 'Toyota', 'Ford', would yield the cost 30. While this works, we might run into a problem whose solution requires a different approach - not the regular pattern matching. In such an occasion we would turn towards case expressions or guards (examples attached below, respectively).

rentCost :: CarBrand -> Int 
rentCost car = case car of
    Ferrari -> 150
    Bugatti -> 800
    _ -> 30

data CarBrand = Ferrari | Bugatti | Toyota | Ford deriving (Eq)

rentCost :: CarBrand -> Int 
rentCost car 
    | car == Ferrari = 150
    | car == Bugatti = 800
    | otherwise = 30

(The last code block that uses guard approach introduces 'deriving (Eq)'. Haskell doesn't know how to evaluate the equivalence of the values for our custom data type, therefore either we must implement such equivalence class instance ourselves or we can derive from 'Eq', which in simple terms introduces the comparison logic for us).

Recursion & Higher-order functions

Recursion and higher-order functions are crucial strategies to understand in order to solve problems in functional programming approach. The term 'recursion' when applied to functions merely refers to the idea that the function has the ability to invoke itself various times, breaking down problems into smaller sub-problems, and terminate when a base condition is met.

Higher-order function is a function that takes other functions as arguments or returns a function as result.

You might ask, what does recursion have to do with higher order functions? Well, these two concepts often work together, with higher-order functions providing a 'framework' (abstraction) for recursive solutions. Let's put these ideas into action with some concrete examples!

Summing elements in a list (recursive approach).

sumList :: Num a => [a] -> a
sumList [] = 0
sumList (x:xs) = x + sumList xs

sumList [1, 2, 3] =
(con)1 + sumList [2, 3] =
(con)1 + (2 + sumList [3]) =
(con)1 + (2 + (3 + sumList [])) =
(con)1 + (2 + (3 + 0)) =
(con)1 + (2 + 3) = 1 + 5 =
(con)6

Here's a short breakdown of what is happening:

"Num a => [a] -> a" indicates that the function takes a list of numbers, denoted by [a], and returns a number.
"sumList [] = 0" defines the base case, which will act as recursion terminator upon being pattern matched to.
"(x:xs)" pattern matches x to be the list head element and xs to be the remaining list.
"sumList (x:xs) = x + sumList xs" recursively calculates the sum of the numbers in a list by breaking the problem into sub-problems (first element in the list + the first element of the remaining list, and so on...)

Summing elements in a list (higher-order function approach).

An alternative, often more elegant approach would be to use built in higher-order functions, for example, foldl.

Foldl takes the second argument and the first item of the list and applies the function to them, then feeds the function with this result and the second argument and so on.

sumListHigherOrder :: Num a => [a] -> a
sumListHigherOrder x = foldl (+) 0 x

To have a clearer understanding of 'foldl', let's see the arguments and their types it expects.

:type foldl
(con)foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b

Essentially, 'foldl' works on any 'Foldable' structure, such as lists, trees, et cetera. The first argument '(b -> a -> b)' is a binary function (in our case an accumulator function), that takes an accumulated value 'b' (initially 0), an element from a Foldable typeclass (in our case an element from a list), and returns a new accumulated value 'b' (performed by the '+' function). Argument 'b' is the initial accumulator value, 't a' represents a foldable element (list) of type 'a'.

Why bother with 'foldl' or higher-order functions in particular?

Consider them your hidden weapon in functional programming, akin to upgrading from a sedan to a sleek sports car. They make your code more concise, and in addition can introduce performance optimizations when working with complex or large data.

But there's more to the story! Here's the real kicker, Clash struggles with recursion when translating Haskell code into hardware description languages, but we are allowed to use higher-order functions that capture our recursive function patterns.

We'll dive deeper into that in the following section, but as a conclusion to take home, just know that using higher-order functions isn't just about writing elegant Haskell.

Introducing Clash

You've trained, you've prepared, now it's time to Clash! The main event starts now. Anyways, after all of the pre-requisites we have finally arrived at the core of the clash series - the part where we actually start working on hardware design. Let's not beat around the bush, and jump straight into some action!

Interactive environment

In a similar manner, let's proceed to launch clashi - the interactive environment of Clash compiler.

clashi

We should be greeted with the following message (ignore the warning).

It might take a moment for clashi to load. Be patient, the fun's almost here.

Combinational design

We will start off simple by creating a combinational circuit design. I've included a definition of combinational logic for those of you who might not be too familiar.

Combinational logic is a type of digital logic that is implemented by Boolean circuits, where the output is a pure function of the present input only. This is in contrast to sequential logic, in which the output depends not only on the present input but also on the history of the input.

If this still puzzles your mind, don't fret, a practical demonstration will shed some light.

Let's build our first digital circuit together! We will start off with a primitive yet fundamental building block - the 'AND' gate. It produces a 1 only when both of its inputs are 1. Create a file named 'ANDGate.hs' with the following initial code:

module ANDGate where
import Clash.Prelude

These preliminaries purely help preventing naming conflicts with other modules or the main program and imports the main clash library - Prelude.

We can define our 'AND' gate in the following way:

module ANDGate where
import Clash.Prelude

andBit :: Bit -> Bit -> Bit
andBit x y = x .&. y

We observe something new. The function 'andBit' takes two 'Bit' values as input and returns a resulting 'Bit' value (0 or 1). It uses the bitwise AND operator '.&.' to perform the operation on both input bits.

Load your file in the same manner as before and test the functionality.

:load path_to_file/prog.hs

andBit 1 0
(con)0

andBit 1 1
(con)1

That's it, you have just created your first digital circuit. Let's bring it to life by visualizing and analyzing its inner workings in the following subsection.

Visualizing circuits

Remember that AMD Xilinx tool we installed in the first series? Now it's time to put it to work! But let's not get ahead of ourselves, we have couples of things to do first:

We need to tell Clash which function should be compiled down to HDL
We need to compile the Clash code to our preferred HDL (either Verilog or VHDL)

To tackle the first endeavour let's modify 'ANDGate.hs' file.

module ANDGate where
import Clash.Prelude

andBit :: Bit -> Bit -> Bit
andBit x y = x .&. y

topEntity :: Bit -> Bit -> Bit
topEntity = andBit

The new addition is visible in line's 7 and 8. We have introduced a new function called 'topEntity', which serves as the entry point for our circuit.

When Clash is compiling our code to hardware description language, it needs to be aware of which function represents the top-level circuit (in our case 'andBit') - the circuit that connects to the 'external world' (by means of IO, et cetera). In essence, 'topEntity' informs Clash: "This is the function you should focus on when generating the hardware description language code".

To tackle the second step, we have to exit the 'clashi' environment (or open up a new command prompt window). Having done that, let's execute the following command:

clash --verilog path_to_file/ANDGate.hs

Give it a moment to complete. Once done, you will find a new folder named 'verilog/ANDGate' containing file 'topEntity.v'. This file contains our clash code in the format of hardware description. You can briefly examine it and try to identify the corresponding statements that implement the 'andBit' function, but we will not delve into specifics, however keep this file handy, as we will need it shortly.

/* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE.
** GENERATED BY CLASH 1.8.1. DO NOT MODIFY.
*/
`default_nettype none
`timescale 100fs/100fs
module topEntity
    ( // Inputs
      input wire  c$arg
    , input wire  c$arg_0

      // Outputs
    , output wire  result
    );

  assign result = c$arg & c$arg_0;

endmodule

Great, now let's open 'AMD Vivado' and load our design. Use the image below as a guide for the next steps. We will be creating a new project, adding our Verilog file (topEntity.v), and setting it as the top-level module. These steps will configure 'Vivado' to let us work with our AND gate design.

Step 1: Create a new projectStep 2: ProceedStep 3: Give your project a recognizable name, the default location is fineStep 4: Select the type to be RTL ProjectStep 5: From the dropdown select 'Add Files...'

Step 6: Locate the 'topEntity.v' file

Step 7: Project summary, proceedStep 8: Select the FPGA development board. This tutorial doesn't assume you have a physical board, we will be only simulating our designs. So, in this case the selected board choice does not really matter, but the board we opt for is 'Kria KV260 Vision AI Starter Kit SOM'.Step 9: Click on 'Open Elaborated Design' and wait for the process to finishStep 10: New 'Schematic' window appears that contains our design in RTL (Register-transfer level)Step 11: (Optional) Run SynthesisStep 12: (Optional) Press 'OK'. After finished, choose 'Schematic' under 'Open Synthesized Design' dropdownStep 13: New 'Schematic' window appears that contains our synthesized design (Note the difference, and remember, FPGA's don't contain AND gates!)

Analyzing results

Figure 1 shows the Register Transfer Level (RTL) schematic of our design. RTL schematic is a high-level abstraction of a circuit that models the flow between various components. In our case, it's made up of two input signals, whose names clash had generated based on the context it could perceive. These inputs are fed into an 'AND' gate, and the output of the AND gate represents the result of logical operation by our 'andBit' function.

While the RTL schematic depicts an AND gate for clarity and simplicity purposes, FPGAs actually implement logic functions using Look-up Tables (LUTs) instead (read more about it in the part one of the series). Figure 2 portrays that, it shows the corresponding LUT, named 'result_OBUF_instr_i_1', which is configured to perform the AND operation.

Sequential design

While combinatorial designs are great for certain applications, the world of digital circuits quite often demands more dynamic and sophisticated solutions. On this note, I welcome you to sequential designs. Sequential designs introduce circuit elements like registers (memory units) and a clock signal (periodic signal that allows us to synchronize the circuit). With these elements established we empower our circuits to keep track of past states and react to changes over time. This in turn opens up a vast array of possibilities - from creating counters and timers to implementing complex state machines.

To better understand sequential design development, let's create a counter that counts up or down based on the provided input control signal. We will employ a single register for keeping track of the current counter value, and a clock signal to ensure precise timing. Let's create a file named 'Counter.hs' and populate it with the following contents.

module Counter where
import Clash.Prelude

type Val = Unsigned 3

incrementer :: Val -> Val
incrementer v = v + 1

decrementer :: Val -> Val 
decrementer v = v - 1

counter :: (HiddenClockResetEnable dom) => Signal dom Bool -> Signal dom Val 
counter incr = state
    where 
        state = register 0 (mux incr (incrementer <$> state) (decrementer <$> state))

Don't worry if this feels a bit overwhelming at first, we will break down the code step by step.

At line 4, we start off by defining a type alias 'Val' that simply represents an unsigned 3-bit integer, meaning that it can hold values from 0 to 2^3-1. It mainly serves its purpose as a pseudonym making our code simpler and easier to read.

At line 6 through 10, we define a simple helper functions that increment or decrement our value.

At line 12, we have arrived at the type signature of our counter function. This is where things get exciting.

(HiddenClockResetEnable dom): This is a type constraint that ensures that the function will operate in a special domain where the clock, reset and enable signals are automatically handled for us. Which means that we don't have to worry about manually managing these signals.
Signal dom Bool -> Signal dom Val: This indicates that our function accepts a boolean signal as an input and returns a signal of type 'Val' while operating under the special domain.

At line 13 through 15, we encounter the body of our counter function. The 'state' is stored in a register, just as we had planned. Let's explore the type signature of a register element.

:type register
(con)register
(con)  :: (HiddenClockResetEnable dom, NFDataX a) =>
(con)     a -> Signal dom a -> Signal dom a

As visible, a register expects an initial value and a signal as input of the same type. Navigating by this guide, we initialize our register with a 0.

Furthermore in our case, the input signal to the register is determined by a multiplexer 'mux' that selects the first part '(incrementer <$> state)' given that boolean signal 'incr' is True, otherwise the second part '(decrementer <$> state)' is selected.

However, the current 'state' is on signal level, where as our 'incrementer' and 'decrementer' helper functions work with raw values, thus we we employ fmap (<$>), which applies a function to the values inside the signal, to correctly update it.

Great, let's simulate our implementation. Add the following line of code at the bottom of the file.

simulateCounter = simulate @System counter [True, True, True, False, False, False]

With that in place, load the file and run the simulation.

:load Counter.hs

simulateCounter
(con)[0,1,2,3,2,1,0,*** Exception: X: finite list
(con)CallStack (from HasCallStack):
(con)...

As visible, we can observe that our sequential design has the expected behavour. The counter was incremented three times and then decremented three times to end up at result 0.

Please not that the initial 0 in the output is a known artifact and can be ignored. It can be thought of as some kind of noise being spat out just as the 'system starts up'.

Visualization & analysis of the counter

To visualize our sequential circuit, we will follow a process similar to that of our combinatorial design. First and foremost, let's add the 'topEntity' definition. The final code looks as following:

module Counter where
import Clash.Prelude

type Val = Unsigned 3

incrementer :: Val -> Val
incrementer v = v + 1

decrementer :: Val -> Val 
decrementer v = v - 1

counter :: (HiddenClockResetEnable dom) => Signal dom Bool -> Signal dom Val 
counter incr = state
    where 
        state = register 0 (mux incr (incrementer <$> state) (decrementer <$> state))

simulateCounter = simulate @System counter [True, True, True, False, False, False]

topEntity :: Clock System -> Reset System -> Signal System Bool -> Signal System Val
topEntity clk rst incr = withClockResetEnable clk rst enableGen counter incr

With that in place, in a separate command prompt lets compile our Clash code into Verilog.

clash --verilog path_to_file/Counter.hs

A new folder is created 'verilog/Counter' containing 'topEntity.v' that contains the translated code.

/* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE.
** GENERATED BY CLASH 1.8.1. DO NOT MODIFY.
*/
`default_nettype none
`timescale 100fs/100fs
module topEntity
    ( // Inputs
      input wire  clk // clock
    , input wire  rst // reset
    , input wire  incr

      // Outputs
    , output wire [2:0] state
    );
  // Counter.hs:12:1-76
  reg [2:0] state_1 = 3'd0;
  // Counter.hs:12:1-76
  wire [2:0] t;
  // Counter.hs:12:1-76
  wire [2:0] f1;
  wire [2:0] result;

  // register begin
  always @(posedge clk or  posedge  rst) begin : state_1_register
    if ( rst) begin
      state_1 <= 3'd0;
    end else begin
      state_1 <= result;
    end
  end
  // register end

  assign t = state_1 + 3'd1;
  assign f1 = state_1 - 3'd1;
  assign result = incr ? t : f1;
  assign state = state_1;

endmodule

To visualize it we have to return back to 'AMD Vivado'. Luckily, we don't have to go through all of the steps again, instead we can simply replace the previous 'topEntity.v' file in our project with the new one.

Step 1: Remove the existing 'topEntity.v' file from sourcesStep 2: Choose option to add new sourcesStep 3: Choose 'Add Files' optionStep 4: Select the 'topEntity.v' source file for the counter circuit designStep 5: Click on 'Open Elaborated Design' and wait for the process to finish. New 'Schematic' window appears that contains our design in RTL (Register-transfer level)In Figure 3, you can see the corresponding design of our developed circuit. It contains three input signals - reset, clock and increment, a multiplexer and a register.

Wrapping up

Congratulations! You have successfully completed the second and final part of our Clash journey. We have delved into the fundamentals of Clash, explored combinatorial and sequential designs, and even created our own counter circuit.

However, the journey does not necessarily end here... There's a lot more to explore both in semantics and features of Clash and hardware development itself.

If you have purchased a physical FPGA board (which tend to be on the pricier end), you can generate a bitstream (a configuration file that tells the FPGA how to configure its internal circuitry) and load it onto the board. This is where you would truly see your designs in action.

Imaging building smart IoT devices that control your household, custom hardware accelerators, or even embedded systems that power anything from cards to industrial machines. The possibilities are endless!

Getting Started with Clash: Haskell’s Hardware Description Language for Modern Hardware Design

Sun, 01 Sep 2024 16:57:01 GMT

Introduction

Hardware development tends to be more specialized, and compared to software engineers, there are fewer professionals in this field. The development process itself imposes various challenges, such as high manufacturing and development costs, scalability, reliability validation, intellectual property protection, et cetera. Despite these challenges, (experienced) professionals in hardware development will remain in high demand for many years to come, as many companies and individuals will continue to rely on existing technology, and engineers will remain crucial as hardware adapts and improves.

The hardware field is indeed expansive, with various solutions and architectures available for different challenges. In this series, we’ll concentrate on development specifically for FPGA (Field Programmable Gate Array) boards.

Does this sound intriguing to you? Are you ready to take on the challenge? Then follow me, as we together, explore the realm of hardware development.

What is an FPGA?

You are likely familiar with CPUs (Central Processing Units), the workhorses of modern computing. CPU's are hardware components that are general purpose and often excel in performance, essentially they are structured in a way that would allow firmware or operating system to utilize it in any way necessary to solve various (extensive) tasks. However, for solving a certain problem we might not need such a versatile or performance efficient hardware component. When looking at problem specific available options in the market, we arrive at couple alternatives - ASIC (Application Specific Integrated Circuit), MPSoC (Multi Processor System On Chip), FPGA (Field Programmable Gate Array). Note! If you are interested in the distinctive benefits each technology introduces, feel free to explore more, unfortunately it's out of scope for these series.

At this point you have been told that FPGAs are used to solve problem specific tasks. That indeed is quite vague, let's expand on that. First and foremost, the hardware structure of FPGAs greatly differs to that compared to a standard CPU. The basic architecture of an FPGA is made up of Configurable Logic Blocks (CLB's). Furthermore, these CLB's are made up of smaller components, such as Flip-Flops (memory units), Look-up Tables (containing hardwired outputs for every combination of inputs), Multiplexers. However, pay careful notice, we did NOT mention any logic gates (AND, OR, XOR, NOT, NAND, NOR and XNOR). As a matter of fact, that is correct, FPGA fabric contains no such basic gates, instead the required logic is performed by means of Look-up tables. An FPGA would contain hundreds or (usually) thousands of such CLB's, that when linked together solve complex logic functions.

Moreover, more expensive and advanced boards contain Hard IP's (Hard Intellectual Properties) - chip blocks that are etched into the silicon and perform a certain task very well (in a way better than the same solution made of CLB's). Some examples to mention: ARM CPU, Block RAM, et cetera.

FPGAs have a wide range of applications, they particularly excel in parallelism. To name few examples:

High-performance computing: Accelerating scientific simulations, data analysis, and machine learning tasks.
Image/audio processing: Accelerating image/audio processing, encoding or decoding.
Embedded systems: Complex control units in industrial automation, aerospace.

Why Haskell & Clash?

These days, 'Hardware Description Language' (HDL), is used to write the great majority of hardware projects. These are declarative languages that let circuit designers explain how their designs behave. Although there are many HDLs available, Verilog and VHDL are the most widely used.

This figure displays a simple implementation of an 'AND' gate in VHDL and Verilog, respectively.

I've included a short example of both of the descriptive languages above. I will not dive into too much detail, that would be beyond the scope of these series, for this there are plentiful of documentation and tutorials. Coming from a software development background, such HDL languages did not necessarily tickle my fancy.

Looking at alternatives, I came across a functional hardware description language - Clash. Clash is built upon the ideology of Haskell, from which both the syntax and semantics are borrowed. Functional properties of Clash, such as purity (no implied side effects, function output depends only on its input values, and no global state that could be altered), polymorphism, strict typing, and higher-order functions are the driving factors that enable developers to write concise and maintainable code both for combinational and sequential circuits.

How awesome is that?

Installation & Setup

There are couple tools we need to install to get started. The following steps will guide you to set everything up for a windows machine.

GHC & Cabal

GHC is a Haskell compiler, where as Cabal is a tool for building and packaging Haskell libraries and applications. The Haskell Downloads Page provides an installation guide, or GHCup which proposes a straight-forward and user friendly textual user interface. If you do decide to take the manual route, rather than utilizing GHCup, I can recommend the stable release version 9.8.2 for GHC, and 3.10.3.0 for Cabal.

Clash

Having completed the first step, proceed with clash installation.

Install clash-ghc by running the following command in CMD: cabal v1-install clash-ghc. Occasionally, this will require several runs of the installation command before everything compiles. Hence, if Cabal reports an error, check if the command continues past the error point by running it again. (Try cabal install clash-ghc, if the command v1-install is not present).

Validate that everything was successfully installed by entering clashi in CMD. You should be greeted with the clash interactive environment.

Xilinx Vivado

This tool is more demanding in terms of system requirements. It will come in handy in future series when we are going to be visualizing our circuit RTL (Register Transfer Language) schematics. For now, let's install it and set it aside for later use.

To install the necessary software, you'll need to create an AMD account. You can sign up here: AMD Signup Page.
Head to the downloads page: AMD Xilinx Vivado Download. The version of the software you want to download is 2024.1, namely the AMD Unified Installer for FPGAs & Adaptive SoCs 2024.1: Windows Self Extracting Web Installer.
Follow the Setup Wizard (I have attached couple of images below to aid the installation process).

The final installation size is quite significant, at least 18GB. It is recommended to use stable and quick internet connection.

Wrapping up

The first section of the series is now complete. By talking about the particular difficulties in hardware development, we have set the foundation. Additionally, I briefly highlighted the vastness of the hardware field, emphasizing the upcoming focus on FPGA boards and the modern alternatives to traditional HDLs, particularly Clash, which leverages Haskells powerful functional programming paradigm.

We managed to also cover the essential setup steps, from installing GHC, Cabal and Clash to setting up Xilinx Vivado.

In the following series we will begin with a brief overview of Haskell, learn the fundamentals of Clash, and then work on implementing and analyzing various circuit designs.