Introduction

In this blog post, we will explore some effective programming patterns from Clojure that can also be applied to other programming languages. We will discuss the benefits of a bottom-up approach and how REPL-based languages like Clojure facilitate this process. Additionally, we will emphasize the importance of separating pure and impure functions in order to create more modular and scalable code.

Fear not! The code is going to be in python to show those advantages to non-clojurians.

Clojure and REPL-based Programming

Clojure is a REPL-based programming language, which allows for quick and easy testing of small components. REPL stands for Read-Eval-Print Loop, and it is an interactive programming environment that reads user inputs, evaluates them, and returns the result. This feature makes Clojure well-suited for a bottom-up approach, where you start with small, isolated examples and build up from there.

The Power of Pure Functions and Decoupling Impure Functions

When writing code, it's important to differentiate between pure and impure functions. Pure functions are those that always produce the same output given the same input and have no side effects. I.e they are deterministic functions. Whereas impure functions can produce different outputs for the same input because of some dependency on some external state or have side effects that affect the program's state.

For instance a impure function is going to be a random function. It has an internal state and is not deterministic, depends on the computer entropy, etc.

A pure function would be something that ran multiple times with the same inputs would always give the same output (and wouldn't affect the program's state like changing a global variable).

Using pure functions can lead to code that is easier to test, debug, and maintain. It also allows for better separation of concerns, as you can isolate the parts of your code that deal with IO or other side effects from the parts that focus on computation.

In languages like Haskell, impure parts are pushed to the edges of the program, making the majority of the code consist of pure functions. While Clojure does not enforce this separation as strictly as Haskell, it is still beneficial to apply a similar principle in your code.

First you isolate some example, and use this to develop a function that only applies to it. Then, if you got a map or an array/list, you use map to apply this function to the whole.

Practical Example: Identifying Long Lines in Files

Let's look at a practical example of how to apply these concepts in Clojure. In this example, we will create a program that reports all lines in a file that have more than 80 characters.

Step 1: Process a Single File

First, we will create a function that processes a single file:

(def file-content (slurp "file"))

(defn line-numbers-seq
  []
  (map inc (range)))

(defn get-files-lines
  [content]
  (->> content
       str/split-lines
       (zipmap (line-numbers-seq))))

(defn get-lines-too-long
  [content]
  (->> content
       get-files-lines
       (filter #(> (count (second %)) 80))))

In the code above, we have separated the impure IO operation (slurp) from the pure functions that process the file content. This makes the code easier to test and understand.

Step 2: Process Multiple Files

Next, we will extend our program to process multiple files:


(def list-of-files ["file1", "file2"])

(->> list-of-files
     (map slurp)
     (map get-lines-too-long)
     (zipmap list-of-files))

By keeping the IO operations and pure function application separate, our code is more composable and easier to maintain.

Conclusion

In this blog post, we explored the benefits of using a bottom-up approach and REPL-based programming languages like Clojure. We also discussed the importance of separating pure and impure functions to create more modular and scalable code.

By applying these concepts, you can improve the quality of your code and make it easier to test, debug, and maintain. Whether you are working in Clojure or another programming language, consider incorporating these patterns and principles to enhance your programming skills and the projects you work on.

A small example :

Imagine a program that has to report all files that have longer than 80 characters.

First thing would be to take a file content, then split it by '\n'. We want to keep line numbers so zip it with a range, then filter out lines that have less. ,#+begin_src clojure

(def file-content (slurp "file"))

; Range that starts at 1. This is lazy so we don't care about memory here (defn line-numbers [] (map inc (range)))

(def file-lines (->> file-content str/split-lines ; -> ["line1" "line2" …] (zipmap (line-numbers)))) ; -> {1 "line1", 2 "line2", …}

(def lines-too-long (->> file-lines ; Filter out all lines that are less than 80 chars (filter #(> (count (second %)) 80)))) ; ([1 "toolong"])

#+end_src

See in the example above how we move the IO impurity by putting only at the top. If we refactor our examples, we get functions that don't depend on IO which is great for testing (both unit testing and REPL).

Now we can refactor it :


(def file-content (slurp "file"))

                                        ; Range that starts at 1. This is lazy so we don't care about memory here
(defn line-numbers-seq
  []
  (map inc (range)))

(defn get-files-lines
  [content]
  (->> content
       str/split-lines ; -> ["line1" "line2" ...]
       (zipmap (line-numbers)))) ; -> {1 "line1", 2 "line2", ...}

(defn get-lines-too-long
[content]
(->> content
     get-file-lines
     ; Filter out all lines that are less than 80 chars
     (filter #(> (count (second %)) 80)))) ; ([1 "toolong"])
import re

def file_content(file_name):
    with open(file_name, "r") as file:
        return file.read()

def get_files_lines(content):
    lines = content.splitlines()
    return zip(range(1, len(lines)), lines)

def get_lines_too_long(content):
    file_lines = get_files_lines(content)
    return [(num, line) for num, line in file_lines.items() if len(line) > 80]

Then if we want to apply it to multiple files :

filenames = ["file1", "file2"]

def main():
    def read_file(name):
        with open(name, 'r') as file:
            return file.read()

    file_contents = [read_file(name) for name in filenames]
    files_lines = [get_lines_too_long(content) for content in file_contents]
    res = zip(filenames, files_lines)
    return res

(def list-of-files ["file1", "file2"])

(->> list-of-files
     ; read each file into a variable -- Note this doesn't take the big files
     ; into account, which is fine for now, this is a demo. We could use a reader
     ; instead.
     (map slurp)
     (map get-lines-too-long)
     (zipmap list-of-files)
     ) ; -> {"file1" {[1 "toolong"]} "file2" {}}

We separate IO use and pure function application which makes our code more composable.