TL;DR;

Source code at khanhhua/cuckoo

A word on the cuckoo bird

I did not learn of the infamous trick of the cuckoo bird from books or Discover Channel but rather from Doraemon the Japanese manga by the late beloved authors under the name Fujiko Fujio. When it is time that mother cuckoo laid her egg, she will deliver in a foreign nest. The baby cuckoo will then hatch and parasitically live off on its naive surrogate mom, which cannot distinguish her biological offsprings from the adopted one.

Synopsis

This library generates combinations of randomized data for any giving templates which should mimic the your project data model and mass generate them for testing purposes.

Request

curl --location --request POST 'http://localhost:3000/' \
--header 'Content-Type: application/json' \
--data-raw '{
    "customer": "fullname",
    "primary_email": "email",
    "secondary_email": "email",
    "home_address": "address",
    "dob": "past-date",
    "employer": {
        "name": "company",
        "domain": "domain"
    }
}'

Response

[
    {
        "customer": "Mia Robinson",
        "dob": "2011-09-25",
        "employer": {
            "domain": "youronlinechoices.com",
            "name": "Pfizer"
        },
        "home_address": "946 Albemarle Avenue, Oklahoma",
        "primary_email": "mia.tierradelfuego_2266@bing.com",
        "secondary_email": "james.kyushu_9063@researchgate.net"
    },
    {
        "customer": "Benjamin Campbell",
        "dob": "1984-11-23",
        "employer": {
            "domain": "rambler.ru",
            "name": "Verizon Communications"
        },
        "home_address": "453 Aalto Place, Maine",
        "primary_email": "liam.redrock_8292@www.over-blog.com",
        "secondary_email": "william.melville_9613@nps.gov"
    }
]

The idea is very much similar to https://fakerjs.dev. Therefore, I cannot claim any novelty but rather that a well understood problem domain should help me reason in the FP paradigm. The endgoal is multifold:

  • To better understand Haskell and FP
  • To reason in FP
  • To deliver a practical (in the wild yet domesticated) Haskell product

I am going to focus on the second point within this article.

Design and Thought Process

Building Blocks

To be honest, from the beginning of the project I totally forgot how to generate random values in Haskell. I had to constantly refer to examples from the book Haskell in Depth regarding random value generation. Basically, the example shows us how to generate series of random number with a state monad:

random :: RandomGen g => g -> (a, g)
-- To retrieve a random
g <- newStdGen
(value, _nextG) = random g

state ::      (s -> (a, s)) -> m a
runState ::              State s a -> s -> (a, s)
statefulRandom = state random
seriesOfRandom = replicateM 5 statefulRandom

-- To retrieve a series of 5 random
(value, _nextG) = runState seriesOfRandom g

However, I want to be able to retrieve a random text from existing text files, instead of just any number or just any series of characters. In Haskell, that would involve the IO monad. Somehow, I gotta stick IO somewhere in one of the definitions above. Well – modifying the signature of random :: g -> (a, g) makes the most sense to me!

That is, a function which generates randoms with IO capability should take on the following appearance.

randomIO :: g -> IO (a, g)

It reads: randomIO is a function that takes in a Random Generator g and returns an IO computation context that returns a tuple of the random value and the next state of the generator. But how to determine what a is?

newtype

In Haskell we can introduce new data types that take in another data type as its parameter. In C# or Java, we have such a concept implemented as generic classes, like ICollection<T> or Enumerable<T>. As a pure functional programming language, Haskell treats functions as first class citizens: generic functions are equal with generic classes. Enter newtype…

newtype Fake a = Fake
   { runFake :: StdGen -> IO (a, StdGen)
   }

It reads: Here is a new type named Fake a.

  • An instance of Fake a is constructed with Fake followed by a function which takes in an instance of StdGen and returns an IO computation context that returns a tuple of a value of type a and the next state of StdGen instance.
  • runFake is the getter for instances of Fake a. Calling runFake will return the encapsulated function.

I then implemented a text file based enum random generator. Keep in mind that I would be reading texts from different files.

fakeString :: FilePath -> Fake String
fakeString p = Fake f
  where
    f :: StdGen -> IO (String, StdGen)
    f gen = do
      names <- words <$> readFile p
      let
        (num, nextG) = uniformR (0, length names) gen
      pure (names !! num, nextG)

Exercise: There was actually a logical error in this implementation which got fixed later on. Can you spot it?

Now I can have a variety of random generators with different text categories, e.g. common last names and first names.

fakeFirstName = fakeString "data/first-names.txt"
fakeFamilyName = fakeString "data/family-names.txt"

See commit Main.hs@e48d63b

How to run these randomizers?

main :: IO ()
main = do
  g <- newStdGen
  -- The following would generate a random first name
  (fname, nextG) <- runFake fakeFirstName g
  print fname
  -- The following would generate a random family name
  (lname, _nextG) <- runFake fakeFamilyName nextG
  print lname

Allowing Customization via Composition

However, first names and last names normally come hand in hand as fullnames. A randomization based on the list of last names and first names should generate an m . n amount of possible outcomes (where m and n are respectively number of lines in each file). I don’t wanna sit there and manually copy and paste items from both files into a new text file called fullnames.txt!

Indeed Haskell is great at composing! One rule of composing is that: Thing A and thing B of the same type Thing must together result in something also of type Thing. In our case study, the type of Thing is Fake String. Thus composing two things of the same type means:

construct :: String -> String -> String
-- I will discuss the role of mapping function right below
fakeFullname = construct <$> fakeFirstName <*> fakeFamilyName

g :: StdGen
(value, _nextG) <- runFake fakeFullname g

It reads: combine fakeFirstName randomizer with fakeFamilyName randomizer, whose outputs (of String data type) then get processed by a construct function, which transforms two strings into one string.

Note

To Haskellers out there, the above reading does not really describe truthfully the order of evaluation of <$> and <*>, both of which are infixl 4 functions. I am not lying nor misleading, but rather inviting people into a lift inside the ivory tower of technical jargons.

Now you ask, what good is it to have another layer of indirection, at the cost of learning two more mystic symbols? Scalability. It is all turtle down to the bottom, so to speak. I plan to create further randomizers other than just the existing two, namely:

  • fakeHouseNumber
  • fakeStreetName
  • fakeStateName
  • fakeEmail

The pattern of composing these all together remains the same and almost boring.

construct :: String -> String -> String -> String -> String -> String -> String
fakeProfile = construct
            <$> fakeFirstName
            <*> fakeFamilyName
            <*> fakeEmail
            <*> fakeHouseNumber
            <*> fakeStreetName
            <*> fakeStateName

Even to the point when fakeProfile is consumed, it is exactly the same as fakeFullname

g :: StdGen
(profile, _nextG) <- runFake fakeProfile g
(fullname, _nextG) <- runFake fakeFullname g

Exercise: Do you notice the similarity between profile and fullname?

I love it when things behave at larger scale the same way at smaller scale. Generating a random string appears so much similar to contructing a random profile, ie. with runFake randomizer g. The internal complexity is always encapsulated while the public interface is retained! How then do we make sure this pattern can be in use? We need to implement two “interfaces”, or type classes as we call it in Haskell, i.e. Functor and Applicative.

instance Functor Fake where
  fmap f (Fake randomizer) = Fake $ \g -> do
    (r, nextG) <- randomizer g
    pure (f r, nextG)

instance Applicative Fake where
  pure randomizer = undefined
  (<*>) :: (Fake (a -> b)) -> (Fake a) -> (Fake b)
  (Fake randomizer1) <*> (Fake randomizer2) = Fake $ \g -> do
    (f, nextG1) <- randomizer1 g
    (r, nextG2) <- randomizer2 nextG1
    pure (f r, nextG2)

The two interfaces speak nothing of the usage of our Fake randomizers. Why fmap f? What is pure? To be honest, one must read articles like Applicative Functors or even Notions of Computation as Monoids. However, as software engineer myself, I don’t see the point (yet) to fully apprehend the theories behind it. As far as I am concerned, these two type classes describes the relationships between functions (or “object structures”) in Haskell. To put in plain English, Functor Fake instructs the compiler how to apply a given function of type a -> b, where a is the same type parameter as in Fake a, to the content of instance of type Fake a in order to achieve an instance of type Fake b. It is a bit harder to repeat the same exercise for Applicative a. Therefore, I would like to “rehydrate” all the type parameters a and b (so that you can see what a beautiful mess C# developers have been putting up with for ages).

(<*>) :: (Fake (a -> b)) -> (Fake a) -> (Fake b)
-- in the case of newtype Fake can be expanded as
(<*>) :: (Fake (StdGen -> IO (a -> b, StdGen))) 
      -> (Fake (StdGen -> IO (a, StdGen))) 
      -> (Fake (StdGen -> IO (b, StdGen)))

It works as follows:

  1. Construct a new instance of Fake with the constructor Fake (StdGen -> IO (a, StdGen))
  2. Unpack the content of (Fake (StdGen -> IO (a -> b, StdGen))) so that we could get a function f :: a -> b
  3. Unpack also the content of (Fake (StdGen -> IO (a, StdGen))) to retrieve r :: a
  4. Apply the function f on r and return according to the contract defined by newtype Fake

Such is how the relationship between a and b in a computation context defined. However, you may ask, “How come we end up with Fake (a -> b)? How do we write a function that generates a random function that generates a random value?” Answer: you have seen it done! With fmap! An instance of Fake (a -> b) is created as follows:

fakeFirstName :: Fake String    -- Fake (StdGen -> (a, StdGen))
f :: String -> String -> String -- :: a0 -> a -> b
fmap f fakeFirstName            -- :: f              <$> fakeFirstName 
                                -- :: (a0 -> a -> b) <$> (Fake (StdGen -> (a, StdGen)))
                                -- >>                    (Fake (StdGen -> (a -> b, StdGen)))

And that is the little secret behind chaining <$> with a series of <*>. Haskell evaluates multiparameter functions partially when given not sufficient parameter values.

fakeProfile = construct
            <$> fakeFirstName
            <*> fakeFamilyName
            <*> fakeEmail
-- (String -> String -> String -> String)  <$> (Fake String) <*> (Fake String) <*> (Fake String)
--                                          ⬇
--                        (Fake (String -> String -> String))<*> (Fake String) <*> (Fake String)
--                                          ⬇
--                                          (Fake (String -> String))          <*> (Fake String)
--                                                                                  Fake String

It is a stumbling block for Haskell beginners (me included) to wrap their head around such abstractions. Other authors may recommend different paths to understand Haskell concepts, including OOP analogies. But I would beg to differ: I strongly suggest you watch the YouTube series on Category Theory I by Dr.Bartosz Milewski. So that you could rid yourselves of OOP baggage, especially when you come from Java or C# world.

See commit Main.hs@0de61a7

Exercise: Could you fix the partial implementation by providing the definition for pure randomizer?

Exposing to the WWW: Mapping between Categories

As library, cuckoobird can be consumed to generate random complexed value however we want, depending on how we shape the construct function. However, I would like to offer cuckoobird as a webservice, in which user-submitted JSON templates will turn into random values of predefined structures, instead of submitting pure Haskell code!

+-----------+     +--------------+     +------------------+
| Template  |     |  Config      |     |  Structure of    |
|           |     |              |     |  Randoms         |
|           +---->|              +---->|                  |
|           |     |              |     |                  |
+-----------+     +--------------+     +------------------+

For example, a template such as:

{
    "customer": "fullname",
    "dob": "past-date",
    "company": "company",
    "domain": "domain"
}

should yield

{
    "customer": "William Allen",
    "dob": "1976-02-11",
    "domain": "eonline.com",
    "company": "Kroger"
}

where “fullname”, “past-date”, “company” and “domain” are keys in a lookup table of predefined randomizers.

The challenge

Up to this point, all my randomizers are defined in lib/CuckooLib.hs. It must not require me to modify this module in order to implement a graph-like random value. I must be able to compose instances of Fake a, where a is a graph-like structure.

Jumping thru hoops

Hoop #1

Before we could charge full speed forwards, let walk back a bit, while training our eyes on the final goal: Transforming graph-like object to a graph-like structure of random values. As a baby step forward, let us first transform a list of Fake a into a list of random values.

listOfRandomizers :: [Fake String]
listOfRandomizers = [fakeFullname, fakeCompany, fakePastDate, fakeDomain]

Now that our runFake function does not accept [Fake a] but rather Fake [a], we need to transform [Fake a] into Fake [a]. Out of the box, Haskell provides us with a function for this very purpose, i.e. sequenceA or “A sequence of applicatives”.

applicativeOfSequence :: Fake [String]
applicativeOfSequence = sequenceA listOfRandomizers

g <- newStdGen
(values, _nextG) <- runFake applicativeOfSequence g
-- values = ["James Williams","AT&T","2007-01-28","amazon.co.jp"]
Hoop #2

One hurdle finished, some more to go! Because in the end, we will apply Map.fromList on [(String, String)] to construct Map String String, which is JSON serializable with aeson library.

Exercise: We see the pattern quite often now:

  • Applying runFake on Fake String yields a String
  • Applying runFake on Fake [String] yields a list [String]
  • Applying runFake on what to achieve a list [(String, String)]?

Answer: Fake [(String, String)] (where the second string is the random value)

Since the second string refers to a random value, it should also be clear the raw material from which we compose instances of Fake [(String, String)] is [(String, Fake String)]. Putting the steps back to back, we have the following series of transformations:

asApplicative :: (String, Fake a) -> Fake (String, a)
asApplicative (attr, faker) = (attr,) <$> faker
-- (attr,) is enabled via TupleSection extention. It is a short hand for: f value = (attr, value)

toListOfFakerTuples :: [(String, Fake String)] -> [Fake (String, String)]
toListOfFakerTuples configs = map asApplicative configs

toFakeListOfTuples :: [Fake (String, String)] -> Fake [(String, String)]
toFakeListOfTuples listOfFakerTuples = sequenceA listOfFakerTuples

The flow of transition should look like follows

[(String, Fake String)]
    ==(map asApplicative)=> [Fake (String, String)]
    ==(sequenceA)        => Fake [(String, String)]

app/CuckooNest.hs@73aa98

Hoop #3:

It is time that we think about how to turn a template [(String, String)] (or [(Attribute, Value)]) into [(String, Fake String)]. We need a lookup table and a lookup function:

tableOfCuckoos :: [(String, Fake String)]
tableOfCuckoos =
  [ ( "first-name", fakeFirstName )
  , ( "family-name", fakeFamilyName )
  , ( "fullname", fakeFullname )
  , ( "email", fakeEmail )
  ]

lookupCuckooGen :: String -> Maybe (Fake String)
lookupCuckooGen = flip lookup tableOfCuckoos

Obviously mapping over a [(String, String)] with a function String -> Maybe (Fake String) yields [(String, Maybe (Fake String))] instead of [(String, Fake String)].

template :: [(String, String)]
configs :: [(String, Maybe (Fake String))]
configs = map lookupCuckooGen template

We approach this problem the same way as we did in Hoop #2.

[(String, Maybe (Fake String))]
    ==> [Maybe (String, Fake String)]
    ==> Maybe [(String, Fake String)]
asApplicative :: (String, Maybe a) -> Maybe (String, a)
asApplicative (attr, maybeFaker) = (attr,) <$> maybeFaker

toListOfMaybeTuples :: [(String, Maybe a)] -> [Maybe (String, a)]
toListOfMaybeTuples configs = map asApplicative configs

toMaybeListOfTuples :: [Maybe (String, a)] -> Maybe [(String, a)]
toMaybeListOfTuples listOfMaybeTuples = sequenceA listOfMaybeTuples

I’m having dejavu now! I believe I have seen these three functions before! Right, in Hoop #2, when we were working on Fake a, (String, Fake a) to ultimately achieve Fake [(String, a)]. The beauty of Haskell is this: we apply the same functions for the same concepts. No extra interfaces is required. No extra ivory classes. Because we live in one already – HAR HAR HAR.

To be honest, I have been learning a lot about Haskell thanks to hlint. The above fragment could be rewritten as follows:

toMaybeListOfTuples :: [(String, Maybe a)] -> Maybe [(String, a)]
toMaybeListOfTuples configs = sequenceA . map asApplicative configs

which hlint recommends rephrasing to

toMaybeListOfTuples configs = traverse asApplicative configs

which hlint also recommends eta-reduction:

toMaybeListOfTuples = traverse asApplicative

What about asApplicative? We have also seen its definition with regard to Fake a

asApplicative :: (String, Fake a) -> Fake (String, a)
asApplicative (attr, faker) = (attr,) <$> faker

asApplicative :: (String, Maybe a) -> Maybe (String, a)
asApplicative (attr, maybeFaker) = (attr,) <$> maybeFaker

What is in common between Fake a and Maybe a? Both can be acted on by (<$>) – TADA! They are both functors. Therefore, the function asApplicative should be rewritten as:

asApplicative :: Functor f => (String, f a) -> f (String, a)
asApplicative (attr, f) = (attr,) <$> f

See commit app/CuckooNest.hs@73aa98

Hoop #4:

We have gone a long distance to cover newtype Fake a, Functor, Applicative and how to transform different structures like Fake, Tuple, List, Map and Maybe. We have managed to transform [(String, String)] into Fake [(String, String)] (via the use of a lookup table). Our final hoop is to transform a graph-like template into a graph-like structure of random values. I suggest the graph-like structure should have the following recursive definition:

data Graph a
  = Object (Map String (Graph a))
  | Leaf a

A graph-like template could be define as type Template = Graph String

template :: Template
template = Object $ M.fromList
  [ ("customer", Leaf "fullname")
  , ("dob", Leaf  "past-date")
  , ("company", Object $ M.fromList
      [ ("name", Leaf "company")
      , ("domain", Leaf "domain")
      ]
    )
  ]

so that a graph-like random structure could be achieved as follows:

type FakeGraph = Graph String

fakeGraph :: FakeGraph
fakeGraph = Object $ M.fromList
  [ ("customer", Leaf "William Allen")
  , ("dob", Leaf  "1976-02-11")
  , ("company", Object $ M.fromList
      [ ("name", Leaf "Kroger")
      , ("domain", Leaf "eonline.com")
      ]
    )
  ]

You could immediately notice that the relationship between the assigned value to Leaf nodes in Template and FakeGraph. They are keys in the lookup table being mapped to faker random result. However, we must take into account the recursive nature of our graph-like structure. In other words, for each node in the graph structure of Template:

  • If it is a Leaf:
    • Look up the predefined faker by node value
    • If a faker is found, run the faker to extract a random value
    • If a faker is not found, it is an Exercise for you now
  • If it is an Object Map:
    • Step into the nested structure
    • and repeat

Without a surprise, Haskell provides us with a function for this very purpose, i.e. traverse or “Apply the same action to a traversable structure”. The signature of traverse goes like follows:

traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)

In Hoop #3, we have observed traverse in action once:

toMaybeListOfTuples ::                                  [(String, Maybe a)] -> Maybe [(String, a)]
toMaybeListOfTuples configs = traverse asApplicative    configs
--                                     (a -> f b)    -> t a                 -> f (t a)

which demonstrates that “List of Tuples” (or more generally [a]) is an instance of Traversable. Therefore, to turn [(String, Graph a)] “inside out” to Graph [(String, a)], Graph must be an instance of Traverable. To implement Tranversable, a data type should be a Functor and a Foldable. We can modify its definition by auto-deriving the typeclass.

class (Functor t, Foldable t) => Traversable t where
-- ...

data Graph a
  = Object (Map String (Graph a))
  | Leaf a
  deriving (Foldable)

instance Functor Graph where
  fmap f (Leaf a) = Leaf $ f a
  fmap f (Object m) = Object $ f' <$> m
    where
      f' (Leaf a)   = Leaf $ f a
      f' (Object m) = Object $ fmap f' m

instance Traversable Graph where
  traverse f (Leaf a) = Leaf <$> f a
  traverse f (Object m) = Object <$> traverse f' m
    where
      f' (Leaf a)    = traverse f (Leaf a)
      f' (Object m') = traverse f (Object m')

Could you believe that now the work is almost done, without even changing a line of code in the CuckooLib.hs file? Transforming a template to graph-like structure of random value is now done with some lines of code:

lookupCuckooGen :: String -> Maybe (Fake String)
lookupCuckooGen = flip lookup tableOfCuckoos

-- traverse :: (a -> f b) -> t a -> f (t b)
-- t :: Graph a
-- a :: String
-- f :: Maybe b
-- b :: Fake String
-- traversable :: (a -> f b)
-- output :: f (t b) ==> Maybe (Graph (Fake String))

maybeGraph :: Maybe (Graph (Fake String))
maybeGraph = traverse lookupCuckooGen                    template
--                     String    Maybe Graph(Fake String)     Graph String
--                    (a      -> f    (t     b         )) -> (t     a)

unwrapping maybeGraph gives us a Graph (Fake String), upon which we can apply sequenceA

sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)

graphOfFakes :: Graph (Fake String)
graphOfFakes = fromJust maybeGraph

fakeGraph :: Fake (Graph String)
fakeGraph = sequenceA graphOfFakes
--                    Graph (Fake String) ==> Fake (Graph String)
--                    t     (f    a)          f    (t     a)

And there you go, by applying runFake over Fake (Graph String) you will land Graph String, which is the nested recursive structure of random values.

g <- newStdGen
(result, _nextG) <- runFake fakeGraph g

Exercise: How do you convert JSON input into Graph String and result :: Graph String back into JSON bytestring?

Exercise: How do you generate an array of Graph String for given template? I would like to have multiple items in a single response.

Exercise: How would you adapt the current implementation to the requirement for randomized integers (instead of only strings)?