Thought Process Behind a Cuckoo Nest
TL;DR;
Source code at khanhhua/cuckoo
A word on the cuckoo bird
I did not learn of the infamous trick of the cuckoo bird from books or Discover Channel but rather from Doraemon the Japanese manga by the late beloved authors under the name Fujiko Fujio. When it is time that mother cuckoo laid her egg, she will deliver in a foreign nest. The baby cuckoo will then hatch and parasitically live off on its naive surrogate mom, which cannot distinguish her biological offsprings from the adopted one.
Synopsis
This library generates combinations of randomized data for any giving templates which should mimic the your project data model and mass generate them for testing purposes.
Request
curl --location --request POST 'http://localhost:3000/' \
--header 'Content-Type: application/json' \
--data-raw '{
"customer": "fullname",
"primary_email": "email",
"secondary_email": "email",
"home_address": "address",
"dob": "past-date",
"employer": {
"name": "company",
"domain": "domain"
}
}'
Response
[
{
"customer": "Mia Robinson",
"dob": "2011-09-25",
"employer": {
"domain": "youronlinechoices.com",
"name": "Pfizer"
},
"home_address": "946 Albemarle Avenue, Oklahoma",
"primary_email": "mia.tierradelfuego_2266@bing.com",
"secondary_email": "james.kyushu_9063@researchgate.net"
},
{
"customer": "Benjamin Campbell",
"dob": "1984-11-23",
"employer": {
"domain": "rambler.ru",
"name": "Verizon Communications"
},
"home_address": "453 Aalto Place, Maine",
"primary_email": "liam.redrock_8292@www.over-blog.com",
"secondary_email": "william.melville_9613@nps.gov"
}
]
The idea is very much similar to https://fakerjs.dev. Therefore, I cannot claim any novelty but rather that a well understood problem domain should help me reason in the FP paradigm. The endgoal is multifold:
- To better understand Haskell and FP
- To reason in FP
- To deliver a practical (in the wild yet domesticated) Haskell product
I am going to focus on the second point within this article.
Design and Thought Process
Building Blocks
To be honest, from the beginning of the project I totally forgot how to generate random values in Haskell. I had to constantly refer to examples from the book Haskell in Depth regarding random value generation. Basically, the example shows us how to generate series of random number with a state monad:
random :: RandomGen g => g -> (a, g)
-- To retrieve a random
g <- newStdGen
(value, _nextG) = random g
state :: (s -> (a, s)) -> m a
runState :: State s a -> s -> (a, s)
statefulRandom = state random
seriesOfRandom = replicateM 5 statefulRandom
-- To retrieve a series of 5 random
(value, _nextG) = runState seriesOfRandom g
However, I want to be able to retrieve a random text from existing text files, instead of just any number or just any series of characters. In Haskell, that would involve the IO monad. Somehow, I gotta stick IO
somewhere in one of the definitions above. Well – modifying the signature of random :: g -> (a, g)
makes the most sense to me!
That is, a function which generates randoms with IO capability should take on the following appearance.
randomIO :: g -> IO (a, g)
It reads: randomIO is a function that takes in a Random Generator g
and returns an IO
computation context that returns a tuple of the random value and the next state of the generator. But how to determine what a
is?
newtype
In Haskell we can introduce new data types that take in another data type as its parameter. In C# or Java, we have such a concept implemented as generic classes, like ICollection<T>
or Enumerable<T>
. As a pure functional programming language, Haskell treats functions as first class citizens: generic functions are equal with generic classes. Enter newtype…
newtype Fake a = Fake
{ runFake :: StdGen -> IO (a, StdGen)
}
It reads: Here is a new type named Fake a
.
- An instance of
Fake a
is constructed withFake
followed by a function which takes in an instance ofStdGen
and returns anIO
computation context that returns a tuple of a value of typea
and the next state ofStdGen
instance. runFake
is the getter for instances ofFake a
. CallingrunFake
will return the encapsulated function.
I then implemented a text file based enum random generator. Keep in mind that I would be reading texts from different files.
fakeString :: FilePath -> Fake String
fakeString p = Fake f
where
f :: StdGen -> IO (String, StdGen)
f gen = do
names <- words <$> readFile p
let
(num, nextG) = uniformR (0, length names) gen
pure (names !! num, nextG)
Exercise: There was actually a logical error in this implementation which got fixed later on. Can you spot it?
Now I can have a variety of random generators with different text categories, e.g. common last names and first names.
fakeFirstName = fakeString "data/first-names.txt"
fakeFamilyName = fakeString "data/family-names.txt"
How to run these randomizers?
main :: IO ()
main = do
g <- newStdGen
-- The following would generate a random first name
(fname, nextG) <- runFake fakeFirstName g
print fname
-- The following would generate a random family name
(lname, _nextG) <- runFake fakeFamilyName nextG
print lname
Allowing Customization via Composition
However, first names and last names normally come hand in hand as fullnames. A randomization based on the list of last names and first names should generate an m . n
amount of possible outcomes (where m
and n
are respectively number of lines in each file). I don’t wanna sit there and manually copy and paste items from both files into a new text file called fullnames.txt
!
Indeed Haskell is great at composing! One rule of composing is that: Thing A and thing B of the same type Thing
must together result in something also of type Thing
. In our case study, the type of Thing
is Fake String
. Thus composing two things of the same type means:
construct :: String -> String -> String
-- I will discuss the role of mapping function right below
fakeFullname = construct <$> fakeFirstName <*> fakeFamilyName
g :: StdGen
(value, _nextG) <- runFake fakeFullname g
It reads: combine fakeFirstName
randomizer with fakeFamilyName
randomizer, whose outputs (of String
data type) then get processed by a construct function, which transforms two strings into one string.
Note
To Haskellers out there, the above reading does not really describe truthfully the order of evaluation of <$>
and <*>
, both of which are infixl 4
functions. I am not lying nor misleading, but rather inviting people into a lift inside the ivory tower of technical jargons.
Now you ask, what good is it to have another layer of indirection, at the cost of learning two more mystic symbols? Scalability. It is all turtle down to the bottom, so to speak. I plan to create further randomizers other than just the existing two, namely:
- fakeHouseNumber
- fakeStreetName
- fakeStateName
- fakeEmail
The pattern of composing these all together remains the same and almost boring.
construct :: String -> String -> String -> String -> String -> String -> String
fakeProfile = construct
<$> fakeFirstName
<*> fakeFamilyName
<*> fakeEmail
<*> fakeHouseNumber
<*> fakeStreetName
<*> fakeStateName
Even to the point when fakeProfile
is consumed, it is exactly the same as fakeFullname
g :: StdGen
(profile, _nextG) <- runFake fakeProfile g
(fullname, _nextG) <- runFake fakeFullname g
Exercise: Do you notice the similarity between profile
and fullname
?
I love it when things behave at larger scale the same way at smaller scale. Generating a random string appears so much similar to contructing a random profile, ie. with runFake randomizer g
. The internal complexity is always encapsulated while the public interface is retained! How then do we make sure this pattern can be in use? We need to implement two “interfaces”, or type class
es as we call it in Haskell, i.e. Functor
and Applicative
.
instance Functor Fake where
fmap f (Fake randomizer) = Fake $ \g -> do
(r, nextG) <- randomizer g
pure (f r, nextG)
instance Applicative Fake where
pure randomizer = undefined
(<*>) :: (Fake (a -> b)) -> (Fake a) -> (Fake b)
(Fake randomizer1) <*> (Fake randomizer2) = Fake $ \g -> do
(f, nextG1) <- randomizer1 g
(r, nextG2) <- randomizer2 nextG1
pure (f r, nextG2)
The two interfaces speak nothing of the usage of our Fake
randomizers. Why fmap f
? What is pure
? To be honest, one must read articles like Applicative Functors or even Notions of Computation as Monoids. However, as software engineer myself, I don’t see the point (yet) to fully apprehend the theories behind it. As far as I am concerned, these two type classes describes the relationships between functions (or “object structures”) in Haskell. To put in plain English, Functor Fake
instructs the compiler how to apply a given function of type a -> b
, where a
is the same type parameter as in Fake a
, to the content of instance of type Fake a
in order to achieve an instance of type Fake b
. It is a bit harder to repeat the same exercise for Applicative a
. Therefore, I would like to “rehydrate” all the type parameters a
and b
(so that you can see what a beautiful mess C# developers have been putting up with for ages).
(<*>) :: (Fake (a -> b)) -> (Fake a) -> (Fake b)
-- in the case of newtype Fake can be expanded as
(<*>) :: (Fake (StdGen -> IO (a -> b, StdGen)))
-> (Fake (StdGen -> IO (a, StdGen)))
-> (Fake (StdGen -> IO (b, StdGen)))
It works as follows:
- Construct a new instance of
Fake
with the constructorFake (StdGen -> IO (a, StdGen))
- Unpack the content of
(Fake (StdGen -> IO (a -> b, StdGen)))
so that we could get a functionf :: a -> b
- Unpack also the content of
(Fake (StdGen -> IO (a, StdGen)))
to retriever :: a
- Apply the function
f
onr
and return according to the contract defined by newtypeFake
Such is how the relationship between a
and b
in a computation context defined. However, you may ask, “How come we end up with Fake (a -> b)
? How do we write a function that generates a random function that generates a random value?” Answer: you have seen it done! With fmap
! An instance of Fake (a -> b)
is created as follows:
fakeFirstName :: Fake String -- Fake (StdGen -> (a, StdGen))
f :: String -> String -> String -- :: a0 -> a -> b
fmap f fakeFirstName -- :: f <$> fakeFirstName
-- :: (a0 -> a -> b) <$> (Fake (StdGen -> (a, StdGen)))
-- >> (Fake (StdGen -> (a -> b, StdGen)))
And that is the little secret behind chaining <$>
with a series of <*>
. Haskell evaluates multiparameter functions partially when given not sufficient parameter values.
fakeProfile = construct
<$> fakeFirstName
<*> fakeFamilyName
<*> fakeEmail
-- (String -> String -> String -> String) <$> (Fake String) <*> (Fake String) <*> (Fake String)
-- ⬇
-- (Fake (String -> String -> String))<*> (Fake String) <*> (Fake String)
-- ⬇
-- (Fake (String -> String)) <*> (Fake String)
-- Fake String
It is a stumbling block for Haskell beginners (me included) to wrap their head around such abstractions. Other authors may recommend different paths to understand Haskell concepts, including OOP analogies. But I would beg to differ: I strongly suggest you watch the YouTube series on Category Theory I by Dr.Bartosz Milewski. So that you could rid yourselves of OOP baggage, especially when you come from Java or C# world.
Exercise: Could you fix the partial implementation by providing the definition for pure randomizer
?
Exposing to the WWW: Mapping between Categories
As library, cuckoobird
can be consumed to generate random complexed value however we want, depending on how we shape the construct
function. However, I would like to offer cuckoobird
as a webservice, in which user-submitted JSON templates will turn into random values of predefined structures, instead of submitting pure Haskell code!
+-----------+ +--------------+ +------------------+
| Template | | Config | | Structure of |
| | | | | Randoms |
| +---->| +---->| |
| | | | | |
+-----------+ +--------------+ +------------------+
For example, a template such as:
{
"customer": "fullname",
"dob": "past-date",
"company": "company",
"domain": "domain"
}
should yield
{
"customer": "William Allen",
"dob": "1976-02-11",
"domain": "eonline.com",
"company": "Kroger"
}
where “fullname”, “past-date”, “company” and “domain” are keys in a lookup table of predefined randomizers.
The challenge
Up to this point, all my randomizers are defined in lib/CuckooLib.hs
. It must not require me to modify this module in order to implement a graph-like random value. I must be able to compose instances of Fake a
, where a
is a graph-like structure.
Jumping thru hoops
Hoop #1
Before we could charge full speed forwards, let walk back a bit, while training our eyes on the final goal: Transforming graph-like object to a graph-like structure of random values. As a baby step forward, let us first transform a list of Fake a
into a list of random values.
listOfRandomizers :: [Fake String]
listOfRandomizers = [fakeFullname, fakeCompany, fakePastDate, fakeDomain]
Now that our runFake
function does not accept [Fake a]
but rather Fake [a]
, we need to transform [Fake a]
into Fake [a]
. Out of the box, Haskell provides us with a function for this very purpose, i.e. sequenceA
or “A sequence of applicatives”.
applicativeOfSequence :: Fake [String]
applicativeOfSequence = sequenceA listOfRandomizers
g <- newStdGen
(values, _nextG) <- runFake applicativeOfSequence g
-- values = ["James Williams","AT&T","2007-01-28","amazon.co.jp"]
Hoop #2
One hurdle finished, some more to go! Because in the end, we will apply Map.fromList
on [(String, String)]
to construct Map String String
, which is JSON serializable with aeson
library.
Exercise: We see the pattern quite often now:
- Applying
runFake
onFake String
yields aString
- Applying
runFake
onFake [String]
yields a list[String]
- Applying
runFake
on what to achieve a list[(String, String)]
?
Answer: Fake [(String, String)]
(where the second string is the random value)
Since the second string refers to a random value, it should also be clear the raw material from which we compose instances of Fake [(String, String)]
is [(String, Fake String)]
. Putting the steps back to back, we have the following series of transformations:
asApplicative :: (String, Fake a) -> Fake (String, a)
asApplicative (attr, faker) = (attr,) <$> faker
-- (attr,) is enabled via TupleSection extention. It is a short hand for: f value = (attr, value)
toListOfFakerTuples :: [(String, Fake String)] -> [Fake (String, String)]
toListOfFakerTuples configs = map asApplicative configs
toFakeListOfTuples :: [Fake (String, String)] -> Fake [(String, String)]
toFakeListOfTuples listOfFakerTuples = sequenceA listOfFakerTuples
The flow of transition should look like follows
[(String, Fake String)]
==(map asApplicative)=> [Fake (String, String)]
==(sequenceA) => Fake [(String, String)]
Hoop #3:
It is time that we think about how to turn a template [(String, String)]
(or [(Attribute, Value)]
) into [(String, Fake String)]
. We need a lookup table and a lookup function:
tableOfCuckoos :: [(String, Fake String)]
tableOfCuckoos =
[ ( "first-name", fakeFirstName )
, ( "family-name", fakeFamilyName )
, ( "fullname", fakeFullname )
, ( "email", fakeEmail )
]
lookupCuckooGen :: String -> Maybe (Fake String)
lookupCuckooGen = flip lookup tableOfCuckoos
Obviously map
ping over a [(String, String)]
with a function String -> Maybe (Fake String)
yields [(String, Maybe (Fake String))]
instead of [(String, Fake String)]
.
template :: [(String, String)]
configs :: [(String, Maybe (Fake String))]
configs = map lookupCuckooGen template
We approach this problem the same way as we did in Hoop #2.
[(String, Maybe (Fake String))]
==> [Maybe (String, Fake String)]
==> Maybe [(String, Fake String)]
asApplicative :: (String, Maybe a) -> Maybe (String, a)
asApplicative (attr, maybeFaker) = (attr,) <$> maybeFaker
toListOfMaybeTuples :: [(String, Maybe a)] -> [Maybe (String, a)]
toListOfMaybeTuples configs = map asApplicative configs
toMaybeListOfTuples :: [Maybe (String, a)] -> Maybe [(String, a)]
toMaybeListOfTuples listOfMaybeTuples = sequenceA listOfMaybeTuples
I’m having dejavu now! I believe I have seen these three functions before! Right, in Hoop #2, when we were working on Fake a
, (String, Fake a)
to ultimately achieve Fake [(String, a)]
. The beauty of Haskell is this: we apply the same functions for the same concepts. No extra interfaces is required. No extra ivory classes. Because we live in one already – HAR HAR HAR.
To be honest, I have been learning a lot about Haskell thanks to hlint
. The above fragment could be rewritten as follows:
toMaybeListOfTuples :: [(String, Maybe a)] -> Maybe [(String, a)]
toMaybeListOfTuples configs = sequenceA . map asApplicative configs
which hlint
recommends rephrasing to
toMaybeListOfTuples configs = traverse asApplicative configs
which hlint
also recommends eta-reduction:
toMaybeListOfTuples = traverse asApplicative
What about asApplicative
? We have also seen its definition with regard to Fake a
asApplicative :: (String, Fake a) -> Fake (String, a)
asApplicative (attr, faker) = (attr,) <$> faker
asApplicative :: (String, Maybe a) -> Maybe (String, a)
asApplicative (attr, maybeFaker) = (attr,) <$> maybeFaker
What is in common between Fake a
and Maybe a
? Both can be acted on by (<$>)
– TADA! They are both functors. Therefore, the function asApplicative
should be rewritten as:
asApplicative :: Functor f => (String, f a) -> f (String, a)
asApplicative (attr, f) = (attr,) <$> f
See commit app/CuckooNest.hs@73aa98
Hoop #4:
We have gone a long distance to cover newtype Fake a
, Functor
, Applicative
and how to transform different structures like Fake
, Tuple
, List
, Map
and Maybe
. We have managed to transform [(String, String)]
into Fake [(String, String)]
(via the use of a lookup table). Our final hoop is to transform a graph-like template into a graph-like structure of random values. I suggest the graph-like structure should have the following recursive definition:
data Graph a
= Object (Map String (Graph a))
| Leaf a
A graph-like template could be define as type Template = Graph String
template :: Template
template = Object $ M.fromList
[ ("customer", Leaf "fullname")
, ("dob", Leaf "past-date")
, ("company", Object $ M.fromList
[ ("name", Leaf "company")
, ("domain", Leaf "domain")
]
)
]
so that a graph-like random structure could be achieved as follows:
type FakeGraph = Graph String
fakeGraph :: FakeGraph
fakeGraph = Object $ M.fromList
[ ("customer", Leaf "William Allen")
, ("dob", Leaf "1976-02-11")
, ("company", Object $ M.fromList
[ ("name", Leaf "Kroger")
, ("domain", Leaf "eonline.com")
]
)
]
You could immediately notice that the relationship between the assigned value to Leaf
nodes in Template
and FakeGraph
. They are keys in the lookup table being mapped to faker random result. However, we must take into account the recursive nature of our graph-like structure. In other words, for each node in the graph structure of Template:
- If it is a Leaf:
- Look up the predefined faker by node value
- If a faker is found, run the faker to extract a random value
- If a faker is not found, it is an Exercise for you now
- If it is an Object Map:
- Step into the nested structure
- and repeat
Without a surprise, Haskell provides us with a function for this very purpose, i.e. traverse
or “Apply the same action to a traversable structure”. The signature of traverse
goes like follows:
traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
In Hoop #3, we have observed traverse
in action once:
toMaybeListOfTuples :: [(String, Maybe a)] -> Maybe [(String, a)]
toMaybeListOfTuples configs = traverse asApplicative configs
-- (a -> f b) -> t a -> f (t a)
which demonstrates that “List of Tuples” (or more generally [a]
) is an instance of Traversable
. Therefore, to turn [(String, Graph a)]
“inside out” to Graph [(String, a)]
, Graph
must be an instance of Traverable
. To implement Tranversable
, a data type should be a Functor
and a Foldable
. We can modify its definition by auto-deriving the typeclass.
class (Functor t, Foldable t) => Traversable t where
-- ...
data Graph a
= Object (Map String (Graph a))
| Leaf a
deriving (Foldable)
instance Functor Graph where
fmap f (Leaf a) = Leaf $ f a
fmap f (Object m) = Object $ f' <$> m
where
f' (Leaf a) = Leaf $ f a
f' (Object m) = Object $ fmap f' m
instance Traversable Graph where
traverse f (Leaf a) = Leaf <$> f a
traverse f (Object m) = Object <$> traverse f' m
where
f' (Leaf a) = traverse f (Leaf a)
f' (Object m') = traverse f (Object m')
Could you believe that now the work is almost done, without even changing a line of code in the CuckooLib.hs
file? Transforming a template to graph-like structure of random value is now done with some lines of code:
lookupCuckooGen :: String -> Maybe (Fake String)
lookupCuckooGen = flip lookup tableOfCuckoos
-- traverse :: (a -> f b) -> t a -> f (t b)
-- t :: Graph a
-- a :: String
-- f :: Maybe b
-- b :: Fake String
-- traversable :: (a -> f b)
-- output :: f (t b) ==> Maybe (Graph (Fake String))
maybeGraph :: Maybe (Graph (Fake String))
maybeGraph = traverse lookupCuckooGen template
-- String Maybe Graph(Fake String) Graph String
-- (a -> f (t b )) -> (t a)
unwrapping maybeGraph
gives us a Graph (Fake String)
, upon which we can apply sequenceA
…
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
graphOfFakes :: Graph (Fake String)
graphOfFakes = fromJust maybeGraph
fakeGraph :: Fake (Graph String)
fakeGraph = sequenceA graphOfFakes
-- Graph (Fake String) ==> Fake (Graph String)
-- t (f a) f (t a)
And there you go, by applying runFake
over Fake (Graph String)
you will land Graph String
, which is the nested recursive structure of random values.
g <- newStdGen
(result, _nextG) <- runFake fakeGraph g
Exercise: How do you convert JSON input into Graph String
and result :: Graph String
back into JSON bytestring?
Exercise: How do you generate an array of Graph String
for given template? I would like to have multiple items in a single response.
Exercise: How would you adapt the current implementation to the requirement for randomized integers (instead of only strings)?