A New Year Resolution: F# Practice – Frequency chart, some graphics

2014 will be the year I get stuck into F# and more – so the grand plan:

  • The basics, starting with the sample below
  • Computation expressions and Async/Parallel basics
  • Digging into Deedle/R
  • Hadoop
  • MongoDB and Neo4j

The projects at time of writing will start as simple as the one below, however the current plan is:

  • DNA-Ribosome string simulator – this will require parallel and agent based coding to enable it to handle hundreds of thousands of bases, compile them into actual protein lists. This relies on external data such as http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=tgencodes
  • Simple Gene expression programming – this will allow me to get the hang of code quotations and splicing, may lead to the next project
  • Neural networking – this may be combined with the above project as both subjects I’ve only got a beginner’s view of!** Like a few bloggers, I am finding that my own site is a good place to review code solutions that I’ve puzzled out for myself. As I’m not a fan of the cut n paste via google method, I’ve got to wrestle with syntax line by line. So here is where I’ll keep the results. Today is a couple of hours foundation work using charts, pattern matching and a few other basic ideas – the chart below is something useful to budding cryptography students: it shows the most commonly occurring letters in a sample of text. The next stage is to make the functions work as parallel units for multi file reading.

frequency chart

open FSharp.Charting
open System.Windows.Forms
open System.Drawing

let counter(str:string,ch:char) = str.ToCharArray() 
                                    |> Array.sumBy(fun s ->
                                                        let x = 0
                                                        match s with 
                                                        | c when ch = s-> x + 1
                                                        | _ -> x)

let chrs = ['e';'a';'o';'s';'t'] 
let sample = "tesqatetweafwefwlvuioneuiounrv;onr;rvoian;c;oi;oijewrrfaa"
let character = chrs |> List.map(fun x-> counter (sample ,x)) 
                                        |> List.zip chrs 
                                        |> List.map (fun (a, b)-> string a,b)

let frequencyChart = character |> Chart.Bar

let form = new Form(Visible = true , 
                    TopMost = true,
                    Width = 700, 
                    Height = 500)

form.Controls.Add( new ChartTypes.ChartControl(frequencyChart, 
                                               Dock = DockStyle.Fill))
Application.Run(form)

For a diversion, I had a play with an idea I had to do with a thing I saw in Logo years ago:

pattern

open System.Windows.Forms
open System.Drawing

let showForm() =
    let form = new Form(TopMost = true, ClientSize = new Size(600,400))
    form.Show()
    form.Paint.Add(fun pe -> pe.Graphics.Clear(Color.White)
                             for i in 0 .. 40 do
                             pe.Graphics.DrawLine(Pens.DarkMagenta, 1, 1, 300 , i*10)
                             pe.Graphics.DrawLine(Pens.Black,1,400,300,i * 10)
                             pe.Graphics.DrawLine(Pens.Red, 600,400,300, i* 10)
                             pe.Graphics.DrawLine(Pens.Green, 600,1, 300,i* 10)
                               )

showForm()

New Years Resolution, DNA, GA and more

For the past day or so a little diagram for the transcription of DNA/RNA -> Amino acid bases has been going around my head. I think it’s a project to be done 🙂

So this is a personal challenge that currently is months/years outside of my current experience level, which is the point.

To do it, I’ve got to understand sufficient of several fields at once to make the thing meaningful.

The broad outlines start with the DNA transcriber. Using the simplest elements of F#, the first stage will be construct a tree data structure suitable for hunting down triplets of bases.

So we have cytosine, guanine, adenine and thymine that are used to code for 20 or so amino acids and stop/start points. This is a great candidate for a computation expression as the project will need to be fully parallel.

The full project will require both parallel and agent based models, make use of online data sources and will have the ultimate aim of translating arbitrary lengths of DNA into proteins.

As it stands, the basic components lend themselves well to being very parallel.

Following this project will be two further exploratory F# pieces. The first will be basic neural networking, the second will be about gene expression programming. The latter will be a deeper experiment to see how plausible generating neural networks by GEA/GA techniques will be.

Code generation has always been a minor interest to me, so this project will stretch this notion to the limit. Generate neural networks efficiently from evolved GEA code? I may find this is a naïve view of the field, after all I’m very much a novice at coding.

That just makes it more interesting.

Eventually the projects may join up but that’s a whole other thing.

That’s the New Year’s Resolution – dig into some serious AI work as described. This will ensure that the languages I need to use will be pushed to their limits.

A fuller description may be added to this post later!

F# Monad / Computation Expressions and a chart

A two hour hack and write session:

A first go at a computation expression. Based on a simple coding challenge that was much harder in interview than at home, I decided to go a step further and see how quickly I could grok enough of computation expressions to make one work. Below is the version 1. The short explanation is define a function that takes a value AND another function as a parameter and utilise the latter function in it’s return values. In the type below this goes a bit further because we have members that can be used in various ways. I’ve scratched the surface as I wanted to ensure I disposed of my streamReader each time. Next stage, multithreads!

let filePath = new System.IO.DirectoryInfo(@"c:\dropbox")

let countWords(x: string) = x.Split(',').Length

type fileStream() =
    member this.Using( x : FileInfo, c: string -> int)=
                                use streamReader = x.OpenText().ReadToEndAsync()
                                streamReader.Result |> c
    member this.Return(x : string ) = countWords x 

let str = fileStream()

let WordCount = filePath.EnumerateFiles("*.csv", SearchOption.AllDirectories) 
                         |> Seq.sumBy(fun x -> str.Using(x, countWords) ) 

printfn "%i" WordCount

These are considered quite hard, unless your already very skilled. However, as I look at this, the hard part is getting the syntax right as conceptually it’s not quite as bad as all that. UPDATE – A few beers later. I decided to add something more visual to this demo, this of course after a few beers with my better half.

open System.IO
open Microsoft.FSharp
open FSharp.Charting
open System.Windows.Forms
open System.Drawing

let filePath = new System.IO.DirectoryInfo(@"e:\Dropbox")

let wordCount ( x: string) = x.Split(',').Length

type streamRead() = 
    member this.Using( file : FileInfo, c: string -> int) = 
        use x = file.OpenText()
        x.ReadToEnd() |>  c  
let stream = streamRead()


let FileTotal = filePath.EnumerateFiles("*.csv", SearchOption.AllDirectories) 
                        |>  Seq.map ( fun x -> stream.Using(x, wordCount) )  |> Chart.Line

let form = new Form(Visible = true, TopMost = true, Width = 700, Height = 500)

form.Controls.Add(new ChartTypes.ChartControl(FileTotal, Dock = DockStyle.Fill))
Application.Run(form) 

It produces a dazzling, exciting and enlightening chart – oh alright, it’s the very bare minimum but still it’s a quickie: Chart1

Simple substitution cipher breaker

Some months back a website offered up a small challenge, figure out a password they had encrypted. As there wasn’t enough text to do a statistical analysis (read: figure out e,s etc) I resorted to brute force. Try all of them. Twenty minutes if hacky VB got me this which got me: `

  Public Class codeBreakerFrm

    Private Sub Main()
          InitializeComponent()
    End Sub

    Private Sub GetDecodedText(integers As List( Of Integer))
        cipherText = String.Join( ",", integers.Select(Function (x) x))
        For Each letterComboSet In letterMatrix
            For Each code In integers
                decodedLetters += letterComboSet(code)
            Next
            decodedLetters += ControlChars.NewLine
        Next
    End Sub

    Private Sub getLetterList()
        For index = 1 To 26
            Dim i = index
            Dim letters As New Dictionary(Of Integer , Char )
            For Each c In "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                If i > 26 Then i = 1
                letters.Add(i, c)
                i += 1
            Next
            letterMatrix.Add(letters)
        Next
    End Sub

    Private Property letterMatrix As New List( Of Dictionary(Of Integer , Char ))
    Private Property decodedLetters As String = ""
    Private Property cipherText As String = ""

    Private Sub codeBreakerFrm_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        getLetterList()
        Dim CodedStuff As New List( Of Integer) From {8, 6, 15, 2, 7, 17, 19, 2, 18}
        GetDecodedText(CodedStuff)
        TextBox1.Text = cipherText
        revealedText.Text = decodedLetters
    End Sub

End Class

Meditation

I’m learning to meditate, only in a simple way, however every time I do, this kind of thing happens:

<RecursionToPosteriorHazard>

To program anything, it must be quantifiable, knowable, decideable.

If it can be represented by number, by comparison or by decision it is computable.

The real task of the programmer is understanding how to make the real world computable.

Once done, the rest is merely details.

Lots of details.

</RecursionToPosteriorHazard>

What should you learn first about programming?

Abstraction – Why?

Because complex ideas are built from simpler ones
,that are built from simpler ones
, built from even simpler ones
….

Values and variables– Why?

 Because some things should not change.
,other things need to change but often rely on those that don’t.

Loops and recursion-Why?

 Because the world is built from cycles and rhythms
,iterations of period
,growth and gradual change.

Choices and decisions-Why?

Because life itself constantly chooses
,variably decides
,yet some values stay the same.

Data structures-Why?

 Because everything needs somewhere logical to live
,and structure itself is an essence of life.

Algorithms and heuristics-Why?
Because life is a process of processes of processes:
some exact, some fuzzy
,some downright random.

How to solve problems-Why?
Solving problems ultimately drives the evolution of life
,no matter how random the forces themselves appear

How to learn something new- teach a ten year old!

Teaching a ten year old!

I’ve started a personal mission quite recently to master maths that I would have taken at A level if I’d done those. It’s a lot easier to learn algebra if one has done some programming; it’s a lot easier to appreciate.

However, its easier to understand just what you understand when you teach it to a ten year old. Especially when said ten year old only needs maybe fifteen minutes of a pen and paper explanation to just get it.

The ten year old, he’s my nephew and some teachers who love their jobs,will love teaching him. As for me, maybe I can pinch some reflected glory as he becomes a top scientist or something?

A few weeks back, we covered sets, differential calculus – power rule, list comprehensions and transposition ciphers. As I said, he’s quick! I’m teaching, it clarifies ones own understanding because somehow there’s a need to see the other guy ‘get it’. That driver alone can deepen intuition and open up a new way to see the world. I guess if you want to understand something, try teaching it!
image