CSC148

Tuesday 1 April 2014

Sorting and Efficiency

Everyone has sorted a deck of cards at least once, or sorted a set of numbers. These tasks seem relatively simple to us, however it may surprise one to find out how complicated sorting can be for a a computer. Analyzing different sorting algorithms can give one a better understanding of how many different ways sorting can be done, and is a great starting point to learn about algorithm efficiency.

One of the first sorts that most people learn is insertion sort. The insertion sort algorithm is as follows:

Start at second item in the list

Compare it to the item that precedes it

If the item is less than the item that precedes it, swap the items.

Continue until the item that precedes it is less than the selected item or the item is at the end of the list

This process can be visualized by the following gif:

http://en.wikipedia.org/wiki/File:Insertion-sort-example-300px.gif

Algorithms worst case run-time can be described in big- oh notation. If a sorting algorithm has a run time that grows proportional to the size of the list (n) the run time would be O(n). In insertion sorts case the worst case run time is O(n^2). Insertion sort has a run time of O(n^2) when the list is sorted in a reversed order. When the list is reversed all n items will be compared to all the other n items, n*n = n^2 run time.

There are many ways to describe the run-time of an algorithm. Big - Oh describes the worst case run time, big- omega describes the best case run time, and big- theta describes the run time with both a minimum and maximum. The insertion sort algorithm described previously has a best case run time of Ω(n) on an already sorted list.

Another sort with a better run time than insertion sort is Merge sort. Merge sort is recursive algorithm, that continuously divides the size of the list in half, sorts them separately, then merges the two halves together. The base case of this algorithm is a list of size two. The process can be visualized by the following gif:

http://en.wikipedia.org/wiki/File:Merge-sort-example-300px.gif

The worst case run time of this algorithm is O(nlogn), the best case is O(nlogn). This is an example of an algorithm with a tight-bound run time (meaning the best and worst case are of the same degree). The algorithms run time can also be described as Θ(nlogn).

A great resource for viewing various sorting algorithms is http://www.sorting-algorithms.com/

In the real world, often data is partially sorted, and some sorting algorithms can take advantage of this. A sort that does this is Timsort. Timsort is pythons built in sort, and has a worst case run time of O(nlogn) and a best case of O(n). Timsort combines insertion sort and merge sort, and as a result has a better minimum run time that merge sort.

There are many different types of sorting algorithms, some better than others. Some sorting algorithms are better for certain types of data than others. By analyzing the run times of various algorithms, it can help one make informed decisions about what algorithm to use on a set of data. More importantly analyzing sorting algorithms is a great way to introduce efficiency to students.

Sunday 9 March 2014

Struggles with A2 part 1: Specializing flexibly

In assignment 2 part 1 we were asked to design a classes or class to represent various regex trees. The three types of trees specified in this assignment were Bar("|"), Dot(".") and Star("*"). There are specific restrictions on the number of children given to these trees. Bar and Dot trees can have one or two children, while Star can only have one child.

Initially I decided to model regex trees using as single general RegexTree class. This class would take a root and a list of children. Depending on the given root the RegexTree would impose restrictions on the number of children. There are a number of benefits to this model, first of all all of the trees and subtrees in a regex tree would the same class, in addition only one class needs to be written. However this design has poor flexibiltiy, and would make expanding the types of regex trees allowed difficult.

This approach would look something like this,

class RegexTree():
def __init__(root,nodes):
if root == '*'
make sure number of nodes is equal to 1
if root == "|"
make sure number of nodes is equal to 1 or 2
if root == "."
make sure number of nodes is equal to 1 or 2
self.root = root
self.nodes = nodes

Another approach would be to design a very general regex class with no restrictions on the type of root and number of children. Then for every type of tree (Bar, Dot and Star) design a class that inherits the general regex tree class. This approach means writing four classes instead of one, and overriding the specific classes __repr__() methods. However, I believe this approach is superior to the single RegexTree class approach. Instead of one class specialized for the three tree types, this approach allows for easier expansion for any type of regex tree. The downside to this approach is writing more code and each tree is not the same class.

This approach would look like this,
class RegexTree():
def __init__(root,nodes):
self.root = root
self.nodes = nodes

class BarTree(RegexTree):
def __init__(nodes):
root = "|"
make sure number of nodes is 1 or 2
super().__init__(root, nodes)
...Similar approach for Dot and Star trees

Both of these approaches have their pros and cons. The first approach is shorter, and all trees and subtrees would be the same class. However, the first approach does not allow for easy expansion of the types of trees. The second approach allows for easy expansion of the types of trees, simply add a new subclass of RegexTree. This approach requires more code, and does not have the benefit of all trees and subtrees being the same class.

For this part of the assignment I decided that the downsides of the second approach do not matter, and that it is the better approach. While all the trees and subtrees are not the same class, this doesn't matter because all of the trees are children of RegexTree, making equality checks and other operations easy. The only downside to this approach is having to write more code.

Sunday 2 March 2014

Week 7: Recursion

We are all familiar with recursion, wether we know it or not. Recursive structures are everywhere, from the food we eat, to a seashell. Recursion is the process of describing repeating items in a self-similar way (Wikipedia). Recursion frequently occurs in nature, and one example if a fern.

This fern, dubbed the Barnsley fern, is named the mathematician Michael Barnsley, who first discovered it. Each leaflet is a smaller version of the whole fern, and each sub-leaflet is a smaller version of the leaflet, and so on.

Recursion in Computer Science

In computer science there are certain problems that are best solved using a recursive solution. A recursive solution is one where the problem is reduced to a self-similar simpler form until a base case is reached.
For example, recursion is very useful when dealing with nested lists. If one wanted to sum up all the numbers in a nested list (ex. sum of [1,3,[1,[2,3]],5] = 15). The way most naturally think about this problem is recursive. We look at the list, and if its a number we add it to a sum, if the item is another list we add up all the items in that list, and add it to the sum. In this example the base case is when the item is a number. If the item of the list is another list, we reduce the problem to adding up the numbers in that sublist. However, if the item is a number we do not reduce the problem, we simply add it to the total.The way to write a method to do this in python is as follows.

This code constructs a list of integers using python comprehension, and takes the sum of the list. With the list [1,2,3] this function would create a list [1,2,3] and take the sum. If the list is nested, like [1,2,[3,4]] it would create a list [1,2, ... then it would take the sum of the nested list [3,4], and add it. The top list is now [1,2,7]. This is summed up for a final answer of 10.

Why use recursion?
Recursion can make solving some problems very easy. Often these problems deal with recursive data structures like nested lists, and is often simpler than a iterative solution. Some problems are very difficult to solve in a iterative fashion.

This code demonstrates the problem with solving a general case nested list. Unless one is sure of the depth of the list, an iterative solution is very difficult. Recursion can be difficult to understand at times, however it is an extremely powerful tool for solving problems.

Sunday 9 February 2014

Getting started With Android

I have been developing android apps for a few months now. Getting started with android is fairly simple. First download the the Android SDK bundle, the bundle comes with eclipse with the ADT plugin pre-installed. Eclipse is a powerful IDE, and does a lot of the work for you, making beginning android development a little easier. The best tutorials for getting started are the official android tutorials. The tutorials have a lot of code examples and ease you into android development at a good pace. Another resource that I find extremely helpful is the free website thenewbostons android tutorials.Working through the official android tutorials, and thenewboston's videos will have you developing apps in no time. Who knows maybe you will make the next flappy bird!

Sunday 2 February 2014

Thinking In Objects: Doubly Linked Lists

The goal of last week’s lab was to find a more efficient implementation of a queue. A queue designed using python’s lists is generally slow due to the shifting behaviour of ".pop(0)" or ".insert(0,item)". When an item is inserted or removed at the front of a list, all the items must be reassigned and then shifted up or down. My TA was trying to get me to think of an implementation that would avoid this behaviour. More specifically, make a queue design where I would only need to know the first and last items. However, I could not think of an efficient way to apply this in python. In other words, I did not have my object oriented programming cap on. Near the end of the lab, my TA hinted towards using objects that recognize which item was before and after them. However, the lab was over and I did not get to employ my solution. I had heard of linked lists before, but had never implemented one, or really figured out what they were. What my TA was hinting at sounded like a linked list! When I got home, I googled “linked list”, and decided to design queue using a doubly linked list design. I have posted my code below.

My queue design is modelled after the queue from lab 2. Integers can be added and removed(“popped”). The queue removes items in a first in, first out manner. Every item in the linked list is a Node object.

Node objects have three variables:

- A reference to the object before it

- A reference to the object after it

- The data of the Node

Sunday 26 January 2014

Just a way to organize code

I first started learning Object Oriented Programming (OOP) in the beginning of Grade 12. I was excited since I was finally going to learn about this mysterious computer programming term that I had heard digital whispers about. I was eager to learn a new programming language and it came to my knowledge that my classmates and I would be learning Java! However, my first experience with OOP and Java was, simply put, overwhelming and confusing.

Like almost everything in computer science, learning something new means learning a lot of new jargon. Combine this with the rough transition from english-esque python to slightly more cryptic Java; it’s fair to say that I had some work ahead of me. For some reason OOP seemed mysterious, there was something about it that I didn’t understand at first. When I talked to some of my classmates, even halfway through the course, I would get the same question, “Yeah… but what IS object oriented programming!?”. The problem was, due to the excessive amount of new information and terms like; inheritance, object, instance, encapsulation, classes, fields, property, etc… something was lost. What exactly was Object Oriented Programming?

Object Oriented programming is JUST A WAY TO ORGANIZE CODE!

Or, as Wikipedia more complicatedly puts it, “Object-oriented programming (OOP) is a programming paradigm that represents concepts as "objects" that have data fields (attributes that describe the object) and associated procedures known as methods”(this is the kind of language instructors should avoid).

I believe if the point is emphasized during an introduction to OOP, it would make the learning process easier and less intimidating for the students. After I understood this definition, any mystery and intimidation surrounding OOP was gone, and, as a result, learning OOP became a great deal easier. While this does help the learning immensely, I believe that the best way to learn is to implement OOP design, and write to your own code!