EECS2011 Fundamentals of Data Structures

目录 | Table of Contents

Java Syntax
Java Data Types
Java OOP
Exceptions
Generics
Array
Linked Lists
Running time
Time Complexity
Prove Correctness
Solve the Recurrence
Stack & Queue
List
Tree
Priority Queue
Heap
Adaptable Priority Queue (自适应优先队列)
Maps & Dictionaries
Skip List
multi-Set multi-Map
Binary Search Trees
AVL Tree
Splay Trees (伸展树)
Sorting Algorithm
Divide and Conquer 分治法
Master Theorem
Sorting Lower Bound
Special Sorting Algorithms
Quick Select
Graph
Depth-First Search (深度优先搜索)
Breadth-First Search (宽度优先搜索)
Directed graphs
Strong Connectivity Algorithm
Transitive Closure
Topological Sorting of DAGs (有向无环图的拓扑排序)
Shortest Paths
- Dijkstra's Algorithm
- Bellman-Ford Algorithm
- DAG-based Algorithm
Minimum Spanning Trees (最小生成树)
- Prim-Jarnik's Algorithm
- Kruskal's Algorithm
- Boruvka's Algorithm
Union-Find Partition Structure

数据结构课程，其实用Java而不是用C / PASCAL来讲让我有点不太适应。

This content is for YorkU EECS 2011. As usual, I will post some "interesting" course note. This course is not that difficult. And our prof's ppt is so nice. Try not to miss any lecture and you will learn a lot from him. My professor is Andy Mirzaian.

The title of this course is "Fundamentals of Data Structures". This course introduces very fundamental algorithms and data structures in computer science. It does not go too deep into the analysis of the algorithms, which will be covered in EECS3101. In my point of view, this is the most important computer science course in 2000 level at York University.

I can share lecture notes but I will NEVER provide any course exam or assignment answers. I just want to remind those people who are trying to email me for those materials that this course is very fundamental course.

September 8

Solving problem using java data structure.

Week mean (Thursday + next Tuesday)

no labs, only paper (mid-term + final)

JDK7

Java Syntax

java to present Data Structure.

procedure 不返回值（与PASCAL相近的定义）

editor(.java) ->compiler(.class / byte-code, not really assemble language)->JVM(assemble language/binary code)

wrapper types：
Base type: int; Wapper type: Integer 诸如此类
创建：Integer obj = new Integer(数字)；使用： obj.intValue()

signature：A method’s name combined with the number and types of its parameters

Java is pass by value (object reference is also a value, in this word)

September 13

Data Structure

Java Data Types

Explicit Casting: primitive type/reference type 应该是相互对应的

Widening Cast: 转变成容量更大的变量，一般都是自动的
Narrowing Cast: 需要声明变量例如在变量前面强制加上，例如 (integer) variable

Java auto-unwrap the wrapper type when operate the <(smaller) or >(bigger) , but Java will not unwrap when using ==(equal)

break 关键字可以设定参数（其实很多函数都可以，类似于return后面可以加参数，详见例子，不常用）

September 15

OOP

Java OOP

robust(强大的) : 应该能够掌握意料之外的输入
abstraction: is to distill a system to its most fundamental parts基础的，考虑基础的运算过程
encapsulation 能够封装一些内部的内容
molecularity 模块化地

Hierarchy Design 分等级的设计诸如Java中Collection的设计

Design Patterns

Algorithmic patterns
- Recursion
- Amortization
- Divide-and-conquer
- Prune-and-search
- Brute force
- Dynamic programming
- The greedy method
Software design patterns
- Iterator
- Adapter
- Position
- Composition
- Template method
- Locator
- Factory method

Abstract Data Types (ADTs) 重要

An ADT is a model of a data structure that specifies the type of data stored,
the operations supported on them, and the types of parameters of the operations.
An ADT specifies what each operation does, but not how it does it.
The “how” is provided by the software that implements the ADT.
The collective set of behaviors supported by an ADT is its public interface.
The interface guarantees certain invariants.
Invariant: a fact about the ADT that is always true,
e.g., a Date object always represents a valid date.

UML 图，描述class

Interfaces

缺陷在于，发布之后，不能够进行修改(因为已经有人拿这个 interface 来使用了)。因此当你设计interface 的使用，should think twice.

Inheritance

Constructors never Inheritance
C++ is not single inheritance, Java is single inheritance

Polymorphism (多态性)

在运行和编码中使用不同的method
actual (dynamic) type, not declared (static) type
参考EECS2030 关于 Polymorphism的章节

Overriding and Overloading

overriding using dynamic dispatch
overload is static (same method, different signatures)
重点还是 static 的函数不能被override
参考EECS2030 关于Overload 的章节
Override 的重点摘录如下
- 使用final可以确保 method不被Override, 可以确保attribute 中的值不被改变（如果该值是一个地址，不能确保里面的内容不改变）The new class can re-define (override) its super class methods (non-final methods)
- 当override一个method含有throws的时候需要处理throws，从设计的角度，subclass的method必须含有throw部分，并且后面跟着的Exception classes需要含有原method的throws 的class 或者其subclass. A subclass promises to do everything its super class does; if the super class method claims to throw an exception then the subclass must also.
- static 的attribute和method 都不能够被override（因其是跟着Class 而不是跟着 Object），因此使用相同名称是被称为Hide。不要 hiding attribute（父类和子类中不要使用名称相同的attribute），不要hiding methods（父类和子类中不要使用名称相同的moethod），因为那样会增加代码的复杂程度。

这里老师用Progression的类型来表示继承的关系，这个例子非常专业Arithmetic Progression啥的，さすがプロだ。

September 20

exception

Exceptions

types of exceptions:

an unavailable resource
unexpected input from a user
a logical error on the part of the programmer

可参考 EECS2030 的有关Exception内容

Generics

Java includes support for writing generic types that can operate on variety of data types while avoiding the need for explicit casts & with type safety through compile-time type-checking
available from Java SE 5
non-primitive type for generics types' for variables.

可以 Bounded generics

wild-card "?" stands for "any class or interface"
<? extends T> 代表着 any types extends type T
<? super T> 代表着 any types that is the parent type of T

<T extends Comparable<? super T>>

但是还是有点问题，关于 array 和 generic type

you can declare an array of generic type,
but you cannot create an array of generic type. (无法使用new来create，那么说只能强制cast，或者采用Object[]来存，需要时候再cast到对应的类型)
- 例如，假如generic type 是 E 你不能够定义 E[]

@SuppressWarnings(“unchecked”) 	// DANGEROUS!!!
public Stack() {                            		
	elements = (E[ ]) new Object[CAPACITY];   // explicit cast
}

September 22

Lecture Slide 3 arrays & linked lists

Array

对于new一个新的array，new并不代表新建Object（没有Constructor），但是会分配内存地址

pascal triangle 编程题（杨辉三角形 2333333 老套的初级算法题）

java.util.Arrays

binarySearch()
copyOf
copyOfRange
equals
sort
toString

Array 的implementation 确实没什么好讲的

September 27

linked list
Lecture Slide 4 Algorithm Analysis

Linked Lists

环状linked list
SLCL: singly linked circular lists
DLCL: doubly linked circular lists

参考EECS2030的Linked List的内容

Running time

big-O Notation

Simple worst case

Constant = 1
Logarithmic = log n
Linear = n
n-log-n = n log n
Quadratic = n^2
Cubic = n^3
Exponential = 2^n

computer science log always base on 2

September 29

Big-O Notation - upper bound on the growth rate of a function

Time Complexity

2n+5 is O(n) and O(n^2) but O(n) is tighter

7n-2 求 big-O
7n-2 is O(n)
need c>0 and n0>= 1 such that 7n-2 <= c*n for n>= n0
this is true for c = 7 and n0=1

Prefix Averages
O(1+2+3+...+n) = O(n(n+1)/2) = O(n^2)

2*log(n) = log(n)+log(n) = log(n^2)

Rule: Ignore the lower order terms and the constant multiplicative factor of the highest order term.
Examples:

log(n^2) = 2 log n = O( log n ).
log(n^4) = 4 log n = O( log n ).
14 log n + 36 = O( log n ).
6 log n + 1000 log log n + 2300000 = O( log n ).

course web page http://www.eecs.yorku.ca/~andy/courses/2011/UsefulMath.pdf

big-O 上限
big-Theta 两者
big-Omega 下限

Prove Correctness

证明算法的准确性

Loop Invariant
pre-condition
post-condition
exit-condition

October 4

binary search, recursion

这节课重点讲了binary search，编写时候我们应该注意binary的判断边界的条件.

October 6

Recursions Stack Queue
Lecture Slide 4

Solve the Recurrence

如何计算递归的程序的时间复杂度

例子：

写出基本方程
T(n) =
- T(n/2) + c ; if n > 1
- c ; if n <= 1
解方程
T(n) =
- T(n/2) + c
- T(n/6) + c + c
- T(n/12) + c + c +
- ......
- T(n/(2^k)) + kc
- ...... let k = log(n), 2^k = n 可得
- T(n/n) + c log(n)
- c + c log(n)
最终得到 O(log(n))

T(n) = T(n/2) + c 往往得到 O(log(n))
T(n) = T(n - 2) + c 往往得到 O(n)
T(n) = T(n - 2) + T(n - 1) + c 往往得到 O(n ^ 2)

October 11

Stack / Queue/ Algorithm
Lecture Slide 6
For the Problem 2 in Assignment 2, should not use contains() method of the queue. each step should be different direction.

Stack & Queue

使用两个Stack 进行数学运算，使用Operation stack

Computing spans 跨度计算算法使用stack

October 13

List
Lecture Slide 7
use Dequeue(Deck) for the assignment 2 problem 2

List

Perfer pure recursive solution for Problem 1

amortized time 平均时间

T(n) 的概念

Iterator 的使用方法

October 18

Tree terminology
Lecture Slide 8

Tree

For a tree

root
internal node
external node (leaf)
height: max node depth

Traversals

Preorder : Book-table of content
Postorder
In order
Euler tour https://en.wikipedia.org/wiki/Euler_tour_technique Important!
- 如何使用Euler方法正确且快速地写出上面三种的遍历方法
level order

使用Tree来表达数学公式 Arithmetic Expression Tree

符号是内部节点
数字是叶
inorder 需要括号
postorder 不需要括号都能正常表示公式

概念

Height节点到下面的底
Depth节点到Root的深度

两种Tree的实现方法的比较

Linked Structure
- representation of 3 links per position
- structure grows as needed, no wasted space
Array Structure
- lower memory requirements per position
- memory requirements determined by the height of tree

October 20

Tree
期中考试之后的内容（期中考试考到前面的Tree）
其实也算EECS2011相比EECS2030新多出的内容吧

Lecture Slide 9

Priority Queue

The structure

Java 8 中的实现案例：http://docs.oracle.com/javase/8/docs/api/java/util/PriorityQueue.html

each entry is a pair (key, value)
Main methods
- insert(k,v)
- removeMin() 与stack不同，移除最小优先度的元素
- 分两种Implement 的方法
  1. insert() O(1) removeMin() O(n) unsort list
  2. insert() O(n) removeMin() O(1) sorted list
Additional methods
- min() get value w/o removing the element
- size() of the stack
- isEmpty() if the stack
Applications
- auctions
- standby flyers
- stock market

Priority Queue Sorting

往Queue里面扔，然后不断将min拿出来，就变成sorted的了。结果时间复杂度是O(n^2)。

Unsorted Priority Queue Sorting 与Selection-sort是一样的
Sorted Priority Queue Sorting 与Insertion-sort 是一样的

都是垃圾的 Quadratic sorting algorithms

Sort cannot be less than O(n*log(n)), 除非有阴谋:-)

November 3

Heap

Lecture Slide 9

Heap

wikipedia: https://zh.wikipedia.org/wiki/堆_(数据结构)

Properties: (重要)

Heap-Order
- 在从顶到底的path上的所有元素都是有序的
- 这里的例子: key(v) >= key(parent(v))（值比parent大）
Complete Binary Tree
- 是一个完整二叉树，也就是说，除了最低层没有填满，别的都填满了。
- compact representation

Every sub-tree in a heap is also a heap. 也就是说子树也符合上面所述的两个特征

如何将元素插入heap：O(log(n))

insert the element to the next position of last node (树的底层)
pushing element to the root (将path上面的element排序，将heap修复成符合规则的heap（children bigger than parent），又称“up heap”)

如何将元素从heap顶部(Root)中删除：O(log(n)) (重要)

读取root
将最后一个元素放到root
排序path （又称“down heap”，或“heapify”）
（思考：课堂只是讲述移除顶部，事实上每一个sub-tree都是heap，那么说移除子树中的root（整个树的中间节点）其实也是一样的，这个时候就需要移除子树的最后一个元素了）

eecs2011review-heap-20161205233928

找到最后一个元素的方法

Find the last node

Pointer structure (O(log(n)))
- 从先前的Node怎样知道下一个node插入的空位在哪里
- keep going up, as long as you are the left child
- running the opposite of path to
- 如果在going up过程中 hit the root
- go opposite + 1
Array structure
- 这个比较简单，直接过一遍找空位就好了

eecs2011review-heap-20161205235913

概念：

up-heap 新添加的元素在heap中向上移动
down-heap 新添加的元素在heap中向下移动

Heap Sorting
就是将数据过一次heap，up-heap 和 down-heap都是同样复杂度O(log(n))，外加n个元素，所以是O(n*log(n))。

Heapify

也算是一种Heap Sorting不过有更多地空间进行操作，不是简单地进行heap insertion操作（这样的操作会导致up-heap，元素在树中向上移动），而是将已经整理好的子树连接起来，新加入的节点进行down-heap操作。因为所有子树都排好了，所以，只需消耗O(n)。因此，在Heapify中down-heap比up-heap好。

这个过程总而言之是先建立小的heap，然后将它们连接起来（merge，Slide09 Page34），成为一个完整的heap（完整论述 Slide09 Page35）。

Adaptable Priority Queue (自适应优先队列)

中文似乎没有对应的翻译

应用举例

正在排队的乘客等烦了，要求自己从队伍立刻离开，不等了，那么removeMin()不能用了，因为他没有最高的优先级，只能使用remove()，然而remove在Priority Queue中是相当缓慢的
一个乘客有金卡会员，可以提升自己的优先级（插队），这时候我们需要一个 replaceKey()函数
有一个乘客发现自己的名字写错了，需要一个replaceValue() 函数

Location aware priority heap will be used in all algorithm course.

主要概念：Since entries are created and returned from the data structure itself, it can return location-aware entries, thereby making future updates easier. （数据结构包含指向存储节点的指针）

这个数据结构可以用上面所述的Priority Queue(sorted list/unsorted list) 或者heap 都行，实现的方式各有不同因此时间是不一样的。

remove(Node)
replaceKey(Node, newKey)
replaceVale(Node, newKey)

注意这里的remove() pass过去的其实是整一个node，当然可以做到O(1)的时间复杂度，前提是你要找到这个node。同理 replaceKey() 和 replaceValue() 也是pass整个Node过去，对于sorted list和heap需要重新排序的需要，无法达到较大的提升，但是别的都得到很大提升。

与Assignment 3 中最后一个问题相关

November 8

Maps & Dictionaries

Maps & Dictionaries

Map is key-value entries. Dictionary is just a ordered map (sorted map).

Dictionary allow nearest neighbour operation.

课本上面的put()是会返回之前对应key的value值

例如：

put(2,B) ---> null
put(2,A) ---> B

Java Map 实现方法

List Based:
- put, get,remove takes O(n) 因为需要traverse the entire sequence
Hash Based: may have collision.
- hashing is matching key to index value of the array.
- step: key->hash number->compress the range
- goal: spread out number distributions
- Hash Methods:
  - memory address
  - integer cast
  - component sum
  - polynomial accumulation(z=33 collision<= 6 in 50000 English word, which is good)
- Compression Methods:
  - y mod N(N 通常使用prime number).
  - (ay + b) mod N (scale and shift then mod) to be aware is (a mod N != 0).

Collision Handling 如何处理两个元素有同样的hash值

如果有同样的hash值就放到同一个位置上面去了，造成了相撞。

两个不同的key指向相同的index地址
处理方式
- Separate Chaining:
  - 将这两个数值放在linked list里面
  - 这种实现方法很简单，但是需要额外的存储空间
- Linear Probing:
  - 将值放在hash table下一个可用的位置
  - 相撞的元素聚在了一起(lump) 导致了以后的碰撞会有一个非常长的查找时间(longer sequence of probes)
  - Linear Probing 改，DEFUNCT放在被移除元素那里，这样我们就能顺利地操作remove() put()函数了
- Double Hashing:
  - 遇到碰撞，将会进行第二次hashing，第二次hashing的结果将会加在第一次的结果上，获得一个新地址
  - 如果第二次hashing依然相撞，再继续hashing
  - 因为double hashing是一个“偏移量”，因此需要确保hash的结果不是零
  - 这样的算法是为了让元素更加均匀地分布在数组中

Performance of Hashing

O(n) worst case, but rare
load factor is important alpha = n/N
- this is also the probability of "getting a empty space to store the element"
那么也可以算出 probes的数学期望 E(number of probes) = 1 / (1-alpha)
expected running time of all the dictionary operation is O(1)

Application of hash tables 应用

small databases
compilers
browser caches

November 10

skip list (Slide 10 page 38) Important

Skip List

Wikipedia 跳跃列表 https://zh.wikipedia.org/zh-cn/跳跃列表

结构简介：其实就是一系列的linked list的堆叠，可以命名为 S0, S1, S2, S3....... Sn, (前者包含后者)。（因为不知道要讲元素往上放多少层，这个算法其实比较依赖一个随机数生产工具。这个随机数生成工具的概率将决定空间和时间的关系）其实这个感觉还是挺糟糕的算法的，最好也好不过普通的Binary Search Tree

主要方法：

Add()
Delete() is like Search()

Search:
- 遇到大于的元素，则往skip list的下面找 (drop-down)
- 还是没有碰到大的元素则向前找 (scan-forward)
- Worst-case running time 非常糟糕，但是很少发生，通常预期Expected Running time BigOmega - log(n)
Insertion:
- 取得一个随机数 i （用Geometric 的方法）
- 如果这个随机数 i 大于 skip list的 高度 h 则在最上方再加1行linked list
- 从skip list的上往下找，记录导致向下转跳的记录
- 如果向下转跳的过程中，发现当前层编号x <= i ，则在该层插入这个元素
Deletion:
- 找到就删，
- 删完再找下一层

Node的造型比较独特，有4个方向的pointer

eecs2011review-skiplist-20161206135013

Its actually running time depends on the random choices on the algorithm. 由于采用不同算法，所以这里通常会讨论 Expected Running Time.

Add()
Delete() is like Search()

Memory usage

需要分析两种算法
- Fact 1: getting i consecutive heads when flipping a coin is (1/2)^i
- Fact 2: if each of n entries is present in a set with probability p the expect
O的计算涉及概率的计算（两种算法被选择的概率都一样）
The expected case space usage of a skip list with O(n)
We are NOT use worst case say expected case instead. worst case 去到 O(n^n)

Height of the skip list

这里也要分类考虑两种情况
- Fact 3:if each of n events has probability p, the probability that at least one of those n events occurs is at most np
The expected height is at most 3log(n) 达到这个的概率至少是 1 - 1/(n^2)

这里的概率论有点复杂！！！！

Search and Update times

proportional(成比例) to
- number of drop-down steps
- number of scan-forward steps
the drop-down are bounded by the height of the skip list is O(log(n))
the scan-forward steps:
- Fact 4: The expected number of coin tosses required in order to get tails is 2
总共的expected time 是 O(log(n))

multi-Set multi-Map

一个可以有重复叫multiSet(allow duplicates)“像一个包裹，可以放任何东西”，

多个key指向同一个value的map叫做multimap 简称“一键多值”

Generic Merge runs in O(Na+Nb)

November 15

接下来将会讲述 Red-Black Tree 等
Lecture Slide 11

Binary Search Trees

Binary Search Tree: in-order traverse is sorted order
- 相比存储在array里面，insert()更加简单了
- each subtree is also is a binary search tree
- In java: TreeMap
worst-case O(n) per dictionary operation

height is
- O(n) worst case
- O(logn) best case

AVL Tree

Adelson-Velskii & Landis 简述，一个可以自平衡的树，平衡后基本上就成为满二叉树了。深度就不超过 O(log(n))

Wikipedia 上面的AVL树的说明

height explicitly store in the node
binary search tree with height balance property: 不允许区别超过1 (at most 1)
Fact: height of an AVL tree storing n key is O(log(n))
Slide上面的证明： LS11 Page 16
Action
- Search ： O(log(n)) 提升了，在后面pay off
- Insert, Delete ：因为会破坏AVL结构，所以这里会消耗更多时间，使用 O(log(n))
  - Link Rotation: 没有违法binary search tree的结构，但是root被改变了
  - Insert:
    - insert
    - probe upward
    - repair : 这里的 link rotation
    - Tree trinode restructuring 三点重构
      - 个人理解，分4种情况讨论
      - LL, LR, RR, RL
      - 移动完毕后需要重新计算Height
      - Assignment 4 Problem 3自己的方法实现了一遍！
  - Delete:
    - The side you deleted height - 1
    - if children are equally high always take the same slant
    - 有几种情形需要分类讨论
- Extra action are needed to update the height along the path

Splay Trees (伸展树)

Daniel Sleator & Robert Tarjan

Wikipedia 上面的伸展树的说明 & amortized time

简介：其实是一个自动调整的Binary Search Tree。将常用的node移向更靠近root的地方（Splaying is an operation performed on a node. It moves the node to the root of the tree by a series of rotations.），通常用于Dictionary Operation 的提升。每一次获取都调整顺序。

注意：

Node没有aux这样的属性
高度(height)是不平衡的
能够根据使用调整节点

Time Complexity:

Bad: worst-case O(n)
Good: amortized case O(log n)

任何操作都会触发splay导致结构变化

splay()也是需要分类讨论：

zig
zig-zig
zig-zag

even in the search operation: if you cannot find the node, do Splay() on the last node that you have reached.

时间比较

Hash Table
- O(1) expected
- O(n) worst-case
- no ordered map methods
- simple to implement
Skip List
- O(log(n)) high probability
- randomized insertion
- simple to implement
AVL Tree
- O(log(n)) worst-case
- complex
- suited for time critical applications
Splay Tree
- O(log(n)) amortized
- simple to implement
- suitable for batch-mode applications
- adjust to usage pattern
- competitiveness properties

November 17

Storage base structure

Lecture Slide 12
Lecture Slide 13/15 不在授课范围

Sorting Algorithm

要求术语

in-place 通常只能使用swap
stable 不能打乱旧顺序(假设有隐藏有序元素，排序后，隐藏有序的顺序不能乱)

Merge Sort
- worst case 是 O(n log(n))
- For huge data sets
Heap Sort
- in-place 也是O(n log(n))
- For large data set
Quick Sort
- 选择一个pivot（基准）
- 分成3类（大于、小于、等于）
- worst case 是 O(n^2) 当每次选择pivot都是最大／最小值
- best case O(nlogn) 每次都选到中间值
- expected case is something between O(nlogn) and O(n^2)
  - is much closer to the best case
- Probabilistic fact: get a good pivot is 1/2
- 这个与MergeSort相比，优点是可以进行 In-Place Partitioning
Insertion Sort
- 每一次destroy an invert
- 优点是，在基本上sorted 的list里面，Insertion 执行时间会很少

排序算法的比较

Selection/Insertion sort
- O(n^2)
- in-place
- slow good for small data
Quick sort
- O(n log(n)) expected
- O(n^2)
- in-place
- fastest good for large inputs
Heap sort
- O(n log(n))
- in-place
- fast good for large inputs
Merge sort
- O(n log(n))
- sequential data access
- fast good for huge inputs

Divide and Conquer 分治法

需要三步

divide
conquer
combine

有关信息

merge sort 是一个典型的例子
- not in-place sort (通常情况分析)
base case 需要重点设计
Heap-sort is Not divide and conquer

Master Theorem

The Master Theorem: EECS 3101 有详细讲解，这节课需要运用这个理论(wikipedia 主定理)，但是不需要证明

eecs2011review-mastertheorm-20161206160539

Recurrence Equation Analysis (Time Complexity)

通常分治法的算法都会有这样的时间复杂度结构

T(n) =

b , n < d
aT(n/b)+f(n), n >= d

例如Merge Sort：

T(n) =

b , n < 2
2T(n/2)+bn, n >=2

Driving function（leading term）：
T(n) = 2T(n/2) + bn
bn是驱动函数，线性的。

Guess and Verify Method: 修正算法方法，检查算法中的错误

高精度乘法：
普通方法会采用 O(n^2)，但是采用分治法可以获得更小的 Slide 12 Page 45
T(n)=4T(n/2) + n
再精简一次可以得到（减少了一次Recursive Call
T(n) = 3T(n/2) +n
因为存储了别的结构

November 22

AAW动画

Sorting and Selection

Sorting Lower Bound

Comparison Base Sorting

Sorting 的 decision tree（决策树），leaves（原比较）应该有 n! （n为array元素）（Slide 12 Page 70）所有的Sorting算法都必须要进行 n! 个比较

所以General Case 需要 Big_Omega(n*log(n))

Special Sorting Algorithms

Special是指有precondition。这种情况下

we can get O(n) and 不违反上面的推理

Bucket Sort
Radix Sort-
- Lexicographic Ordering 逐位比较

Quick Select

类似于Quick Sort的技巧，选择一个pivot，然后分开两组元素

使用O(n)

November 24

Graphs

Lecture Slide 13

Graph

常见的储存方法

Edge List Structure
- one list for Edge objects
- one list for Vertex objects
- Edge objects point to two Vertex objects
Adjacency(邻接) List Structure 是常用的储存Graph的数据结构
- one list for Vertex objects
- Vertex point to all the Edge object that connected
Adjacency(邻接) Map Structure
- one list for Vertex objects
- Vertex has target and the edge
Adjacency(邻接) Matrix Structure
- a (Vertex objects) x (Vertex objects) table
- fill Edge objects in the cells

各有不同的特点

Sub-graph 指的是只有原图部分的图

Tree no cycle
- Free tree
- Rooted tree
Spanning tree

各种元素名称

unexplored vertex
visited vertex
unexplored edge
discovery edge
back edge

Depth-First Search (深度优先搜索)

经过这个算法后，只有两类edge：

discovery edge
(undiscovered edge) back edges

尽量走更远，通常用来寻找双向联通性（biconnectivity）

Path finding
- Edge finding
- Cycle finding

O(n+m) , n为vertices, m为edges

func DFS(Graph g, Vertex u) -> Void {
	for i in u.outGoingEdges {
		if (v.NotVisited){
			DFS(g,v);
		}else{
			//Cycle finding
			print(path, "this is cycle");
		}

	}
}

Breadth-First Search (宽度优先搜索)

与Tree的 Level 顺序输出节点其实是一致的，使用Queue。

通常用来寻找最短路径

经过这个算法后，只有两类edge：

discovery edge
(undiscovered edge) cross edge

O(n+m) , n为vertices, m为edges

How to find the cycle by using BFS without extra calculation from Lowest Common Point to Root?

因为BFS是按层遍历的，因此如果元素指向同层元素则证明是形成了cycle。

func BFS(Graph g) -> Void {
	let q = Queue()
	q.enqueue(g.startPoint);
	while(q.isNotEmpty){
		vertex = q.dequeue;
		for edge in vertex.outEdge{
			if edge.vertex.isVisited
			{
				//CLCYE???
			}else{
				q.enqueue(vertex)
			}
		}
	}
}

November 29

Shortest path problem

Directed graphs

n为vertices, m为edges

Directed graphs

single tree can represent the shortest Paths

有in-edge和out-edge

注意图里面有个算法Dijkstra's Algorithm，是O((n+m)log(n))

不同的Edges名称

discovery edges
back edges
forward edges
cross edges

A digraph is a graph whose edges are all directed

应用：

one way street
flights
tasks scheduling

Strong Connectivity Algorithm

Wikipedia 中 Kosaraju算法

强连通分量: （指一张有向图G的极大强连通子图G'。）如果将一个图的每一个强连通分量缩成一个点，则原图G将会变成一张有向无环图DAG。

强连通分量几何特性: 一个图中的强连通分量在这个图的反图中依然是强连通分量。

正反两次DFS，找出强连通分量。其实就是利用了几何特性。

Running time O(n(n+m)) n为vertices, m为edges

Transitive Closure

Wikipedia 中有关传递闭包

传递闭包: 简而言之就是绕行方案，点A到点B有直接的点，点A到点B的路径经过别的点就是传递闭包。

Converts graph G into its transitive closure G*

如果使用DFS，则需要 O(n(n+m))
如果使用Floyd-Warshall Algorithm，则需要 O(n^3) 假设 assuming areAdjacent()(判断两点是否相邻) is O(1) n为vertices, m为edges
- 简而言之：所有能来我这一点的都能直接去到我能去的地方!
- wikipedia

Topological Sorting of DAGs (有向无环图的拓扑排序)

A directed acyclic graph (DAG) is a digraph that has no directed cycles
遍历有向无环图都是能产生一定的顺序的。

topological order
Running time: O(n+m)

IMPORTANT!!!! All the algorithms will be in the FINAL!!!!

December 1

Minimum Spanning Trees

令人激动的最短路径问题，基本上所有初级泛泛而谈的算法课结尾都是最短路径。。。。

Shortest Paths

In a weighted graph, each edge has an associated numerical value, called the weight of the edge. 每一条边都有重量。

IMPORTANT!!!! All the algorithms will be in the FINAL!!!!

Dijkstra's Algorithm

Wikipedia 戴克斯特拉算法

计算所有从出发点出发到所有其他点的路径

O((n + m)*log(n)) with adjacency list/map structure

BFS算法

不能应用与有negative weight(负权值)的图。Dijkstra's algorithm is based on the greedy method. It adds vertices by increasing distance.

使用贪心法

Bellman-Ford Algorithm

Wikipedia 贝尔曼-福特算法

与前者方案相似，但是默认是负无穷的。不使用贪心法。节点会被加上权重，到达同一个节点计算出来的权重较小的路径将会被选取

O(n*m)

DAG-based Algorithm

DAG based的算法其实就是动态规划，因为不用考虑回环问题，所以只需要按顺序过一遍点就行了

Works even with negative-weight edges
Uses topological order
Does not use any fancy data structures
Is much faster than Dijkstra’s algorithm
Running time: O(n+m).

Minimum Spanning Trees (最小生成树)

Wikipedia 最小生成树

应用：

Communications networks
Transportation networks

Cycle Property: 在一个圈内，移除最长的Edge，路径的总长是减少的。

Partition Property：两个分开的图，相连最短的Edge，总路径是最优的。

IMPORTANT!!!! All the algorithms will be in the FINAL!!!!

Prim-Jarnik's Algorithm

Wikipedia 普林姆算法

使用 Cycle Property (在一个圈内，移除最长的Edge，路径的总长是减少的) 这个特殊的属性

We cycle through the incident edges once for each vertex.

随机选择一个起点
选择最近的点
现在有了一个线段，线段上的点再选择最近的点（类似贪吃蛇），然后延长

耗时

O((n+m) log(n)) with adjacency list structure
O(m log(n)) since the graph is connected

使用了贪心法

Kruskal's Algorithm

Wikipedia 克鲁斯克尔演算法

先假定所有的独立的Node 都是独立的Graph，但是将它们关联起来。使用 Partition Property 这个方法实现。
先将所有的Edge打入Priority Queue
如果这条边连接的两个节点于图G中不在同一个连通分量中，则添加这条边到图G中

merge smaller to lager one, union(A,B) will be O(min(length(A), length(B)))

耗时 O((n+m) log(n))

Priority Queue operations: O(m log(n))
Union-Find operations: O(n log(n))

使用了贪心法

Boruvka's Algorithm

Wikipedia Boruvka Algorithm(EN)

Like Kruskal's Algorithm, Boruvka's algorithm grows many clusters at once and maintains a forest T
Each iteration of the while loop halves the number of connected components in forest T
The running time is O(m log(n))

Implement (回去自己试试)

Union-Find Partition Structure

上面讲到的这些关于图的概念都涉及到了集合的运算，那么我们应该怎样实现集合类型呢？

主要三个函数

makeSet(x) 创造一个只包含x的set
union(A,B) 将两set融合一起，消除旧的set
find(p) 返回包含元素p的最顶端值（leader of the cluster）

Implementation

Sequence
- when you do union, you always move elements from the smaller set to the larger set
- union() find() takes O(nlog(n))
Tree
- when you do union, you make the smaller root point to the larger root
- union() find() takes O(nlog(n))
- 可以进行path compression
  - After performing a find, compress all the pointers on the path just traversed so that they all point to the root

note.mc256.dev

分类 | Categories

标签 | Tags

友链 | Links

目录 | Table of Contents

EECS2011 Fundamentals of Data Structures

Java Syntax

Java Data Types

Java OOP

Exceptions

Generics

Array

Linked Lists

Running time

Time Complexity

Prove Correctness

Solve the Recurrence

Stack & Queue

List

Tree

Priority Queue

Heap

Adaptable Priority Queue (自适应优先队列)

Maps & Dictionaries

Skip List

multi-Set multi-Map

Binary Search Trees

AVL Tree

Splay Trees (伸展树)

Sorting Algorithm

Divide and Conquer 分治法

Master Theorem

Sorting Lower Bound

Special Sorting Algorithms

Quick Select

Graph

Depth-First Search (深度优先搜索)

Breadth-First Search (宽度优先搜索)

Directed graphs

Strong Connectivity Algorithm

Transitive Closure

Topological Sorting of DAGs (有向无环图的拓扑排序)

Shortest Paths

Dijkstra's Algorithm

Bellman-Ford Algorithm

DAG-based Algorithm

Minimum Spanning Trees (最小生成树)

Prim-Jarnik's Algorithm

Kruskal's Algorithm

Boruvka's Algorithm

Union-Find Partition Structure