Minitorch is a course focused towards building a high performance tensor API like PyTorch through assignments in Python.

  • The course material is developed by Sasha Rush
  • The course works outside of the notebook enivronment for the most part and focuses on engineering best practices:
    • Abstraction, for having tensors work on both CPU and GPU with minimal duplication.
    • Hypothesis testing, for testing mathematical function
    • Code formatting
  • Tools and libraries:
    • Streamlit
    • Numpy
    • Pytorch

Auto differentiation:

  • To train a neural network we want to know how the loss would change if we slightly adjust one of the learnable parameters.
  • Symbolic derivatives require access to the full symbolic function, whereas numerical derivatives require only a black-box function. The first is precise but rigid, whereas the second is imprecise but more flexible. This module introduces a third approach known as autodifferentiation which is a tradeoff between symbolic and numerical methods.
  • The trick behind this auto-differentiation is to implement the derivative of each individual function, and then utilize the chain rule to compute a derivative for any scale value.

Backward functions:

  • Chain rule

Topological ordering

  • When we’re actually doing backprop, how do we guarantee that we’ll always know the value of our backward function inputs. We
  • The topological ordering of a directed acyclic graph is an ordering that ensures no node is processed after its ancestor, e.g. in our example that the left node cannot be processed before the top or bottom node. The ordering may not be unique, and it does not tell us whether to process the top or bottom node first.

Backpropagation:

  • Scalar tracks the numeric value stored, has forward and backward methods.
  • when we call backward we propagate the gradient downstream, working through the dependencies in topological order.