Regression III – A Practical Example

We’ve already covered the basics and some of the theory or regression. So now, let’s do a practical example of linear regression in python.

Let’s mock up some practice data. We’ll create a simple predictor X, that just takes all the values between 0 and 20 in steps of 0.1, and a response Y, which a depends on X linearly via Y = 5X + 3 + (random noise). We generate this in python like so

import numpy as np

X = np.arange(0, 20, 0.1)
Y = 5 * X + 3 + np.random(200)/10

Now we can create a linear model and fit it with scikit-learn

from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X.reshape(-1,1),Y)

We need to reshape the X array, because the fit method expects a two dimensional array, the outer dimension for the samples and the inner dimension is the predictors.

We can see the coefficients of this model with

model.coef_,model.intercept_

Which should give something like

(array([5.00041147]), 3.0461878369249646)

Then, if we would like to forecast values, say for the X value 11.34, we can use the predict method on our model

model.predict(np.array(11.34).reshape(-1,1))

Don’t Use Objects in Python – part 3

So my previous blog posts, were posted on reddit and I got a lot of interesting feedback. I’ve decided to address the issues that were brought up all together in one place, and this is it! I’ve grouped all the different ideas that came up into their own sections below.

This is against community guidelines

Some people were upset that what I recommended was against the community guidelines. They’re correct, it is. However that isn’t really an argument against doing something. Community guidelines and standards aren’t some holy scripture that must always be obeyed. The entire point of my post was that the common usage of object orientation of Python is bad, I doubt there is anyway I could say that without also advocating against the normal Python standards. There is certainly some value in educating people about community standards, but the value of an idea cannot be reduced to how compliant it is with Python coding guidelines.

Dictionaries are Actually Objects

So a couple of people pointed out that internal to python there is inheritance, and that datatypes like Dictionary actually inherit from the Python Object type. Sure, that’s fine, I don’t have a problem with how the internals of Python implement the various data structures. Indeed the internals are overwhelmingly written in C, so it is not at all relevant, when we’re talking about actual Python code.

Objects are Abstractions of Python Dictionaries

Some commenters argued that Objects in Python are abstractions sitting on-top of dictionaries. This just isn’t true. An abstraction hides the details below it. For example, variables are an abstraction that hide the details of how registers and memory access actually work. Because they limit what you see and what you can do, they make using memory a lot easier. Python objects don’t hide any of the details of a dictionary, they are just syntactic sugar, all the operations of a normal python dictionary are available right their on the object with slightly different syntax, you can even access the dictionary itself with the __dict__ attribute. So this isn’t really true.

It’s all just an implementation Detail

So a few people took issue with the fact that I was talking about the specifics of how objects and classes are actually implemented, rather than talking about them in an abstract sense. I think this has missed the point. I am not quibbling with how objects are implemented. My point is that objects don’t give any extra functionality beyond using dictionaries. I brought up the implementation as a way to demonstrate this. My point doesn’t rest on the fact that there is an actual dictionary internal to objects. It rests on the fact that objects are really just a collection of attributes with string names, these attributes can be added, deleted and modified at run time, and that there really isn’t anything more of value in the Python Object. The fact that this collection of attributes is implemented by a dictionary specifically isn’t important.

In particular some people stressed that what I was talking about was just an implementation detail unique to CPython, and that somehow it is different in some other Python implementation. I can’t stress this enough, it doesn’t matter what the actual implementation of the attributes of an object are, it is still bad. But also, no-one actually uses PyPy, Jython or IronPython. They are basically irrelevant.

Having methods defined on types is Convenient

I think this was probably the best point raised. I concede that if you want to maintain some state, and mutate it at runtime, it is convenient to wrap that state in an object and have specific methods defined on that object. The alternative is to just store your state using some built in types and operate on it with functions. However, what it seems you really want to do, restrict functions to only work with specific types, is to use static types. This seems like a roundabout way to achieve that. If you want static typing, you’re not going to be happy with Python. Really what you are doing here, is associating certain bits of data and functionality, like in an object oriented language, but without any guarantee of what the data actually is.

Python Objects are Different if you use __slots__

A lot of people brought up __slots__. Yes, if you use the __slots__ attribute Python objects work differently and my criticism doesn’t apply exactly. But, by default, python objects don’t use __slots__. The default implementation is actually pretty important, because it’s the one that is going to get used most often. In fact I’ve never seen someone use __slots__, I’ve only ever come across it as an obscure trick that gets recommended for python experts. It’s not a robust defence of a language feature to say, “actually there’s this special thing you can do that totally changes how it is implemented”. That’s really a sign that the default implementation is not good.

So __slots__ does change how objects work, and makes them a lot less like a special wrapper around a simple dictionary. But I don’t think it resolves any of the fundamental problems with using objects in Python. They are basically just a collection of attributes, we cannot say anything about them apart from how many attributes there are and what their names are. In particular they don’t specify the shape of the data.

One benefit of __slots__ is that your objects will use memory more efficiently. I would say, however, that you shouldn’t be worrying about memory optimisations like this in a high level language. Also, if there are memory optimisations in your language they shouldn’t be controlled in an opaque unintuitive way like the __slots__ attribute.

Don’t Use Objects in Python – part 2

In my previous post I explained that objects in python are really just special wrappers for dictionaries. There is one dictionary that contains the attributes of the object and another dictionary that contains the attributes of the class. Indeed we can really just think of the __init__ method as a static method that adds elements to a dictionary.

Objects in Python do however have one feature that you cannot get by just passing around dictionaries: inheritance.

Let’s suppose we are using some Arbitrary Object-Oriented Language. This language looks and works a lot like C#, Java or C++. We define two classes, Security and Bond in this language. Security has a single field, an integer, called Id:

public class Security
{
   public int Id;
   public Security(int id)
   {
        Id = id;
   }
}

Bond inherits from Security and adds another field, this time a double called Rate:

public class Bond : Security
{
    public double Rate;
    public Bond(int id, double rate) : super(id)
    {
        Rate = rate;
    }
}

What do we get by making Bond a subclass of Security? The most obvious thing we get is that Bond will get a copy of the implementation inside Security. So in this example, as Security has a public field called Id, Bond has one as well. We can think of Security as a user defined type that is built up out of primitive types. By using inheritance Bond extends this type.

In Python we can do something similar. First we define our two classes:

class Security:
    def __init__(self, id):
        self.id = id

and:

class Bond(Security):
    def __init__(self, id, rate):
        self.rate = rate
        Security.__init__(self, id)

Now, this time when we define Bond as a subclass of Security what do we get? Python objects are not composite types like in our object oriented language. In Python objects are really just dictionaries of attributes and these attributes are only distinguished by their names. So our bond class could have a numeric value in it’s Rate attribute, but it could also have a string value, or a list or any other type. When we subclass in python we are not extending a user defined type, we are just re-using some attribute names.

In our Arbitrary Object Oriented language, there is another advantage to making Bond a subclass of Security: the ability to treat Bond as a Security. To understand what this means, suppose we have a function that prints the ids of a list of Securities:

static void printPortfolio(Security[] securities)
{
    string ids = "";
    foreach(Security security in securities)
    {
        ids += (" " + security.id);
    }
    Console.WriteLine(ids);
}

Now, this function specifies it’s single parameter must be an array of Securities. However, by the magic of polymorphism, we can actually pass in an array of types that inherit from Security. When they are passed in they are automatically cast to type Security. This can be pretty useful, in particular it makes it a lot easier to reason about our code.

Now let’s define the same function in Python:

def print_portfolio(securities):
    ids = ""
    for security in securities:
        ids += (" " + str(security.id))
    return ids

On the face of it, this is very similar. However we are not really using polymorphism in the same way as we are in our object oriented language. We could actually pass a list of any objects into the print_portfolio function, and, as long as they had a id attribute, this would execute happily. We could, for example, define a completely unrelated class like so:

class Book:
    def __init__(self, id):
        self.id = id

and pass a list of these into our print_portfolio function without any problems. Indeed in Python we can dynamically add attributes to an object, so we could even create an empty class:

class Empty:
    def __init__(self):
        pass

and assign a id attribute to it at runtime, via:

e = Empty()
e.id = "Hello"

and then enclose it in a list [e] and pass it into the print_portfolio function.

There’s one more thing we get with inheritance in an object oriented language: access to protected members. When we mark a method or field as protected it will only be accessible from within objects of that type or types that inherit from it. However in Python there are no protected methods or fields, everything is public.

So there are three reasons why I don’t think inheritance makes sense in Python:

  • Python classes aren’t really composite types, so it doesn’t make sense to extend them
  • Inheritance doesn’t give us extra access to protected methods, as everything in python is public anyway
  • Inheritance in Python doesn’t give us the benefit of polymorphism because in Python there are really no restrictions on what objects we pass around

So, that is why there really is no extra benefit to using an object in Python rather than a dictionary.

Don’t Use Objects In Python – Part 1

Everyone knows Python is object oriented. It’s right there on on page 13 of introducing python, it says as much on Wikipedia. You might have even seen classes, objects, inheritance and polymorphism in python yourself. But is Python really object oriented?

Let’s first ask, what actually is an object in python? We’ll see for ourselves, by creating a simple python class:

class Record:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def is_over_eighteen(self):
        return self.age > 18

this is just a simple class that has two member fields, name and age and a single method is_over_eighteen. Let’s also create an instance of this class:

my_record = Record("Stubborn", 207)

In Python member fields and methods are known as attributes. We can lookup the attributes of our record just like in any other standard object oriented language by my_record.name, my_record.age and my_record.is_over_eighteen.

Our object also has other secret attributes that are created automatically. Their names start with a double underscore. The attribute we are interested in is my_record.__dict__. If we evaluate this we will see that it is a dictionary of the instance attributes:

{'name': 'Stubborn', 'age': 207}

What’s interesting is that this isn’t just a representation of the object as a dictionary. In python an object is backed by an actual dictionary, and this is how we access it. When we look up an attribute of an object with the normal notation, for example my_record.age, the python interpreter converts it to a dictionary lookup.

The same is true of methods, the only difference is that methods are attributes of the class. So if we evaluate: Record.__dict__ we get something like:

mappingproxy({'__module__': '__main__', '__init__': <function Record.__init__ at 0x7f3b7feec710>, 'is_over_eighteen': <function Record.is_over_eighteen at 0x7f3b7feec7a0>, '__dict__': <attribute '__dict__' of 'Record' objects>, '__weakref__': <attribute '__weakref__' of 'Record' objects>, '__doc__': None})

We can access our method from this dictionary via:

Record.__dict__["is_over_eighteen"](my_record)

So a Python object is really just a wrapper for two dictionaries. Which begs the question, why use an object at all, rather than a dictionary? The only functionality objects add on top of dictionaries is inheritance (more on why that is bad in a later post) and a little syntactic sugar for calling methods. In fact there is one way in which python objects are strictly worse than dictionaries, they don’t pretty print. If we evaluate print(my_record) we will see:

<__main__.Record object at 0x7f3b8094dd10>

not very helpful. Now there is a way to make objects pretty print. We do this by implementing the __repr__ method in our Record class. This is the method used by print to represent an object as a string. However, dictionaries already have __repr__ defined so they will pretty print automatically.

Fundamentally, if you think you want to define a class in python, consider just using a dictionary instead.