loader image
Skip to main content
If you continue browsing this website, you agree to our policies:
x

Topic outline

  • Unit 4: Data Structures I – Lists and Strings

    Most of the programming concepts presented so far can be found in any programming language. Constructs such as variable definitions, operators, basic input and output, and control flow via conditional statements and loops are fundamental to what it means to compute. In this unit, we begin studying how data is structured within Python so we can program efficiently. Specifically, you will be introduced to lists and also immersed more deeply in the subject of strings. Upcoming units will introduce even more powerful data structures.

    Completing this unit should take you approximately 6 hours.

    • Upon successful completion of this unit, you will be able to:

      • explain lists and indexing;
      • write simple programs that apply list and string methods;
      • explain and apply slicing; and
      • write programs that plot and visualize list data.
    • 4.1: Python Lists

      • A list is a data structure capable of storing different types of data. Run this set of commands in Repl.it:

        a=2
        b=3
        alist_examp=[1,3.4,'asdf',96, True, 9.6,'zxcv',False, 2>5,a+b]
        print(alist_examp)
        print(type(alist_examp))
        print('This list contains ', len(alist_examp), ' elements')

        You should see the list output to the screen as well as the data type of the variable alist_examp. Notice that the list contains bool, int, str, and float data. It also contains the result of relational and arithmetic operations. Any valid data type in Python can be placed within a list.

        The data contained within the list are called elements. The list you printed contains 10 elements. You can figure this out by counting them by hand. There is also a command called len that will give you this information, which you can see in the example. len is a very useful command.

        Pay very close attention to the syntax used to build the list. The left square bracket/right square bracket is known as a "container". Elements must be separated by a comma when building the list.

        Pay very close attention to this example. Add this code on the line after print('This list contains ', len(alist_examp), ' elements').

        tmp=alist_examp
        print(tmp)
        print(alist_examp)
        tmp[3]='vbnm'
        print(tmp)
        print(alist_examp)

        Notice that changing the value in the variable tmp also changed the value in the variable alist_examp. Initially, this appears to go against what we have learned about the assignment operation (=). For reasons that will be discussed in a later unit, when it comes to lists, Python views both variables as occupying the same space in memory. They are different names that refer to the same data. How then can we truly assign the list to a new variable that occupies a different place in memory and can be modified independently of the variable alist_examp? This set of commands will accomplish this:

        tmp=alist_examp.copy()
        print(tmp)
        print(alist_examp)
        tmp[3]='vbnm'
        print(tmp)
        print(alist_examp)

        This example introduces some new syntax that will soon become second nature. The command "alist_examp.copy()" is your first introduction to object-oriented syntax. The "copy()" portion of this command is what is known as a method. It is connected to the variable alist_exmp using the period or 'dot notation'. This command makes a true hard copy of the variable alist_examp so that modifications of tmp do not affect alist_examp.

        Every variable type in Python is what is known as an "object". You will learn an immense amount about objects as we delve deeper into the course. The main point to realize is that there are several methods available for use with lists. The copy method we just introduced is one of them. Some other important ones worth mentioning at this point are pop, append, remove, and insert. Pay attention to the sections that describe these methods in detail.

      • Continuing with our previous example, run this set of commands in Repl.it:

        print(alist_examp[0])
        print(alist_examp[1])
        print(alist_examp[2])
        print(alist_examp[3])
        print(alist_examp[4])
        print(alist_examp[5])
        print(alist_examp[6])
        print(alist_examp[7])
        print(alist_examp[8])
        print(alist_examp[9])

        The elements contained within a list can be referenced using what is known as an index. Notice that Python begins indices by starting at a value of zero. So, if a list has 10 elements, the first element on the list is referred to using an index of 0, and the last element is referred to with a value of 9. This can take some getting used to if you are used to counting starting with the number 1.

        A common error when first starting with lists is attempting a command such as:

        print(alist_examp[10])

        on a list with 10 elements. Such a command would yield an error message and halt program execution because there is no such element.

        The index is the key to referring to an element within a list. You must see the programming equivalence between an element and referencing the element via its index. Continuing with our example:

        print()
        c=3 + alist_examp[1]
        alist_examp[1]= c +alist_examp[0]
        print(c)
        print(alist_examp)

        The whole point of using a list is that a programmer plans to reference elements further down in a program. In this case, 3 is being added to the element with index 1, and then that element is being assigned a new value by referencing the element with index 0. Mastering the gymnastics of using indices is key to becoming an advanced programmer.

        In this example, we explicitly typed out all the indices of every element. What if we had a list with 1000 elements, and we wanted to output them all one-by-one? Would we have to type 1000 commands? If we did, we would be completely disregarding the power of loops, which we mastered in the previous unit. Consider this code as an alternative:

        for i in range(len(alist_examp)):
            print('Element',i, '= ', alist_examp[i])

        Notice how the result of the len command is being used to define the range of the loop. Lists, indexing, and loops are related topics. It is important to understand how they are related to become a seasoned programmer.

        One more important point must be mentioned about list indexing. Recall that any valid data type in Python can be inserted into a list. Lists are a valid data type; therefore, it is possible to have a list that contains lists as this example shows:

        x=[3,4,5.5,6,7.9]
        y=[-300,3.14]
        z=[x , y, 3.45678]
        print()
        print(z)
        print()
        print('The list z contains ', len(z),' elements')
        print()
        for i in range(len(z)):
            print('Element with index ',i,' = ',z[i])

        After running this code, you should see that the list z contains 3 elements. The lists x and y are said to be nested within the list z. Therefore, indexing elements within these lists will require a second index as follows:

        print(z[0][2])
        print(z[1][0])

        where the second index refers to elements within the nested list. To print out all elements on a nested list, we could also use a loop:

        for j in range(len(x)):
            print(z[0][j])

        One very useful feature of Python is its ability to reference elements in loops without the need to reference an index, as this example shows:

        for value in z:
            print(value)

        In this example, the variable value iterates across all the values in the list z without the need to create an actual index. This is a very powerful feature in the Python programming language. Practice as many examples as you can. It is important to master indexing before we move forward to slicing.

      • Read this for more on indexing.

      • It is often the case that a program requires referencing a group of elements within a list (instead of a single element using a single value of the index). This can be accomplished by slicing. Consider running this example:

        x=[101,-45,34,-300,8,9,-3,22,5]
        print()
        print(x)
        print(x[0:9])

        The colon operator can be used to index multiple elements with the list. The index on the left side of the colon (:) is the starting index. By Python's indexing convention, the value on the right side of the colon (:) indexes that value minus one (be careful with the right index). Since the variable x has 9 elements indexed from 0 to 8, x[0:9] references all indices from 0 to 8. Here are some more examples you should try:

        print()
        print(x[3:4])
        print(x[3:5])
        print(x[3:6])
        print(x[3:7])
        print(x[3:8])
        print(x[3:9])

        Again, you should be careful when using the right index since Python will sequence up to that value minus one. For the sake of convenience, there are also shorthand slicing techniques that do not require specifying the start or ending index when it assumed the complete rest of the list is being referenced:

        print()
        print(x[3:])
        print(x[:4])

        In addition to specifying the start and stop indices, you can also specify the "step". Here is an example that will count by twos from element 3 to element 7 (every other element starting at index 3). The step size is given by the index value after the second colon.

        print()
        print(x[3:8:2])

        Finally, Python allows for negative indices where, by convention, the index -1 implies starting from the end of the list.

        print()
        print(x[-1])
        print(x[-1:-3:-1])
        print(x[-1:-len(x)-1:-1])

        In these examples, the step size of -1 means to count backwards in steps of -1.

        This video reviews many of the list concepts we have discussed so far. At 6:11, it discusses and elaborates on the rules for slicing in Python. Be sure to follow along and practice the examples in Repl.it.

      • Practice these examples using Repl.it to become more familiar with some methods commonly applied to lists. As you go through these examples, you should begin to see how powerful Python can be as a programming language.

      • Python offers many opportunities for creating efficient programming structures. When it comes to writing loops, 'list comprehension' allows for very compact code. Using lists, it is possible to pack quite a bit of power using just one line. This is an optional topic that requires understanding how to write loops using lists. Practice the examples to expand your ability to write loops.

    • 4.2: Strings Revisited

      • List concepts such as methods, indexing, and slicing are also important for dealing with string objects. While there are some similarities in the syntax of processing strings and lists, there is a major difference in how Python views strings versus lists. Lists are mutable, which means that once created, as we have already seen, the elements contained in a list can be changed and updated. Strings are immutable objects. This means that once they are created, they cannot be changed, such as by using assignment operation (=). Because of string immutability, this example would yield an error:

        a='asdf'
        print(a[1])
        a[1]='3'

        While indexing a given character is fine, like by using the print(a[1]) command, the a[1]='3' command will generate an error because it attempts to modify an immutable object.

        There are lots of operations we can perform on strings. We will often use the "+" operation, which can be used to join two strings together. When applied to strings, the + sign does not mean add; it should be interpreted as either a "join" or "append" operation. Consider running this example:

        a='good'
        b='morning'
        c=a+b
        print(c)
        print(len(c))

        As you learn more, you will be able to determine what an operation is based on its context. In this case, when used with strings, you should see that the + operation has joined the two strings together to form a new string.

      • Read this for more on strings.

      • There are a host of methods available for processing string objects. Here are a few examples. At this point it is sensible to introduce the comment character, #. The comment character allows you to put comments into your code for the purpose of documentation. While comments are not considered executable code, they are extremely useful for making your code readable and understandable.

        #explore changing to uppercase and lowercase
        a='good'
        c=a.upper()
        d=c.lower()
        print(c)
        print(d)
         
        #join a list of strings together with a space in between the strings
        b='morning'
        e=' '.join([a,b,'today'])
        print(e)
         
        #find a string within a string
        #find method returns the first index where string was found
        x='a picture is worth a thousand words'
        x1=x.find('picture')
        print(x1)
        x2=x.find('worth')
        print(x2)
        x3=x.find('words')
        print(x3)
         
        #split up a string into a list of smaller strings
        #use the ' ' space character as the boundary (delimiter) to split up the string
        y=x.split(' ')
        print(y)
        print(type(y))
         
        #try the replace method ...
        z=x.replace('thousand', 'million')
        print(x)
        print(z)

        Take some time to explore your own versions of these examples.

    • 4.3: Data Visualization Application

      • In this section, you will have a chance to exercise your understanding of lists and indexing and apply your knowledge to plotting data. You will also learn how to import libraries into Python. A library in Python generally contains several methods that can be used to perform a specific set of functions. The matplotlib library contains a host of methods useful for plotting data. Try running this snippet of code in Repl.it:

        import matplotlib.pyplot as plt
        x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        y=[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
        plt.plot(x,y)
        plt.xlabel('X')
        plt.ylabel('Y')
        plt.title('Test Plot')
        plt.show()
        plt.savefig('plot1.png')

        The import command instructs Python to import the matplotlib library. The extra qualifier as pltis added so that the name of the plotting object can be shortened to plt. This set of commands plots the data contained in the x list against the data contained in the y list. The methods xlabel, ylabel, and title are useful for adding annotation to the plot. For completeness, the show method is given so that users understand that this command is useful for rendering the plot in many different Python IDEs. However, Repl.it is a web-based IDE, and the savefig command will be more appropriate and useful for the rendering of data plots. The leftmost Repl.it window is where you can find the plot file 'plot1.png'. Click on that file to view the plot generated by this code. To return to your Python code, click on 'main.py' in the leftmost window. We will discuss the uploading and downloading of files as we delve into the course more deeply. For now, realize that, for each new plot you would like to generate, you can use the savefig method with a different filename in single quotes (such as plot2.png, plot3.png, and so on).

        Make sure to mirror and practice the examples provided in the video tutorial in Repl.it.

    • Study Session Video Review

    • Unit 4 Review and Assessment

      • In this video, course designer Eric Sakk walks through the major topics we covered in Unit 4. As you watch, work through the exercises to try them out yourself.

      • Take this assessment to see how well you understood this unit.

        • This assessment does not count towards your grade. It is just for practice!
        • You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
        • You can take this assessment as many times as you want, whenever you want.