UTF8 Encoded Strings and Count()

It isn’t that a string is Unicode which makes count() an O(N) operation. It depends on the encoding. In older languages it is common to use UCS-2, UTF16 and UTF32 encoding. I believe one of these encodings are used in Python. With these encodings typically each character takes a fixed number of bytes. All of these encodings have various problems. UCS-2 can’t encode all unicode characters. UTF16 doesn’t always have fixed length characters and thus has no advantage over UTF8, but all the disadvantages. UTF32 is the only encoding which allows for both encoding all unicode characters and which is O(1) for count(). The problem is that it is very inefficient as it takes 4 times as much space as regular ASCII or UTF8 encoding in most cases.

For this reason no modern programming language today uses it. If you look at all the new languages like Swift, Go and Julia, they all use UTF8 encoding. UTF8 has the advantage of being backwards compatible with ASCII, and being very space efficient. The downside is that a character can take 1–4 bytes. So e.g. the word “pal” takes 3 bytes, while “pål” takes 4 bytes. Thus you can’t determined the length of the string simply by looking at the bytecount. By the way this isn’t unique to various unicode encodings. Any system which has null terminated strings rather than storing size, will have O(N) lookup time. C/C++ is one example.

Are linked lists useful?

You raised the issue of linked lists. I admit I seldom use linked lists in imperative programming languages. Most people will probably say the same. However this might give the wrong impression of the usefulness of linked lists. Linked lists work excellent for immutable data structures, because you can add elements to the front of a linked list without modifying an existing linked list reference. You can also take a subset of the end of a list without mutating the list and increasing memory usage. For this reason linked lists are at the core of most functional languages I know of, LISP, Scheme, Haskell and OCaml e.g. In classic LISP it is pretty much the only data structure.

Python packages

I agree that Python is a lot more convenient than e.g. C++ due to the packages. The whole full batteries included approach from Python is something I think most modern languages today have taken to heart and realized a good packages system and extensive standard library has to be built from the start to make the language useful. E.g. what attracted me to Google’s Go was my previous positive experience with Python and a certain affinity for C programming. Go replicated an extensive package system like Python and kept the principle of there is one way to do things approach. Having jumped into an existing Ruby on Rails project I think it is easy to appreaciate this Python philosophy.

Keep in mind this is why I compared Julia to Python. It was because of all possible choices, I thought Python was the second best choice. If I’d tried doing scripting in Java, C#, C++ or Go you can bet you would have seen a lot more complaints. I actually did experiment with doing it in Swift, as that is really my main language. However Swift is simply not suited to that sort of thing at the moment as the libraries are generally geared towards GUI application development and not throw away scripts. It certainly has the potential to occupy that space though. The feature set should allow it.

Geek dad, living in Oslo, Norway with passion for UX, Julia programming, science, teaching, reading and writing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store