Steve Bennett blogs

…about maps, open data, Git, and other tech.

Cycletour.org: a better map for Australian cycle tours

Cycletour.org is a tool for planning cycle tours in Australia, and particularly Victoria. I made it because Google Maps is virtually useless for this: poor coverage in the bush and inappropriate map styling make cycle tour planning a very frustrating experience.

Let’s say we want to plan a trip from Warburton to Stratford, through the hills. This is what Google Maps with “bicycling directions” offers:

Google Maps - useless for planning cycle tours.

Google Maps – useless for planning cycle tours.

Very few roads are shown at this scale. Unlike motorists, we cyclists want to travel long distances on small roads. A 500 kilometre journey on narrow backstreets would be heaven on a bike, and a nightmare in a car. So you need to see all those roads when zoomed out.

Worse, small towns such as Noojee, Walhalla and Woods point are completely missing!

Enter Cycletour.org:

Screenshot 2015-01-09 18.03.59

You can plan a route by clicking a start and end, then dragging the route around:

Screenshot 2015-01-13 23.43.12

It doesn’t offer safe or scenic route selection. The routing engine (OSRM) just picks the fastest route, and doesn’t take hills into account. You can download your route as a GPX file, or copy a link to a permanent URL.

Cartography

The other major features of cycletour.org’s map style are:

Screenshot 2015-01-09 18.12.04Bike paths are shown prominently. Rail trails (old train lines converted into bike paths) are given a special yellow highlighting as they tend to be tourist attractions in their own right.

Train lines (in green) are given prominence, as they provide transport to and from trips.

 

 

Screenshot 2015-01-09 18.20.23Towns are only shown if there is at least one food-related amenity within a certain distance. This is by far the most important information about a town. Places that are simply “localities” with no amenities are relegated to a microscopic label.

 

 

Screenshot 2015-01-09 18.27.40Major roads are dark gray, progressing to lighter colours for minor roads. Unsealed roads are dashed. Off-road tracks are dashed red lines. Tracks that are tagged “four-wheel drive only” have a subtle cross-hashing.

And of course amenities Screenshot 2015-01-09 18.57.40useful to cyclists are shown: supermarkets, campgrounds, mountain huts, bike shops, breweries, wineries, bakeries, pubs etc etc. Yes, well-supplied towns look messy, but as a user, I still prefer having more information in front of me.

Terrain

Screenshot 2015-01-09 19.11.23The terrain data is a 20 metre-resolution digital elevation model from DEPI, within Victoria, trickily combined with a 90m DEM elsewhere, sourced from SRTM (NASA). I use TileMill‘s elevation shading feature, scaled so that sea level is a browny-green, and the highest Australian mountains (around 2200m) are white, with green between. 20-metre contours are shown, labelled at 100m intervals.

I’m really happy with how it looks. Many other comparable maps have either excessively dark hill shading, or heavy contours – or both.

Screenshot 2015-01-09 19.21.02

4UMaps

Screenshot 2015-01-09 19.20.35

Komoot

Screenshot 2015-01-09 19.20.24

OpenCycleMap

Screenshot 2015-01-09 19.20.15

Sigma

Mapbox Outdoors

Google Maps (terrain mode)

Screenshot 2015-01-09 19.35.37

MapBox Outdoors

 

Other basemaps

Screenshot 2015-01-09 19.39.25

VicMap

I’ve included an assortment of common basemaps, including most of the above. But the most useful is perhaps VicMap, because it represents a completely different data source: the government’s official maps.

Layers

Vegetation

Vegetation

There are also optional overlays. Find a good spot to stealth camp with the vegetation layer.

Or avoid busy roads with the truck volume layer. This data comes from VicRoads.Screenshot 2015-01-09 19.47.43

The bike shops layer makes contingency planning a bit easier, by making bike shops visible even when zoomed way out. The data is OpenStreetMap, so if you know of a bike shop that’s missing (or one that has since closed down), please update it so everyone can benefit.

Screenshot 2015-01-14 00.06.20

Mobile

Unfortunately, the site is pretty broken on mobile. But you can download the tiles for offline use on your Android phone using the freemium app Maverick. It works really well.

Other countries

Screenshot 2015-01-14 00.46.05

is.cycletour.org for Iceland. Yes, it’s real – but I don’t know how long I will maintain it.

It’s a pretty major technical undertaking to run a map for the whole world. I’ve automated the process for setting up cycletour.org as much as possible, and created my own version for Iceland and England when I travelled there in mid 2014. If you’re interested in running your own, get in touch and I’ll try to help out.

 

 

 

Feedback?

I’d love to hear from anyone that uses cycletour.org to plan a trip. Ideas? Thoughts? Bugs? Suggestions? Send ‘em to stevage@gmail.com, or on Twitter at @Stevage1.

Normalize cross-tabs for Tableau: a free Google Sheets tool

Problem

You want to do some visualisation magic in Tableau, but your spreadsheet looks like this:

All those green columns are dependent variables: independent observations about one location defined by the white columns.

Tableau would be so much happier if your spreadsheet looked like this:

This is called “normalizing” the “cross-tab” format, or converting from “wide format” to “long format”, or “UNPIVOT“. Tableau provides an Excel plugin for reshaping data. Unfortunately, if you don’t use Excel, you’re stuck. It’s kind of weird.

Solution

Anyway, I’ve made a Google Sheets script “Normalize cross-tab” that will do it for you.

As the instructions say, to use it, you:

  1. Reorganise your data so that all the independent variable columns are to the right of all the dependent ones; then
  2. Place the cursor somewhere in the first (leftmost) independent variable column.

It then creates a new sheet, “NormalizedResult”, and puts the result there.

How to use

It’s surprisingly clumsy to share Google Scripts, at least until the new “Add-ons” feature is mature. Here’s the best I can do for you:

1. Copy the script to the clipboard

Go to https://raw.githubusercontent.com/stevage/normalize-crosstab/master/normalizeCrossTab.gs, select all the text, and copy to the clipboard.

2. Upload your spreadsheet to Google Sheets

Upload your Excel spreadsheet into Google Sheets, if it’s not there already.

3. Tools > Script Editor…

4. Click “Spreadsheet”

 

 

 

 

 

 

 

5. Paste

In the window labelled “Code.gs”, select all the text and paste over it the script from the clipboard.

6. Save.

You need to give this script “project” a name. It doesn’t matter.

7. Select the “start” function.

8. Click Run

Click Continue and accept the authorisation request.

9. Follow the instructions of the script

Now, switch windows to your Google Sheet, and you’ll see the sidebar.

10. Download your normalised spreadsheet

On the NormalizedResult page, choose File > Download as…

Screenshot 2015-01-06 20.53.36

 

 

 

 

If you want to convert several spreadsheets, you can save yourself pain by loading them all into the same workbook. Just remember that the script will always save its output to NormalizedOutput.

7 reasons to release that government dataset

As a data guru in residence, I’m helping government bodies prioritise which datasets to release as open data. Sometimes people say “No one would ever find this data interesting, so why bother releasing it?” I think there are several distinct reasons why a given dataset might be worth releasing. Some datasets are valuable for several reasons simultaneously. Some aren’t valuable at all.

When a public servant comments that a potential dataset isn’t interesting or useful, ask: “are there other reasons to release it”?

But if a dataset fails to meet any of these criteria? You have my permission not to release it.

#1 Build an app around it

Census Explorer, by Yuri Feldman, allows easy exploration of part of the 2011 Australian Census.

Datasets like public transport timetables, public bike share station status, or parking space availability are obvious candidates for third party developers to use to build an app. Unfortunately, these examples also require near-realtime feeds in order to be useful.

#2 Support other apps

Even if a dataset isn’t interesting or useful enough to warrant an app in its own right, it could add value to another website or app if it’s easy to use. I’ve come across many of these:

  • Average traffic volume on roads maintained by VicRoads, used to help cyclists decide which roads to avoid, on cycletour.org.
  • The slope of footpaths around Melbourne can help wheelchair users navigate the city.
  • The location and species of every tree in Melbourne can add colour and interest to a map of the city.
  • Locations of drinking fountains could be useful for cycling, jogging, or dog walking apps or websites.
Vicroads traffic volume

Which way would you cycle to Port Albert?

#3 Interesting for research

If a dataset is big, rich, detailed and high quality, then there’s a pretty good chance it’s worth of some kind of analysis. If it’s unique enough, then it might even interest a researcher in starting a research project just to look at this dataset.

Examples: building permits database, public transport timetables (for urban planning).

#4 Supporting other research

Much more common than such a rich dataset is small datasets that researchers find useful to solve particular problems, add context, or strengthen an analysis. Local Government Area boundaries aren’t inherently interesting, but they’re one of the geospatial datasets that researchers request the most often. The ATO’s Standard Business Rules taxonomy sounds incredibly dry to me, but is of potential use to lots of people trying to glue different kinds of data and applications together.

#5 Policy and analysis

Lots of organisations need government data to develop internal strategies or policies to be shared with the public – or even to influence government. Typically they get the data either by transcribing tables from official reports, or by developing direct relationships with the government body in question. Publishing data directly to an open data portal allows a wider range of groups to make use of it, without the overhead of having to ask whether the data is available. Data that is collected regularly, in the same format is a particularly likely to be useful.

#6 Transparency

If the data relates to how government decisions are made, it may be worth releasing to demonstrate transparency – regardless of how much the dataset is even used. For example, releasing annual budget data as an easy to use spreadsheet makes a big political statement about willingness to be scrutinised. Even if no citizen takes up the opportunity to crunch the numbers, they may still appreciate having that option.

Examples: annual budgets, revenue sources (parking meters, speeding fines), parliamentary voting records.

#7 Insights for government

If you’re really lucky, the dataset you publish may help another part of government do something useful. I think good things happen when people can access data without having to ask anyone for it, and the some goes for governments themselves. You can’t really expect insights, but if it happens – great.

The Data Guru in Residence

Cross-posted at Code for Australia.

Last week, Code for Australia launched its first fellowship program, a four-month project where a civic-minded developer will try a new approach to helping government solve problems with their data. For the next few months, I’ll be the Data Guru in Residence, blogging mostly to http://melbdataguru.tumblr.com. The program got a brief mention in The Age.My goals are to find interesting and useful datasets, help make them public, and do fun stuff with them. It’s a kind of test run for the Code for Australia hacker in residence program currently being developed. Since I work for the University of Melbourne, I’ll be targeting datasets that are useful for researchers, and using VicNode to store data wherever it’s needed.

To start with, I’m spending some time with the CityLab team at City of Melbourne. They’re very progressive on the open data front, and their Open Data Platform has some really high quality datasets, like the 70,000-tree Urban Forest or the Development Activity Monitor which contains detailed information on property developments.

“Living, Breathing Melbourne”, our GovHack Project, would be so much better with live data feeds.

Some of the immediate datasets on the radar are finding live feeds from the city’s pedestrian sensors and bike share stations. I’d love to incorporate these into the successful Govhack project, Living Breathing Melbourne, built with Yuri Feldman and Andrew Chin. There’s also lots of interesting data from the Census of Land Use and Employment with immense detail on how floorspace is divided up between residential, retail, commercial and so on. There areMahlstedt fire plans, LIDAR data, and a really detailed, textured 3D model of the CBD. And of course other data that’s already public, but whose full potential hasn’t yet been realised.

If you’re from a government body (Federal, State, Council, or agency), based in or around Melbourne and you could use the services of a Data Guru, please get in touch!

Chromecast in the real world: six casting workflows

For such a simple device, Google’s Chromecast has created a surprisingly complex network of technology at my place.

Google says that Chromecast works roughly like this:

Chromecast Google view

Actually it’s more complicated than that. My setup is about as simple as you can get (no NAS, no existing media servers, no Netflix or Hulu or Foxtel or anything), and it looks like this:

 

Chromecast in the real world

Chromecast in the real world. Not so simple, really.

 

Six casting workflows

That is, depending on what exactly I want to watch and how, I have to choose between 6 different workflows:

  1. YouTube? Just go to the YouTube website in Chrome, and click the Chromecast button in the video window. This works really well. Great for music playlists, too.
  2. iView or SBS? Go to the site, and use the Google Cast extension to “TabCast”. This works so-so. It’s great for randomly showing something funny you found on the web, though.
  3. Movies you’ve downloaded? Use the VideoStream Chrome app to load it directly off disk. This works perfectly.
  4. Movies in your Plex library? Use the Plex web interface. For some reason you have to go through http://plex.tv, and the whole experience is a bit complicated. There are some issues with transcoding that I don’t really understand.
  5. Vimeo? Plex to the rescue. Add Vimeo as a channel (a slightly complicated procedure to view your own uploads).
  6. Want to watch something without using your computer? There’s only a couple of “Google Cast ready Android apps” (YouTube is the only one that works well for me), or use BubbleUPnP to access your Plex library.

And I haven’t even mentioned a couple more complications:

My advice? Figure out the smallest number of workflows to do everything you want to do, and get rid of any extraneous apps, servers, websites etc.

Google’s world

Google’s world centres around casting stuff from your phone. If that was all you could do, the Chromecast would suck. There are few apps, a lot of them are very niche (eg, anime or baseball), junk (like this) or just don’t really work (like the Red Bull app, which drops out every few minutes).

Fortunately, third party tools like VideoStream and Plex fill in a lot of the gaps.

But does it work?

The end result is actually great. Compared to having to plug my laptop into the TV, these things are now easy and fun:

  • Put on some background music: go to YouTube, Pandora or GrooveShark, and cast. No more hooking up audio cables.
  • Show a silly video to my partner. Even from the other room. Stuff I previously wouldn’t have bothered with, but it’s so easy – the TV even turns on by itself.
  • Keep watching a video while doing something else. Easy to leave my study, keep watching the same thing while making coffee or something.
  • Show photos: Just go to Google Plus or Flickr, and cast.

Web map projections: the bare minimum you need to know

TileMill wants to know: what projection is this data?

TileMill wants to know: what projection is this data?

If you’re making maps, you will probably need to know something about cartographic projections. Here’s the minimum.

  1. The globe is round, maps are flat. Each of the hundreds of different methods for converting from round to flat is a projection.
  2. When you have a latitude and longitude, you have unprojected coordinates. Anything you can do with these doesn’t require choosing a projection.
  3. Most consumer web maps use the Web Mercator projection, also known as the Google Web Map de facto standard, EPSG:900913 (“google” written with numbers), EPSG:3857, etc.
  4. Government agencies, desktop apps and other stuff often use the WGS84 projection, also known as EPSG:4326.
  5. It is technically straightforward to convert from unprojected coordinates to any projection, or between projections, using GIS packages or command line tools like GDAL. It can be slow to do this on the fly.
  6. Each projection is defined using a Spatial Reference System. An SRS can also define systems of unprojected coordinates, and even other planets.
  7. There are half a dozen common formats for describing the SRS, including:
    1. SRID, an identifier including the identifier scheme, like “EPSG:3857″, “ESRI:102113″ or “SR-ORG:7483″.
    2. proj4, a short piece of text with lots of + and =, used by a tools like GDAL and TileMill. It looks like:
      +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs 
    3. Well-known text (WKT), a verbose format that can also be used to define spatial data. For example:
      GEOGCS[“GCS_Oman”,
      DATUM[“Oman”,
      SPHEROID[“Clarke_1880_RGS”,6378249.145,293.465]],
      PRIMEM[“Greenwich”,0],
      UNIT[“Degree”,0.017453292519943295],
      AUTHORITY[“EPSG”,”37206″]]
  8. The tool you are working with (eg, TileMill) will only support certain projections. You need to:
    1. Find data that is in the right projection (Web Mercator is the safest), or convert it; and
    2. Tell the tool what projection it’s in, if it can’t guess. You will have to pick from a list, or use one of the formats above, that it supports.

Multivariate binary symbol maps with TileMill.

I help researchers make maps of their research. An archaeologist recently wanted to visualise the distribution of some iron-age artefacts around the Levant, based on a spreadsheet of thousands of rows. Each row represents one kind of artefact at a given site, such as “3 incised bangles, subtype I.b.iv, at Gath.” What are these maps called? I’ll go with “multivariate binary symbol map”.

It sounded like a job for CartoDB, but as the requirements unfolded, she wanted pretty specific cartography, plus a custom base map of rivers, historical boundaries etc. So we used TileMill instead, although we didn’t end up getting all that done.

Image

This is where we got to. Each symbol next to a place name represents the presence of a specific type of artefact. ‘Eitun has pins of Type 1 with “incised decorations”, Far’ah has pins of Type 1 with “incised decorations”, “plain decorations” and “ribbed/grooved decorations”.

The most complex of these maps has 6 different attributes:

Image

Loading the data

With a clearer understanding of exactly what we were trying to achieve, I probably would have done something simpler to calculate each of these attributes, such as using Excel. Instead, I loaded the data into PostGIS and wrote some queries. TileMill supports CSV files directly, but unlike CartoDB, doesn’t load the data into a database, so you can’t run SQL queries.

This post from “The World is a Village” explains how to load CSV into PostGIS, but in summary:

Image

 

The most interesting line is:

update artefacts set geom = ST_SetSRID(ST_MakePoint(lon,lat),4326);

That’s what converts the raw lon and lat columns into a geometry column so that TileMill can plot it.

Views

To determine “are there any artefacts of type X in location Y”, an easy way is to write a view. Each column is a different subquery, for a different X.

Image

That gives data like this:

Image

 

So, in TileMill we can now use a filter like [subtype_1a>0] to decide whether to place a symbol.

TileMill

Because there were so many maps to produce (5 of this type, plus another 11), I created them all in one project, each as a single layer.

Image

 

The #map1 to #map12 layers refer to a different set of data. Each layer pulls in the same spreadsheet, and styles it identically, with the only difference being a single filter.

Image

That turned out to work really well.

But back to the main problem of showing symbols for attributes. It’s easy to show a single symbol if an attribute is present (like a coffee icon if a site is a cafe). But how do you show 4 symbols simultaneously, without them overlapping?

I thought of two approaches.

Symbol approach 1: Fonts

It’s theoretically possible to construct a text string, with an appropriate font. The string could look like “A Q Z”, where A gets rendered as a square, Q as a circle and Z as a star. Unfortunately I couldn’t make it work. I just couldn’t find an open truetype font that would behave like this. I tried loading various WingDings fonts, but always got little boxes instead of symbols.

There are projects like Map Icons or Font Awesome which sort of do this, but using web technologies that aren’t compatible with TileMill. The only proof of concept I achieved was using punctuation.

Image

Using fonts makes it very easy to space icons appropriately:

Image

Using punctuation in this way just doesn’t look good.

Symbol approach 2: marker icons

So the second approach is using traditional markers, and finding a way to position them appropriately. In CartoCSS, there’s no “marker-dx” to offset a marker, but there is “marker-transform“. So you can use SVG transforms, such as translate().

marker-transform:translate(10,-5);

That positions your marker 10 pixels right, and 5 pixels up.

 

Image

 

Each different symbol has to be given its own layer (::square, ::circle…), and a different translation offset: (10, -5), (10, 5), (20, -5) etc.

This guarantees that they don’t collide, and mostly looks good:

Image

although it inevitably leads to odd positioning:

Image

 

With enough time, you could some write some fancy SQL that would stack symbols from the left, avoiding any gaps.

Other TileMill styling

The only other styling of note is that the text labels should appear right-justified, to the left of the exact position. The CartoCSS designation for this is text-horizontal-alignment: left.

Image

You can see the full TileMill project on Github.

 

The Australian’s menacing editorial

An editorial published in The Australian on the 21st of March set a new low standard in writing about conflicts between cyclists and cars. Prompted by video of a cyclist colliding with a taxi door, the editorial combined a strong anti-cyclist viewpoint (as it’s entitled to do in the opinion section) with some astonishing ignorance and lousy argumentation.

It’s so terrible, I’ve commented on each sentence. (Even the grammar is bad: “The problem of city cyclists reached their apogee…”, “clogging-up lanes”)

The Australian says…

Comment

The  arrogant sense of entitlement in our inner cities is also evident in the ever-growing number of cyclists snaking their way through pedestrians on overcrowded pathways, darting between cars and clogging-up lanes on our congested roadways.

Cyclists are entitled to ride on roads. Just ask our Police Chief Commissioner, Ken Lay.

Cyclists don’t “clog-up” roads. If anything, the opposite is true, since each takes up less room than a car.

The problem of city cyclists reached their apogee in Melbourne this week when a cyclist was “doored” on busy Collins Street, after a passenger opened a taxi door and a rider crashed into it.

(Nothing factually wrong here, although the “problem” described is obviously subjective.)

Neither the taxi nor its passenger could be deemed at fault because a narrow “bike lane” inhibited the taxi from stopping next to the kerb.

 

1. The passenger is clearly committing the offence of causing a hazard to a cyclist by opening a door.

2. This stretch of road is a no-stopping area: the taxi could not have stopped anyway.

3. Cars are allowed to stop in bike lanes.

4. Even if cars weren’t allowed to stop in bike lanes, the suggestion that this would excuse the opening of a door into the path of a cyclist is outrageous.

The passenger was lucky to avoid serious injury.

The risk to the passenger in this case is much lower than the risk to the cyclist, as the collision risk is in the moment immediately following the door being opened – before the passenger gets out. The suggestion here is absurd.

What makes this incident even more absurd is that, although the lane was marked by a bicycle symbol, it was not actually a dedicated bicycle lane.

Whether or not the cyclist was in a bike lane is irrelevant to the offence committed. I can’t fathom what “absurdity” is created by the painted bike lane not being an actual bike lane.

Melbourne bike lanes must have signage, fixed to a pole, that shows the start and finish of a lane, as well as clear markings on the road itself.

This may be true, but not relevant.

The state’s bicycle operations officer — yes, there is such a position — admits there is confusion for cyclists, pedestrians and motorists.

This is possibly a reference to this interview in the Age on March 20. This statement doesn’t seem relevant, other than to imply that the cyclist is somehow at fault for being doored, due to being “confused”. (Why is it surprising that there is a police officer dedicated to cyclists? There are whole sections devoted to motorists)

Cyclists, including the one “doored” this week, are using cameras to film such incidents so they can make insurance claims.

Very few cyclists use such cameras, which is why this incident is getting so much attention. There is an unpleasant (possibly unintended) implication here that users of such video cameras are somehow actively seeking such incidents.

The Victorian government imposed even tougher on-the-spot fines in 2012 for people who opened car doors in the direct path of cyclists.

True. (As far as I know.)

For too long, authorities have bowed to the demands of selfish cyclists and their lobby groups.

This hyperbolic statement doesn’t seem well supported by facts. The equivalent statement for motorists is much better supported.

Truth is, our cities are dominated by cars because they are sprawling.

Certainly true in outer suburbs that lack good public transport, but irrelevant when discussing an incident in the CBD.

We have no equivalent of Amsterdam and should stop pretending we do.

Australia has no equivalent of Amsterdam? Or Melbourne is no Amsterdam? If the implication is that cycling is fundamentally incompatible with Melbourne’s geography, then this is demonstrably incorrect. Currently about 15% of commuters to the CBD each day travel by bike. This is not a fringe activity, by any stretch.

Cycletouring and OpenStreetMapping: a beautiful symbiosis

Contributing to OpenStreetMap is diversely rewarding: you help other people, you make open data as a whole more viable, you learn a lot about the area you’re mapping, and it’s fun. But sometimes it’s just plain pragmatic. Last weekend, I organised a cycle tour from Bendigo to Avenel, via the O’Keefe Rail Trail, Lake Eppalock, Colbinabbin, Rushworth, Murchison and Nagambie. When I started planning the route, OpenStreetMap looked like this:

Image

The major features are all there, but what’s missing is what matters most to cycle tourists: quiet country roads, and road surfaces. Is there a way to get from Eppalock to Colbinabbin on only sealed roads? Is Buffalo Swamp Rd (near Murchison) really sealed? A great way for me to research is to add to OpenStreetMap: use aerial imagery to add new roads, paying attention to whether they look sealed or not from the air.

Image

So Buffalo Swamp Road is obviously not sealed after all. By the time I was done, the map of the area looked like this:

Image

Notice how many “sealed” roads have turned out to be dirt, but also how many other unmapped little roads have been added to the map.

Once this is done, the steps are:

  1. Finalise the route, using OSRM.
  2. Send GPX files to everyone on the trip
  3. Load the GPX files onto both my GPS and Maverick Pro, an Android App
  4. Also load the cycletour.org tiles into Maverick
  5. Ride
  6. Update OpenStreetMap afterwards with any fresh information – obstacles, unexpected connections, local businesses, and so on.

There’s still lots more to add, but it’s nice that just planning this one trip has significantly improved coverage in a whole region like this.

Git: what they didn’t tell you

Credit:Tim Strater from Rotterdam, Nederland CC-BY-SA

Of all the well-documented difficulties I’ve had working with Git over the years, a few conceptual difficulties really stand out. They’re quirks in the Git architecture that took me far too long to realise, far too long to believe, or far too long to really grasp. And maybe you have the same problem without realising it.

Branch names are completely arbitrary


git branch -d master
git checkout develop -b master

There, I did it. I’m now calling the develop branch master. What you call this branch, and what I call it, and what your Github repo calls it, and what my Github repo calls it just don’t matter. Four different names? No problem. There are some flimsy conventions that Git half-heartedly follows to link two branches with the same name, but it gives up pretty easily.

Remote branches are local

I have very frequently fallen into this trap:

$ git diff origin/master
$

No differences, so my branch must be in sync with Origin, right? Wrong. What is true is my branch is in sync with the local copy of Origin. If you don’t run git fetch, then Git will never even update its local copies.

Technically, Git has always been upfront about this. The Git book opens the section on remote branches:

Remote branches are references to the state of branches on your remote repositories.

But it’s counterintuitive, and so I keep messing it up. I keep hoping (and assuming) that Git will one day include an auto-fetch option, where it constantly synchronises remote branches

‘Detached HEAD’ mode is fine

Here’s the message that we have seen many times:

Note: checking out ‘origin/develop’.

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

This scary looking message threw me off for a long time, despite the fact that it’s actually one of Git’s most helpful messages – it tells you everything you need to know.

It boils down to this: you can’t make a commit if you’re not at the top of a branch. The two most common situations that cause this are:

    1. Checking out a remote branch. You should do this:

      git checkout origin/develop -b mydevelop
      or even, if you want to abandon your branch completely:
      git branch -d develop
      git checkout origin/develop -b develop
  1. Checking out a commit in the middle of a branch, like:

    git checkout a8e6b18

    Usually in this case you just want to look at it, so you can just ignore the message.

There’s nothing special about ‘git clone’

For a long time, I thought that ‘git init’, ‘git pull’ and ‘git clone’ somehow created repositories that were different, even if they ended up with the same commits in them. It’s hard to recreate my state of mind, but I spent a long time trying to salvage certain directories on disk when I should have just abandoned them.

Similarly, there is no difference between:

  1. git clone http://github.com/stevage/myrepo
  2. git init
    git remote add origin http://github.com/stevage/myrepo
    git pull origin master

Well, in the second case Git’s a bit confused about which local branches map onto which remote branches, so you have to be more explicit or fix it with some configuration option.

Don’t call any remote ‘origin’

Credit: Chevassu (GFDL)

For some reason, Git encourages you to call the source of the first clone “origin”. I have found this very confusing and ultimately very unhelpful. Let’s say you’re working on a project called widget, and you fork it in Github so you can work on it. You will want both remotes accessible locally, so you will probably do one of these:

  1. git clone http://github.com/stevage/widget
    git remote add widget http://github.com/widget/widget
  2. git clone http://github.com/widget/widget
    git remote add mine http://github.com/stevage/widget

So you either have remotes called “origin” and “widget”, or remotes called “origin” and “mine”. But on the next project, you might make the opposite choice, and soon you really don’t remember what “origin” means.

My tip: never name any remote “origin” ever. Name them all after their Github username.

  1. git init
    git remote add widget http://github.com/widget/widget
    git remote add stevage http://github.com/stevage/widget
    git fetch --all
    get checkout stevage -b master

You will never understand “git reset”‘s options

The difference between “git reset”, “git reset –soft” and “git reset –hard” is beyond your comprehension. And you probably wanted “git checkout” anyway.

Rule of thumb:

  • If your directory is FUBAR: git reset –hard
  • If you just want to throw away changes to one file: git checkout file
  • In all other cases, google it.
Follow

Get every new post delivered to your Inbox.

Join 522 other followers