Steve Bennett blogs

…about maps, open data, Git, and other tech.

Category Archives: Uncategorized

7 reasons to release that government dataset

As a data guru in residence, I’m helping government bodies prioritise which datasets to release as open data. Sometimes people say “No one would ever find this data interesting, so why bother releasing it?” I think there are several distinct reasons why a given dataset might be worth releasing. Some datasets are valuable for several reasons simultaneously. Some aren’t valuable at all.

When a public servant comments that a potential dataset isn’t interesting or useful, ask: “are there other reasons to release it”?

But if a dataset fails to meet any of these criteria? You have my permission not to release it.

#1 Build an app around it

Census Explorer, by Yuri Feldman, allows easy exploration of part of the 2011 Australian Census.

Datasets like public transport timetables, public bike share station status, or parking space availability are obvious candidates for third party developers to use to build an app. Unfortunately, these examples also require near-realtime feeds in order to be useful.

#2 Support other apps

Even if a dataset isn’t interesting or useful enough to warrant an app in its own right, it could add value to another website or app if it’s easy to use. I’ve come across many of these:

  • Average traffic volume on roads maintained by VicRoads, used to help cyclists decide which roads to avoid, on cycletour.org.
  • The slope of footpaths around Melbourne can help wheelchair users navigate the city.
  • The location and species of every tree in Melbourne can add colour and interest to a map of the city.
  • Locations of drinking fountains could be useful for cycling, jogging, or dog walking apps or websites.
Vicroads traffic volume

Which way would you cycle to Port Albert?

#3 Interesting for research

If a dataset is big, rich, detailed and high quality, then there’s a pretty good chance it’s worth of some kind of analysis. If it’s unique enough, then it might even interest a researcher in starting a research project just to look at this dataset.

Examples: building permits database, public transport timetables (for urban planning).

#4 Supporting other research

Much more common than such a rich dataset is small datasets that researchers find useful to solve particular problems, add context, or strengthen an analysis. Local Government Area boundaries aren’t inherently interesting, but they’re one of the geospatial datasets that researchers request the most often. The ATO’s Standard Business Rules taxonomy sounds incredibly dry to me, but is of potential use to lots of people trying to glue different kinds of data and applications together.

#5 Policy and analysis

Lots of organisations need government data to develop internal strategies or policies to be shared with the public – or even to influence government. Typically they get the data either by transcribing tables from official reports, or by developing direct relationships with the government body in question. Publishing data directly to an open data portal allows a wider range of groups to make use of it, without the overhead of having to ask whether the data is available. Data that is collected regularly, in the same format is a particularly likely to be useful.

#6 Transparency

If the data relates to how government decisions are made, it may be worth releasing to demonstrate transparency – regardless of how much the dataset is even used. For example, releasing annual budget data as an easy to use spreadsheet makes a big political statement about willingness to be scrutinised. Even if no citizen takes up the opportunity to crunch the numbers, they may still appreciate having that option.

Examples: annual budgets, revenue sources (parking meters, speeding fines), parliamentary voting records.

#7 Insights for government

If you’re really lucky, the dataset you publish may help another part of government do something useful. I think good things happen when people can access data without having to ask anyone for it, and the some goes for governments themselves. You can’t really expect insights, but if it happens – great.

Advertisements

The Australian’s menacing editorial

An editorial published in The Australian on the 21st of March set a new low standard in writing about conflicts between cyclists and cars. Prompted by video of a cyclist colliding with a taxi door, the editorial combined a strong anti-cyclist viewpoint (as it’s entitled to do in the opinion section) with some astonishing ignorance and lousy argumentation.

It’s so terrible, I’ve commented on each sentence. (Even the grammar is bad: “The problem of city cyclists reached their apogee…”, “clogging-up lanes”)

The Australian says…

Comment

The  arrogant sense of entitlement in our inner cities is also evident in the ever-growing number of cyclists snaking their way through pedestrians on overcrowded pathways, darting between cars and clogging-up lanes on our congested roadways.

Cyclists are entitled to ride on roads. Just ask our Police Chief Commissioner, Ken Lay.

Cyclists don’t “clog-up” roads. If anything, the opposite is true, since each takes up less room than a car.

The problem of city cyclists reached their apogee in Melbourne this week when a cyclist was “doored” on busy Collins Street, after a passenger opened a taxi door and a rider crashed into it.

(Nothing factually wrong here, although the “problem” described is obviously subjective.)

Neither the taxi nor its passenger could be deemed at fault because a narrow “bike lane” inhibited the taxi from stopping next to the kerb.

 

1. The passenger is clearly committing the offence of causing a hazard to a cyclist by opening a door.

2. This stretch of road is a no-stopping area: the taxi could not have stopped anyway.

3. Cars are allowed to stop in bike lanes.

4. Even if cars weren’t allowed to stop in bike lanes, the suggestion that this would excuse the opening of a door into the path of a cyclist is outrageous.

The passenger was lucky to avoid serious injury.

The risk to the passenger in this case is much lower than the risk to the cyclist, as the collision risk is in the moment immediately following the door being opened – before the passenger gets out. The suggestion here is absurd.

What makes this incident even more absurd is that, although the lane was marked by a bicycle symbol, it was not actually a dedicated bicycle lane.

Whether or not the cyclist was in a bike lane is irrelevant to the offence committed. I can’t fathom what “absurdity” is created by the painted bike lane not being an actual bike lane.

Melbourne bike lanes must have signage, fixed to a pole, that shows the start and finish of a lane, as well as clear markings on the road itself.

This may be true, but not relevant.

The state’s bicycle operations officer — yes, there is such a position — admits there is confusion for cyclists, pedestrians and motorists.

This is possibly a reference to this interview in the Age on March 20. This statement doesn’t seem relevant, other than to imply that the cyclist is somehow at fault for being doored, due to being “confused”. (Why is it surprising that there is a police officer dedicated to cyclists? There are whole sections devoted to motorists)

Cyclists, including the one “doored” this week, are using cameras to film such incidents so they can make insurance claims.

Very few cyclists use such cameras, which is why this incident is getting so much attention. There is an unpleasant (possibly unintended) implication here that users of such video cameras are somehow actively seeking such incidents.

The Victorian government imposed even tougher on-the-spot fines in 2012 for people who opened car doors in the direct path of cyclists.

True. (As far as I know.)

For too long, authorities have bowed to the demands of selfish cyclists and their lobby groups.

This hyperbolic statement doesn’t seem well supported by facts. The equivalent statement for motorists is much better supported.

Truth is, our cities are dominated by cars because they are sprawling.

Certainly true in outer suburbs that lack good public transport, but irrelevant when discussing an incident in the CBD.

We have no equivalent of Amsterdam and should stop pretending we do.

Australia has no equivalent of Amsterdam? Or Melbourne is no Amsterdam? If the implication is that cycling is fundamentally incompatible with Melbourne’s geography, then this is demonstrably incorrect. Currently about 15% of commuters to the CBD each day travel by bike. This is not a fringe activity, by any stretch.

Windows red cross errors scam

Image

We have noticed many red cross errors!

I just received an interesting phone call, apparently from a group of Indian scammers. It went roughly like this. (Phrases in bold are things I jotted down during the call)

  • Hello, I’m Caroline from the Computer Technical Department at Windows Best Help [or Windows Based Help, perhaps]. We’re calling to alert you that for the past four weeks you’ve been receiving red cross errors, which mean you’re subject to internet viruses, and hackers that are trying to break into your computer. Your address is [my address], correct?

Throughout this, I give non-committal “mmm, yes” responses.

  • We’re connected through the Global IT Server. This just an awareness call, nothing to do with telemarketing.
  • Now, go to the home page of your browser.
  • Now press Windows-R, and type “cookies“. [Actually, long description of how to find the Windows key, and spelling out “cookies” in radio code. The first “o” was orange, the second was Oscar.]
  • Now, do you see all those files and folders? All the work you’re doing is stored in those files as a double coded check up.
Image

I’m calling from the Computer Technical Department at Windows Best Help. This has nothing to do with telemarketing!

At this point, she transferred me to her “technical supervisor”. He gave me his name, but I didn’t quite catch it – something like Armin. I asked where they’re based – Kolkata.

  • Are you in front of your computer now? [I admitted that, no, the phone was in a different room.]
  • But I believe that you told Caroline that you were typing the commands and could see the results? [Interesting…I had led her to believe that. Is their operation so small that he listens to the whole conversation?]

Some confusion followed, where I offered to go and run the command for real. I told him to hold the line for a minute, while I went and did it. When he came back, the line was dead. Oops.

I’ve heard of this scam before, but it was entertaining to see it in operation. Too bad I didn’t get to see where it led.

Semantic Google keywords

I seem to have picked up the habit of using semantic keywords in my Google searches. I have no idea if Google actually supports any of them, but, if they don’t, they should!

  1. buy
    Example: buy usb extension cable
    Meaning: I’m specifically looking to purchase something, so don’t show me any site that doesn’t help me buy things, preferably online.
  2. review
    Example:  Samsung galaxy note review
    Meaning:I want to read reviews about something. Don’t show any site that doesn’t have at least one review.
  3. experience
    Example: Samsung galaxy note experience
    Meaning: I’m interested in people’s personal experiences. I’m interested in blogs, and informal reviews: not professional reviews from technophiles. 
  4. get/download
    Example:  get git
    Meaning:Take me to sites that either let me download Git, or will explain how I can do this.
  5. photo
    Example: photo Birrarung Marr
    Meaning: I want to see what Birrarung Marr looks like. Show me photos, pictures, whatever!
  6. how to
    Example: how to uninstall Crayon Physics
    Meaning: I’m looking for  technical solutions, problem solving, troubleshooting type sites.
  7. compare
    Example: compare HDMI DLNA
    Meaning: I’m looking for information about the differences between these two things: massively prioritise pages that treat just these two things
  8. alternatives
    Example: BeyondPod alternatives
    Meaning:  I’m looking for things like BeyondPod, but different to it. Especially show me comparisons between it and its alternatives.
  9. why
    Example: why github
    Meaning: I want to know the benefits of GitHub. Prioritise sites that tell me specifically about its advantages, the problems that it solves – not how to use it, or general descriptions of it.

Why is buying stuff from eBay so complicated?

So you’ve found the thing you want, and you already have both an eBay account and a PayPal account. You don’t want to do anything complicated – send this item to me, and charge my (already stored) card appropriately.

  1. Click Buy it Now
  2. Sign in
  3. Click Commit to Buy
  4. Click Pay Now
  5. Click Continue
  6. Sign in again (Paypal this time)
  7. Click Continue (confirming payment type?)
  8. Click Confirm Payment
In other words, “I’d like to buy this. Yes. Yes. Yes. Yes. Yes.” Maybe there was more genius to Amazon’s one-click purchasing than I thought.

Introducing: Cooking for engineers

Here’s what I hate about recipes:

  1. They’re delivered as unstructured narratives.
  2. They mix identifying information (“carrots”) with process information (“thinly sliced”)
  3. They’re overly specific (3/4 of a teaspoon, does it matter?)
  4. They have too many ingredients, and you don’t know which ones you can leave out
  5. They always have one or two ingredients you wouldn’t have lying around. Shrimp paste?

I can’t fix all of that. But here goes:

Cooking for engineers: spicy cauliflower and almonds

"Cooking for engineers" version of a sicy cauliflower dish

A cauliflower dish I made up the other night, seen by an engineer.

Recipe for non-engineers

Here’s what a conventional version of that dish might look like:

Ingredients:

1/2 cauliflower, chopped

1 bok choi

1 tbsp coriander, chopped

1/2 cup slivered almonds

1 tbsp ground turmeric

1 tbsp mustard seeds

1-4 tsp chilli powder, to taste

Put 2cm of water in a saucepan and bring it to the boil. Simmer the cauliflower 2 minutes, then drain, discarding the water. Add half the sesame oil, and return the cauliflower to the saucepan.

Meanwhile, heat the rest of the sesame oil in a frying pan. When hot, add the slivered almonds and remove from the heat, stirring continuously. Once browned, add the almonds to the frying cauliflower, and add the spices. After a few minutes, add the bok choi. Stir frequently until the bok choi is soft, then serve with rice and roti bread.

Principles of cooking for engineers

Here are the rules:

  1. One column per preparation dish (saucepan, mixing bowl, tray…)
  2. Red arrows show cooking. Maybe thicker arrows for hotter.
  3. Ingredients start off to the sides. Name of thing in bold. Quantities as rough as appropriate, in parentheses.
  4. Thin blue arrows show transfer of stuff.
  5. Text between two black lines means “keep doing the thing above until this happens”
  6. Stuff you need to do (processes) in square boxes.
  7. Collapse stuff down wherever appropriate. It’s a communication tool, not an exhaustive process analysis.

Explaining step 7: the diagram above could have shown a colander as another column, with cauliflower transferred from the saucepan to the colander and back. But why would you do that?

Discussion

Normal recipe format works pretty well for most people, but I find it takes multiple readings before I can start. Effectively, I’m constructing this kind of flow chart in my head, working out what gets transferred from where, to where. I hope that a refined version of this diagramming methodology would allow you to confidently dive straight in, with no nasty surprises.

On the downside, diagrams are hard to manage. I realise I forgot an ingredient (crushed garlic and ginger paste), but it’s time-consuming to modify the diagram and re-upload.

And lastly, yes, it’s a bit facile to call this “cooking for engineers”. Probably real engineers want precise measurements, correct use of flow control symbols and so forth.

New Gmail feature: auto mailing list management

Ok, Gmail, you’re halfway there: you’ve got labels, and you can detect mailing lists (Google Groups ones, anyway). Now, take it a bit further:

  1. Automatically create a label for every group that I receive mail from. Don’t wait for me to do it.
  2. Make these labels more than just a label. Give them options like “skip the inbox” or “delete on sight”.
  3. Have a view which shows all mail from all mailing lists, with some smart options to make this even more useful.

For extra credit:

4. Detect other sources of regular mail which are not “mailing lists” as such. Newsletters from the bank. Quarterly updates from my alma mater. Treat them exactly the same.

Most of the features are already there, but it’s so tedious having to set up a special rule and label for every single list.

Penny Auctions – a bit of analysis

swoopo, bidray, bidstick (bids tick, apparently), bidrivals and dozens of others are running what we’ll call “penny auctions“. Using bidrivals.com as the example, they all work on the following principals:

  1. There are consumer electronics for auction, usually at big discounts.
  2. It costs a certain amount to make a bid, regardless of whether that bid is ultimately successful. For bidrivals.com, it’s 40 British pence.
  3. Every bid raises the price by a fixed amount. In this example, by 1c. It also extends the auction to last another 15 seconds or so.
  4. If you “win” the auction you must then buy the item at the final price.
  5. It’s not a lottery. Because they say so.

At first glance, the auction looks great – buy a phone for $20! Buy a plasma tv for $1.53!

But not so fast. A couple of things that are not obvious to the beginner:

  1. Every dollar of the final price represents $40 in bidding fees. A $1000 TV selling for $1 is a big loss for the site. The same TV selling for $25 is a small profit. Sold for $1000 it’s a $40,000 profit.
  2. You can use a site-provided bot (“bidbot”, “bidbutler”…) to bid on your behalf. If two people do this simultaneously, they’ll both lose a lot of money with no apparent gain.

So, is it a scam? Well, there are really two quesions:

  1. If the site is running completely as described, legitimately, and not using shill bidders (bidding on their own auctions), is this an honest way to make a living – and should you participate?
  2. How do you know if a site is legitimate? Is it likely to be?

Is this honest?

I see very little to distinguish these penny auctions from gambling:

  • When you bid, whether you win or not depends entirely on whether anyone else bids in the next 15 seconds. Assuming you’re bidding on an item which is clearly a bargain (eg, $5 for a TV), then the normal considerations of auctions do not apply: any rational person would bid if they could so for free.
  • The house take is enormous. Frighteningly so. For example, imagine on average the site sells items at a 65% discount from RRP, and bids cost 40 times as much as the amount they increase the value by. This means that it costs on average (100-65)x40 to win a $100 item (whose value is now $65), or in other words (100-65)x40/65=$21.50 to win one dollar’s worth of value. By comparison, a skilled blackjack player in a casino can pay as little as $1.01 to win $1’s worth of value.
  • Most bids give no return to the bidder. This means that even if you don’t want to call it “gambling”, it should still be regulated, as the potential for dishonesty is great. You don’t want to be bidding for a dead donkey.

There is quite a bit of antipathy towards these sites: Washington Post, Jeff Atwood, Ed Oswald.

Can you trust them?

There are two main risks:

  • The site may use “shill bidders” to bid on items that would otherwise go for a low price. This could prevent you ever winning, or cause you to spend far more than you want.
  • Even if you “win”, the site may never ship. The whole thing could be a scam.

Fortunately, there are sites on the look out for this kind of thing, such as pennyauctionwatch.com. There is evidence of dodgy sites, such as fake testimonials.

So, what are the incentives for a site to use shill bidding? Well, as we saw above, the difference between a $1000 item selling for $1 and $25 doesn’t look like much, but it’s the difference between breaking even and posting a big loss. Imagine there is fairly steady bidding activity, but there are just a few gaps before that $25 mark. If the site could shill just a few times, they would massively increase their profitability.

But how much should they shill? Consider two strategies:

  1. bid whenever the time gets to 1 second; or
  2. bid immediately after anyone else bids.

In 1), the shill bids guarantee almost any asking price, as long as there is still some demand. This has the potential to greatly increase profit, and decrease variance.
In 2), half the bids end up being shill bids. This causes two problems: first, you’re directly losing one bid fee for every shill bid. Second, by inflating the price, you’re accelerating reaching the point at which people no longer want to bid, because the prize at stake is shrinking. So if people might normally bid strongly up to half the value of the item, then shilling along the way is just replacing paying bids with free ones. You might even decrease the final sale value, and every dollar of sale value lost is $40 of bidding fees lost.

Conclusion: shill bidding seems likely to occur, in small doses, because the incentive is just so strong.

Can you beat them?

Probably not. It’s been tried. To beat it:

  • You have to find a site that is not a complete scam.
  • You have to find a site that is completely honest. Even a little bit of shill bidding will crush you.
  • You have to defeat an absolutely incredible house take of 95% (remember, normal house take for gambling ranges from 1% to 5%).
  • You have to know enough about the auctions and your fellow users to make you fairly confident that no one will bid in the next 15 seconds. In the $1000 TV at $25 case (40c to bid), you need there to be a greater than 1/250 chance that you will win the auction with this one bid. Sound easy? Think: if that were the case, how did the price get to $25? It would stop, on average, at $2.50.

If it is beatable, you’d think people would have done it. And, since they’re capped at 4 wins per month generally, there would not be much harm in them sharing their secret. Unless they have a network of penny auction-beating bots. There’s a thought.

But just in case you wanted to try:

  • Compare sites. Find a safe one that appears to be losing money.
  • Collect lots of data. Try GreaseMonkey.
  • Find the right time of day, with the least competition.
  • Track all the auctions, pick individual moments and place bids.
  • Don’t try and win a specific auction. Bid any time your positive expectation on that bid is positive. The moment could pass.
  • Consider the effect of distractions. A good moment might be when several auctions are closing at the same time. You could even engineer that by bidding on several simultaneously.
  • Consider using several accounts to bid with, to drive off other bidders. If you know they’re paying attention and will react appropriately, that is. I’m thinking you bid with a group, gradually spacing their bids further apart and hoping you can sneak a 15 second gap through.

More reading:

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!