Google spreadsheets as feed stores

September 18, 2009 by hapdaniel

There’s nothing new about this idea, but I wanted to be able to point people to a set-by-step guide to using a Google spreadsheet to store feed URLs and associated data for a pipe. I do, however, assume that you have a Google account and know how to generally use Google spreadsheets.

So let’s start with a spreadsheet that I’ve already set up.

Column A holds the feed URL, column B holds a prefix I want to use on each item title and column C holds the number of items I want to retrieve from each feed. Each column has a title.

Having set the data up I need to share the spreadsheet, as it’s the sharing that will enable me to use a URL for this spreadsheet in Pipes.

The Share menu is in the top right-hand section of the spreadsheet page, and the option I want is "Publish as a web page", which gives a new window. In the top half of the window there are sheet options.

I have a choice between sharing all sheets or just sheet 1, and between having the link to the sheet automatically reflect changes or not. I chose the options you can see above. The next step is click on "Start publishing". This makes available in the bottom half of this window various link options.

From among the various format options available, Web Page, Atom, RSS etc., I’ve chosen the CSV option, which will enable me to use the Fetch CSV module in Pipes. I could have chosen the RSS or Atom options, in which case I would have used a Fetch Feed module. And finally I have the URL I need to use in the CSV module. Here’s the link to the spreadsheet itself. In order for you to be able to follow that link I had to select the "Get the link to share" option as seen in the Share menu above, and then select the option to allow anyone who had the link to view the sheet.

In Pipes I’ll need to use 2 pipes, a main pipe and a sub-pipe. The sub-pipe that will do most of the work.

The main pipe

The main pipe consists of a Fetch CSV module, a sub-pipe in a Loop and the Pipe Output module. The only thing I’ve done with the Fetch CSV is to add the URL.

And the output from that module looks like:

The column names in the spreadsheet become element names in the Fetch CSV output. Because of that don’t use column names in the spreadsheet that contain spaces or ‘odd’ characters such as percent signs. Pipes won’t like it and you won’t be able to reference the element in the rest of the pipe. So what would the output have looked like if the RSS or Atom feeds for the spreadsheet? Here’s the RSS output.

The title element comes from the first column in the spreadsheet and subsequent columns are combined into the description element.

In the Loop and sub-pipe module I can select the relevant elements to be passed to the sub-pipe.

In order to get the sub-pipe module into the Loop I dragged it from the "My pipes" section of the sidebar into a Loop that was already on the canvas.

The sub-pipe

Here I have a URL Input providing the feed URL for a Fetch Feed module, a Text Input and a String Regex module to provide the prefix value created in a Regex module, and a Number Input providing the value for a Truncate module.

So there it is. I did have a rather wordy section about the advantages of this approach, but I’m going to leave that for a later date.

Global replacement using YQL

August 4, 2009 by hapdaniel

Have a look at his Regex module.

The purpose of the rule is to take each @user and around the name build a link for the user’s Twitter page. The description before the Regex is the same as can be seen in the title element. Unfortunately, once the first user name has been captured that name is used in all of the replacements.

It’s the sort of functionality that many Pipes users want, but it’s just never been implemented in Pipes. However, with the help of YQL we can carry out global replacements. I’ll show 2 very similar ways of achieving this, both of which require the use of YQL Execute. I won’t go into any detail about the inner workings of the Execute statements, but there is an example of the approach on the Pipes blog. The images are taken from a pipe that implements both methods (as well as the unsuccessful Regex rule).

In the first method I use a URL Builder to build a URL which can be fed into the Web Service module.

The Base and Path are both the same as the Pipes example, and the q parameter has a similar structure to the Pipes example: a use statement followed by a select statement. The format has to be json, but the diagnostics can be set to "true" (the path in the Web Service will need to be removed in order to see the diagnostics). The regex and replace_with values are the same as those used in the Regex module shown above. The modifiers parameter takes the place of the switches in the Regex module. Instead of specifying an element on which the regex to be used, the value of the relevant element needs to be copied to an element called "yql_string", and the results of the regex will be returned in this element.

Depending on the data that has been passed to the Web Service some of the elements may need to have escape, "\", characters removed. Check elements that  are likely to have quotation marks, HTML tags or control characters.

The second method is very similar to the first, but uses a Fetch Data module instead of a Web Service. The URL Builder is the same apart from the addition of a new "text" parameter.

The Fetch Data module has the path set to results.result.

Because XML and not JSON is being returned there is no escaping issue. However, there is a limit to the length of the text that can be passed, due to the parameters being part of the HTTP request made to YQL.

Passing complex elements to sub-pipes

June 15, 2009 by hapdaniel

I have 2 warnings here. The first warning is that, as far as I know, this technique is completely unsupported. It became possible as a by-product of the change the Pipes team made to allow users to enter their own values into drop-down boxes. The second warning, and this is a general one for passing any data to a sub-pipe, is that there is a limit of about 2,000 characters that can be passed to a sub-pipe.

So what am I talking about? I’m calling an element "complex" if it has one or more sub-elements.

Sio let’s have a look at some pipes. In the main pipe I’m using 2 Item Builders and a Union. Normally there would be some sort of Fetch module instead.

I’m passing item.stuff to the sub-pipe.

Let’s start with the simplest possible sub-pipe for this situation.

Notice that there are no default or debug values in the Text Input. I haven’t the least idea what this complex element is going to look like to the Text Input module. This makes things tricky, because I’m just going to have to assume that I get things right in the sub-pipe and wait until I’m back in the main pipe to check if everything’s working as expected. I’ve used the element name "stuff" in the Item Builder. I could have used any name, but I’ve found that it makes things simpler to use the same name as that of the element being passed to the sub-pipe.

Back in the main pipe we can now have a look at the result of all this.

The output of the Loop has returned, within the item.result element, a copy of the original stuff element. So far so useless. So let’s set about trying to do something useful. Let’s say I want to filter items to permit only items where a stuff.category element contains "Robots". Here’s an attempt at this without trying to use a sub-pipe.

There’s a Split module, and then in the right-hand path I use a Sub-element module to get the category sub-elements as items, filter on my key word and produce a count of the accepted items.

In the left-hand path I use a Regex module to get the Count value from the right path and then filter the items based on that value. Now all this would be great if my mission was to accept all items if any of the categories for any of the items contained the key word, but that’s not the aim. The problem here is that the sub-element is working at the feed level, so on all the items at once, while I need something that will work on the category values for each item. The purpose of this charade is that I can compare how things would look if I used the sub-pipe approach.

The Split and Regex in the sub-pipe are just there as mechanisms for getting the count value out of the pipe, since the Count module cannot be directly connected to the Pipe Output module. In the main pipe there is a Filter module to filter where the value of item.result.count is greater than 0. As I hope you can see, if you keep the name of the element in the sub-pipe the same as the name of the element passed from the main pipe, it’s not too difficult to work out what is needed at the level of the main pipe and then apply that inside the sub-pipe.

If I hadn’t wanted to show a comparison between processing in the main and sub-pipes I would probably have not used a Count in the sub-pipe, but rather used a Truncate module, to 1, after the Filter and connected that to the Pipe Output. In the main pipe I would have set the Filter to permit items where result.content matched the regex "^(?=.)", that is, result.content existed at all (by containing a character).

Concatenation revisited

June 5, 2009 by hapdaniel

Now that Paul Donnelly has shown us the way, let’s look at another way to concatenate elements from items that uses YQL and the Web Service module.

This pipe starts off in the same way as the pipe in my previous post on concatenation, with a URL Input, an Item Builder and a Fetch Feed in a Loop.

Here we use Rename to make a copy of the items (in item.stuff) to an element yql_array. That’s a specific name that will be used by the YQL web service. On the right is a URL Builder with the elements:

  • Base – http://query.yahooapis.com
  • Path – v1/public/yql
  • Parameters
    • q – use http://pdaniel.co.uk/yql/array_concat.xml as array_concat; select * from array_concat where data = @data and delim=@delim and ele=@ele
      This is the YQL statement at the heart of this method. The use clause links to a YQL open data table with an  Execute clause in it, and gives that table a name of array_concat. The select clause is going to pass values for “data”, “delim” and “ele” to the open table. The values for these are specified using YQL variable substitution. It looks like the @data substitution is special, since we ourselves don’t supply a value for it.
    • format – json. That’s the format the Web Service uses.
    • diagnostics – If you’re having problems with a YQL statement you can set this to “true” and get diagnostic information.
    • delim – One of the @ variables that were used. The delim value is a separator for each of the elements we’re going to collect.
    • ele – the element name. In this case we’re concatenating title elements.

In the Web Service module the path is set to query.results.result.items.

The result of the concatenation is always returned in the yql_array_string element. The rest of the pipe then works on that string to produce the desired output.

This approach isn’t just limited to working on whole items as we can use it to concatenate array elements within an item. In this pipe, for instance, category values within an item are being concatenated. This is how the data looks to start with:

In the Rename module we have

and in the URL Builder we have

Update: 10th June. The ele parameter is now optional. This allows for arrays such as category.0, category.1 as well as category.0.title, category.1.title.

Web page section headers in items

May 21, 2009 by hapdaniel

It’s quite common for a web page to be broken into sections with items appearing in each section.

The best way that I’ve found to approach this situation is to firstly break the page into sections, and then break those sections into items. So let’s have a look at that process as seen in this pipe.

As we can see, the kind developer of this page has made it nice and easy for us to split the page into sections.

The first rule extracts the section name from the content and creates a pair of div tags. The div has a class attribute that should have a value that doesn’t appear anywhere else on the page. So in item.section we now have something like “<div class="mysection">Sports</div>”. Next we need to find some portion of the html that occurs at least once for each item (and preferably only once). The html has to be exactly the same for each item. In this case we can identify “<td class="subject"” as a suitable target. So the second rule takes that string and globally prefixes it with the value in item.section. We now have the text of the section appearing in each item. Time to break out the items.

Nothing out of the ordinary here. Use a String Tokenizer to split up the items, and a Filter to get rid of any unwanted items. In this case the heading is going to be saved as a category element. So a Rename creates the category element.

We don’t want to be left with just the text of the heading at this stage, because we have some housekeeping to do, so the Regex rule leaves us with what we originally had in the section element, e.g. “<div class="mysection">Sports</div>”.

Finally, as far as this post is concerned, we can use a String Replace to remove the div that was inserted, and a Regex to leave us with just the text of the heading. The housekeeping might not be strictly necessary, but it could make life easier for later on, since we can refer to the original content of the web page without having to remember that somewhere in there is some added markup.

Concatenating items in a feed

May 20, 2009 by hapdaniel

By which I mean lumping selected elements from all the items in a feed into a single item.

A word of warning first. This technique involves using a string of numbers in the form “0,1,2,3”, where the highest number will be the maximum (-1) number of items  possible in the feed. If you’re looking to lump together 50 items and aren’t willing to build the relevant string then this is not for you.

Let’s have a look at my pipe that does this.

Here we have a URL Input module, an Item Builder a Fetch Data module in a Loop and a Split. The main thing to notice here is that in the normal run of things I could just have used a URL Input module and a Fetch Data, but what I want is for my items to appear as a sub-element array.

If you want to have more than one element in the output, or you want to format the elements, then use a sub-pipe instead of my Fetch Data module to do that work.

Now let’s see what’s happening in the right-hand path of the Split.

We need the Sub-element so that we can then use the Count to output the number of items in the feed. If we have 2 items then the String Regex converts that number to “,2.*”. This is where that string of numbers I mentioned earlier comes in.

The first rule in this String Regex removes “,2” and everything after that, and the second rule wraps what is left inside “${stuff.” and “.title}”. Finally the String Replace replaces each of the commas with the string on the right. The outcome of all this is that we end up with a string like “${stuff.0.title}<br>${stuff.1.title}”. In case you weren’t aware, in the Regex module we can refer to elements of an item using $(element path excluding item.}. And now we can plug this into the Regex module in the left-hand path of the Split.

The grand result of this is that I can show off the first 2 titles of my blog posts.

I think that could be 3 titles fairly soon.

How long is my (piece of) string?

May 11, 2009 by hapdaniel

There’s no built-in module in Pipes that can count the number of characters in a string/element, and in the past I hadn’t been able to come up with a way to do this. But recently I hit on this method, which may be the strangest way you’ll ever see this function implemented.

In the right hand path of the Split module the Regex module replaces each character by “1,”. There is no significance to the 2 characters used.

the String Tokenizer splits the string into separate items using the second character of the replacement string, the comma in this case. All that is left then is to get a count of the elements and insert that figure into the Item Builder that is in the path on the left.

This pipe needs to be used as a sub-pipe. If the routine is used in a main pipe the String Tokenizer will produce a set of items based on the sum of the strings for all of the main items. Since it’s used as a sub-pipe it means that there is a limit of roughly 2,000 characters for the string.

A pipe as a Firefox custom search engine

April 28, 2009 by hapdaniel

There’s a pipe I often use before I reply to a tweet. It shows me recent tweets from and to a Twitter user. There’s no point in replying to someone if they’ve already received a reply that contains the same information.

While using the pipe I began to wonder if I could set it up as a Firefox custom search engine. A quick web search sent me to this article on Wikihow. It looked simple enough – just create a small text file in the Firefox searchplugins directory and that’s it.

After a small amount of experimentation this is what I came up with:

<search
name="Twitter user"
method="GET"
action="http://pipes.yahoo.com/hapdaniel/rnqt_6gE3hGtLKkw_w6H4A"
queryCharset="utf-8">
<input name="user" user>
</search>

There’s a couple of things to notice here. Firstly, in the pipe I have a Text Input module that has the name “user”. If the name had been “tweeter” then the input tag would have been <input name="tweeter" user>. Secondly, don’t try using the pipe link format http://pipes.yahoo.com/pipes/pipe.info?_id=pipeid since the query generated for the search assumes that the search term parameter is going to be the first parameter. You’ll end up with a query that looks like http://pipes.yahoo.com/pipes/pipe.info?_id=pipeid?user=”fred”, which won’t work.

I found that I needed to restart Firefox in order for my search engine to show up.

Update – Sorry folks, but I’m a blogging novice. I’ve changed the link to the original article I mentioned because the original article was updated and simply didn’t reflect what I was talking about. So lesson one is not to link to anything that starts “Wiki”. For now I’ve changed the link to point to the version of the page I was talking about. My second lesson is to realize that the characters I type in when editing a post are not necessarily the characters that will appear on the blog. My simple quotes appear as fancy quotes, so it means that a copy and paste on the text for my search engine does not work.