Posts Tagged ‘Yahoo Pipes’

Generating indexed items

November 22, 2010

This is a post about generating new items in a pipe where those items are sequentially indexed based on a given number.

What would we use this for?
One reason that comes to mind is where the data is spread over a number of web pages. So we go to the first page and there are links to the next 3 pages and a link to the last page. What we want to do is fetch all of the pages. The link to the last page could look like
http://www.somedomain.co.uk/greatstuff?page=17
So we have the number of the last page and need to generate URLs for pages 2 to 17 (we already have page 1).

Background
Paul Donnelly in a Yahoo! Pipes blog post has already shown us how to index a set of existing items.
http://blog.pipes.yahoo.net/2009/06/01/using-yql-execute-to-power-the-pipes-webservice-module/
Look in the last paragraph of that post for the index example.
So the main task that is left is to create the required number of items.

Generating n items
The pipe starts by putting a numeric value into an element "number" and then reducing that by 2. That’s because Paul’s indexing starts with 0 and we want the indexing to start with 2 (in our example usage we already have page 1). The next step is to use a Regex module to create a new element "num" that includes our number.

Imagine an item that had an element "ting" which was an array where the last cell was cell 15. How would we create that array?
This is step 1.

We’ve now created an element "ting" that has a sub-element "15" that has a value "fred". The element name "ting" is not significant. Neither is the value "fred". The important thing to notice here is that we have not given item.num a new value. What we have done is to assign a value to the contents of item.num. Let’s have a look at this when we assign the value of the Loop instead of emitting it.

We can see that the element "num" still has the value "ting.15", but in loop:itembuilder we have an element "ting" with a sub-element "15" which has a value "fred".
So the next step is to take this element "ting" and a sub-element "15" and create a full array.

From there it’s fairly simple to pass the array through Paul Donnelly’s routine and adjust the numbering (as in my pipe). And from there I’ll leave it to your ingenuity to use the "yindex" element to create URLs.

Pipes and the new Twitter id format

October 20, 2010

Twitter will soon be changing the format of the Tweet ID element.
http://bit.ly/aZislf

To see if this change would affect Pipes I set up a simple test Pipe.

https://i0.wp.com/hapdaniel.pwp.blueyonder.co.uk/pdaniel/blog/twitter_id/pipes_test_result.png

In the results from the test pipe the id element should have the same value as the id_str element. This is obviously not the case.

At this stage it’s not clear whether or not the Pipes team can/will make changes to correct the id value, so at this stage the options are:

  1. Wait and see what happens.
  2. Change your pipes to use the string version of the id.

The important dates for the id change are

  1. October 22 – String versions of the id start appearing (whatever that might mean).
  2. November 4 – Ids will increase to ~41 bits.
  3. November 26 – Ids are increased to the full 64 bits.

Oh, and as for YQL

https://i1.wp.com/hapdaniel.pwp.blueyonder.co.uk/pdaniel/blog/twitter_id/yql_test_result.png

The YQL module and JSON output

May 19, 2010

YQL has certain problems with JSON output, and these problems can be reflected in the output of the YQL module in Pipes (which uses the YQL JSON output). These problems can sometimes be avoided by using a YQL query via a Fetch Data module.

Here’s an example pipe that show the difference.

 

Here are some details of the first item obtained using the Fetch Data module.

 

What we get using the YQL module is

As you can see, we have lost the content from the "content" element.

We can see that this is not an issue with Pipes by looking at a comparison between the XML and JSON output in the YQL console.

The XML output gives us

while the JSON output gives us

 

My preference is to go with the Fetch Data module from the beginning.

Update: The issue for this feed has now been fixed. I’m still staying with my preferred approach

YQL Build 6122 and JSON changes

April 14, 2010

This is just a short note to say that a change in the new YQL build 6122 has meant that the use of JSON.stringify may no longer be required (as in open tables fail) in certain circumstances. I will be changing my pipe examples accordingly.

You heard it here first.

Or perhaps you didn’t.

And no, I don’t know the significance of this change beyond the fact that stuff that used to work doesn’t now without the removal of JSON.stringify.

Google spreadsheets as feed stores

September 18, 2009

There’s nothing new about this idea, but I wanted to be able to point people to a set-by-step guide to using a Google spreadsheet to store feed URLs and associated data for a pipe. I do, however, assume that you have a Google account and know how to generally use Google spreadsheets.

So let’s start with a spreadsheet that I’ve already set up.

Column A holds the feed URL, column B holds a prefix I want to use on each item title and column C holds the number of items I want to retrieve from each feed. Each column has a title.

Having set the data up I need to share the spreadsheet, as it’s the sharing that will enable me to use a URL for this spreadsheet in Pipes.

The Share menu is in the top right-hand section of the spreadsheet page, and the option I want is "Publish as a web page", which gives a new window. In the top half of the window there are sheet options.

I have a choice between sharing all sheets or just sheet 1, and between having the link to the sheet automatically reflect changes or not. I chose the options you can see above. The next step is click on "Start publishing". This makes available in the bottom half of this window various link options.

From among the various format options available, Web Page, Atom, RSS etc., I’ve chosen the CSV option, which will enable me to use the Fetch CSV module in Pipes. I could have chosen the RSS or Atom options, in which case I would have used a Fetch Feed module. And finally I have the URL I need to use in the CSV module. Here’s the link to the spreadsheet itself. In order for you to be able to follow that link I had to select the "Get the link to share" option as seen in the Share menu above, and then select the option to allow anyone who had the link to view the sheet.

In Pipes I’ll need to use 2 pipes, a main pipe and a sub-pipe. The sub-pipe that will do most of the work.

The main pipe

The main pipe consists of a Fetch CSV module, a sub-pipe in a Loop and the Pipe Output module. The only thing I’ve done with the Fetch CSV is to add the URL.

And the output from that module looks like:

The column names in the spreadsheet become element names in the Fetch CSV output. Because of that don’t use column names in the spreadsheet that contain spaces or ‘odd’ characters such as percent signs. Pipes won’t like it and you won’t be able to reference the element in the rest of the pipe. So what would the output have looked like if the RSS or Atom feeds for the spreadsheet? Here’s the RSS output.

The title element comes from the first column in the spreadsheet and subsequent columns are combined into the description element.

In the Loop and sub-pipe module I can select the relevant elements to be passed to the sub-pipe.

In order to get the sub-pipe module into the Loop I dragged it from the "My pipes" section of the sidebar into a Loop that was already on the canvas.

The sub-pipe

Here I have a URL Input providing the feed URL for a Fetch Feed module, a Text Input and a String Regex module to provide the prefix value created in a Regex module, and a Number Input providing the value for a Truncate module.

So there it is. I did have a rather wordy section about the advantages of this approach, but I’m going to leave that for a later date.

Global replacement using YQL

August 4, 2009

Have a look at his Regex module.

The purpose of the rule is to take each @user and around the name build a link for the user’s Twitter page. The description before the Regex is the same as can be seen in the title element. Unfortunately, once the first user name has been captured that name is used in all of the replacements.

It’s the sort of functionality that many Pipes users want, but it’s just never been implemented in Pipes. However, with the help of YQL we can carry out global replacements. I’ll show 2 very similar ways of achieving this, both of which require the use of YQL Execute. I won’t go into any detail about the inner workings of the Execute statements, but there is an example of the approach on the Pipes blog. The images are taken from a pipe that implements both methods (as well as the unsuccessful Regex rule).

In the first method I use a URL Builder to build a URL which can be fed into the Web Service module.

The Base and Path are both the same as the Pipes example, and the q parameter has a similar structure to the Pipes example: a use statement followed by a select statement. The format has to be json, but the diagnostics can be set to "true" (the path in the Web Service will need to be removed in order to see the diagnostics). The regex and replace_with values are the same as those used in the Regex module shown above. The modifiers parameter takes the place of the switches in the Regex module. Instead of specifying an element on which the regex to be used, the value of the relevant element needs to be copied to an element called "yql_string", and the results of the regex will be returned in this element.

Depending on the data that has been passed to the Web Service some of the elements may need to have escape, "\", characters removed. Check elements that  are likely to have quotation marks, HTML tags or control characters.

The second method is very similar to the first, but uses a Fetch Data module instead of a Web Service. The URL Builder is the same apart from the addition of a new "text" parameter.

The Fetch Data module has the path set to results.result.

Because XML and not JSON is being returned there is no escaping issue. However, there is a limit to the length of the text that can be passed, due to the parameters being part of the HTTP request made to YQL.

Passing complex elements to sub-pipes

June 15, 2009

I have 2 warnings here. The first warning is that, as far as I know, this technique is completely unsupported. It became possible as a by-product of the change the Pipes team made to allow users to enter their own values into drop-down boxes. The second warning, and this is a general one for passing any data to a sub-pipe, is that there is a limit of about 2,000 characters that can be passed to a sub-pipe.

So what am I talking about? I’m calling an element "complex" if it has one or more sub-elements.

Sio let’s have a look at some pipes. In the main pipe I’m using 2 Item Builders and a Union. Normally there would be some sort of Fetch module instead.

I’m passing item.stuff to the sub-pipe.

Let’s start with the simplest possible sub-pipe for this situation.

Notice that there are no default or debug values in the Text Input. I haven’t the least idea what this complex element is going to look like to the Text Input module. This makes things tricky, because I’m just going to have to assume that I get things right in the sub-pipe and wait until I’m back in the main pipe to check if everything’s working as expected. I’ve used the element name "stuff" in the Item Builder. I could have used any name, but I’ve found that it makes things simpler to use the same name as that of the element being passed to the sub-pipe.

Back in the main pipe we can now have a look at the result of all this.

The output of the Loop has returned, within the item.result element, a copy of the original stuff element. So far so useless. So let’s set about trying to do something useful. Let’s say I want to filter items to permit only items where a stuff.category element contains "Robots". Here’s an attempt at this without trying to use a sub-pipe.

There’s a Split module, and then in the right-hand path I use a Sub-element module to get the category sub-elements as items, filter on my key word and produce a count of the accepted items.

In the left-hand path I use a Regex module to get the Count value from the right path and then filter the items based on that value. Now all this would be great if my mission was to accept all items if any of the categories for any of the items contained the key word, but that’s not the aim. The problem here is that the sub-element is working at the feed level, so on all the items at once, while I need something that will work on the category values for each item. The purpose of this charade is that I can compare how things would look if I used the sub-pipe approach.

The Split and Regex in the sub-pipe are just there as mechanisms for getting the count value out of the pipe, since the Count module cannot be directly connected to the Pipe Output module. In the main pipe there is a Filter module to filter where the value of item.result.count is greater than 0. As I hope you can see, if you keep the name of the element in the sub-pipe the same as the name of the element passed from the main pipe, it’s not too difficult to work out what is needed at the level of the main pipe and then apply that inside the sub-pipe.

If I hadn’t wanted to show a comparison between processing in the main and sub-pipes I would probably have not used a Count in the sub-pipe, but rather used a Truncate module, to 1, after the Filter and connected that to the Pipe Output. In the main pipe I would have set the Filter to permit items where result.content matched the regex "^(?=.)", that is, result.content existed at all (by containing a character).

Concatenation revisited

June 5, 2009

Now that Paul Donnelly has shown us the way, let’s look at another way to concatenate elements from items that uses YQL and the Web Service module.

This pipe starts off in the same way as the pipe in my previous post on concatenation, with a URL Input, an Item Builder and a Fetch Feed in a Loop.

Here we use Rename to make a copy of the items (in item.stuff) to an element yql_array. That’s a specific name that will be used by the YQL web service. On the right is a URL Builder with the elements:

  • Base – http://query.yahooapis.com
  • Path – v1/public/yql
  • Parameters
    • q – use http://pdaniel.co.uk/yql/array_concat.xml as array_concat; select * from array_concat where data = @data and delim=@delim and ele=@ele
      This is the YQL statement at the heart of this method. The use clause links to a YQL open data table with an  Execute clause in it, and gives that table a name of array_concat. The select clause is going to pass values for “data”, “delim” and “ele” to the open table. The values for these are specified using YQL variable substitution. It looks like the @data substitution is special, since we ourselves don’t supply a value for it.
    • format – json. That’s the format the Web Service uses.
    • diagnostics – If you’re having problems with a YQL statement you can set this to “true” and get diagnostic information.
    • delim – One of the @ variables that were used. The delim value is a separator for each of the elements we’re going to collect.
    • ele – the element name. In this case we’re concatenating title elements.

In the Web Service module the path is set to query.results.result.items.

The result of the concatenation is always returned in the yql_array_string element. The rest of the pipe then works on that string to produce the desired output.

This approach isn’t just limited to working on whole items as we can use it to concatenate array elements within an item. In this pipe, for instance, category values within an item are being concatenated. This is how the data looks to start with:

In the Rename module we have

and in the URL Builder we have

Update: 10th June. The ele parameter is now optional. This allows for arrays such as category.0, category.1 as well as category.0.title, category.1.title.

Web page section headers in items

May 21, 2009

It’s quite common for a web page to be broken into sections with items appearing in each section.

The best way that I’ve found to approach this situation is to firstly break the page into sections, and then break those sections into items. So let’s have a look at that process as seen in this pipe.

As we can see, the kind developer of this page has made it nice and easy for us to split the page into sections.

The first rule extracts the section name from the content and creates a pair of div tags. The div has a class attribute that should have a value that doesn’t appear anywhere else on the page. So in item.section we now have something like “<div class="mysection">Sports</div>”. Next we need to find some portion of the html that occurs at least once for each item (and preferably only once). The html has to be exactly the same for each item. In this case we can identify “<td class="subject"” as a suitable target. So the second rule takes that string and globally prefixes it with the value in item.section. We now have the text of the section appearing in each item. Time to break out the items.

Nothing out of the ordinary here. Use a String Tokenizer to split up the items, and a Filter to get rid of any unwanted items. In this case the heading is going to be saved as a category element. So a Rename creates the category element.

We don’t want to be left with just the text of the heading at this stage, because we have some housekeeping to do, so the Regex rule leaves us with what we originally had in the section element, e.g. “<div class="mysection">Sports</div>”.

Finally, as far as this post is concerned, we can use a String Replace to remove the div that was inserted, and a Regex to leave us with just the text of the heading. The housekeeping might not be strictly necessary, but it could make life easier for later on, since we can refer to the original content of the web page without having to remember that somewhere in there is some added markup.

Concatenating items in a feed

May 20, 2009

By which I mean lumping selected elements from all the items in a feed into a single item.

A word of warning first. This technique involves using a string of numbers in the form “0,1,2,3”, where the highest number will be the maximum (-1) number of items  possible in the feed. If you’re looking to lump together 50 items and aren’t willing to build the relevant string then this is not for you.

Let’s have a look at my pipe that does this.

Here we have a URL Input module, an Item Builder a Fetch Data module in a Loop and a Split. The main thing to notice here is that in the normal run of things I could just have used a URL Input module and a Fetch Data, but what I want is for my items to appear as a sub-element array.

If you want to have more than one element in the output, or you want to format the elements, then use a sub-pipe instead of my Fetch Data module to do that work.

Now let’s see what’s happening in the right-hand path of the Split.

We need the Sub-element so that we can then use the Count to output the number of items in the feed. If we have 2 items then the String Regex converts that number to “,2.*”. This is where that string of numbers I mentioned earlier comes in.

The first rule in this String Regex removes “,2” and everything after that, and the second rule wraps what is left inside “${stuff.” and “.title}”. Finally the String Replace replaces each of the commas with the string on the right. The outcome of all this is that we end up with a string like “${stuff.0.title}<br>${stuff.1.title}”. In case you weren’t aware, in the Regex module we can refer to elements of an item using $(element path excluding item.}. And now we can plug this into the Regex module in the left-hand path of the Split.

The grand result of this is that I can show off the first 2 titles of my blog posts.

I think that could be 3 titles fairly soon.