Threading made Easy (2/2)
This blog is part two of a little example I wrote to test the new threading stuff in .Net 4.0. To read part 1 follow this link.
Short recap from part 1 first. This is what I wrote:
The thing I have to do is get some data from the web in the following pseudo manner:
foreach subject from the list
search for the last 100 tweets on this subject
The part left over was to show some results from the threaded calls from part 1. It so happens to be that, somewhere near the end, I showed the following call as the most nicest and clean threading mechanism:
public static void GetParallel() { Parallel.ForEach(UrlList, url => { Get(url); } ); }
Threadsafe collections
From the example it shows we’re not doing anything useful with the result which is returned by Twitter. A byte array is returned and the thing I want to do is to store the byte array in a collection of byte arrays. Pretty simple huh? Well almost because a potential threading problem is kickin’ in. The standard collections in .Net are not threadsafe. We used to write code with a locking object and lock the piece of code which makes changes to the collection. The new .Net 4.0 has a more smooth solution to this problem: System.Collections.Concurrent.BlockingCollection. And the code above is changed in:
// use a thread safe collection for all threads BlockingCollection<byte[]> data = new BlockingCollection<byte[]>(); Parallel.ForEach(UrlList, url => { data.Add(Get(url)); } ); // return an ordinary collection (non thread safe) to the outside world return data.ToList<byte[]>();
So each thread use a thread safe collection to store its results and when we’re ready with all threads the non-threadsafe counterpart is returned because when processing the result we don’t need any multithreaded handling.
Twitter data structure
Twitter returns its data in JSON format. The search API returns the data in the following structure (see also the Twitter API documentation), here is an example:
{
"results":
[
{
"text":"@twitterapi http:\/\/tinyurl.com\/ctrefg",
"to_user_id":396524,
"to_user":"TwitterAPI",
"from_user":"jkoum",
"metadata":
{
"result_type":"popular",
"recent_retweets": 109
},
"id":1478555574,
"from_user_id":1833773,
"iso_language_code":"nl",
"source":"<a href="http:\/\/twitter.com\/">twitter<\/a>",
"profile_image_url":"http:\/\/s3.amazonaws.com\/twitter_production\/profile_images\/118412707\/2522215727_a5f07da155_b_normal.jpg",
"created_at":"Wed, 08 Apr 2009 19:22:10 +0000"
},
... truncated ...
],
"since_id":0,
"max_id":1480307926,
"refresh_url":"?since_id=1480307926&q=%40twitterapi",
"results_per_page":15,
"next_page":"?page=2&max_id=1480307926&q=%40twitterapi",
"completed_in":0.031704,
"page":1,
"query":"%40twitterapi"
}
The list of byte arrays we have is a list of bytes in JSON format. Each byte array represents the results of a query and consists of some meta data describing the query and a list with references to all tweets returned by the search query. Each of these byte arrays needs to be transformed in something more meaningful and useful. Here JSON comes to the rescue, it is not only blazingly fast it is also simple. All we have to do is just define the interesting elements in the structure and the serialization will do its work, silently and fast. The structure in a .Net understandable format is:
[DataContract] public class TwitterSearchResultList { [DataMember] public TwitterSearchResult[] results { get; set; } [DataMember] public string query { get; set; } [DataMember] public double completed_in { get; set; } } [DataContract] public class TwitterSearchResult { [DataMember] public string from_user { get; set; } [DataMember] public string text { get; set; } [DataMember] public string profile_image_url { get; set; } }
Transforming one byte array with twitter results is now as easy as:
List<TwitterSearchResultList> tweets = new List<TwitterSearchResultList>(); foreach (byte[] tweetlist in allTweetLists) { MemoryStream tweetStream = new MemoryStream(tweetlist); DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(TwitterSearchResultList)); TwitterSearchResultList tsrl = (TwitterSearchResultList)serializer.ReadObject(tweetStream); }
Rendering to html
To render the TwitterSearchReturnList to a presentable format I generated some html. This can be done much cleaner and nicer with XSLT and/or CSS but for the sake of the example I just coded the bare minimum. It looks like:
StringBuilder = new StringBuilder(); stringBuilder.Append("<TABLE>"); foreach (TwitterSearchResult tweet in list.results) { stringBuilder.Append("<TR>"); // the tweeters' avatar stringBuilder.AppendFormat("<TD><IMG src=\"{0}\" width=\"48\" heigth=\"48\" /></TD>", tweet.profile_image_url); // tweeters' name stringBuilder.AppendFormat("<TD>{0}</TD>", tweet.from_user); // tweet text; with all urls as an href stringBuilder.AppendFormat("<TD>{0}</TD>", Regex.Replace(tweet.text, "(http:/[\\S/]*)", "<a href=\"$1\">$1</A>")); stringBuilder.Append("</TR>"); } stringBuilder.Append("</TABLE>");
Resulting in the following output:

Source code
The full source code for this example can be downloaded here.
Thank you for reading, questions will be answered, suggestions are more than welcome!
have fun,
florisz
