User

You are logged in as Anonymous.

Want to log out?

My friend Paul has a cool service called Wonderproxy that lets you test and develop GeoIP-based apps without the normal headaches. If you need to simulate remote, international traffic, you should check it out.
← Previous  1 2 3 4 5 … 20 Next →

Recent Happenings

I've got a bunch of stuff that I haven't found/made time to blog about, so just dropping some quick notes here:

  • I've been invited to speak at PHP Quebec 2009. I've been to this conference a few times (but not for a couple years, now), and I'm really looking forward to getting back into the conference circuit (as a speaker, not an organizer... think of all the free time I'll have! (-; Anyway, I'll be giving a talk entitled "Stupid Browser Tricks" in which I'll talk (at a high level) about Firebug, and Selenium IDE, and possibly a few other things like granular browser security, komodo macros/extensions (like a browser!) and maybe greasemonkey.
  • This year, I was once again invited back to the Microsoft Web Developers Summit (couldn't think of a better URL). This is a yearly event where Microsoft selects members of the PHP community to Redmond to have a discussion on PHP and Microsoft's offerings. This year was definitely the best one yet, as it was better organized, and it felt much less like they were trying to sell us things. Their candor was especially appreciated this year, as I think many of the attendees felt like Microsoft was asking us for our opinions instead of trying to give them to us. I wrote about this last year, and I think what I wrote still rings true, today. Thanks to the organizers... we got some great information, made our opinions clear, and had a LOT of fun (great people!).
  • I tweeted about this, but never posted it on my blog. My colleague Luke Welling is a funny guy.
  • Over the holiday weekend (I got days off, but in Canadia, we celebrate Thanksgiving in October), I found some time to work on a bunch of pet projects, including fale.ca, which is nothing special, but kind of fun. See?
  • Today, I was extended an invitation to join the Habari Cabal, which I quickly accepted. So, if you use Habari and your blog breaks in the future, it's probably my fault.
  • ... and last, but not least, Chris and I—with the help of many other people—managed to almost get the 2008 PHP Advent calendar launched in time. Word on the street is that Jon Tan is going to show the design some love, and we have a feed. The 2007 edition was a success, but was a lot of work, so I offered to pitch in this year. Thanks to everyone who's already submitted... and the rest of you slackers: get to it! (-;
  • S

UTF: WTF?

Note: This article first ran in php|architect in March 2008, while I still worked at MTA. Marco (the publisher, and my former colleague) has graciously agreed to allow me to republish this in a more public forum. I've wanted to link a few people to it in the past few months and until now that was only possible if they were php|architect subscribers. That said, if you're into PHP, you really should subscribe to php|a.

As you might know, one of my roles at php|architect is to organize and manage speakers (and their talks) for our PHP conferences.

A while back, PHP 6's main proponent, Andrei Zmievski, submitted a talk that we accepted, entitled "I ♥ Unicode, You ♥ Unicode." When we selected the talk and invited Andrei to attend the conference, he accepted and humorously suggested that we pay special attention to the talk's heart characters when publishing details on the conference website and in other promotional materials. I took his suggestion as wise advice, and double checked the site before releasing it to the public—it worked perfectly.

Within a few hours of publication, Andrei dropped me a note indicating that I hadn't heeded his warning, and that the s weren't showing up properly. The problem turned out to be a bug in a specific version of Firefox, and I believe we resolved it by employing the entity. This ordeal, while minor, was my first taste of how bad things would become.

If I had to guess, I would estimate that I've spent somewhere in the range of 40 hours wrangling UTF-8 in the past 3 months, which is not only expensive for my employer, but also disheartening as a developer who's got real work to do. Admittedly, this number is inflated, due to the heavy development cycle we completed with the launch of our new site. As time goes on, though, I don't see this situation improving in the short term (though, if we were to glimpse much further into the future, I'm sure we'll eventually consider this a solved problem).

The main problem with using Unicode, today, is that it's partially supported by some parts of any given tool chain. Sometimes it works great, and other times—due to a given piece of software's lack of implementation (or worse, a partial implementation), human error, or full-on bugs—the chain's weakest link shatters in a non-spectacular way.

As any experienced developer knows, having the weak point of a process collapse is a normal part of building complex systems. We're used to it, and we usually manage this by making the systems less complex, by eliminating the parts that are prone to collapse, or by fixing the broken parts. When implementing a system that may contain Unicode data, today, we're challenged with many potential points of failure that are often difficult to identify, and nearly impossible to replace.

To illustrate, consider an overly simplified web development work—and content delivery—flow: developer creates a file, developer edits file, developer uploads the files to the web server, httpd receives a request from a browser, httpd passes the request to PHP, PHP delivers content back to httpd, httpd delivers content to the visitor's browser. If a single part of this flow fails to handle Unicode properly, a snowball effect causes the rest of the chain to fail.

A more typical flow for me (and our code) goes something like this: create file, edit file, commit file to svn, other developers edit file, others commit to svn, release is rolled from svn, visitor browser requests page, httpd parses request, httpd delivers request to PHP, PHP processes request, PHP (client) calls service to fulfill back-end portions of request (encodes the request in an envelope—we use JSON most of the time), PHP (service) receives request, service retrieves and/or stores data in database, service returns data to PHP client, PHP client processes returned data and in turn delivers it to httpd, httpd returns data to browser.

If you'll bear with me for one last list in this article, that means that any (one or more!) of the following could fail when handling unicode: developers' editors, developers' transport (either upload or version control), user's browser, user's http proxy, client-side httpd, client-side PHP, client-side encoder (JSON), service-side httpd (especially HTTP headers), service-side decoder, service-side PHP, service-side database client, database protocol character set imbalance, database table charset, database server, service-side encoder, client-side decoder, client-side PHP (again), client-side httpd (including HTTP headers, again), user's proxy (again), and user's browser (again). I've probably even left some out.

As you can see, there are so many points of failure here, that determining the source of an invalid UTF-8 character is torturous, at best.

Recently, I had to wrestle UTF-8 monsters. In my case, it was a combination of user (me) error and an actual bug in PHP, but it was so non-obvious that it caused most of my day to melt away, trying to resolve the issue. In my case, I had decided to split a file that contained UTF-8 characters into two files. By default, my editor of choice creates new files using my system character encoding—which happened to be Mac-Roman because I hadn't changed it from Leopard's default. The original file was UTF-8, and the characters displayed normally in the new Mac-Roman file. However, when the data was passed to PHP's json_encode function, the string was arbitrartily truncated, due to a PHP bug .

Because the script that triggered the bug pulled the data from a database, and the data was inserted by another script—the one with the broken encoding/characters—it took me entirely too long to trace it back to the change I'd made to that now-split file. For a while, I even thought that MySQL was storing the data poorly because we'd had problems with that before, and also because the database client I was using that day was reporting the characters improperly, due to its own encoding issues. I believe my blood pressure skyrocketed to dangerous levels, that afternoon.

Universal Unicode support is going to be a long uphill battle. I'm not sure I'm ready for it, but I hope it's worth it, nonetheless.

More Web of Trust Thoughts

A while back, I blogged about trust on the web, and how there are a lot of assumptions made by content providers that simply don't carry over to end users, or are just a small (but important) step from being good practices.

Yesterday, at $work, we were talking about something that lead to a discussion on SSL, and how I think (hypocritically since the domain you're reading right now isn't even available on https://) that most sites, even if they don't contain sensitive information should be available by https—even if the certificate is self-signed.

Chris respectfully (I think (-; ) disagreed with me saying that certificates that are not trusted a user's browser are as bad, or even worse than not allowing SSL at all. His theory—and I'm sure he'll correct me below if I'm misrepresenting him—is that offering this type of unverifiable certificate is not only useless, but harmful to users because there's a false sense of security. My retort, though not well received, is that users of modern browsers (Firefox at least) will be notified when a self-signed certificate that they've accepted has changed. This at least allows the user to verify when something is amiss. His rebuttal was that there's no way for the user to tell which certificate is the "good" one, and which is the "bad" one, and I can see his point.

We had a discussion on DNS and how we trust it for a lot of things that we shouldn't, even though we don't want to... especially given the recent problems with DNS. In the end, we all agreed that putting something like http://omniti.com/ on self-signed https serves no practical value as users will a) never use it, b) not know how to verify the certificate, and c) will get confused by their browser warning them about security problems.

This lead to a few other branches of thinking about SSL. The first was a question Chris asked us "how do access your online banking?" clarifying with "how do you get to the login page?" A few of us (myself included) answered "bookmark" while others said they hit their bank's main domain either from URL history or manually, and clicked through from there. Chris's point was that most users visit http://bank.example.com/ and are somehow directed their https login page. I checked my bank, and bad things happen:

  • visit http://www.royalbank.ca/
  • click "online banking", which links to http://www.rbcroyalbank.com/STRINGHERE/redirect-bank-hp-pagelink-olb.html
  • which redirects, via META tag to: https://www1.royalbank.com/cgi-bin/rbaccess/RESTOFURLHERE
  • user is presented the login form (in https)

My bookmark is the https://www1.royalbank.com/... page, so I feel relatively safe, but let's look at the bad things that happen here:

  • User visits one domain (HTTP, not secure)
  • User is _silently_ redirected to another domain on HTTPs

Why are these bad? Well, aside from the possible confusion of getting bumped from royalbank.ca to rbcroyalbank.com to royalbank.com, the user's chain of trust breaks down when they visit http://royalbank.ca/. http—no "s". If this site was compromised, the user would never know (without careful URL confirmation at the https destination) that s/he was not maliciously redirected to https://www1.roya1bank.com/ (note "L" is "1" (one) in my bad-guy example). Phishers could easily get a SSL certificate for roya1bank.com.

That got me thinking a bit about the SSL certificate acquisition process. I'm sure some of the really high-end SSL certificates still come with human validation (a real person looks at the application and makes a real decision about granting the certificate; in the case above, hopefully this would have been caught). Most certificate signing I've seen recently is based on proven ownership of the domain in question. So, as I say, it's trivial for me to go register a domain that LOOKS like a bank. Sure, I'd still have to compromise either the http server or DNS that points at the server, but Kaminsky demonstrated that this isn't so hard (or wasn't until just a few weeks ago).

Let's take it a step further back. If bad guys can compromise DNS, which is inherently insecure (not SSL, no trust model other than IP address, and it runs on UDP(!)), then surely they can trick your the certificate authority's SMTP server to deliver mail to another mail exchanger, right?

  • bad guy targets example.com poisons the certificate authority's DNS for example.com to point MX at an IP controlled by bad guy
  • bad guy generates a certificate signing request (CSR) and send it to the certificate authority (CA), "From" bob@exmaple.com
  • CA receives the CSR and verifies with whois that the contact for the domain is bob@example.com
  • CA signes the CSR and returns the certificate to bob@example.com (either by mail or through a web interface)
  • bad guy is now in posession of a perfectly valid and trusted http://example.com/ SSL certificate

Scary. You must be thinking that CAs probably have a more secure DNS setup and wouldn't get poisoned (as easily). I believe that to be somewhat true. Let's say it's absolutely true: the CA has 100% perfectly secure DNS. Ok, we'll need to go one step further back:

  • bad guy poisons the DNS for the target's less secure $20/month ISP, example.com, to redirect the MX for example.com to a different server
  • bad guy visits example.com's registrar's web interface and indicates that he has forgotten his password
  • registrar generates password reset URL/instructions and emails it to bob@example.com
  • bad guy receives the hijacked email, logs into the domain and changes the contacts to badguy@example.net, an email account that he controls
  • bad guy generates a CSR and sends it to the CA from badguy@example.net, and continues the process outlined above to receive a legitimate, valid and trusted certificate

In any of these scenarios, hundreds or thousands of account credentials could be acquired—especially with creative use of proxies at the bad guy's malicious server.

We're lead to believe that SSL is truly safe, and it's true that the encryption part lives up to the expectation, but modern practice of the certificate generation/signing process certainly leaves something to be desired, I think.

Yeah, it might be a long shot that an attacker could easily poison specific DNS servers on the internet, but again, as Kaminsky showed the world just a few weeks ago, (nearly?) every DNS server on the planet was vulnerable to exactly this type of attack before summer 2008.

Pardon me if I don the tinfoil hat until we all forget about this mess.

PHP-Aware Diff

UPDATED (and intentionally reinserted into the feed):

I've made a bunch of changes to this code, and updated it.

It's quite a bit slower, but I really don't care (-:

It uses my new pet project, the tokalizer.

You'll probably want to grab the newly-compiled diff-php as this is the one I'll be "maintaining" (ie, when someone complains, or when it breaks for me).

(end update)

I've told a few people that I'd blog about this "soon" and that was a while ago, so I figured I'd better get on the ball.

I tweeted this almost two weeks ago:

Derick responded saying that diff -p does this for C. I tried it with PHP, and it gave me the outermost block where the change occurred (ie, the class, not the function). The interesting thing, though, is that it changed the @@ line:

@@ -32,7 +32,7 @@ class Foo2 {

Almost what I was looking for, not not quite. I really wanted a php-aware diff that could tell me context.

So, what's a developer with almost no spare time on his hands (but an idea of how to actually accomplish this pet project) to do? Write it himself, of course! (-:

So, I did. Here's an example of the output:

--- tmp/left.php
+++ tmp/right.php
@@ -1,7 +1,7 @@ (root)
 <?php
 class Foo {
     function bar() {
-        // baz!
+        // bax!
     }
 }
 
@@ -32,7 +32,7 @@ (root):Foo2(class)
 // k
 // l
     function bar2() {
-        // baz2!
+        // bax2!
     }
 }
 
@@ -63,7 +63,7 @@ (root):Foo3(class):bar3(function)
 // k
 // l
         $test = "foo {$test}";
-        // baz2!
+        // bax2!
     }
 
     function bar4() {
@@ -93,7 +93,7 @@ (root):Foo3(class):bar4(function):bar5(function)
 // k
 // l
             $test = "foo {$test}";
-            //baz5
+            //bax5
 // a
 // b
 // c

Here's the code for my php-aware diff. I use it as my default svn diff command now (see comments). Hope you find it useful, I sure do.

#!/usr/bin/php
<?php
/// PHP-Aware diff
 
/// Copyright 2008, Sean Coates
///   Usage of the works is permitted provided that this instrument is retained
///   with the works, so that any entity that uses the works is notified of this
///   instrument.
///   DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
/// (Fair License - http://www.opensource.org/licenses/fair.php )
/// Short license: do whatever you like with this.
 
 
//// save this file as diff-php
////    and make sure /path/to/diff-php is chmod +x
 
//// TO USE from cli:
////    /path/to/diff-php leftfile rightfile   # (compares files, as diff does)
 
////
//// TO USE from svn:
////    in ~/.subversion/config, add: diff-cmd = /path/to/diff-php
 
//// You might need to adjust DIFF_PATH, below
 
// the tokenizer scares me a bit (-:
 
class DiffPHP {
 
    const DEBUG_SYNTAX = false; // set to true to get syntax error data (== broken diffs)
 
    const DIFF_PATH = '/usr/bin/diff';
    const DIFF_OPTS = '-u';
 
    /**
     * The "left" file, as passed by svn (or cli)
     */
    protected $left;
 
    /**
     * The "right" file, as passed by svn (or cli)
     */
    protected $right;
 
    /**
     * A "nice" version of the left file.
     *
     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php
     */
    protected $niceLeft;
 
    /**
     * A "nice" version of the right file.
     *
     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php
     */
    protected $niceRight;
 
    /**
     * Captured file contents (prevents reading the file twice + diff)
     */
    protected $fileContents;
 
    /**
     * The output from the diff executable
     */
    protected $diff;
 
    /**
     * Each chunk of the diff goes in here (begins with a @@ identifier line)
     */
    protected $chunks;
 
    /**
     * Array of tokens from the Left file
     */
    protected $tokens;
 
    /**
     * Mapping of source lines to source class/functions
     */
    protected $lineMap;
 
    /**
     * Current context (used to construct line map)
     */
    protected $context;
 
    /**
     * Brace depth (used to determine if we're still in the current context)
     */
    protected $braceDepth;
 
    /**
     * Bool flag to indicate that syntax is somehow broken
     */
    protected $isBroken;
 
    /**
     * Object-wide index to keep track of the current token number
     */
    protected $tokenIndex;
 
    /**
     * Currently parsing token value
     */
    protected $currentValue;
 
    /**
     * Constructor. The magic happens here. Once instantiated, the entire
     * process runs
     */
    public function __construct() {
        $this->parseArgs();
 
        $this->fileContents = file_get_contents($this->left);
 
        $this->doDiff();
 
        // subject (probably) IS a PHP file:
        if (!isset($_ENV['NODIFFPHP']) && stripos($this->fileContents, '<?') !== false) {
            $this->splitDiff();
            $this->determineHierarchy();
            $this->reconstructDiff();
        } else {
            // not a PHP file; return regular diff:
            echo $this->diff;
        }
    }
 
    /**
     * Parses the passed arguments.
     *
     * Determines if it's svn (7 args) or cli (2 args), and stores the parsed
     * arguments.
     */
    protected function parseArgs() {
        // if this is being called from svn, we'll get 4 arguments
        //   (8th is argv 0 == this script)
        if (8 == $_SERVER['argc']) {
            $this->niceLeft = $_SERVER['argv'][3];
            $this->niceRight = $_SERVER['argv'][5];
            $this->left = $_SERVER['argv'][6];
            $this->right = $_SERVER['argv'][7];
        } else if (3 == $_SERVER['argc']) {
            // 2 arguments means a regular diff
            $this->niceLeft = $_SERVER['argv'][1];
            $this->niceRight = $_SERVER['argv'][2];
            $this->left = $this->niceLeft;
            $this->right = $this->niceRight;
        } else {
            die("See " . __FILE__ . " for details on how to use this script\n");
        }
    }
 
    /**
     * Calls the external diff program to get the base diff
     */
    protected function doDiff() {
        if (is_readable($this->left) && is_readable($this->right)) {
            $diffCmd = self::DIFF_PATH . ' ' . self::DIFF_OPTS . " {$this->left} {$this->right}";
            $this->diff = `$diffCmd`;
        } else {
            die("{$this->left} or {$this->right} is not readable\n");
        }
    }
 
    /**
     * Takes an identifier line (looks like: @@ -30,23 +30,79 @@) and returns
     * the begin line number
     */
    protected function parseLineNum($identifier) {
        list(,$from) = explode(" ", $identifier);
        list($from) = explode(',', $from);
        return (int) substr($from, 1);
    }
 
    /**
     * Sanitizes CRLF or CR into just LF
     */
    protected function sanitizeLineEndings($data) {
        // first, sanitize line endings:
        $data = str_replace("\r\n", "\n", $data);
        $data = str_replace("\r",   "\n", $data);
        return $data;
    }    
 
    /**
     * Actually splits the diff into chunks and stores chunks + line numbers
     */
    protected function splitDiff() {
        // now split:
        $this->diff = explode("\n", $this->sanitizeLineEndings($this->diff));
 
        // array to return:
        $this->chunks = array();
 
        // line counter
        $line = 0;
 
        // outer loop: file(s)
        $maxLine = count($this->diff);
 
        // skip first 2 lines as left, right files
        $line += 2;
 
        // descend into data chunks
        while ($line < $maxLine) {
            // next line is the chunk identifier
            $dataChunk = array();
            $dataChunk['identifier'] = $this->diff[$line++];
            $dataChunk['line'] = $this->parseLineNum($dataChunk['identifier']);
            $dataChunk['data'] = array();
            while ($line < $maxLine && !(substr($this->diff[$line], 0, 2) == '@@' && substr($this->diff[$line], -2) == '@@')) {
                $dataChunk['data'][] = $this->diff[$line++];
            }
            $this->chunks[] = $dataChunk;
        }
    }
 
    /**
     * Reconstructs the diff (with adjusted identifier lines, and outputs the
     * result)
     */
    protected function reconstructDiff() {
        $out = "--- {$this->niceLeft}\n+++ {$this->niceRight}\n";
        foreach ($this->chunks as $chunk) {
            $out .= $chunk['identifier'] . "\n";
            $out .= implode("\n", $chunk['data']) ."\n";
        }
        echo $out;
    }
 
    /**
     * Descends into a deeper context
     *
     * @param string $type friendly name, either class or function
     */
    protected function enterContext($type) {
        // next comes whitespace:
        if (is_array($this->tokens[++$this->tokenIndex])) {
            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
        } else {
            $token = null;
            $this->currentValue = $this->tokens[$this->tokenIndex];
        }
        if ($token != T_WHITESPACE) {
            // syntax is broken, let's get out of here
            if (self::DEBUG_SYNTAX) {
                die("Syntax broken in whitespace assertion, " . $this->context[count($this->context) - 1] . "\n");
            }
            $this->isBroken = true;
            break;
        }
        $this->checkLineBreak();
 
        // next comes the name:
        if (is_array($this->tokens[++$this->tokenIndex])) {
            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
        } else {
            $token = null;
            $this->currentValue = $this->tokens[$this->tokenIndex];
        }
        $this->context[] = $this->currentValue . "({$type})";
 
        // chew through the next few tokens until we get a "{"
        while ($this->currentValue != '{' && $this->tokenIndex < count($this->tokens)) {
            if (is_array($this->tokens[++$this->tokenIndex])) {
                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
            } else {
                $token = null;
                $this->currentValue = $this->tokens[$this->tokenIndex];
            }
            $this->checkLineBreak();
            switch ($token) {
                // these are all valid before the brace:
                case null:
                case T_WHITESPACE:
                case T_VARIABLE:
                case T_EXTENDS:
                case T_IMPLEMENTS:
                case T_STRING:
                case T_ARRAY:
                case T_CONSTANT_ENCAPSED_STRING:
                case T_LNUMBER:
                case '=':
                    break;
 
                // if another token is found, then there's a syntax error
                // (this was added to prevent really deep looping)
                default:
                    if (self::DEBUG_SYNTAX) {
                        die("Syntax broken in token assertion, " . $this->context[count($this->context) - 1] . "," . token_name($token) . "\n");
                    }
                    $this->isBroken = true;
                    return;
            }
        }
 
        // found the starting brace
        $this->braceDepth[count($this->context) - 1] = 1;
    }    
 
    /**
     * Tokenizes the code and creates a line map
     */
    protected function tokenizeHierarchy() {
        $this->context = array('(root)');
        $this->lineMap = array('');
        $this->tokens = token_get_all($this->sanitizeLineEndings($this->fileContents));
        $this->isBroken = false;
        for ($this->tokenIndex=0; $this->tokenIndex<count($this->tokens); $this->tokenIndex++) {
            if ($this->isBroken) {
                // syntax is somehow broken; return progress, but don't go further
                return;
            }
            if (is_array($this->tokens[$this->tokenIndex])) {
                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
            } else {
                $token = null;
                $this->currentValue = $this->tokens[$this->tokenIndex];
                //change here
            }
 
            switch ($token) {
                // check for class
                case T_CLASS:
                    // found "class"
                    $this->enterContext('class');
                    break;
 
                case T_FUNCTION:
                    // found "function"
                    $this->enterContext('function');
                    break;
 
                default:
                    $idx = count($this->context) - 1;
                    switch ($this->currentValue) {
                        case '{':
                        case T_CURLY_OPEN:
                        case T_DOLLAR_OPEN_CURLY_BRACES:
                            ++$this->braceDepth[$idx];
                            break;
 
                        case '}':
                            --$this->braceDepth[$idx];
                            if ($this->braceDepth[$idx] == 0) {
                                // we're out of this context
                                array_pop($this->context);
                            } else if ($this->braceDepth[$idx] < 0) {
                                // bad stuff!
                                if (self::DEBUG_SYNTAX) {
                                    die("Syntax broken in brace close assertion, " . $this->context[count($this->context) - 1] . "\n");
                                }
                                $this->isBroken = true;
                            }
                            break;
 
                        default:
                            $this->checkLineBreak();
                    }
            }
        }
    }
 
    /**
     * Determines if the currently processing token contains line breaks, and
     * if so, adjusts the lineMap accordingly
     */
    protected function checkLineBreak() {
        // check for new line:
        if (strpos($this->currentValue, "\n") !== false) {
            for ($j=1; $j<=substr_count($this->currentValue, "\n"); $j++) {
                $this->lineMap[] = implode(':', $this->context);
            }
        }
    }
 
    /**
     * Matches the chunk map to the line map
     */
    protected function determineHierarchy() {
        $this->tokenizeHierarchy();
        for ($chunknum=0; $chunknum < count($this->chunks); $chunknum++) {
            $this->chunks[$chunknum]['identifier'] .= ' ' . $this->lineMap[$this->chunks[$chunknum]['line']];
        }
    }
}
 
new DiffPhp;
 
// komode: le=unix language=php codepage=utf8 tab=4 notabs indent=4

The most up-to-date version of this file can also be found in my personal svn repostory: https://svn.caedmon.net/svn/public/diff-php/diff-php.

Please let me know if you run into any bugs.. I'm sure there are a few, but it works pretty well for me.

Is Pagination Still Necessary?

My first network connection device was a 2400 baud modem. Practically speaking, that would allow me to sustain downloads at a rate of less than 250 bytes per second. This was relatively fast at the time; I'd been using my buddy's 1200 baud modem to connect to local BBSs before that modem-netting birthday.

To put this into perspective, the Yahoo! homepage, all considered, is somewhere around 470kB. On my early-90s era modem, it would have taken a little over 30 minutes (half of one hour) to download (in perfect conditions, without protocol overhead (good ol' zmodem), and if my mom didn't happen to pick up the phone during transfer).

For the past few years, I've had a 10 megabit connection (downstream) into my home/office. Under perfect conditions, I can pull the entire Y! homepage, and all attached media in less than half of one second.

In the early 90s, the Y! homepage was obviously much smaller—all pages were smaller—but even with a smaller footprint, many pages took a long time to load. I remember browsing with many windows open (browsers didn't have tabs back then... in fact, we barely had browsers (-: ), loading up a dozen or so pages before alt-tabbing back to the first one I'd queued up a few minutes before (on my 14.4kbps modem, by this time), to see if it had finally finished loading.

To overcome low connection speeds, lack of resources on the client side, and other factors such as connection latency that lead to slow page page loads, web pioneers came up with a model for allowing content to be delived in reasonable sized chunks that is still in use today: pagination. Long lists of (say "100") pieces of data ("search results") were separated into smaller pages (of "10"), including widgets to allow skipping to the next, previous and often any page in the set.

Well... mostly still in use today.

Technologies have helped us hack around the idea of separating growing amount of data into pages. Ajax, for example, allows the dynamic loading of the next set of results without forcing a page reload (often poorly... try bookmarking the result of many of these dynamic populations. Even Mobile Mail on the iPhone/iPod Touch allows something like this.

It seems to me, though, that web interface designers are stuck in this rut of showing end users a mere 10, 20 or even 100 items at a time. My 10Mb connection can handle a lot more traffic than you're sending; your server had better be able to deliver it (and usually, it can); my browser is allowed to allocate much more RAM; and I even like to think that I've microevolved the ability to parse much more data that I could a few years ago.

So, I ask you, fellow web professionals: is pagination still necessary? I obviously don't think so, but I'm not a User Experience guy, I'm a user (and also the guy who has to make the UX happen, and make sure your server can deliver the results mentioned above). Tell me what you think.

← Previous  1 2 3 4 5 … 20 Next →


Clicky Web Analytics