Category: Technology

  • WTF is wrong with hosted gmail?

    I can’t log in to either of my hosted gmail accounts. Anyone else?

    UPDATE: I contacted gmail support and apparently they occasionally lock accounts due to suspicious activity. I think I had two different hosted gmail accounts open in tabs in the same browser. Very suspicious.

    Their suggestion: contact them. The response: occasionally we lock accounts – see our help page for detail. The help page suggests you contact them.

    So I’m stuck in a loop and it’s pissing me off because I haven’t had access to mark at linebuzz.com for going on 16 hours now.

    If you’re thinking of moving your corporate email to hosted GMail, think twice. You may be up the creek for 16 hours waiting for a locked account to timeout.

    .

  • An ode to the end of Facebook

    I rant, Tony rants, Alan ranted.

    With surprisingly similar space-time coordinates.

    Our love of Facebook is duly recanted.

    We’re no longer Zuckerberg’s subordinates.

  • How to record a remote podcast

    A quick article about how to record a remote interview and how to fix the audio levels after the interview.

    I got a few questions about the equipment I used to record the podcast interview with Tony yesterday. I recorded it remotely using Skype – Tony was in West Seattle and I’m in Sammamish. We were both wearing headsets which I recommend because even though Skype is good at cutting out feedback from a PC speaker, some noise does get through if you’re not wearing a headset.

    I used Pamela to record the audio. I recommend the Pro version because the other versions limit your recording to 30 minutes or less. Pamela is free for the first 30 days and it’s about $12 after that. A tip when using Pamela: To get to the mp3 audio files, right-click on a recording and click “open call recording folder”. It took me a while to figure that out.

    The only complaint I have about Pamela is that it doesn’t regulate the volume of the caller vs. the callee. So my voice was very loud and Tony’s was much softer. It’s taking the audio directly from Skype, so perhaps that’s too much to ask. I also haven’t experimented playing around with the Skype audio settings. Fixing this was time consuming:

    I used Audacity, and open source sound editor to fix the difference in Audio volume, and besides the actual interview, this occupied most of my time putting the podcast together. Using Audacity you can see the waveform and it’s quite clear where the audio level is much lower. So I selected the parts in the audio where Tony speaks and applied the Amplify effect. Amplify automatically detects the largest waveform and sets the amplification so that the largest waveform won’t clip – in other words it wont over-amplify and cause distortion. I recommend using the default number it gives you and if that’s too low, then look at the area of the clip you’ve selected and you’ll probably see a spike in the waveform that’s causing amplify to give you a low amplification number. Just select around that spike and you’ll be able to boost the signal more.

    I’m sure there’s an easier way to do this, but I tried using Leveller and a couple of other tools and the results weren’t as good as Amplify.

    Next time, I’m going to make darn sure my levels are much lower and as close as possible to the person I’m calling. Pamela has a level indicator when you’re recording, so I might try and use that as a visual guide and tweak Skype’s audio settings.

    Once I’d finished working with the clip in Audacity, I saved it as a WAV file rather than using Audacity’s ‘save-as mp3’ option and I used RazorLame to convert the WAV to mp3. That gave me more control over the mp3 quality. Under Edit/LAME options, select 24kbit as the bitrate and ‘mono’ as the mode.

    Then I just uploaded the file to my blog server and presto!

  • Configuring MySQL and Apache for a faster blog

    I logged onto my blog this morning and it wouldn’t load. I tried to ping the server and it was still up. Then I tried ssh’ing into the server and it connected. I hit reload again in my browser and starting mumbling WTF.

    Then I ran ‘uptime’ on the server and got something like this:

    09:52:40 up 325 days, 6:45, 2 users, load average: 0.28, 0.28, 0.27

    That’s a little high, so I checked how many apache processes there were and it was at MaxClients, apache was working pretty hard. I checked my Analytics stats and by 7am today I had already done as much traffic as yesterday:

    So I tailed the web server log file and it just flew off the screen.

    I figured out Reddit was the source. Someone posted a blog entry I wrote yesterday about Rescuetime and it’s getting a few votes.

    I run a standard WordPress.org install (newest version). My server has 1G of RAM and is an AMD Athlon XP 2100. It’s on a 10 Megabit backbone, so has plenty of bandwidth. So I made some basic changes to the server.

    Apache needed to handle more concurrent connections, and I had MaxClients set to 15. But the server was using too much memory for me to increase maxclients, and MySQL was the memory hog. So changed the mysql config to use less memory because fetching blog entries from disk is not that much hard work.

    My my.cnf file (the config file for mysql) has the following settings now:

    key_buffer = 50M
    sort_buffer_size = 5M
    read_buffer_size = 1M
    read_rnd_buffer_size = 1M
    myisam_sort_buffer_size = 5M
    query_cache_size = 4M
    That’s a fairly small number of the key buffer and the other caches are very low too, but I’m just serving around 300 blog entries, so I could probably do away with the key buffer completely and just rely on disk access and it would still be ok. I left the query cache at 4M in the hope that it would save me some disk access when fetching blog entries.

    I changed Apache’s config from this:

    MinSpareServers 15
    MaxSpareServers 15
    StartServers 15
    MaxClients 30

    to this:

    MinSpareServers 15
    MaxSpareServers 45
    StartServers 30
    MaxClients 60

    It fixed it immediately and my blog is now blazingly fast. 🙂 Right now apache has 49 children, so it’s still getting a lot of traffic, but it’s not hitting MaxClients which means it’s not turning away users.

    Digg!

  • The Web

    This little guy quietly spun this masterpiece while I was snoozing on the couch below him last night – and he got me thinking about The Web and what the word really means these days. Perhaps I’ve been in the entrepreneurial game for too long now, but it’s beginning to mean:

    • Design
    • User interfaces
    • SEO
    • SEM
    • Traffic
    • Competitive analysis
    • Bounce rates,
    • Return rates,
    • Content optimization
    • etc…etc…

    …when it really means one thing:

    COMMUNICATION

    Whether it’s buyers communicating with sellers or mining the worlds collective knowledge via search or blog trackbacks or inline comments – it’s all just new ways for us all to communicate. Communication is the web’s raison d’être. It’s really that simple.

  • World-wide city database and other (free) geospatial data

    The National Geospatial Intelligence Agency is one of my favorite data sources – it’s also one of my favorite names for any government agency. The agency provides a database of world-wide features which I use as a data source for Geojoey.com’s landmark search feature (top right of the screen).

    These guys are selling the equivalent data for over $300.

    For an up to date ZIP code database, you should contact USPS.gov and order it from them – which may take a while – government companies grumble grumble. Or just buy it from one of the many online sellers for about $50. Census.gov doesn’t do ZIP codes anymore, but if you don’t care about it being current, there’s an old ZIP code database available for download.

    Census.gov’s TIGER database is definitely THE source for US geospatial data. My favorite page is the cartographic boundary files they’ve extracted from the database. It has things like ZCTA’s (ZIP borders), County boundaries, etc. If you’re handy with a graphics app, you can do all kinds of fun stuff with this data.

  • Programming language choices for entrepreneurs

    I’ll often find myself chatting about choice of technology with fellow entrepreneurs and invariably it’s assumed the new web app is going to be developed in Rails.

    I don’t know enough about Rails to judge it’s worth. I do know that you can develop applications in Rails very quickly and that it scales complexity better than Perl. Rails may have problems scaling performance. I also know that you can’t hire a Rails developer in Seattle for love or money.

    So here are some things to think about when choosing a programming language and platform for your next consumer web business. They are in chronological order – the order you’re going to encounter each issue:

    1. Are you going to be able to hire great talent in languageX for a reasonable price?
    2. Can you code it quickly in languageX?
    3. Is languageX going to scale to handle your traffic?
    4. Is languageX going to scale to handle your complexity?
    5. Is languageX going to be around tomorrow?

    If you answered yes to all 5 of these, then you’ve made the right choice.

    I use Perl for my projects, and it does fairly well on most criteria. It’s weakest is scaling to handle complexity. Perl lets you invent your own style of coding, so it can become very hard to read someone else’s code. Usually that’s solved through coding by convention. Damian Conway’s Object Oriented Perl is the bible of Perl convention in case you’re considering going that route.

  • Saving server costs with Javascript using distributed processing

    I run two consumer web businesses. LineBuzz.com and Geojoey.com. Both have more than 50% of the app impelemented in Javascript and execute in the browser environment.

    Something that occurred to me a while ago is that, because most of the execution happens inside the browser and uses our visitors CPU and memory, I don’t have to worry about my servers having to provide that CPU and memory.

    I found myself moving processing to the client side where possible.

    [Don’t worry, we torture our QA guru with a slow machine on purpose so she will catch any browser slowness we cause]

    One down side is that everyone can see the Javascript source code – although it’s compressed which makes it a little harder to reverse engineer. Usually the most CPU intensive code is also the most interesting.

    Another disadvantage is that I’m using a bit more bandwidth. But if the app is not shoveling vasts amount of data to do its processing and if I’m OK with exposing parts of my source to competitors, then these issues go away.

    Moving execution to the client side opens up some interesting opportunities for distributed processing.

    Lets say you have 1 million page views a day on your home page. That’s 365 Million views per year. Lets say each user spends an average of 1 minute on your page because they’re reading something interesting.

    So that’s 365 million minutes of processing time you have available per year.

    Converted to years, that’s 694 server years. One server working for 694 years or 694 servers working for 1 year.

    But lets halve it because we haven’t taken into account load times or the fact that javascript is slower than other languages. So we have 347 server years.

    Or put another way, it’s like having 347 additional servers per year.

    The cheapest server at ServerBeach.com costs $75 per month or $900 per year. [It’s a 1.7Ghz Celeron with 512Megs RAM – we’re working on minimums here!]

    So that translates 347 servers per year into $312,300 per year.

    My method isn’t very scientific – and if you go around slowing down peoples machines, you’re not going to have 1 million page views per day for very long. But it gives you a general indication of how much money you can save if you can move parts of a CPU intensive web application to the client side.

    So going beyond saving server costs, it’s possible for a high traffic website to do something similar to SETI@HOME and essentially turn the millions of workstations that spend a few minutes on the site each day into a giant distributed processing beowulf cluster using little old Javascript.

  • Competitive intelligence tools

    In an earlier post I suggested that too much competitive analysis too early might be a bad idea. But it got me thinking about the tools that are available for gathering competitive intelligence about a business and what someone else might be using to gather data about my business.

    Archive.org

    One of my favorites! Use archive.org to see how your competitors website evolved from the early days until now. If they have a robots.txt blocking iarchive (archive.org’s web crawler) then you’re not going to see anything, but most websites don’t block the crawler. Here’s Google’s early home page from 1998.

    For extra intel, combine Alexa with archive.org to find out when your competitors traffic spiked, and then look at their pages during those dates on Archive.org to try and figure out what they did right.

    Yahoo Site Explorer

    Site explorer is useful for seeing who’s linking to your competitor i.e. who you should be getting backlinks from.

    Netcraft Site Report

    Netcraft have a toolbar of their own. Take a look at the site rank to get an indication of traffic. Click the number to see who has roughly the same traffic. The page also shows some useful stuff like which hosting facility your competitor is using.

    Google pages indexed

    What interests me more than pagerank is the number of pages of content a website has and which of those are indexed and are ranking well. Search for ‘site:example.com’ on Google to see all pages that Google has indexed for a given website. Smart website owners don’t optimize for individual keywords or phrases, but instead provide a ton of content that Google indexes. They then work on getting a good overall rank for their site and getting search engine traffic for a wide range of keywords. I blogged about this recently on a friends blog and it’s called the long tail approach.

    If I’m looking at which pages my competitor has indexed, I’m very interested in what specific content they’re providing. So often I’ll skip to result 900 or more and see what the bulk of their content is. You may dig up some interesting info doing this.

    Technorati Rank, Links and Authority

    If you’re researching a competing blog, use Technorati. Look at the rank, blog reactions (inbound links really) and the technorati authority. Authority is the number of blogs linking to the blog you’re researching in the last 6 months.

    Alexa

    Sites like Alexa, Comscore and Compete are incredibly inaccurate and easy to game. Just read this piece by the CEO of plenty of fish. Alexa provides an approximation of traffic. It’s also subject to anomalies that throw the stats wildly off. Like the time that Digg.com overtook Slashdot.org in traffic. Someone on Digg posted an article about the win and all the Digg visitors went to Alexa to look at the stats and many installed the toolbar. The result was a big jump in Digg’s traffic according to Alexa when nothing had changed.

    Google PageRank

    PageRank is only updated about once every 2 or more months. New sites could be getting a ton of traffic and have no pagerank, while older sites can have huge pagerank but very little content and only rank well for a few keywords. Install Google Toolbar to see pagerank for any site. You may have to enable it in advanced options.

    nmap

    This may get you blocked by your ISP and may even be illegal, so I’m just mentioning it for informational purposes and because this may be used on you. nmap is a port scanning tool that will tell you what services someone is running on their server, what operating system they’re running, what other machines are on the same subnet and so on. It’s a favorite used by hackers to find potential targets. It also has the potential to slow down or harm a server. It’s also quite easy to detect if someone is running this on your server and find out who they are. So don’t go and load this on your machine and run it.

    Compete

    Compete is basically an Alexa clone. I never use this site because I’ve checked sites that I have real data on and compete seems way off. They claim to provide demographics too, but if the basics are wrong, how can you trust the demographics.

    whois

    I use unix command line whois, but you can use whois.net if you’re not a geek. We use a domain proxy service to preserve our privacy, but many people don’t. You’ll often dig up some interesting data in whois, like who the parent company of your competitor is, or who founded the company and is still the owner of the domain name. Try googling any corporation or personal names you find and you might come up with even more data.

    HTML source of competitors site

    Just take a glance at the headers and footers and any comments in the middle of the pages. Sometimes you can tell what server platform they’re running or sometimes a silly developer has commented out code that’s part of a yet unreleased feature.

    Personal blogs of competitors and staff

    If you’re researching linebuzz.com and you’re my competitor, then it’s a good idea to keep an eye on this blog. I sometimes talk about the technology we use and how we get stuff done. Same applies for your competitors. Google the founders and management team, find their blogs and read them regularly.

    dig (not Digg.com)

    dig is another unix tool that queries dns servers. Much of this data is available from netcraft.com mentioned above. But you can use dig to find out who your competitor uses for email with ‘dig mx example.com’ and you can do a reverse lookup on an ip address which may help you find out who their ISP is (netcraft gives you this)

    Another useful thing that dig does is give you an indication how your competitor is set up for web hosting – if they’re using round-robin DNS or a single IP with a load sharer.

    traceroute

    Another unix tool. Run: ‘/usr/sbin/traceroute www.example.com’ and you’ll get a list of the path your traffic takes to get to your competitors servers. Look at the last few router hostnames before the final destination of the traffic. You may get data on which country and/or city your competitors servers are based in and which hosting provider they use. There’s a rather crummy web based traceroute here.

    Google alerts

    Set up Google news, blog and search alerts for both your competitors brands and your own because your competitors may mention you in a blog comment or somewhere else.

    There is a lot more information available via SEC filings, Secretary of State websites and so on – perhaps the subject of a future entry.