Monthly Archives: February 2014

Search Engine Optimization techniques

Devin Bost

2/23/2014

One may ask, “How do we measure the results of our search engine marketing/optimization?” Measuring data requires Google Analytics. In this article, we will assume that Google Analytics has already been configured. According to what data Google Analytics provides, the process for improving site metrics is as follows:

  1. First, we setup filters.  We use filters to isolate traffic in the following areas (in order of importance from highest to lowest):
    1. Organic search results;
    2. Paid search results (this only applies when using pay per click advertising with Google AdWords);
    3. Unique new visitors from non-search engine sources.
  2. Second, we set our metrics (external variables). We should setup different metrics for each advertising campaign. We should also setup different metrics for organic search traffic. There are several benchmarks (variables) I like to collect data for:
    1. Number of unique visitors, or unique visitor count;
    2. Unique page view count, partitioned by ;
    3. Hit count, partitioned by landing page URL (filtered to display only pages generating one or more unique visits);
    4. Hit count, partitioned first by keyword phrase (the search term used to land on a page); then, partitioned by landing page URL (the URL the search brought them to);
    5. Relative position of ranked pages on Google, weighted according to their position (with an exponential decay model I developed);
    6. Return visit count, partitioned by IP address;
    7. Bounce rates:
      1. Partitioned by keyword phrase, then landing page URL, then by number of internal links (aka layer count) clicked on;
      2. Partitioned by landing page URL, then keyword phrase;
    8. Visitor count, partitioned by backlink URL. These are visitors that landed on our site by following a link from someone else’s website, and according to (Brin & Page, 1998), backlinks have been important since the creation of Google’s search algorithm.
  3. Third, we set our internal variables. These are what we generate internally. This technique becomes invaluable once our external variables begin exhibiting acceleration; then, we may use mathematical techniques to gain insights into how our changes to page content (internal variables) affect our external variables. It is very important that changes to site content are tracked. It can become very hard to assess rankings when it is unclear which version of a particular page was responsible for obtaining a top ranking. For this reason, it is very important that revision control is tracked across the site. HTML tags must be analyzed and tracked. Here are descriptions of how these are used:
    1. Title tag: It defines the page title and it communicates what the page is about to the search engines. The target keyword must be included in this tag. It is displayed to Google search users, so it is important that we apply some practical psychology here;
    2. Description meta tag: It provides a summary description of the web page and its contents. Also, this description appears (in most cases) in Google search results, just below the title; target keyword must be included in this tag;
    3. URLs: An optimized URL is one that is self-explanatory and self-documenting; target keyword must be included in this tag;
    4. Heading tags: These tags are used to emphasize important text and a hierarchical structure of keywords on the web page. Heading tags also inform the search engines how the page content is organized. The <h1> tag defines the most important keywords and <h6> defines the least important keywords;
    5. Keyword placement: This data will be more relevant when we start clustering keywords for strategic optimization on keyword stems. There are several techniques which may be used on this data, depending on how we implement clustering. We can use neural networks and natural language processing for this later on. Using language processing techniques is very easy when content is stored in a database that offers out-of-the-box text processing features;
    6. Content keyword density: According to (Parikh & Deshmukh, 2013), search algorithms place great emphasis on keyword density. It is important that targeted keywords have greater density in the content involved;
    7. Use of robots.txt: The robots.txt file gives directions to search engines regarding which pages or directories should be crawled. Having this file configured correctly will make sure all optimized pages will get indexed;
    8. Images: Using the image alt attribute to provide an accurate description of the image being used; target keyword should be used in the description, if possible. The alt attribute from the <img> tag specifies the alternate text of what the image contains if the image is displayed incorrectly or doesn’t load. It is also used by content readers for people with disabilities;
    9. Use of the “rel=nofollow” attribute: In a HTML anchor tag <a>, the rel attribute defines the relationship between the current page and the page being linked to by the anchor. The nofollow value signals web spiders not to follow the link. In other words, it tells Google that your site is not passing its reputation to the page or pages linked to;
    10. Sitemaps: Keeping the sitemap updated is key for good site rankings. Search engines depend upon sitemaps to tell them what the current web pages are for the website;
    11. Time interval. This is the frequency by which we take measurements. Monthly is fine initially. Once we have enough data to observe our rates of change, we can change our interval to weekly;
  4. We will track internal links and external links once we have traffic that doesn’t bounce. We will discuss this more later. External linking is considered off-site SEO. Important factors, although rather difficult to track are:
    1. Keyword in the backlink: Google’s ranking algorithm places high value on the text that appears within the link. The text within the link gets associated with the page and describes the page it links to. For this reason, it’s important to have the target keyword within the text of the backlink;
    2. Gradual link-building: It’s important to build backlinks in a gradual manner. The link-building process should be natural and steady. It is for this reason that SEO takes a lot of work and patience to implement Furthermore, it’s an intentional part of Google’s strategy for gradual reputation building that it not be quick or overnight. In fact, if a site were to acquire dozens or hundreds of backlinks overnight, Google would almost certainly consider this a red flag (spam) that most likely will get your site penalized. But if the site content is compelling, people can find it through search (or through other means) and link to it. If this occurs, the site owner has no control of the amount of backlinks that the site will generate, and certainly Google can detect that these backlinks weren’t intentional.
    3. Writing articles to establish domain authority: Writing articles and getting them published on other reputable sites, is a strategy that can help your site get backlinks. Getting an article published on trusted sites such as About.com, Wikipedia.org or NewYorkTimes.com and getting a backlink in return, will help increase your website’s reputation and achieve higher rankings;
    4. Personal networking to establish a reputation: It is recommended that we make efforts to reach out to those in the site’s community, particularly sites, “that cover topic areas similar to [ours]. Opening up communication with these sites is usually beneficial.” (www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf) Contacting sites that are related to what your site is about is a great way to network, promote and increase your site’s exposure;
    5. Finding your website’s natural affinity group: Find websites that are related or cover similar topics as yours for potential networking opportunities. Getting backlinks off-topic sites do not count as much as links from sites that have related content to yours.

References

Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1), 107-117.

Parikh, A., & Deshmukh, S. (2013, November). Search Engine Optimization. International Journal of Engineering Research and Technology, 2(11), 3146-3153.

Advertisements

What techniques exist for predicting drug interactions?

By Devin G. Bost

Feb. 17, 1014

Problem One: Identifying the metabolites

Many people overlook the fact that after a drug is consumed, it is broken down by the body. A finite series of mechanisms exist to process the drug. The P450 set of enzymes are key in metabolizing most known drugs, and through the liver, they break the drug down into various components. Many of those components, called metabolites, are still biologically active. In some cases, the metabolites of a drug are much more reactive than the initial drug. Thalidomide is a famous example of this. Other good examples include intermediates, coordination complexes, and adducts formed by metabolites of drugs of abuse. Ethanol, for example, is much less dangerous than its metabolite, acetaldehyde, which reacts with DNA, initiating carcinogenesis, and can spontaneously detonate at room temperature.

Solution: Representing the metabolites as an XML data tree

Large chemical databases of allow developers to computationally predict the potential metabolites formed by a given drug. XML is known as a hierarchal data structure. Recently, advances in SQL database technology has improved the functionality and usability of this data type. Using known chemical degradation processes, stored procedures (in the database) can be used to compute each potential metabolite of a given source drug. Mathematically, limitations to SQL stored procedures can be overcome by linking to external software libraries. For example, using Microsoft SQL Server 2012, a database administrator can easily import a dynamic linked library (DLL) to extend database functionality to include a program written in an object-oriented language (e.g. C++, C#, or VB.NET) over the Common Language Runtime (CLR). This technique enables database developers to extend the typical functionality of the database and perform advanced computations with external software libraries designed for mathematical modeling and simulation. Using an XML data type, databases can store the entire computed tree of metabolites for each initial drug. After the computations are performed, the data is easily stored in the database for later use.

Problem Two: How do we identify interactions between drugs and/or metabolites?

According to (Wishart & Materi, 2007), “In the areas of drug discovery and development, [pharmacokinetic (PK)] modeling might be regarded as one of the first and most successful examples of computational system biology,” considering also the discoveries of (Huisinga, Telgmann, & Wulkow, 2006) and (Mager, 2006), “A key limitation of ODEs or systems of ODEs is the need for complete and quantitative data on  concentrations, reaction rates, diffusion rates, degradation rates and many other parameters.” With special thanks to Henry Eyring[1]

Solution

Large metabolomics databases, such as the ones listed by (Wishart & Materi, 2007), reduce the quantity of computational predictions that are required by the developer. Furthermore, these databases provide a means for testing the accuracy of predictions against known empirical data and experimental findings. Developing and integrating the models for computing and joining the required parameters then remains the ultimate task for the developer.

References

Huisinga, W., Telgmann, R., & Wulkow, M. (2006). The virtual laboratory approach to pharmacokinetics: design principles and concepts. Drug discovery today, 11(17), 800-805.

Mager, D. E. (2006). Quantitative structure–pharmacokinetic/pharmacodynamic relationships. Advanced drug delivery reviews, 58(12), 1326-1356.

Wishart, D. S., & Materi, W. (2007). Current Progress in computational metabolomics. Briefings in Bioinformatics, 8(5), 279-293.


[1] Henry Eyring is best known for his contributions on transition state theory. He has been recognized by numerous awards in chemistry, including a Nobel Prize. Interestingly, Henry Eyring is the father of a religious leader, Henry B. Eyring. As of Feb. 17, 2014, Henry B. Eyring is serving as a member of the First Presidency of The Church of Jesus Christ of Latter-Day Saints.

What are the costs of medical errors in the United States?

By Devin G. Bost

Feb. 17, 1014

Introduction

The estimated cost of medical errors in the United States reached $17.1 billion in 2008 (Van Den Bos, Rustagi, Gray, Ziemkiewicz, & Shreve, 2011). In 1997, an estimated cost of medication-related errors in U.S. nursing homes alone reached $7.6 billion (Desai, Williams, Greene, Pierson, & Hansen, 2011). Finding ways to leverage information technology to solve the problems related to medication prescribing mistakes presents an economic tragedy, as well as a serious opportunity for both members of the public and private sectors.

How do we reduce medical errors?

Although scientists have gained a reasonable understanding of the relationships between novel drugs and the receptor sites they bind to, a significantly overlooked complication to these models results from reactions undergone by metabolites of the given drug. In other words, scientists often don’t really know what interactions are occurring between the metabolites of various combinations of drugs. This lack of awareness is only exasperated in patients with complicated disease states (e.g. diabetes) and complex multi-drug regimens that evolve over time. Luckily, modern technology is changing the way we see these problems, particularly as genomic data is becoming more widely available.

What are the industry trends?

Thanks to the human genome project, interest in genomic data is gaining momentum. As more data becomes available, more researchers are opening up access to their data to allow further collaborative research. Databases form the cornerstone of this research. Unfortunately, without common naming conventions or internal references to well-known database schemas, significant variability occurs in the data from one database to another. SQL join operations require data elements to match exactly, so enforcing relationships between databases introduces new integration challenges to the developer. Regarding what is commonly referred to as “big data,” technologies such as Hadoop have emerged for handling large volumes of unstructured data in a high-performance manner. Unfortunately for us, integrating genomic data to perform computational analysis requires data to be structured and relationships to be enforced. Luckily, drugs, for example, can be joined on CAS number, a universally recognized unique identifier, and large ontologies have emerged to join diagnostic terminology to disease states and various forms of adverse events. Other data entities can be joined to the data from PubChem and other popular databases.

What’s the value of predicting drug interactions?

Assuming that the relationships can be enforced and the data can be properly integrated, the developer now faces the serious question: “How do we computational model, simulate, and predict the net effects of drug interactions?” Such a model would allow clinical decision makers to choose medicines in a much more deterministic manner. For example, in theory, a physician could utilize such software to choose analogues of a particular drug to avert drug interactions that would otherwise be caused by a drug for which no analogues currently exist. Such software would also allow physicians to determine when particular drug combinations would be too dangerous to try on patients with complicated medical histories or disease for which cures have not yet been discovered.

References

Desai, R., Williams, C. E., Greene, S. B., Pierson, S., & Hansen, R. A. (2011). Medication errors during patient transitions into nursing homes: characteristics and association with patient harm. The American journal of geriatric pharmacotherapy, 9(6), 413-422.

Van Den Bos, J., Rustagi, K., Gray, T. H., Ziemkiewicz, E., & Shreve, J. (2011). The $17.1 billion problem: the annual cost of measurable medical errors. Health Affairs, 30(4), 596-603.

The emergent state of drug development:

Synthetic biology in the 21st century

Drug development is emerging with a new twist. One of the greatest discoveries of the human genome project was the discovery that our genome changes – not just at conception. If someone were to sample my DNA today and again in one year, scientists recently discovered that my genetics would look totally different.

Due to this discovery, scientists have been racing to develop and reduce the costs of DNA sequencers. The pricing function of these devices resembles the pricing of computers in the early 1980’s, when computers were the size of a small room and cost more than $500,000 to own. Today, less than 40 years later, computers are the size of your palm and cost only hundreds of dollars.

The best thing, however, that this new genetic revolution has introduced is the onset of “big data.” More specifically, this emergent paradigm shift has introduced large, muti-national collaborative efforts to collect, store, and share insights into this massive wave of genetic information. As insights emerge, further data emerges. As data emerges, discoveries occur. Those discoveries result in further funding, some from governments, but mostly from philanthropists, charities, and private research organizations. Educational institutions also have a vested interest in preparing and providing additional data to assist in this collaborative effort.

Examples include:

  • Ensembl
  • Genbank
  • Drugbank.ca
  • Blast
  • ChEMBL
  • PharmGKB
  • PubChem
  • BindingDB

 

Thankfully, this data is available for development of novel applications. For my purposes, I wish to discover ways of computing drug interactions. These interactions can be predicted by evaluating enzyme interactions, drug metabolism, binding site interactions, and associated properties with those binding sites (e.g. PKI/binding affinity, diffusion coefficients, etc.). Thus, we can obtain a deterministic, differential model for computing the net effects of complex drug interactions in patients taking multiple medications. This will help providers determine whether or not it is safe to prescribe a particular medication for a patient with complicated medical histories. It will also help providers prevent dangerous adverse events by helping them determine when a particular drug may result in a “tipping point” from insignificantly contributing to an adverse event to putting the patient into danger. As data emerges, we will see the paradigm shift away from typical drug discovery toward a protein and enzyme driven approach. I will share more on this later.

Developing Successful Business Information Systems Architecture (ISA)

Customer-driven business development

Devin Bost

Feb. 1, 2014

Problem:

“Across a wide spectrum of markets and countries, [information technology] is transcending its traditional ‘back office’ role and is evolving toward a ‘strategic’ role with the potential not only to support chosen business strategies, but also to shape new business strategies” (Henderson & Venkatraman, 1993). Over twenty years ago, what Henderson & Venkatraman recognized is a concept that is more true now than ever before in history. Today, businesses that fail to adopt this concept will never reach their true potential. Ten years later, Davenport & Short identified that this evolution (of the use of technology to drive business strategy) invalidates traditional philosophies in business management. “The conventional wisdom in IT usage has always been to first determine the business requirements of a function, process, or other business entity, and then to develop a system. The problem is that an awareness of IT capabilities can –and should – influence process design.” (Davenport & Short, 2003). Today, these perspectives have changed the way strategic business decisions are made. Marketplace globalization has introduced complexities that early pioneers could never have imagined. Businesses must respond to change more rapidly now than ever, according to (Teece, 2010). Businesses that are slow to change may lose new market opportunities, fail to meet new regulatory demands, and face risks that are often fatal to the business. According to more contemporary insights, agility has emerged as the true definition of business success. “Faced with rapid and often unanticipated change, agility, [is] defined as the ability to detect and respond to opportunities and threats with ease, speed, and dexterity. . . If IT infrastructure is scalable or adaptable, firms may be better able to implement their market response strategy with ease, speed, and dexterity, and so IT infrastructure flexibility could be viewed as a response capability” (Tallon & Pinsonneault, 2011) .

Successful ISA:

  • Increases business agility by improving productivity
  • Increases collaboration (which drives innovation) on projects by improving access, availability, sharing, collection, and reporting of information
  • Improves the density of evidence-based decisions (by reducing time and cost of collecting, measuring, analyzing, and reporting data from internal studies)
  • Improves customer value (by improving response time to customer feedback, improving quality, increasing customer participate in business development process), thus creating “customer-driven business development”

To create a successful ISA, companies must align with technological advancements, develop modular business processes, and change traditional management perspectives to develop IT-driven strategy.  It is not uncommon for executive management teams to overlook the importance of involving interdisciplinary software architecture teams in executive meetings. Failure to include software architects in executive decisions may develop business objectives that are poorly aligned with technological capabilities. As a consequence, poor alignment with technology may result in technological disasters that reduce business agility, rather than increase it. These situations occur frequently in the development of business intelligence, or software reporting that is intended primarily for executive decision makers. According to (Yeoh & Koronios, 2010), “In order for [business intelligence] initiatives to be taken seriously and to be supported by corporate leadership, they need to be integrated with the overall strategy. Otherwise they will not receive the leadership support that is required to make them successful. . . A BI system that is not business driven is a failed system! BI is a business-centric concept. Sending IT off to solve a problem rarely results in a positive outcome.”

References

Davenport, T. H., & Short, J. E. (2003). Information technology and business process redesign. Operations management: critical perspectives on business and management, 1, 1-27.

Henderson, J. C., & Venkatraman, N. (1993). Strategic alignment: Leveraging information technology for transforming organizations. IBM systems journal, 32(1), 4-16.

Tallon, P. P., & Pinsonneault, A. (2011). Competing perspectives on the link between strategic information technology alignment and organizational agility: Insights from a mediation model. Mis Quarterly, 35(2), 463-484.

Teece, D. J. (2010). Business models, business strategy and innovation. Long range planning, 43(2), 172-194.

Yeoh, W., & Koronios, A. (2010). Critical success factors for business intelligence systems. Journal of computer information systems, 50(3), 23-32.